Optimizing Storage Costs with Cloudflare R2 for Data Ingestion into Snowflake
In an increasingly cost-conscious business landscape, one area ripe for exploration is migrating data lakes away from hyperscalers. Egress fees, which are charges incurred when transferring data out of a cloud provider's network, are a significant cost driver within hyperscalers. Cloudflare not only offers lower storage costs, but also eliminates egress fees altogether. This can result in substantial savings for companies, depending on their data usage patterns. Moreover, Cloudflare provides a generous free tier of up to 10 GB per month.
Snowflake, the leader in next-gen cloud computing, offers faster and more cost-effective solutions than its competitors, especially when compared to hyperscalers. Although Snowflake is designed for high data consumption, it is often more affordable for small businesses compared to using a hyperscaler environment. This will be discussed further in a future blog post.
Recently, I stumbled upon a blog by Felipe Hoffa that detailed using Cloudflare R2 as the data storage location for external tables in Snowflake (https://medium.com/snowflake/snowflake-tables-on-cloudflare-r2-b5496c4ae9ec). I had a project that required less than 5 GB of storage per month, and I wanted to avoid paying for storage if possible. Therefore, I decided to experiment with Cloudflare's free tier as my data lake. The results were nothing short of impressive.
Here's a step-by-step guide to what I did:
1. Create FREE Cloudflare account
2. Enable Cloudflare R2 (a credit card is required, but it's free as long as you stay within the free tier limits)
3. Created API token to the bucket (will need this when creating the Snowflake Stage)\
4. Create FREE snowflake testing account
https://signup.snowflake.com
5. Since s3 compatible storage is a preview, need to put in a ticket with Snowflake support to add the endpoint for usage.
6. Create Snowflake Table
7. Create File Format
8. Create External Stage (with API token information)
9. Stage file in Cloudflare bucket
I just did a drag & drop
10. COPY INTO command to load the data into my RAW table location
Here are my results!
These are the current vendors supporting S3-compatible storage:
Cloudflare
Cloudian
Dell
Hitachi Content Platform
MinIO
NetApp (StorageGRID)
PureStorage
Scality
By leveraging Cloudflare R2 as a data lake for ingestion into Snowflake, businesses can optimize their storage costs and potentially benefit from significant cost savings.
Contact Moser Consulting for more information.