How to choose the right Virtual Warehouse size in Snowflake for your workload
Snowflake uses Virtual Warehouse to execute your queries or run a batch load. The virtual warehouse is a fancy term for on-demand computing. For simplicity sake, Snowflake offers Warehouses in T-shirt sizes.
Servers / Cluster
Snowflake Credits / Hour
Snowflake is a Massive parallel processing data warehouse. Which means you need to have multiple active servers to take advantage of parallel computing. Snowflake recommends using X-Large for data warehouses (That's the default warehouse size if you create one from their web interface)
The bad news, There is no precise documentation on when to use what. The good news, Now its time to wake that little Einstein in you and get your hands dirty.
Choose a hardware considering your workload
Are you running a basic query against a table with tens of thousands of rows, You will probably want to choose an XS instance. Larger is not necessarily faster for small, basic queries.
Be heroic and experiment different sizes
Experiment with different types of queries and different warehouse sizes to determine the combinations that best meet your specific query needs and workload. Before you get worried to burn your employer's dollars, let me clarify Snowflake utilizes per-second billing, so you can run larger warehouses (Large, X-Large, 2X-Large, etc.) and simply suspend them when not in use and still charged only for what you used.
Tweak your warehouse size based on the response times
Upping the warehouse size linearly affects query times. 1hr query with XS warehouse takes 30m query with S warehouse, 15m query with M warehouse, etc..
So, rather than thinking in terms of data set sizes, you may want to think about how long your typical session usage pattern usually takes. Keep that in mind when you set your auto_suspend value.
Remember, It does take a while to provision large warehouses
Also, consider the larger warehouse sizes may take a little longer (> 3 minutes) to get provisioned or to scale up. So remember to consider the provisioning latency as well.
When to Scaling up vs Scaling out
Scale up by resizing a warehouse. Scale out by adding clusters to a warehouse
Resizing a warehouse generally improves query performance, particularly for larger, more complex queries. It can also help reduce the queuing that occurs if a warehouse does not have enough servers to process all the queries that are submitted concurrently
Whereas, Multi-cluster warehouses are designed specifically for handling queuing and performance issues related to large numbers of concurrent users and/or queries. In addition, multi-cluster warehouses can help automate this process if your number of users/queries tend to fluctuate.
Read great educational content like this and a lot more !
Members get free exclusive access to content, new courses, and discounts. Signup for a free account to write a post / comment / upvote posts. Creating an account takes less than 5 seconds and you can start earning badges & points too