30 October 2018        Add to Favorites   Report

How to choose the right Virtual Warehouse size in Snowflake for your workload

Snowflake uses Virtual Warehouse to execute your queries or run a batch load. The virtual warehouse is a fancy term for on-demand computing. For simplicity sake, Snowflake offers Warehouses in T-shirt sizes.

Warehouse Size

Servers / Cluster

Snowflake Credits / Hour

X-Small
11
Small22
Medium44
Large88
X-Large
1616
2X-Large
3232
3X-Large 
6464
4X-Large
128128



Snowflake is a Massive parallel processing data warehouse. Which means you need to have multiple active servers to take advantage of parallel computing. Snowflake recommends using X-Large for data warehouses (That's the default warehouse size if you create one from their web interface)

The bad news, There is no precise documentation on when to use what. The good news, Now its time to wake that little Einstein in you and get your hands dirty.

Choose a hardware considering your workload

Are you running a basic query against a table with tens of thousands of rows, You will probably want to choose an XS instance. Larger is not necessarily faster for small, basic queries.

Be heroic and experiment different sizes

Experiment with different types of queries and different warehouse sizes to determine the combinations that best meet your specific query needs and workload. Before you get worried to burn your employer's dollars, let me clarify Snowflake utilizes per-second billing, so you can run larger warehouses (Large, X-Large, 2X-Large, etc.) and simply suspend them when not in use and still charged only for what you used.

Tweak your warehouse size based on the response times 

Upping the warehouse size linearly affects query times. 1hr query with XS warehouse takes 30m query with S warehouse, 15m query with M warehouse, etc..

So, rather than thinking in terms of data set sizes, you may want to think about how long your typical session usage pattern usually takes. Keep that in mind when you set your auto_suspend value. 

Remember, It does take a while to provision large warehouses

Also, consider the larger warehouse sizes may take a little longer (> 3 minutes) to get provisioned or to scale up. So remember to consider the provisioning latency as well.

When to Scaling up vs Scaling out

Scale up by resizing a warehouse. Scale out by adding clusters to a warehouse

Resizing a warehouse generally improves query performance, particularly for larger, more complex queries. It can also help reduce the queuing that occurs if a warehouse does not have enough servers to process all the queries that are submitted concurrently

Whereas, Multi-cluster warehouses are designed specifically for handling queuing and performance issues related to large numbers of concurrent users and/or queries. In addition, multi-cluster warehouses can help automate this process if your number of users/queries tend to fluctuate.



Read great educational content like this and a lot more ! Create my free account now 🎁

Copied