How does snowflake handles large data updates with data pointers and concurrent reads ?

How does snowflake handles large data updates considering a fact multiple virtual warehouses are created using same baseline data? How does the data pointers are updated real time to avoid dirty reads ?

vjcloud

posted on 05 Nov 19

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds




nVector05-Nov-19

Even though there are multiple virtual warehouses, There is only one underlying data store that is shared across them. As to how they maintain the integrity of data / how they avoid dirty reads - That information is not made available to the public. They handle all these operations under the hood

vjcloud06-Nov-19

Thanks npack for your response, it seems to be one of the biggest factor for any organization to leverage Snowflake, one should be assured that they are looking at correct at updated dataset.

But that is something snowflake should clarify for their users, now the question I have is whether there is a capability of updating large table( which might take at least 5-10 minutes for update) concurrently when there is an active user using same table for data analysis and may be running analytical queries, do you happen to have any insights on that side? 

nVector06-Nov-19

I am sure the queries wont block each other. But the question is, whether your second query operates on the pre-update-query data or on the latest data during the update. We will have to try it out with a large dataset to find out