Loading large volume of data into Snowflake using ODBC / JDBC
We all have our own favorite ETL Tools (SSIS, Informatica, Abinitio, Datastage). While using these tools and loading to a cloud datawarehouse. eg. Snowflake. Be cautious on what connection type you use. The easiest to setup is a relational connection (ODBC / JDBC), but that's the slowest as well.
What happens with a Relational loader:
When you use a ODBC Connection, the ODBC driver internally calls the Snowflake API to send data and its not a streaming operation, which means, the client sends a batch of data, the server acknowledges it and then the process repeats. For small volume,this is fine, But as the volume grows this would be a overkill
Here's the alternative, Try a file loader
Once the ETL Processing is complete. Write results to a flat file, we can then move the file to the snowflake staging area and then eventually insert it into the target table. Make sure the SnowSQL client is installed in the server where your ETL jobs run. SnowSQL client is a unix based CLI (Command line interface) tool for Snowflake, that lets you move files to and from the cloud, as well as run SQL queries and table operations directly
You can use the put command to transfer the file to the cloud:
put file://C:\getting-started\employees0*.csv @SALES_DB.SALES_DATA.%emp_basic;
Once the transfer is done, you can now insert the data into the table:
copy into emp_basic
file_format = (type = csv field_optionally_enclosed_by='"')
pattern = '.*employees0[1-5].csv.gz'
on_error = 'skip_file';
Check out video: How to load data into Snowflake using SnowSQL
SnowSQL compresses the file and transfers the files to the cloud in parallel chunks, so this is the super optimal way to load bulk data into the cloud. Well, its good for smaller workloads too.
Enjoy great content like this and a lot more !
Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds