28 May 2019 · victor ·       Add to Favorites   Report

Whats the best way of copying data from hive table to Bigquery

Hive tables runs on ORC format. But in Big query ORC is not supported. So you have to convert your ORC data into one of the 3 formats (Avro, JSON, CSV). I would personally prefer Avro because this serialization is more robust than JSON or CSV.

So the process to follow is:

  • Create your BQ table with the correct data types (need to be done as first step, to ensure proper cast with some Avro logical types like Timestamp)
  • Launch a Hive query to generate the data in a Avro format
  • disctp to Google Cloud Storage
  • "bq load" into your table
  • Check that you haven't done any mistake by comparing that the tables on both Hive and BigQuery have the same data

victor

posted on 28 May 2019

Read great educational content like this and a lot more !

Members get free exclusive access to content, new courses, and discounts. Signup for a free account to write a post / comment / upvote posts. Creating an account takes less than 5 seconds and you can start earning badges & points too

Copied