05 Jun 20 · npack · #Bigquery ·   Bookmark   ×

How to find DISTINCT values in a ARRAY in Bigquery

Often, we have array datatypes and want to remove the duplicates (often called as dedup) and find out the unique values in an array in Bigquery, like shown in the example below:

Solution:

Much cleaner way is to use SQL UDF to encapsulate dedup logic as in below example and reuse it whenever needed:

#Declare the function once
#standardSQL
CREATE TEMP FUNCTION dedup(val ANY TYPE) AS ((
SELECT ARRAY_AGG(t)
FROM (SELECT DISTINCT * FROM UNNEST(val) v) t
));

And call the function for all the array columns that you want to perform dedup:

SELECT * REPLACE(
dedup(country) AS country,
dedup(product) AS product
)
FROM `project.dataset.table`

npack

posted on 05 Jun 20

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds