Bigquery Error: Table too large for JOIN. Consider using JOIN EACH

Problem:

I'm trying to run a query in BigQuery which has joins and a Group By Clause, It throws the error:

Table too large for JOIN. Consider using JOIN EACH. For more details, 
please see https://developers.google.com/bigquery/docs/query-reference#joins

Below is the problematic SQL,

select
t1.device_uuid,
t1.session_uuid,
t1.nth,
t1.Diamonds_Launch,
t2.Diamonds_Close
from (
    select
    device_uuid,
    session_uuid,
    nth,
    sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Launch
    from [project_id].[table_id]
    where name = 'App Launch'
    and attributes.Name = 'Inventory - Diamonds'
    group by device_uuid, session_uuid, nth
    ) as t1
join each (
    select
    device_uuid,
    session_uuid,
    nth,
    sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Close
    from [project_id].[table_id]
    where name = 'App Close'
    and attributes.Name = 'Inventory - Diamonds'
    group by device_uuid, session_uuid, nth
    ) as t2
on t1.device_uuid = t2.device_uuid
and t1.session_uuid = t2.session_uuid

Solution:

The Quey has got a GROUP BY inside a JOIN EACH. "GROUP BY" hits limits with cardinality (the number of distinct values) and the final grouping is not parallelizable. This limits BigQuery's ability to do the join.

If you change the GROUP BY to GROUP EACH BY, this will most likely work

dan-irving

posted on 10 Sep 18

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds