BigQuery ROW_NUMBER Order by Error: Resources exceeded during query execution: The query could not be executed in allotted memory

Error Message: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 144% of limit. Top memory consumer(s): ORDER BY operations: 100%

Why does this happen?

Even though Big query is a distributed database and most queries are executed on multiple nodes, there are certain operations that requires all data to be processed on a single node due to calculation needs, which means that all the data needed for the calculation needs to exist on the same node. When your query requests for too much data to be processed on a single node that can no longer fit onto that node, you will get the “Resources exceeded during query execution” error and the whole query fails. You will not be billed for any failed queries in BigQuery

Which SQL Operations cause this?

  • Un-partitioned window functions like RANK() OVER() or ROW_NUMBER() OVER() will operate on a single node
  • If your query contains an ORDER BY clause, all the data is still going to be passed to a single node and then sorted

Solution:

If you are using a ROW_NUMBER() OVER() as id to generate a sequence number for a each row in your table. Use the below workaround.

Simply change from 

ROW_NUMBER() OVER(ORDER BY eventdate) AS STRING)

to 

ROW_NUMBER() OVER(PARTITION BY eventdate) AS STRING)

The above command generates a non-unique value, try the below commands instead if you want  a unique key

or 

CONCAT(CAST(ROW_NUMBER() OVER(PARTITION BY eventdate) AS STRING),'|',
(CAST(event_date AS STRING)) as id

or use

GENERATE_UUID()

nVector

posted on 15 Jan 20

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds