How to fetch Random Samples in Google BigQuery

RAND() function can be used to generate a (seemingly) random number. You can use it to fetch a sample of 10 words from the Shakespeare dataset using:

#standardSQL
SELECT word
FROM `publicdata.samples.shakespeare`
WHERE RAND() < 10/164656

or even better:

#standardSQL
SELECT word
FROM `publicdata.samples.shakespeare`
WHERE RAND() < 10/(SELECT COUNT(*) FROM `publicdata.samples.shakespeare`)

(where 10 is the approximate number of results I want to get, and 164656 the number of rows that table has)

Atori

posted on 03 Aug 18

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds