Sampling filtering with N1QL query

remy · March 21, 2022, 1:01pm

Hello,

What could be the best solution to make a “sampling” with N1QL query ?

The aim is the retry for example 10% or 20% (by random selection filterinf) of the result comple result. And so, reduce the size of the resulst data.

Regards.

Rémy

Kevin.Cherkauer · March 21, 2022, 6:06pm

@remy This does not answer your N1QL question but on a related note I thought it worth pointing out that if you are trying to get a “good sample” to do statistics on like sample variance and standard deviation, there is no need to select a sample that is a percentage of the total population. The confidence interval for variance and standard deviation estimates are not dependent on the population size, only on the sample size. In most cases sampling 1,000 items randomly gives sufficient confidence. So it does not matter if you have 10,000 items or 1,000,000,000 items, you only need to randomly sample a fixed number of them (e.g. 1,000) to do statistical analyses. (The idea that sample sizes must scale up with population size is a common misconception in the software world, I have seen.)

See this Wikipedia entry for the formula to compute sample variance – population size is nowhere in the formula:

vsr1 · March 21, 2022, 7:23pm

You can add random() function in predicate

remy · March 23, 2022, 2:19pm

Hello Kevin,

Thanks for your precisions !!! Indeed, it’s right, no need to take a percentage. Only determinate a fix number.

Rémy

Topic		Replies	Views
Where can I get documents for n1ql tutorial? SQL++	7	1901	February 19, 2017
N1QL Performances - Are 2,5K requests by second at 2ms each an expected average? SQL++ query , n1ql , sdk , dot-net	23	3909	May 25, 2017
How to do Pagination by using N1QL query? SQL++	7	8767	September 8, 2017
N1QL query performance with large dataset SQL++	8	5877	April 1, 2015
N1ql Query Inconsistent Results Node.js SDK n1ql	8	1406	May 28, 2018

Sampling filtering with N1QL query

Related topics