Sampling filtering with N1QL query

Hello,

What could be the best solution to make a “sampling” with N1QL query ?

The aim is the retry for example 10% or 20% (by random selection filterinf) of the result comple result. And so, reduce the size of the resulst data.

Regards.

Rémy

@remy This does not answer your N1QL question but on a related note I thought it worth pointing out that if you are trying to get a “good sample” to do statistics on like sample variance and standard deviation, there is no need to select a sample that is a percentage of the total population. The confidence interval for variance and standard deviation estimates are not dependent on the population size, only on the sample size. In most cases sampling 1,000 items randomly gives sufficient confidence. So it does not matter if you have 10,000 items or 1,000,000,000 items, you only need to randomly sample a fixed number of them (e.g. 1,000) to do statistical analyses. (The idea that sample sizes must scale up with population size is a common misconception in the software world, I have seen.)

See this Wikipedia entry for the formula to compute sample variance – population size is nowhere in the formula:

You can add random() function in predicate

Hello Kevin,

Thanks for your precisions !!! Indeed, it’s right, no need to take a percentage. Only determinate a fix number.

Rémy