Hi,
From CB7.1 doc, in order to use CBO, we need to run “update statistics” to gather distribution statistics for the expressions used in the query. My questions are:
-
After the initial run of “update statistics”, do we need to schedule a job to run it to refresh statistics or CB cluster automatically refresh statistics if it’s staled?
-
Is the statistics stored in system-like collection N1QL_CBO_STATS purged regularly?
Thanks,
Matthew
Hi,
Thank you for your quick response.
Does update statistics command have the parameter to only refresh stale documents or it always update stats on all documents in bucket?
At this moment, our production only uses default scope and collection. It has about 20B documents. I am afraid of the time will be very long for entire statistics update.
Thanks,
Matthew
Currently UPDATE STATISTICS does not have any way of differentiate “stale” vs “non-stale” bucket/collection. An UPDATE STATISTICS command currently always gather new optimizer statistics (from the entire bucket/collection). There is currently no way to only incrementally update optimizer statistics with newer documents. This aspect is also under evaluation for future enhancement.
In terms of the size of a bucket/collection, that’s not directly related to how long an UPDATE STATISTICS command will run. UPDATE STATISTICS uses random sampling to gather optimizer statistics, it does not need to scan the entire bucket/collection. Therefore even for large bucket/collection UPDATE STATISTICS should not take a very long time.
Hi,
Based on this thread How to Disable Cost Based Optimizer via REST API cmd
I can disable CBO using a command. do I need to run this command on every query node?
Thanks,
Matthew
The curl command given in that thread is a cluster-level setting, i.e. it affects the entire cluster. You don’t need to run it for each query node.
For more information on settings see:
1 Like