Is CB cluster automatically update statistics?

Van2008 · April 17, 2023, 10:01pm

Hi,

From CB7.1 doc, in order to use CBO, we need to run “update statistics” to gather distribution statistics for the expressions used in the query. My questions are:

After the initial run of “update statistics”, do we need to schedule a job to run it to refresh statistics or CB cluster automatically refresh statistics if it’s staled?
Is the statistics stored in system-like collection N1QL_CBO_STATS purged regularly?

Thanks,

Matthew

bingjie.miao · April 17, 2023, 11:24pm

Currently optimizer statistics are gathered via UPDATE STATISTICS/ANALYZE command issued by the user, thus user needs to manually refresh optimizer statistics with UPDATE STATISTICS/ANALYZE command. The frequency of this refresh will depend on the changes of the bucket/collection since the last time UPDATE STATISTICS/ANALYZE is run. Active bucket/collection (with lots of mutations) will require more frequent refresh than a more stale bucket/collection (few mutations). Future enhancement is being evaluated to make this process more automatic.
Optimizer statistics stored in N1QL_CBO_STATS is purged only when a bucket/collection is being dropped, or when an index is being dropped. It is also overwritten by newer optimizer statistics collected with new UPDATE STATISTICS/ANALYZE commands. Use can also manually purge optimizer statistics by using the DELETE option for UPDATE STATISTICS/ANALYZE.

Van2008 · April 17, 2023, 11:56pm

Hi,

Thank you for your quick response.

Does update statistics command have the parameter to only refresh stale documents or it always update stats on all documents in bucket?

At this moment, our production only uses default scope and collection. It has about 20B documents. I am afraid of the time will be very long for entire statistics update.

Thanks,

Matthew

bingjie.miao · April 18, 2023, 12:15am

Currently UPDATE STATISTICS does not have any way of differentiate “stale” vs “non-stale” bucket/collection. An UPDATE STATISTICS command currently always gather new optimizer statistics (from the entire bucket/collection). There is currently no way to only incrementally update optimizer statistics with newer documents. This aspect is also under evaluation for future enhancement.
In terms of the size of a bucket/collection, that’s not directly related to how long an UPDATE STATISTICS command will run. UPDATE STATISTICS uses random sampling to gather optimizer statistics, it does not need to scan the entire bucket/collection. Therefore even for large bucket/collection UPDATE STATISTICS should not take a very long time.

Van2008 · April 21, 2023, 8:44pm

Hi,

Based on this thread How to Disable Cost Based Optimizer via REST API cmd

I can disable CBO using a command. do I need to run this command on every query node?

Thanks,

Matthew

bingjie.miao · April 21, 2023, 8:55pm

The curl command given in that thread is a cluster-level setting, i.e. it affects the entire cluster. You don’t need to run it for each query node.
For more information on settings see:

system · July 20, 2023, 8:55pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Gathering Stats In Couchbase Couchbase Server	4	311	March 4, 2024
Saving cluster statistics through an upgrade Couchbase Server	0	1627	October 24, 2013
Best scenarios where CBO will work efficiently Couchbase Server couchbase-capella	3	292	June 27, 2024
View are inconsistent very long after data creation Couchbase Server	5	2021	January 7, 2016
View auto update semantics Couchbase Server	8	3328	April 9, 2015

Is CB cluster automatically update statistics?

Related topics