How to determine bucket document types and count accurately

FarisAhmed · December 10, 2021, 4:19pm

Hi,

We are running couchbase server 6.6.2 enterprise edition together with sync gateway 2.8 and CBL. All our data resides in our main bucket. The sg is the only service that CREATES documents in the main bucket. Our back end services only change data in the main bucket.

At the moment our main bucket contains 10 million documents. Only 3 million documents are created by our CBL clients. We don’t know what the rest of the documents are

How can we:

Identify all types of documents we have in the main bucket accurately?
Determine the number of documents for each type accurately?

The “Data Insights” right pane in the query editor shows incorrect values because it is based on 1000 document sample. How can we accurately determine the two numbers above?

Regards,
Faris Ahmed

dh · December 10, 2021, 4:44pm

Whilst it is what the “data insights” uses under the covers, you may still want to look into INFER and set the sample size to something larger, however not 7 million which shouldn’t be statistically necessary (and would probably fail due to memory requirements).

But this isn’t 100% accurate - it is statistical sampling.

The only way to test/qualify every single document is to examine every single document, typically via queries. This would then come down to what you know of the documents - for example, if you know a field that only CBL clients would set, you can filter on it etc. to produce counts. If you simply have no knowledge of what fields are present in the documents then functions like OBJECT_NAMES would likely come in handy in initial queries to perhaps group/count documents based on what fields are present.

(Perhaps something like:

select f, count(1) cnt from (select distinct object_names(t) f from `travel-sample` t limit 1000) flds group by f

would give initial flavours; vary the limit once it has been deemed to be useful. If you have a “type” field of course, just use that to identify document types - similarly if the document keys carry type-identifying information, you can query/group/count etc. on that embedded information too.)

HTH.

Topic		Replies	Views
How to query count of specific docType which have in my bucket , Since my bucket has relatively high number of documents (25 GB) , its loading for a long time Couchbase Server query , n1ql , index	4	369	October 2, 2024
Identify the size of an individual document in a bucket Couchbase Server n1ql	2	1166	November 24, 2020
Get total document count per collection Couchbase Server java	0	549	January 30, 2022
Total number of documents can be counted in sync function or server side? Sync Gateway	2	1464	August 1, 2016
View Index on all bucket documents => bad idea? Couchbase Server	0	1764	August 26, 2014

How to determine bucket document types and count accurately

Related topics