To check Uniqueness in Document ID

Debasis_Mallick · May 18, 2023, 5:44am

Hi Team,

I had around 24 million of docs generated in bucket to perform some stress test. So before testing I just want to confirm if the doc id (meta(t).id) is generated sequentially or not. Is there any such NQL query to find out that one.

Thanks,
Debasis

dh · May 18, 2023, 7:37am

META().id is the document key as it was inserted - it isn’t generated, it is user supplied. There is no serial identification of documents within the server - document identification is entirely down to what the user supplies as the key (and/or document content).

So you’d have to examine how the document keys were generated to identify them.

Typically a scheme including a key prefix would be used, for example "test_"||uuid() would generate unique keys all with the prefix “test_” for easy later identification. Or alternatively a field is sometimes added to the documents to group them instead.

If say uuid() alone was used to generate the document keys, then there is no way to distinguish them from others that also used uuid() for their keys.

HTH.

Debasis_Mallick · May 18, 2023, 9:16am

Actually I had used below pillowfight command to generate 24 million records . As per pillowfight documentation it should generate doc id sequentially i:e. a0000000000000000000 to a0000000000024000000. I just want to reconfirm this part so that we will perform some tests accordingly. Could you please help me how we can query to confirm this part.

date; /opt/couchbase/bin/cbc-pillowfight -U couchbase://xx.xx.xx.xx/data2 -u cbadmin -P cbadmin --min-size 1000 --max-size 1000 --json --set-pct 100 --batch-size 1 --num-items 100000000 --sequential --num-threads 1 --rate-limit 8000 --key-prefix a; date

Thanks,
Debasis

dh · May 18, 2023, 12:33pm

From the Server standpoint, pillowfight is just another client - so keys are still “user” determined.

The document key in the database is just a string. You can perform any operations on it you’d perform on other strings to filter.

e.g.

WHERE meta().id BETWEEN 'a0000000000000000000' AND 'a0000000000024000000'

or for more precision (this the above would include non-number trailing elements, e.g. ‘a0000000000000000000_2’):

WHERE meta().id BETWEEN 'a0000000000000000000' AND 'a0000000000024000000'
AND TONUMBER(SUBSTR(meta().id,LENGTH(meta().id)-8)) BETWEEN 0 AND 24000000

(Remember the more complex operations won’t push down to the index; the initial LIKE should. Also remember you’ll need a primary index for this.)

HTH.

dh · May 18, 2023, 2:47pm

This would be very slow but would identify any gaps in the range you’re looking at:

SELECT k                                                                                                                            
FROM ARRAY_RANGE(0,24000000) n
LET p = "00000000"||TO_STRING(n)
   ,k = 'a00000000000'||SUBSTR(p,LENGTH(p)-8)
WHERE NOT EXISTS (SELECT true FROM default USE KEYS[k])
;

If all you need to know is if there are gaps, not the missing keys themselves, then a COUNT would be more efficient.

HTH.

dh · May 18, 2023, 2:52pm

(Make sure the prefix has the correct number of leading zeros after the ‘a’)

mreiche · May 18, 2023, 3:11pm

“As per pillowfight documentation it should generate doc id sequentially i:e. a0000000000000000000 to a0000000000024000000”

system · August 16, 2023, 3:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inserting a new document with the document ID as the key Couchbase Server	7	3125	March 29, 2018
Code review request - creating unique id from documents totals Couchbase Server node	1	1285	October 10, 2020
Document ID uniqueness by bucket or cluster? Couchbase Server n1ql	5	1584	March 28, 2018
PHP SDK version 2 - how do you know what the next document number should be? SQL++	3	1546	April 4, 2016
Auto increment ID in couchbase lite replication Sync Gateway data_modelling , mobile , dot-net	5	4253	October 7, 2016

To check Uniqueness in Document ID

Related topics