To check Uniqueness in Document ID

Hi Team,

I had around 24 million of docs generated in bucket to perform some stress test. So before testing I just want to confirm if the doc id (meta(t).id) is generated sequentially or not. Is there any such NQL query to find out that one.

Thanks,
Debasis

META().id is the document key as it was inserted - it isn’t generated, it is user supplied. There is no serial identification of documents within the server - document identification is entirely down to what the user supplies as the key (and/or document content).

So you’d have to examine how the document keys were generated to identify them.

Typically a scheme including a key prefix would be used, for example "test_"||uuid() would generate unique keys all with the prefix “test_” for easy later identification. Or alternatively a field is sometimes added to the documents to group them instead.

If say uuid() alone was used to generate the document keys, then there is no way to distinguish them from others that also used uuid() for their keys.

HTH.

Actually I had used below pillowfight command to generate 24 million records . As per pillowfight documentation it should generate doc id sequentially i:e. a0000000000000000000 to a0000000000024000000. I just want to reconfirm this part so that we will perform some tests accordingly. Could you please help me how we can query to confirm this part.

date; /opt/couchbase/bin/cbc-pillowfight -U couchbase://xx.xx.xx.xx/data2 -u cbadmin -P cbadmin --min-size 1000 --max-size 1000 --json --set-pct 100 --batch-size 1 --num-items 100000000 --sequential --num-threads 1 --rate-limit 8000 --key-prefix a; date

Thanks,
Debasis

From the Server standpoint, pillowfight is just another client - so keys are still “user” determined.

The document key in the database is just a string. You can perform any operations on it you’d perform on other strings to filter.

e.g.

WHERE meta().id BETWEEN 'a0000000000000000000' AND 'a0000000000024000000'

or for more precision (this the above would include non-number trailing elements, e.g. ‘a0000000000000000000_2’):

WHERE meta().id BETWEEN 'a0000000000000000000' AND 'a0000000000024000000'
AND TONUMBER(SUBSTR(meta().id,LENGTH(meta().id)-8)) BETWEEN 0 AND 24000000

(Remember the more complex operations won’t push down to the index; the initial LIKE should. Also remember you’ll need a primary index for this.)

HTH.

This would be very slow but would identify any gaps in the range you’re looking at:

SELECT k                                                                                                                            
FROM ARRAY_RANGE(0,24000000) n
LET p = "00000000"||TO_STRING(n)
   ,k = 'a00000000000'||SUBSTR(p,LENGTH(p)-8)
WHERE NOT EXISTS (SELECT true FROM default USE KEYS[k])
;

If all you need to know is if there are gaps, not the missing keys themselves, then a COUNT would be more efficient.

HTH.

1 Like

(Make sure the prefix has the correct number of leading zeros after the ‘a’)

“As per pillowfight documentation it should generate doc id sequentially i:e. a0000000000000000000 to a0000000000024000000”

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.