Document details in a particular vbucket

Hi team ,

Is there a way to find document Ids in a particular vbucket ?

Thank you

I cannot think of a simple way to get the documentIds of all documents in a vbucket. What do you need them for?

If you had the documentId, you can compute which vbucket (partition) it is in via this calculation. Most clusters have 1024 partitions, clusters on MacOS have 64 partitions.

  public static int partitionForKey(final byte[] id, final int numPartitions) {
    CRC32 crc32 = new CRC32();
    crc32.update(id, 0, id.length);
    long rv = (crc32.getValue() >> 16) & 0x7fff;
    return (int) rv &numPartitions - 1;
  }

You can get all the documentIds in a collection using collection.scan().
If you want to write code, you could copy the implementation in RangeScanOrchestrator, and have streamForPartitions() only do a specific partition (vbucket).

1 Like

Many thanks for your reply @mreiche and details , much appreciated .
I am looking for what are all document IDs present in a particular vbucket , do we have rest API call or query to know these details, Kindly share.
Using below query I can find where a particular document is storing in which vucket.
/opt/couchbase/bin/cbc-hash documentID -U couchbase://localhost/test -u Administrator -P password

You could use SQL++ to extract all the IDs in a bucket (or collection) and use a script and cbc-hash to calculate the v-buckets. Obviously this would really only suit a one-off type task, not integration into anything that is to be run frequently. [Edit: Collections of course share the v-bucket map with the bucket so cbc-hash doesn’t need to distinguish between collections and the containing bucket.]

( Ref: https://docs.couchbase.com/server/current/cli/cbstats/cbstats-key.html#find-vbucket-ids )

e.g.

$ /usr/bin/curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select meta().id from `travel-sample`'\
|jq -r .results[].id\
|xargs  cbc-hash -U couchbase://localhost/travel-sample -u Administrator -P password 2>&1\
|sed 's/[:=,]/ /g'\
|awk '{print $1" "$3}'\
|head
airline_10 361
airline_10123 747
airline_10226 199
airline_10642 761
airline_10748 1006
airline_10765 873
airline_109 440
airline_112 881
airline_1191 867
airline_1203 745

(You can then of course sort/filter based on the second column as suits your needs.)

HTH.

2 Likes

Thank you @dh for your time and details . I am looking for the information , suppose 1024 vbuckets are holding 2048 documents in a bucket hence each vbucket may hold 2 documents as per CRC hash algorithm. Is there a way to list the document IDs for a particular Vbucket number consider it is 999 , how many documents present in it and its documents IDs.

You can use the same script - instead of piping to head, pipe to grep " 999$" to list keys only in v-bucket 999, or you could be more elaborate with the prior processing, e.g.

/usr/bin/curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select meta().id from `travel-sample`'\
|jq -r .results[].id\
|xargs  cbc-hash -U couchbase://localhost/travel-sample -u Administrator -P password 2>&1\
|sed 's/[:=,]/ /g'\
|awk '$3==999{print $1;t+=1}END{print "Total keys: "t}'
airport_9399
landmark_28966
landmark_40011
landmark_40181
route_11211
route_11563
route_12030
route_12742
route_13934
route_14290
route_14472
route_21353
route_21421
route_24530
route_27681
route_27711
route_29136
route_30428
route_35539
route_36688
route_43364
route_43416
route_45054
route_45726
route_4756
route_48452
route_55959
route_5920
route_59329
route_7205
route_7395
route_9350
route_9422
Total keys: 33

HTH.

Here’s an alternative that lists all details in one go rather than having to run for each v-bucket:

curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select meta().id from `travel-sample`'\
|jq -r .results[].id\
|xargs  cbc-hash -U couchbase://localhost/travel-sample -u Administrator -P password 2>&1\
|sed 's/[:=,]/ /g'\
|sort -nk3\
|awk 'BEGIN{v=-1}{if ($3!=v) {if (c>0) print "Total: "c"\n"; v=$3; c=0; print "v-bucket: "v}; print "\t"$1;c++}END{if (c>0) print "Total: "c"\n"}'

(Sorry don’t know what’s going on with the mark-up not displaying the sed correctly but up to the sort, everything is the same as before.)

1 Like

Hi @lakshram24 out of interest why are you looking for this? Perhaps if we know your use-case better we can suggest alternative approaches.

1 Like

Appreciate @graham.pople your help on this . I have a node fail over issue due to some hardware issue, I am curious to know what are all document ID’s present on node1 vuckets and node2 vbuckets. If you have any such troubleshooting queries links Kindly share.

@lakshram24 I see. Then I don’t think I have anything to offer beyond what my colleagues have already provided above, sorry.

1 Like

Thank you @graham.pople , your query is very helpful and i am able to know the document details in a particular vbucket. Much appreciated for your help.

@lakshram24 I did just think of another “more” SQL approach, but it identifies v-buckets by UUID and not “number”.

cbq> SELECT meta().xattrs.`$document`.vbucket_uuid
   2       ,array_agg(meta().id) ids
   3 FROM `travel-sample`
   4 GROUP BY meta().xattrs.`$document`.vbucket_uuid
   5 ;
{
    "requestID": "e1e0f1b3-734e-48dd-9316-7ab5c7ae94a5",
    "signature": {
        "vbucket_uuid": "json",
        "ids": "array"
    },
    "results": [
    {
        "vbucket_uuid": "0x00003c56e7017fb6",
        "ids": [
            "airport_3542",
            "airport_8274",
            "airport_8496",
            "airport_8506",
            "landmark_16198",
            "landmark_25419",
            "landmark_32052",
            "landmark_37143",
            "route_126",
            "route_32674",
            "route_3369",
...

You can use something like:

cbstats localhost:11210 -u Administrator -p password\
 -b travel-sample vbucket-details|\
awk '/uuid/{printf("%s 0x%016x\n",gensub(":.*","",1,$1),$2)}'

to get a list of vbuckets and their corresponding UUIDs to match with the query results.

HTH.

1 Like

Thank you @dh for another alterative for this , I am getting error while executing this .

cbq> \connect

SELECT meta().xattrs.$document.vbucket_uuid,array_agg(meta().id) ids FROM travel-sample GROUP BY meta().xattrs.$document.vbucket_uuid ;
ERROR 138 : Too many input arguments to command.

What’s your version? I quoted the execution for 7.6.

And here’s it for (out of support) 6.6.5:

cbq> SELECT meta().xattrs.`$document`.vbucket_uuid, array_agg(meta().id) ids FROM `travel-sample` GROUP BY meta().xattrs.`$document`.vbucket_uuid limit 1;
{
    "requestID": "b3d9d21e-4d12-4304-a502-120014b12382",
    "signature": {
        "ids": "array",
        "vbucket_uuid": "json"
    },
    "results": [
    {
        "ids": [
            "airport_7579",
            "landmark_22793",
            "landmark_7049",
            "route_1062",
            "route_12785",
            "route_14257",
            "route_14525",
            "route_1710",
            "route_21204",
            "route_21394",
            "route_21576",
            "route_24467",
            "route_27134",
            "route_28965",
            "route_29061",
            "route_4173",
            "route_43541",
            "route_4601",
            "route_4791",
            "route_51769",
            "route_54678",
            "route_57459",
            "route_5877",
            "route_62378",
            "route_64048",
            "route_9207",
            "route_9397",
            "route_9575"
        ],
        "vbucket_uuid": "0x0000bda53cc4d510"
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "1.808505262s",
        "executionTime": "1.80829058s",
        "resultCount": 1,
        "resultSize": 832
    }
}

Or if you’re not getting on with cbq, you could use the REST API:

$ curl -su Administrator:password http://localhost:8093/query/service -d 'pretty=true&statement=SELECT meta().xattrs.`$document`.vbucket_uuid, array_agg(meta().id) ids FROM `travel-sample` GROUP BY meta().xattrs.`$document`.vbucket_uuid limit 1'
{
    "requestID": "478d0eae-2042-44c9-a218-b041fb137f86",
    "signature": {
        "ids": "array",
        "vbucket_uuid": "json"
    },
    "results": [
    {
        "ids": [
            "airline_210",
            "airport_8648",
            "hotel_16648",
            "landmark_16434",
            "landmark_20134",
...

HTH.

1 Like

wonderful @dh , it is working now , Thank you so much.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.