Document details in a particular vbucket

lakshram24 · August 27, 2024, 12:00pm

Hi team ,

Is there a way to find document Ids in a particular vbucket ?

Thank you

mreiche · August 27, 2024, 5:43pm

I cannot think of a simple way to get the documentIds of all documents in a vbucket. What do you need them for?

If you had the documentId, you can compute which vbucket (partition) it is in via this calculation. Most clusters have 1024 partitions, clusters on MacOS have 64 partitions.

  public static int partitionForKey(final byte[] id, final int numPartitions) {
    CRC32 crc32 = new CRC32();
    crc32.update(id, 0, id.length);
    long rv = (crc32.getValue() >> 16) & 0x7fff;
    return (int) rv &numPartitions - 1;
  }

You can get all the documentIds in a collection using collection.scan().
If you want to write code, you could copy the implementation in RangeScanOrchestrator, and have streamForPartitions() only do a specific partition (vbucket).

github.com

couchbase/couchbase-jvm-clients/blob/master/core-io/src/main/java/com/couchbase/client/core/kv/RangeScanOrchestrator.java

/*
 * Copyright (c) 2022 Couchbase, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.couchbase.client.core.kv;

import com.couchbase.client.core.Core;
import com.couchbase.client.core.Reactor;

This file has been truncated. show original

lakshram24 · August 28, 2024, 4:45am

Many thanks for your reply @mreiche and details , much appreciated .
I am looking for what are all document IDs present in a particular vbucket , do we have rest API call or query to know these details, Kindly share.
Using below query I can find where a particular document is storing in which vucket.
/opt/couchbase/bin/cbc-hash documentID -U couchbase://localhost/test -u Administrator -P password

dh · August 28, 2024, 8:34am

You could use SQL++ to extract all the IDs in a bucket (or collection) and use a script and cbc-hash to calculate the v-buckets. Obviously this would really only suit a one-off type task, not integration into anything that is to be run frequently. [Edit: Collections of course share the v-bucket map with the bucket so cbc-hash doesn’t need to distinguish between collections and the containing bucket.]

( Ref: https://docs.couchbase.com/server/current/cli/cbstats/cbstats-key.html#find-vbucket-ids )

e.g.

$ /usr/bin/curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select meta().id from `travel-sample`'\
|jq -r .results[].id\
|xargs  cbc-hash -U couchbase://localhost/travel-sample -u Administrator -P password 2>&1\
|sed 's/[:=,]/ /g'\
|awk '{print $1" "$3}'\
|head
airline_10 361
airline_10123 747
airline_10226 199
airline_10642 761
airline_10748 1006
airline_10765 873
airline_109 440
airline_112 881
airline_1191 867
airline_1203 745

(You can then of course sort/filter based on the second column as suits your needs.)

HTH.

lakshram24 · August 28, 2024, 9:24am

Thank you @dh for your time and details . I am looking for the information , suppose 1024 vbuckets are holding 2048 documents in a bucket hence each vbucket may hold 2 documents as per CRC hash algorithm. Is there a way to list the document IDs for a particular Vbucket number consider it is 999 , how many documents present in it and its documents IDs.

dh · August 28, 2024, 9:34am

You can use the same script - instead of piping to head, pipe to grep " 999$" to list keys only in v-bucket 999, or you could be more elaborate with the prior processing, e.g.

/usr/bin/curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select meta().id from `travel-sample`'\
|jq -r .results[].id\
|xargs  cbc-hash -U couchbase://localhost/travel-sample -u Administrator -P password 2>&1\
|sed 's/[:=,]/ /g'\
|awk '$3==999{print $1;t+=1}END{print "Total keys: "t}'
airport_9399
landmark_28966
landmark_40011
landmark_40181
route_11211
route_11563
route_12030
route_12742
route_13934
route_14290
route_14472
route_21353
route_21421
route_24530
route_27681
route_27711
route_29136
route_30428
route_35539
route_36688
route_43364
route_43416
route_45054
route_45726
route_4756
route_48452
route_55959
route_5920
route_59329
route_7205
route_7395
route_9350
route_9422
Total keys: 33

HTH.

dh · August 28, 2024, 9:46am

Here’s an alternative that lists all details in one go rather than having to run for each v-bucket:

curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select meta().id from `travel-sample`'\
|jq -r .results[].id\
|xargs  cbc-hash -U couchbase://localhost/travel-sample -u Administrator -P password 2>&1\
|sed 's/[:=,]/ /g'\
|sort -nk3\
|awk 'BEGIN{v=-1}{if ($3!=v) {if (c>0) print "Total: "c"\n"; v=$3; c=0; print "v-bucket: "v}; print "\t"$1;c++}END{if (c>0) print "Total: "c"\n"}'

(Sorry don’t know what’s going on with the mark-up not displaying the sed correctly but up to the sort, everything is the same as before.)

graham.pople · August 28, 2024, 10:02am

Hi @lakshram24 out of interest why are you looking for this? Perhaps if we know your use-case better we can suggest alternative approaches.

lakshram24 · August 29, 2024, 9:11am

Appreciate @graham.pople your help on this . I have a node fail over issue due to some hardware issue, I am curious to know what are all document ID’s present on node1 vuckets and node2 vbuckets. If you have any such troubleshooting queries links Kindly share.

graham.pople · August 29, 2024, 9:44am

@lakshram24 I see. Then I don’t think I have anything to offer beyond what my colleagues have already provided above, sorry.

lakshram24 · August 29, 2024, 10:30am

Thank you @graham.pople , your query is very helpful and i am able to know the document details in a particular vbucket. Much appreciated for your help.

dh · August 31, 2024, 9:30am

@lakshram24 I did just think of another “more” SQL approach, but it identifies v-buckets by UUID and not “number”.

cbq> SELECT meta().xattrs.`$document`.vbucket_uuid
   2       ,array_agg(meta().id) ids
   3 FROM `travel-sample`
   4 GROUP BY meta().xattrs.`$document`.vbucket_uuid
   5 ;
{
    "requestID": "e1e0f1b3-734e-48dd-9316-7ab5c7ae94a5",
    "signature": {
        "vbucket_uuid": "json",
        "ids": "array"
    },
    "results": [
    {
        "vbucket_uuid": "0x00003c56e7017fb6",
        "ids": [
            "airport_3542",
            "airport_8274",
            "airport_8496",
            "airport_8506",
            "landmark_16198",
            "landmark_25419",
            "landmark_32052",
            "landmark_37143",
            "route_126",
            "route_32674",
            "route_3369",
...

You can use something like:

cbstats localhost:11210 -u Administrator -p password\
 -b travel-sample vbucket-details|\
awk '/uuid/{printf("%s 0x%016x\n",gensub(":.*","",1,$1),$2)}'

to get a list of vbuckets and their corresponding UUIDs to match with the query results.

HTH.

lakshram24 · August 31, 2024, 10:02am

Thank you @dh for another alterative for this , I am getting error while executing this .

cbq> \connect

SELECT meta().xattrs.$document.vbucket_uuid,array_agg(meta().id) ids FROM travel-sample GROUP BY meta().xattrs.$document.vbucket_uuid ;
ERROR 138 : Too many input arguments to command.

dh · August 31, 2024, 3:51pm

What’s your version? I quoted the execution for 7.6.

And here’s it for (out of support) 6.6.5:

cbq> SELECT meta().xattrs.`$document`.vbucket_uuid, array_agg(meta().id) ids FROM `travel-sample` GROUP BY meta().xattrs.`$document`.vbucket_uuid limit 1;
{
    "requestID": "b3d9d21e-4d12-4304-a502-120014b12382",
    "signature": {
        "ids": "array",
        "vbucket_uuid": "json"
    },
    "results": [
    {
        "ids": [
            "airport_7579",
            "landmark_22793",
            "landmark_7049",
            "route_1062",
            "route_12785",
            "route_14257",
            "route_14525",
            "route_1710",
            "route_21204",
            "route_21394",
            "route_21576",
            "route_24467",
            "route_27134",
            "route_28965",
            "route_29061",
            "route_4173",
            "route_43541",
            "route_4601",
            "route_4791",
            "route_51769",
            "route_54678",
            "route_57459",
            "route_5877",
            "route_62378",
            "route_64048",
            "route_9207",
            "route_9397",
            "route_9575"
        ],
        "vbucket_uuid": "0x0000bda53cc4d510"
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "1.808505262s",
        "executionTime": "1.80829058s",
        "resultCount": 1,
        "resultSize": 832
    }
}

dh · August 31, 2024, 3:56pm

Or if you’re not getting on with cbq, you could use the REST API:

$ curl -su Administrator:password http://localhost:8093/query/service -d 'pretty=true&statement=SELECT meta().xattrs.`$document`.vbucket_uuid, array_agg(meta().id) ids FROM `travel-sample` GROUP BY meta().xattrs.`$document`.vbucket_uuid limit 1'
{
    "requestID": "478d0eae-2042-44c9-a218-b041fb137f86",
    "signature": {
        "ids": "array",
        "vbucket_uuid": "json"
    },
    "results": [
    {
        "ids": [
            "airline_210",
            "airport_8648",
            "hotel_16648",
            "landmark_16434",
            "landmark_20134",
...

HTH.

lakshram24 · September 1, 2024, 4:35am

wonderful @dh , it is working now , Thank you so much.

system · November 30, 2024, 4:36am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Retrieve all bucket documents from Couchbase server C SDK	9	5291	September 1, 2015
Is document ID searching broken in 2.2 Community? Couchbase Server	0	1605	February 28, 2014
Couchbase view not retrieving all infomation Couchbase Server	3	1881	July 25, 2013
Bulkget for 8000+ records with document id Java SDK	1	1692	January 27, 2017
Get most recent document name or ID PHP SDK	2	2389	March 6, 2015

Document details in a particular vbucket

Related topics