How do I get all keys from the bucket?

ingenthr · March 22, 2010, 12:05am

While not fully supported, owing to the fact that you can make requests which would cause lots of memory usage or resource consumption if you’re not careful, it sounds like what you want is TAP. See:
http://www.couchbase.com/wiki/display/couchbase/TAP+Protocol

Note that the Java client (couchbase.com/develop/java/current) has TAP implemented. Use it with caution and at your own risk. As long as you stay away from checkpoint and registration, it will generally be okay but can cause quite a bit of disk IO if you have much more data on disk than in memory.

Updated Sept. 2016: TAP has been updated some time ago (around Couchbase server 3.0) with a new protocol called DCP. The Go client now has DCP implemented as an unsupported/uncommitted feature. There is also a new Java DCP client as a separate library, also unsupported/uncommitted.

a_pintori · March 22, 2012, 12:04am

Hello,

I’ve a question about Couchbase 2.0.
I’m Using Membase 1.7 and I need to retrieve all keys from the bucket.
I’ve read that Couchbase 2.0 adds query support.
With Couchbase 2.0 it will be possible to query the bucket and retrieve all keys and/or values? How?

Thanks.

a_pintori · March 23, 2012, 12:01am

I’ve downloaded jtap (https://github.com/mikewied/jtap) and I’ve compiled this example to retrieve all keys:

— TapRunner.java —
import com.membase.jtap.;
import com.membase.jtap.exporter.;
import com.membase.jtap.ops.*;
public class TapRunner
{
public static void main(String args[])
{
TapStreamClient client = new TapStreamClient(“localhost”, 11210, “default”, null);
Exporter exporter = new FileExporter(“results.txt”);
CustomStream tapListener = new CustomStream(exporter, “node1”);
tapListener.keysOnly();
tapListener.doDump();
client.start(tapListener);
}
}
— TapRunner.java —
I can compile it with no errors with

javac -cp .:jtap.jar TapRunner.java

But when I run it I receive these errors:

java -cp .:jtap.jar TapRunner

Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at com.membase.jtap.TapStreamClient.(Unknown Source)
at TapRunner.main(TapRunner.java:11)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336)
… 2 more

Where is the problem?
Is this the only way to retrieve all keys with TAP protocol, in a txt file?
Can’t I iterate all the keys in other ways?

Thanks!

ingenthr · March 23, 2012, 12:09am

Sorry for the confusion, but definitely do not use JTap. All of that functionality has been added to the Couchbase Java Client at couchbase.com/develop/java/current

Again, fair warning, use this at your own risk (have a look at that wiki page).

ingenthr · August 26, 2012, 11:08pm

No, it’s not currently available in the .NET client library and it’s only experimental in the Java client library.

pengu1n · August 26, 2012, 11:08pm

Is it possible to iterate through all the key/value pairs using the .NET api?

Thanks

adavidson · September 10, 2012, 11:04pm

Wouldn’t it be easiest to just add a view with the following map?

function (doc, meta) {
emit(meta.id, null);
}

It does add the overhead of an index, but then you can just use the regular query() calls to get all keys.

dipti · September 11, 2012, 11:11pm

Yes, create a primary index as adavidson pointed out.
function (doc, meta) {
emit(meta.id, null);
}
this will give you the ability to get all the doc IDs back or search over a range etc. Then get the documents back using the GET api or using mget. That’s the most performant way.
One thing to remember though is that this will give you ONLY the persisted indexed documents. Given Couchbase’s asynchronous architecture, there may be addition documents in the managed cache that haven’t been persisted yet.
You can also use “limit” and “skip” to step through the result set.

http://127.0.0.1:8092/beer-sample/_design/dev_primary_key/view/primary…
http://127.0.0.1:8092/beer-sample/_design/dev_primary_key/view/primary…
{“total_rows”:7315,“rows”:[
{“id”:“110f033e61”,“key”:“110f033e61”,“value”:null},
{“id”:“110f03499b”,“key”:“110f03499b”,“value”:null},
{“id”:“110f035200”,“key”:“110f035200”,“value”:null},
{“id”:“110f035db2”,“key”:“110f035db2”,“value”:null},
{“id”:“110f035e84”,“key”:“110f035e84”,“value”:null},
{“id”:“110f03622c”,“key”:“110f03622c”,“value”:null},
{“id”:“110f036718”,“key”:“110f036718”,“value”:null},
…
…
…
…
hope this helps

markvincze · March 23, 2017, 3:04pm

@dipti,
I tried to create a view with the following definition:

function (doc, meta) {
  emit(meta.id, null);
}

And I’m trying to query (I tried both simply by calling the view URL and with the .NET SDK).

My bucket contains ~17 million documents. If I’m requesting a range of documents at the beginning of the bucket, for example &limit=1000&skip=1000, it returns quickly, under 200ms. If I request 1000 documents starting at 500.000 (&limit=500000&skip=1000), it’s way slower, takes over 5 seconds. I I request docs starting at 10.000.000 (&limit=10000000&skip=1000), then it takes 2 and a half minutes.

Is there a way to speed it up? Just by creating this view, was the “primary index” automatically created, or do I have to explicitly create it?

markvincze · March 24, 2017, 9:36am

I worked around this by simply not doing my processing in batches, but just download the whole json without any skip or limit, and then loading the whole thing in memory at the beginning of my script.

nick-couchbase · March 24, 2017, 10:08am

I have a similar question.

Using the java api I often want to perform operations of large sets of data. I obviously don’t want to load everything in to memory at once. What i think I want to do is load a batch of keys, perform the data changes, then more onto the next batch of keys. Is there a recommended way to do this/

avsej · March 24, 2017, 11:17am

You can use Java DCP client to iterate the bucket contents consistently. It does not accumulate objects into batches, but it could be done on the user side.

You can start with

github.com

couchbase/java-dcp-client/blob/master/src/test/java/examples/AirportsInFrance.java

/*
 * Copyright (c) 2016 Couchbase, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package examples;

import java.util.concurrent.atomic.AtomicInteger;

import com.couchbase.client.core.event.CouchbaseEvent;

This file has been truncated. show original

And then look into flow control settings

github.com

couchbase/java-dcp-client/blob/master/src/test/java/examples/FlowControl.java

/*
 * Copyright (c) 2016 Couchbase, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package examples;

import java.util.concurrent.atomic.AtomicLong;

import com.couchbase.client.dcp.Client;

This file has been truncated. show original

If you will accumulate data in your application, and do not send back acknowledgement, the cluster will pause transmission until your application will crunch the data and release the objects.

Topic		Replies	Views
Retrieving list of all keys from bucket ( Couchbase Java SDK 1.4) Java SDK java	3	4024	June 9, 2016
Fetching keys across all docs in a bucket from the CLI Couchbase Server	7	528	August 24, 2023
Retrieve all bucket documents from Couchbase server C SDK	9	5287	September 1, 2015
How to create a list of all the keys in a Couchbase cluster? SQL++ query	1	1309	July 22, 2019
Key dump - buckets 101 PHP SDK	2	2760	March 24, 2014

How do I get all keys from the bucket?

javac -cp .:jtap.jar TapRunner.java

java -cp .:jtap.jar TapRunner

Related topics