Parallel read from a bucket

EdgarLGB · March 21, 2019, 10:00am

I’m new to Couchbase and wondering if there is any manner to implement a parallel read from bucket. Given that, a bucket contains 1024 vbuckets by default. So could it be possible to split a N1QL query select * from bucket1 into several queries? What I mean is that one of those queries just reads data from a range of vbucket, e.g, from vbucket1 to vbucket100. Because the partition key is used to decide which node the value should be persisted. I think it could be possible to read a part of data from bucket according to a range of partition key. Could someone help me out of this? Thanks a lot.

vsr1 · March 21, 2019, 12:32pm

N1QL already does fetch from KV by using pool of parallel threads (2.5* Number of Cores) by all fetch requests. For CE the Number of cores limited is 4.
If really need much parallel get document keys using following N1QL and use streaming SDK asynchronous calls multiple threads to get documents.

SELECT RAW META().id FROM bucket1

As you know need any processing of data ,this avoids 2-hops of data (Data Node --> N1QL --> Client) , uses client machine resources and you can control the threads in the client

EdgarLGB · March 21, 2019, 12:54pm

Thanks for the response!

EdgarLGB · March 21, 2019, 2:55pm

Thanks for the update! What do you mean by 2-hops of data?

EdgarLGB · March 21, 2019, 3:28pm

In fact, I have another question. Would it be possible to do batch-wise read by splitting a bucket into severals? The idea is to divide the bucket into small batches of given size and then to read these small batches one by one. Thanks in advance for the reply.

vsr1 · March 21, 2019, 3:39pm

You can use OFFSET/LIMIT repeat statement next offset.
SELECT RAW META().id FROM bucket1 OFFSET 0 LIMIT 10000;

You can use range scan using LIKE change value every time based on your keys “a%”, “b%”, …“A%”…
SELECT RAW META().id FROM bucket1 META().id LIKE “a%”;

EdgarLGB · March 21, 2019, 3:43pm

Thanks a lot! And I’m wondering if I could firstly divide the bucket by batch size in bytes and then read them with “LIMIT”. Could you help me out of this?

vsr1 · March 21, 2019, 9:04pm

There are limit on 10 buckets. If you want divide you can divide by number of documents. Why can’t use LIMIT OFFSET run multiple queries with different offsets

EdgarLGB · March 22, 2019, 1:55pm

Thanks for the response!

Topic		Replies	Views
Batch reading data - controlling vbuckets Couchbase Server	2	770	June 28, 2018
Querying and Processing a large dataset SQL++ query , java , n1ql	6	4410	October 17, 2017
How to bulk read the data from couchbase in spark? Java SDK spark , n1ql	1	2171	December 10, 2018
How to run a distributed query and get different sections of the result to each executor Spark Connector	5	1901	February 12, 2019
Reads from multiple buckets? Couchbase Server	1	1968	September 24, 2013

Parallel read from a bucket

Related topics