Reading a subset of documents using Couchbase View

nagaraj.irock · March 1, 2016, 8:21am

Hi. I have around 25M documents in my cluster. I need to read 1M documents at a time without any specific criterion. I don’t have the access to the keys. So I need to create a view which will emit documents till I reach a counter which goes up to 1M.

I have written a Map function inside which I am trying to create a static variable, but JS doesn’t support static variables. I am not sure how to do this operation. The map function which I have written is just to return 1000 documents and it is full of errors. Can someone help me with this functionality?

function (doc, meta) {
  value = foo();
  if(value < 1000)
  {
    emit(meta.id, null);
  }else{
    return;       
  }
}

function incrementor(){
  if(typeof incrementor.counter == 'undefined'){
       incrementor.counter = 0; 
  }
  
  return ++incrementor.counter;
}

daschl · March 1, 2016, 8:39am

Hi @nagaraj.irock, your map function can’t have an outer context. Why don’t you just emit(meta.id, null); straight away and then use the limit at query time to limit your results to the desired number of documents?

There is no built-in way to control the number of documents maximum indexed other than you defining some criterion (so for example imagine your doc has a “counter” field like {"counter: 400}and in your map function you do if (doc.counter <= 1000) { emit()…}

nagaraj.irock · March 1, 2016, 8:43am

How to limit straightaway? Can I achieve that using ViewQuery? Right now I am using

            ViewQuery query = ViewQuery.from("LCDD", "findAllUsers").stale(Stale.FALSE);
            ViewResult result = theBucket.query(query);

to retrieve the documents.

I think the second method would work, but I don’t have the control over the fields in the document.

simonbasle · March 1, 2016, 8:44am

after the .stale you can use a .limit(1000)

nagaraj.irock · March 1, 2016, 8:45am

Ah! Okay. How bad that I didn’t see the options available.Will that return 1000 random documents from the database?

daschl · March 1, 2016, 9:13am

It will return the first 1000 that are stored in the index, but if the index is not mutated they will be same on the next query. What you can do is there is also a “skip” command in addition to limit which you can use with a random number to generate an offset every time, just make sure your skip + limit <= docs_in_index, otherwise your resultset won’t be 1000 entries.

simonbasle · March 2, 2016, 12:15pm

on a side note skip with views still forces the view query to visit (then discard) the skipped entries. so using it repetitively in queries with an incrementing skip isn’t very performant (eg. you could be tempted to do that for paging, but it’s not the best way).

nagaraj.irock · March 2, 2016, 1:14pm

I can store the batch’s last document’s id and then pass it in the next iteration which can then be used as a parameter for .startKey(String Id) right? Will that improve the performance?

simonbasle · March 2, 2016, 2:20pm

Yes, almost, but depending on the emit of the view it could need an extra step, which is to also use startKeyDocId(lastDocumentIdInPage).

Sometimes your view will emit several documents that share the same view key. But their document Id will be unique, so you can combine both information to restart correctly at the next page. See this blog post on what startkey_docId is all about: http://blog.couchbase.com/startkeydocid-behaviour

nagaraj.irock · March 3, 2016, 5:40am

Hi, are these operations thread safe? Can I use multiple threads to read from different blocks of data? I have used multi-threaded approach for insertion. I am not sure if it will work for read also, given that I have to use .startKey and .startKeyDocID. Right now I am able to read the documents using a single thread (i.e, the main process) without any problem. But the op/sec is just ~1500 which is low for my application. Using threads can I improve the performance?

daschl · March 3, 2016, 5:52am

The view requests are thread safe, but not synchronized across requests. What I mean by that is that you can use the client and its methods totally fine from multiple threads but if you want to pass around the startKey(docId) across threads you need to handle that yourself in a thread safe manner (maybe a volatile String, but I don’t know your exact application semantics)

nagaraj.irock · March 3, 2016, 10:51am

Yes. I was able to put in my own logic in order to make the threads work without any problem. Thanks a lot!

Topic		Replies	Views
Nodejs ViewQuery Node.js SDK	0	896	June 24, 2019
Is there any limit of documents in a bucket? Couchbase Server	2	2566	March 22, 2013
Range query with CouchbaseViewQuery::from PHP SDK	6	2540	December 9, 2014
Pagination on Couchbase view Java SDK client , java	1	914	August 9, 2022
View result count is vary Couchbase Server	2	1929	February 13, 2015

Reading a subset of documents using Couchbase View

Related topics