Couchbase sdk pagination not working properly

deepw · August 8, 2022, 3:39pm

I am trying to pull records from a view which emits in following way

DefaultViewRow{id=0329a6ac-84cb-403e-9d1d, key=[“X”,“Y”,“0329a6ac-84cb-403e-9d1d”,“31816552700”], value=1}

As we have millions of record, we are trying to implement pagination to pull 500 records per page and do some process then get next 500 records
i have implemented following code with java client

def cluster = CouchbaseCluster.create(host)
def placesBucket = cluster.openBucket("pass", "pass")
def startKey = JsonArray.from(X, "Y")
def endKey = JsonArray.from(X, "Y", new JsonObject())
hasRow = true
rowPerPage = 500
page = 0
currentStartkey=""
startDocId=""
def viewResult
def counter = 0
while (hasRow) {
    hasRow = false
   def skip = page == 0 ?0: 1
    page = page + 1
     viewResult = placesBucket.query(ViewQuery.from("design", "view")
            .startKey(startKey)
             .endKey(endKey)
            .reduce(false)
            .inclusiveEnd()
            .limit(rowPerPage)
            .stale(Stale.FALSE)
            .skip(skip).startKeyDocId(startDocId)
    )
    def runResult = viewResult.allRows()
    for(ViewRow row: runResult){
        hasRow = true
        println(row)
        counter++
        startDocId = row.document().id()
    }
    println("Page NUMBER "+ page)
}
println("total "+ counter)

Post execution, i am getting few repetitive rows and even though the total records is around 1000 for particular small scenario i get around 3000+ rows in response and it keeps going.
can someone please tell me if i am doing something wrong ?? PS: My start key value will be same for each run as i am trying to get each unique doc _id.
Are documents on couchbase stored in sequenced/sorted order of _id?
please help.

david.nault · August 8, 2022, 4:19pm

Hi @deepw,

Take a look at these articles that describe the proper use of startKeyDocId parameter and how to paginate large result sets. In a nutshell, the document identified by startKeyDocId should exist and be indexed by the view, and strange things can happen if it doesn’t. For pagination, instead of using skip, the recommended approach is to use keyset pagination by specifying a new startKey and startKetDocIdfor each loop iteration, so you start from the end of the previous page.

Note that the second article refers to Couchbase SDK 2 which is EOL, but the general concepts are still valid. Also be aware that the Views service is deprecated in Couchbase 7; if you’re developing a new app, N1QL will be more future-proof.

Thanks,
David

deepw · August 8, 2022, 4:40pm

Hi David,

Thanks for the documents.
Couchbase we are using right now is 7.10.2 .
wrt new startKey, in my case i have constant startKey value for each run.
for example: my emits [country,status,id1, id2] like

DefaultViewRow{id=0329a6ac-84cb-403e-9d1d, key=[“IND”,“INVALID”,“0329a6ac-84cb-403e-9d1d”,“31816552700”], value=1}

so my startKey is [“IND”,“INVALID”] and endKey is [“IND”,“INVALID”,] and i want to extract on document id.
in this case i am trying to provide my last document ID as startkeyDocID but my startKey will remain same for my each extract.

Topic		Replies	Views
Pagination on Couchbase view Java SDK client , java	1	871	August 9, 2022
Pagination With Views And Complex Keys Couchbase Server query	3	2480	February 23, 2016
Not able to get all rows returned from Couchbase View Java SDK connections	23	7418	June 8, 2016
JAVA SDK vs REST API different results Java SDK	7	2480	November 13, 2015
Inconsistency Between Console and JDK Java SDK	3	2130	November 6, 2014

Couchbase sdk pagination not working properly

Related topics