Couchbase sdk pagination not working properly

I am trying to pull records from a view which emits in following way

DefaultViewRow{id=0329a6ac-84cb-403e-9d1d, key=[“X”,“Y”,“0329a6ac-84cb-403e-9d1d”,“31816552700”], value=1}

As we have millions of record, we are trying to implement pagination to pull 500 records per page and do some process then get next 500 records
i have implemented following code with java client

def cluster = CouchbaseCluster.create(host)
def placesBucket = cluster.openBucket("pass", "pass")
def startKey = JsonArray.from(X, "Y")
def endKey = JsonArray.from(X, "Y", new JsonObject())
hasRow = true
rowPerPage = 500
page = 0
currentStartkey=""
startDocId=""
def viewResult
def counter = 0
while (hasRow) {
    hasRow = false
   def skip = page == 0 ?0: 1
    page = page + 1
     viewResult = placesBucket.query(ViewQuery.from("design", "view")
            .startKey(startKey)
             .endKey(endKey)
            .reduce(false)
            .inclusiveEnd()
            .limit(rowPerPage)
            .stale(Stale.FALSE)
            .skip(skip).startKeyDocId(startDocId)
    )
    def runResult = viewResult.allRows()
    for(ViewRow row: runResult){
        hasRow = true
        println(row)
        counter++
        startDocId = row.document().id()
    }
    println("Page NUMBER "+ page)
}
println("total "+ counter)

Post execution, i am getting few repetitive rows and even though the total records is around 1000 for particular small scenario i get around 3000+ rows in response and it keeps going.
can someone please tell me if i am doing something wrong ?? PS: My start key value will be same for each run as i am trying to get each unique doc _id.
Are documents on couchbase stored in sequenced/sorted order of _id?
please help.

Hi @deepw,

Take a look at these articles that describe the proper use of startKeyDocId parameter and how to paginate large result sets. In a nutshell, the document identified by startKeyDocId should exist and be indexed by the view, and strange things can happen if it doesn’t. For pagination, instead of using skip, the recommended approach is to use keyset pagination by specifying a new startKey and startKetDocIdfor each loop iteration, so you start from the end of the previous page.

Note that the second article refers to Couchbase SDK 2 which is EOL, but the general concepts are still valid. Also be aware that the Views service is deprecated in Couchbase 7; if you’re developing a new app, N1QL will be more future-proof.

Thanks,
David

Hi David,

Thanks for the documents.
Couchbase we are using right now is 7.10.2 .
wrt new startKey, in my case i have constant startKey value for each run.
for example: my emits [country,status,id1, id2] like

DefaultViewRow{id=0329a6ac-84cb-403e-9d1d, key=[“IND”,“INVALID”,“0329a6ac-84cb-403e-9d1d”,“31816552700”], value=1}

so my startKey is [“IND”,“INVALID”] and endKey is [“IND”,“INVALID”,] and i want to extract on document id.
in this case i am trying to provide my last document ID as startkeyDocID but my startKey will remain same for my each extract.