Bulk update / Batch update for millions of records and Handle the exception

Hi CB,
I am doing some analysis where i have to update single field for fetched millions of records using Ni1QL query.

  • To considering the performance issue , would not prefer single update at a time , hence thinking to perform bulk / batch update.
  • Also thinking what if any CB exception occurred while updating the document ( Ex: If update would be perform on 5 records and exception occurred , do we need to re perform the operation or there is a way to handle it.
  • Also would like to add one more scenario when insertion/update would happen at the same document by other process can we blocking the document …?

So far i am reading and found the below info but not sure about it.
Any suggestion / input would be really appreciated.
https://developer.couchbase.com/documentation/server/3.x/developer/java-2.0/documents-bulk.html

Hi Mehul,

The JavaSDK approach as documented in the link will provide great performance, and, as you have rightly pointed out, will need to be expanded to take care of any update exception. We plan to expand the example in the future.

However there is also another approach that you can consider. This is shown in our Couchbase performance dashboard that we run on a weekly basis. Here is the blog that provides more detail about these tests: https://blog.couchbase.com/ycsb-json-benchmarking-json-databases-by-extending-ycsb/
The specific performance test case that is similar to what you are looking for is the N1QL/YCSB/Workload-E http://showfast.sc.couchbase.com/#/timeline/Linux/n1ql/ycsb/Plasma

In this test, the range query is randomized between 0…20M docs, then each document is then UPSERTed by meta().id . Tests are run for both Memory Optimized Index (MOI) and Plasma (Standard Storage Engine) with different cluster size configuration.

Please reach out to me directly if you need any additional information.

-binh

Thank you so much we will get back to you

Hi Binh le,
So, here i am struggling with response time of my N1QL query .
I have around 41,458,301 documents in my bucket , and when i ran my below N1QL query it’s took around 1 minute to fetch 141892 documents , which is unacceptable.
I also ran with some other query parameter as will with longer window it took “3Minutes”
Can, you please help us do some query performance tuning.

Total document : 41,458,301

Index created on
CREATE INDEX IDX_Cus_Order_Detl ON
B551_CUS_ORDER_TRKR(dsDateTime, isdsAlertTriggered,OrderNumber)
PARTITION BY hash(OrderNumber) WITH { “defer_build”:true, “num_replica”:1, “num_partition”:32 }

SELECT b.OrderNumber AS CusOrdNumber
FROM B551_CUS_ORDER_TRKR b
WHERE b.dsDateTime IS NOT MISSING
AND (b.dsDateTime >= ‘2020-02-06 00.00.00’
AND b.dsDateTime<=‘2020-02-06 19.22.00’)
AND b.isdsAlertTriggered = false

{
“latestDelvEpochTm”: 1581082200,
“dsDateTime”: “2020-02-06 19.21.53”,
“gmtOffset”: “+00.00”,
“expiryTime”: 1582119000,
“latestDelvDateTm”: “2020-02-07 13.30.00”,
“OrderNumber”: “1Z9950850391546639”,
“isdsAlertTriggered”: false
}

Not sure how much it’ll help, but you could be performing an index scan of some sort because you have this in your query: WHERE b.dsDateTime IS NOT MISSING and you should add that to your Index WHERE clause.

You might also want to add ASC or DESC as a modifier to sort on dsDateTime.

If your Couchbase bucket is shared, you should try to refine the documents contained in the index by filtering by some subset. You may be currently putting all 41M docs in the index (with null values) when you’re really trying to target a subset.