Is there a way to update multiple documents at once using the subdocument API? I need to update a single field in a large number of documents, and it seems incredibly wasteful to
- Call the subdoc API thousands of times
- Send the entire document in a batch operation when all I’m updating is a single field.
Would it be better to run this as a query string instead?
UPDATE bucket SET doc.field USE KEYS […, …, …, …, …,]
or can multiple docs be updated at once using the subdoc api?
var queryResult = await buck.MutateIn<Annotation>(annot0)
.Upsert("s", "B", false)
There’s no way to update multiple documents at once with Sub-Document, no. Bear in mind that a query, under the hood, will still ultimately be doing a KV update on all of those documents, so likely won’t really gain you anything over just doing the multiple Sub-Document calls.
Couchbase is a high performance key-value store designed to operate at scale, and personally, given reasonable hardware, I wouldn’t be at all concerned about a few thousand calls.
I’m more concerned with the thousand+ round trips to the database to send the requests. I didn’t realise couchbase was internally sending them one-by-one even with the bulk operations. It seems like to minimize the round trips, wrapping it up into a query string is the best way to go?
As a side note, is there a limit to the size of a query string that I can send through the API? I can’t find any information on this. My USE KEYS […, …, …, …,] could potentially be really large, is there a risk of it failing at some point?
Whether a single node or many nodes (in which case, of course, you’d have to break up the requests) the Sub-Document operations are always pipelined for efficiency. You can avoid multiple roundtrips by following some of the techniques that maximize the pipelining in the batching operations section of the docs.
And yes, there is a limit to the query size. That approach would also be less efficient, since the entire statement would need to be parsed, then there is distribution among the nodes (again, assuming a multi-node cluster). I don’t remember the max size off the top of my head, but @Marco_Greco would.
Thanks for the link to those docs, for some reason that didn’t come up in any of my searches. It’s not 100% clear to me how to take advantage of batching though. This statement:
“When using an SDK in an asynchronous (non-blocking) model, all requests are inherently batched.” suggests that all i need to do is make sure I’m using CompleteAsync() on the subdoc API and batching will be handled internally?
but then why do we need to manually batch as demonstrated here?:
Maximum query size is 64Mb - but you also have to take into account that after you parse a USE KEYS clause that large, you then have a huge parse tree and even bigger plan.
The execution layer would then have an evaluated array of keys of probably 8 million entries, so if you go that route and use the max statement size, you could actually blow up memory.