Is there any way to do a partial update inside a transaction?
All I need is to update a single field, so I would like to avoid having to fetch the entire document and do a full update. The documents can be relatively big and performance for this use case is critical. For that reason I would also like to avoid using n1ql.
To clarify the transaction involves updates to several documents.
There is no support for key-value Sub-Document support inside transactions at this time. Thanks for raising this though; requests like this help us know where to focus our engineering effort in the future.
It would be worth testing out the performance with N1QL. It won’t be quite as efficient as key-value Sub-Document for reasons including the query parsing cost, but if you use USE KEYs then there is no index lookup required and the query service can fetch the documents directly from the key-value service. And while the query service will be fetching the full documents, this will be intra-cluster.
Hi @graham.pople ,
Thanks for your prompt response.
Follow up question: say I have 100k ‘elements’ to process. For each element I need to do 1 full document update and 2 document partial updates. It’s important to have atomicity for the 3 updates of every element, but don’t need atomicity for the entire 100k elements. If the process fails in the middle, it’s ok as long elements are left in a consistent state: either 0 or 3 updates for every element.
To maximize performance, would it be better to have multiple parallel single-element transactions (100k transactions each having 3 updates)? Or maybe having each transaction update multiple elements would be better (fewer but bigger transactions, e.g. 10k transactions each one having 30 updates)?
Note that each document partial update would be done using n1ql as you suggested. So 1 replace and 2 queries using keys in a transaction.
Great question. Under the hood, each transaction attempt writes three times into a document called an Active Transaction Record (PENDING, COMMITTED, and removal). And each key-value document write (insert, replace, remove) is doubled - once for staging, once for committing. So 100k transactions each doing 1 element (3 updates) will be 100k * (3 ATR writes + 3 staging writes + 3 committing writes) = 900k total writes. While 10k transactions would be 10k * (3 ATR writes + 30 staging writes + 30 committing writes) = 630k total writes. So the fixed overhead-per-attempt of the ATR writes clearly steers towards less-but-larger transactions.
But there is a balancing act to achieve. If a transaction conflicts with another, it must rollback and retry, which adds more writes. Smaller transactions will probably be less likely to conflict. So that really depends on your workload - if conflicts are unlikely in your case, you can likely push to bigger transactions.
Those are the main performance considerations. The transactions protocol is very well distributed: writes will be distributed pretty evenly across all key-value nodes thanks to auto-sharding, and there are 1,024 Active Transaction Records by default to distribute over, so there aren’t any hotspots there that you need to avoid.
One final thing to note, since you are pushing these updates through N1QL, is the query service handles statements in serial. E.g. 3 query nodes will be processing at most 3 statements regardless of how many parallel transactions are taking place. This would be another factor that would steer me to recommend fewer-but-larger transactions.
Thanks for the detailed answer. I’ll need to run some tests and see what transaction size gives the best performance.
I realize this is old (I was looking for something else), but - Why not have the application get the document, then, in-memory, do the full update followed by the two partial updates, then replace the document? Even without the transaction, the update will be atomic.