Insert in transactions - Moving from staged to new document creation - how it works?

sri_ram · January 18, 2022, 2:06pm

Hi

For keyvalue transactions, during replace it is evident that the new version of document is held in xAttrs value of meta data of a document. This staged value in xAttrs is moved to original document after transaction is committed.

However, for a new document insertion during transactions how does a new document moves from staged to actual document insertion.
Are these staged new documents held in couchbase cluster or in SDK??

Kindly let me know or please redirect me to any documentation that is available which articulates how this is handled in couchbase.

Regards,
Venkat

graham.pople · January 18, 2022, 2:38pm

Hi @sri_ram
That’s correct, for replaces and removes the post-transaction version of the document is held in the xattrs.
For inserts a hidden document is created for this staging purpose, which is converted to a real document at commit point. This hidden document handles write-write conflicts if two transactions try to concurrently insert the same document, and allows the transaction to be rolled back or completed by the async cleanup process if the application is unable to finish the transaction (for example if it crashes).

sri_ram · January 18, 2022, 2:49pm

@graham.pople

Thanks for the quick response, as per my understanding from above, this hidden document also resides on one of the data nodes.

So for replace we ideally have 3 round trips from client to cluster:
get document → stage(to update xAttrs) → trigger to replace document with xAttrs(if transaction is successful).

For insert it would be 2 round trips from client to cluster:
stage(create hidden document) → trigger to convert hidden document to real document.

Please, let me know if my understanding is correct.

graham.pople · January 18, 2022, 3:29pm

Yes, correct on all counts, including that the hidden document is on the data nodes.

sri_ram · January 24, 2022, 7:21am

Hi @graham.pople

Apologies to shoot one more query, hope this would be the last one for transactions .

During inserts for the below:

For insert it would be 2 round trips from client to cluster:
stage(create hidden document) → trigger to convert hidden document to real document.

Say if out of 1k insertions in a single transactions, when client triggers to convert hidden document to real document after transaction is committed, what happens if my client crashes after sending events for only 500 documents and didn’t trigger events for remaining documents to convert from hidden to real documents?

Could you explain how CB would be achieving ACID compliance in this scenario??

would documents already moved from hidden to real documents available for document lookups for other clients??
how the other remaining documents would be moved from hidden to real documents ??

Regards,
Venkat

graham.pople · January 24, 2022, 3:58pm

Hi @sri_ram
There is an asynchronous cleanup that’s responsible for cleaning up any transactions that couldn’t be completed, due to application crash or other reason, which you can read up on here. So this will find the failed transaction (usually within 60 seconds, this can be configured) and finish committing the remaining 500 documents.

This async cleanup is currently run client-side, so at least one application needs to be running and have initialised the Transactions object.

In addition we have Monotonic Atomic View (MAV) reads inside transactions. This means that any transaction T1 reading a document that has staged data from transaction T2, where T2 has reached the commit point, will see the post-transaction version of the document. So T1 will see all 1k inserts as committed once it does reads after T2 has committed, regardless of the commit state of individual documents.
This is called Read Atomicity, since it presents an atomic commit at the read point - rather than at the write point, which would be too expensive in a distributed system as it would require locking across multiple nodes. The mechanics of MAV reads are that T1 check’s T2’s ATR entry to see if it’s committed, if it finds a document that has staged data from T2.
So our isolation level is really Monotonic Atomic View, higher than the Read Committed we state. We just don’t tend to mention MAV in the documentation as it’s not a widely known concept at present - though we expect that to change. If you’re interested, there is more information on MAV on jepsen.io and the Bailis paper.

sri_ram · January 24, 2022, 5:53pm

This makes me ask a query on what happens if we do a document look up which is not inside a transaction, if Monotonic Atomic View (MAV) reads are inside transactions then does that mean that if a document lookup(not part of a transaction) done on any inserted or replaced document which is staged in a committed transaction would give non-committed data??. Is my understanding correct??

graham.pople · January 24, 2022, 6:01pm

A regular non-transactional KV or N1QL read will not be performing MAV logic, and will return the non-committed data, regardless of the state of T2’s ATR entry.
In other words, transactional reads are at MAV isolation level, and non-transactional reads are at Read Committed. It’s in keeping with our philosophy that you should only pay for what you use, as MAV reads do involve a small cost if the document is discovered to be in a transaction and an ATR entry needs to be looked up.
Note that if you always need MAV reads, a read-only transaction is very cheap as it doesn’t need to create an ATR entry.

sri_ram · January 26, 2022, 4:20pm

Thanks for responding to so many queries that I have shot up.

The above explanation makes clear on how non committed data would be read by a regular non-transactional KV or N1QL read.

My new query is around based on inserts where some of the hidden documents are converted to real documents, say for example, below steps happened in order.

Transaction has begun for 100 inserts
Transaction is committed.
A trigger to move hidden documents to real documents has been initiated by client.
90 documents are moved from hidden documents to real documents.
before other 10 documents are moved from hidden documents to real documents my client crashed.
Clean up process has not yet been initiated by any other client.
Before clean up process has started, another client has done a non-transactional KV look up on one of the 90 documents.
So as per my understanding this client would still get the looked up document in step 7, as the transaction is committed and the inserted document is moved from hidden to real document.

Please do let me know if my understanding of step 8 is correct or not??

graham.pople · January 26, 2022, 4:36pm

Yes that’s correct. A non-transactional read will see any of the 90 documents as committed, but will not see the remaining 10 documents until those are handled by cleanup. That’s the difference between MAV and Read Committed.

But as mentioned, if it is an issue for your use-case then you can always get transactional MAV reads very cheaply with read-only transactions. If the document is not in a transaction, then a transactional read is essentially the same cost as a non-transactional one (the difference is a few bytes on the wire).

And no problems on the questions, keep them coming

Topic		Replies	Views
CouchbaseSourceConnector getting a spurious document mutation event when existing document is updated using transactions Kafka Connector	8	1771	December 10, 2021
Witnessing DCP Deletion event first when a document has been inserted with the Transaction API Java SDK	1	720	October 26, 2020
Documents not visible in Couchbase after insertion Java SDK	1	1528	March 24, 2017
Transactions and sub-document operations using .NET .NET SDK	5	884	April 29, 2021
Transaction - Upsert scenario Java SDK java	8	2060	July 23, 2020

Insert in transactions - Moving from staged to new document creation - how it works?

Related topics