CouchbaseSourceConnector getting a spurious document mutation event when existing document is updated using transactions

We are using the CouchbaseSourceConnector (4.0.6) under the Kafka-connect framework.

I have observed that when using transactions, if we update an existing couchbase document, we receive two mutation events. The first event doesn’t contain the updated data. We seem to be getting it only because the CAS value was updated, probably because of the way transactions are handled in Couchbase. The second mutation event has the expected (updated) value in the document-contents.

I would like to confirm if my understanding is correct that the first mutation event is because of the way transactions are handled. I did read here that Couchbase updates the extended attributes of the document when staging the updates. Is that the reason why we are observing the document mutation event ? Strangely, there was no data in the xattr map/dictionary of the document.

If my understanding is correct. Is there any way to turn off receiving this spurious event. We are using couchbase server version 6.6.

Note that when I changed the code to not use transactions, I didn’t observe the spurious event. We just got a single mutation event as expected.

TIA.

Hi Pradeep,

What you are describing is the expected behavior.

There’s a couchbase.xattrs config property you can set to true if you want XATTRs to be visible to the connector. (This setting has no effect unless you’re writing a custom SourceHandler and inspecting the XATTRs yourself.)

Is there any way to turn off receiving this spurious event.

Unfortunately not. Hey @graham.pople, what would happen if the connector looked at the XATTRs and ignored documents with staged changes?

  1. Does a rollback remove the XATTR? If so, I wonder if a user would still see a duplicate message in case of a rollback.

  2. If a transaction does not complete, will the staged version eventually get deleted from the XATTRs?

Thanks,
David

Thanks for your reply, David.

I would like to know if there is any indicator in XATTRs which can help us identify that these events are related to internal book-keeping to support transaction handling and that there is no real update to the document which is visible to an external client. We would like to filter such events. Without this, I think that many clients might have issues in their design, if they are relying on Couchbase to send events, only in the case, when there is a real change in the document. By “real change”, I mean changes which are visible to external clients who are querying for the document in Couchbase.

Hi David,

This is what i can see in the xattrs:

{
    "txn": {
               "op": {
                   "type": "replace",
                   "stgd": {<updated_document_contents>},
                   "crc32": "0x63dce684"
               },
               "id": {
                   "atmpt": "e7769e8b-a7c7-4326-a70e-8367853559f1",
                   "txn": "8e21eb03-9dc9-4fa8-96eb-783fb359fdb7"
               },
               "restore": {
                   "CAS": "0x16be2c171c310000",
                   "revid": "1",
                   "exptime": 0
               },
               "atr": {
                   "scp": "_default",
                   "coll": "_default",
                   "bkt": "mybucket",
                   "id": "_txn:atr-893-#251"
               }
           }
}

What should I consider as a valid field to ignore this event?
Should it be: txn.op.stgd or txn.id.txn ?
I would like to know what is the contract with the clients which will support the requirement for backward compatibility in the future.

Can you please let me know ?

Thanks

Hi Pradeep,

Our transactions expert @graham.pople is on vacation this week. Hopefully he can offer some guidance when he returns.

Thanks,
David

Hi Pradeep and David

Catching up with the above questions:

Does a rollback remove the XATTR? If so, I wonder if a user would still see a duplicate message in case of a rollback.

Yes that’s correct, rollback will remove the “txn” xattr from the document. And this will indeed trigger a DCP message. If the document was a staged insert you’ll see a DCP delete when it’s staged and another delete when it’s rolled back. If it was a staged replace or remove, you’ll see DCP updates for both instead.

If a transaction does not complete, will the staged version eventually get deleted from the XATTRs?

Generally if the transaction fails then rollback will happen immediately. In some rare cases (such as application crash) that may not be able to happen. In this case the asynchronous cleanup (initiated when the Transactions object is created) will find this transaction (usually within a minute) and roll it back.
Since we’re at the DCP level and discussing very low-level transaction protocol internals here, there is one more important detail to understand: if the transaction attempt was at Pending level (e.g. hadn’t yet reached Committed or Aborted), then cleanup has no knowledge of the documents involved in the attempt. In this situation (to stress, it should be rare), the documents will be left with a “txn” xattr and no DCP event will be received for them. There is no negative impact from this beyond a small amount of ‘wasted’ space that will be recovered when the document is next involved in a transaction.

I would like to know if there is any indicator in XATTRs which can help us identify that these events are related to internal book-keeping to support transaction handling and that there is no real update to the document which is visible to an external client. We would like to filter such events. Without this, I think that many clients might have issues in their design, if they are relying on Couchbase to send events, only in the case, when there is a real change in the document. By “real change”, I mean changes which are visible to external clients who are querying for the document in Couchbase.

Well an xattr change is a ‘real change’ by that definition, since the xattrs are visible to external clients, e.g. they can be queried with a Sub-Document lookup reading the “txn” xattr . But I do understand the broader request, that the transactional staging changes be hidden somehow from DCP consumers such as the Kafka connector - e.g. that they would not be treated as a genuine change to the document. We do not currently have that functionality.
May I ask more about the use case. Is it from a performance standpoint?

1 Like

hi Graham,

Thanks for replying.

I was planning to check for the existence of the ‘txn’ xattr to decide whether to ignore these book-keeping events, but it seems like it will not be able to filter all the book-keeping events (especially for rollback).

rollback will remove the “txn” xattr from the document. And this will indeed trigger a DCP message.

For rollback scenario, I guess, the rollback event will just look like a regular mutation event without any extended attributes. Or, do you think that there is a way to differentiate it from a regular mutation event which is triggered when a document is created in Couchbase ?

In some rare cases (such as application crash) … if the transaction attempt was at Pending level (e.g. hadn’t yet reached Committed or Aborted), then the asynchronous cleanup … has no knowledge of the documents involved in the attempt. In this situation … the documents will be left with a “txn” xattr and no DCP event will be received for them. There is no negative impact from this beyond a small amount of ‘wasted’ space that will be recovered when the document is next involved in a transaction.

I would like to know - if the next mutation event to the document, when its not part of a transaction, will have the ‘txn’ xattr cleared atomically when the document is updated ? So that, we don’t see additional DCP events other than for the document-update.

Is it from a performance standpoint

Our main problem is with test-automation. We are not able to reliably know how many events should we expect for assertion. Especially, because the book-keeping is Couchbase’s internal implementation and can change in the future.

We do not currently have that functionality.

Can I request for a feature to mark all events (transaction or otherwise) with a reason-code xattr and an API which can help us know the category of this mutation event, which uses this reason-code; or maybe an alternative solution ?

Correct, you’ll hit issues there with rollback, which will look like a normal update. And there are other potential problems: if a document has transactional metadata changes staged, then the application crashes and cleanup is unable to rollback this document due to the Pending scenario described above - well you don’t want to accidentally filter out any subsequent modifications to that document just because it has a “txn” xattr. So I don’t believe filtering with the currently available information is viable - at least without tracking per-document state to try and disambiguate these cases, but I think that will get very complex, and will require learning about and embedding a lot of details about current implementation internals that could change down the line.

For rollback scenario, I guess, the rollback event will just look like a regular mutation event without any extended attributes. Or, do you think that there is a way to differentiate it from a regular mutation event which is triggered when a document is created in Couchbase ?

The former is correct, and there’s no way to differentiate it just from the DCP packet.

I would like to know - if the next mutation event to the document, when its not part of a transaction, will have the ‘txn’ xattr cleared atomically when the document is updated ? So that, we don’t see additional DCP events other than for the document-update.

When the transaction commits the document it will atomically remove the xattr.

I think you may be talking about cases like those mentioned above, e.g. a crashed transaction in Pending state, and a non-transactional update later is made to that document? In this case, the non-transactional update will not touch the “txn” xattr. Nothing changes xattrs unless the mutation explicitly does so (though user xattrs are removed when a document is deleted). That’s by design - the xattrs are most often used for metadata purposes, so regular non-metadata changes to the document’s body shouldn’t accidentally modify the metadata.

Our main problem is with test-automation. We are not able to reliably know how many events should we expect for assertion. Especially, because the book-keeping is Couchbase’s internal implementation and can change in the future.
Can I request for a feature to mark all events (transaction or otherwise) with a reason-code xattr and an API which can help us know the category of this mutation event, which uses this reason-code; or maybe an alternative solution ?

Ah ok, thanks for clarifying the use-case, which seems reasonable. We do already track this feature request internally, and there has been a good deal of prior discussion on it. Providing a solution isn’t as simple as it perhaps appears, with the interaction between filtering out events and DCP compaction a particularly thorny issue to resolve. So I don’t have a firm commitment or roadmap at this point, but please know that we are tracking it, and I’ll add this use-case and post to that context.

In the interim, I think the only workaround is to code knowledge of the current transactional internals into your tests. While this isn’t ideal, it is relatively safe as there are certain transactional fundamentals that are extremely unlikely to change at the protocol level, e.g. a document will very likely always need two mutations, one for staging and one for committing, unless we fundamentally change how and where we stage changes. And if we do introduce changes at the DCP level that I think for backwards compatibility they would be very likely to be opt-in and your tests would continue to work unchanged.

1 Like

Ok, Graham.

Thanks a lot for your prompt reply. :slightly_smiling_face: