Cases when a transaction is not retried but it would succeed

Hello!

I am using the latest java couchbase transaction API and I have a couple of questions relating to it.
Here is a quote from the documentation (See link.):

But there are situations that cannot be resolved, and total failure is indicated to the application via errors. These situations include:

  • Any error thrown by your transaction lambda, either deliberately or through an application logic bug.
  • Attempting to insert a document that already exists.
  • Attempting to remove or replace a document that does not exist.
  • Calling ctx.get() on a document key that does not exist (if the resultant exception is not caught).

Once one of these errors occurs, the current attempt is irrevocably failed (though the transaction may retry the lambda to make a new attempt). It is not possible for the application to catch the failure and continue (with the exception of ctx.get() raising an error). Once a failure has occurred, all other operations tried in this attempt (including commit) will instantly fail.

First of all, I was confused which operation will be retried in a new attempt, but based my tests, the case of “Attempting to insert a document that already exists.” is not retried in a second attempt. (And I suspect the same for replace and remove.)

I think this should not be the case or at least we need a way to override this behavior.

Here is a use case:
I need to either create or replace a document in a transaction. In the lambda, I start with a get operation to check whether it exists. Then I select the apropriate operation. If there are multiple transaction doing this the same time and the document does not exist yet, only one will succeed the rest will fail completely without any retry. The retry here would be completete fine since in the seconds attempt the get opertion would return with the newly inserted value.

Based on this I think, the assumption that in the case of “Attempting to insert a document that already exists.” and in the case of “Attempting to remove or replace a document that does not exist.” cannot succeed on subsequent attempt is false without analysing the user provided lambda.

Based on this, either the API should retry automatically on these cases or the user should be able configure whether the retry should happen.

Hi @horvath-martin
Thanks for raising this. To answer your first question: yes if you insert a document that doesn’t exist, it will cause a fast-fail (no retry).

As for remove and replace: the API requires you do a ctx.get() before this. This will raise a catchable DocumentNotFoundException if the document doesn’t exist (in the new SDK-integrated version of the API you are using - in the previous transactions library there was a ctx.getOptional() insead for this purpose). If that exception propagates through the lambda then the transaction will fast-fail, but you can catch it and allow the transaction to continue.

So I think that addresses your use-case? You do something like:

cluster.transactions().run(ctx -> {
   try {
      var doc = ctx.get(collection, docId);
      ctx.replace(doc, docContent);
   }
   catch (DocumentNotFoundException e) {
      ctx.insert(collection, docId, docContent);
   }
});

Let me know if that’s not correct and we can iterate further.

By the way, on reading that section of the documentation I can see where some confusion can arise. I’ll get that tidied up.

Hi @graham.pople

Our use case is similar to the code you have provided. (I have meant “getOptional” not just “get” in my example.)

Here is my problem (race condition on insert):

  • Procedure A calls the transaction → the document does not exist → attempts to insert
  • Procedure B calls the transaction → the document does not exists → attempts to insert
  • Either A or B will succeed with the insertion and commit the transaction
  • The one Procedure that did not succeed will fast-fail.

In this case the fast-fail is undesirable since the lamba provided by the user is entirely capable of handling the consequent attempt.

The desired execution would be:

  • The one Procedure that did not succeed fails, but a new attempt is triggered
  • On the next attempt the document is found and replace is called

Something like this would solve this:

try {
      ctx.insert(collection, docId, docContent);
    } catch (DocumentExistsException e) {
      throw new ExceptionThatWillNotCauseFastFail
   }

But insert throws TransactionFailedException and I think even if I handle that exception the transaction will not retry anymore.

Ah, I understand now.

But insert throws TransactionFailedException and I think even if I handle that exception the transaction will not retry anymore.

Yes that’s correct: internal state has been set at that point ensuring that the transaction fast fails regardless of how that exception is handled.

The good news I can give you is that we do already intend for ctx.insert() to raise a catchable DocumentExistsException, in future. It is not currently on the formal roadmap, but it isn’t a huge piece of work - let me see what I can do.

In the meantime, as a workaround, you can retry the transaction if it failed on a DocumentExistsException. Something like:

try {
   transactions.run(ctx -> {
      /* ... */
   });
}
catch (TransactionCommitAmbiguous err) {
   /* omitted for brevity */
}
catch (TransactionFailedException err) {
   if (err.getCause() instanceof DocumentExistsException) {
       // perform the transaction again
   }
}

Thanks for the help! I am happy to hear that the DocumentExistsException will eventually be catchable. Based on your proposed workaround I have managed to create temporary fix as well.

1 Like

Hi @horvath-martin
After further discussion with the team, I’m now aiming to get this into the next minor Java SDK release (3.4) later this year. We want to do the change in the next minor version as it’s technically a behaviour change. I’ll update here once it’s released.
Note this change, since it’s a feature/improvement, will go into the SDK version of transactions and not the transactions library you are currently using, as that is now in maintenance mode (bug fixes only). It’s only a small amount of work to port to the SDK version though - please see the migration guide for details.