Caveats with transaction cleanup

The doc says it performs a scan to find transaction documents and to clean them up.

Does that means the user must have the permission for scan on this collection, if it’s not the default ? And if it is the default ?

In the case of an application running on lambda functions, there will be no running application to perform the cleanup process.

Would it be possible to have a blocking function to do that ? So a lambda could be spawn with that sole purpose and die when it’s done.

Hi @JesusTheHun

Does that means the user must have the permission for scan on this collection, if it’s not the default ? And if it is the default ?

The way it works (in all SDKs bar .NET) is that with default settings, it will track what transactions are going through that SDK (Cluster instance), and add the metadata collection successfully used for that transaction to a cleanup set. So yes the user needs RBAC permissions on that collection, and in all but extremely unusual cases the user will have those permissions because they have successfully performed a transaction on that collection, with the same SDK/Cluster.

In the case of an application running on lambda functions, there will be no running application to perform the cleanup process.

That’s correct. Our recommendation to lambda users is to have a dedicated process that performs the cleanup, and to specify to this process the collections that need cleaning up.

We do have longer-term solutions on the roadmap to better support lambdas (and other tricky cases such as short-lived tests), but I can’t say too much more on that currently.

Would it be possible to have a blocking function to do that ? So a lambda could be spawn with that sole purpose and die when it’s done.

We have considered this option, and it’s not totally off the table, but the issue with it is that instead of amortising the cleanup scan cost over time, it’s going to instead run it on every lambda invocation.

The longer-term solution alluded to above will be a nicer approach.

1 Like

Including the range scan permission ? Because read/write, yes of course, but because the term used in the doc is « scan », I’m wondering.

So have a long-running process that will do nothing else but the cleanup. That’s basically a cleanup app :sweat_smile:

That cleanup lambda invocation could be scheduled on an appropriate interval.
AFAIU you need to cleanup only to remove tiny documents, it’s not blocking anything, is it ? Or does it prevent any new transaction to be performed on the documents involved in the transaction that needs a cleanup ?

Including the range scan permission ? Because read/write, yes of course, but because the term used in the doc is « scan », I’m wondering.

Ah I can see why you may think that, but no, just KV read and write permissions.

So have a long-running process that will do nothing else but the cleanup. That’s basically a cleanup app :sweat_smile:

Yes, can call it that certainly. We do have longer term plans to improve the experience in this regard for our lambda users, and this request has come up a few times. So I hope that we will have a better story for you in the future, but for now, a dedicated cleanup app is required.

That cleanup lambda invocation could be scheduled on an appropriate interval.
AFAIU you need to cleanup only to remove tiny documents, it’s not blocking anything, is it ? Or does it prevent any new transaction to be performed on the documents involved in the transaction that needs a cleanup ?

Usually cleanup will have nothing to do at all. At the point your transactions().run() call returns, we aim to have completely cleaned up the transaction, including removing any of the metadata associated with it.

Cleanup is for the cases where things go wrong - it’s intended to be a robust ‘backstop’ to cover any corner case, such as if your application crashes, for instance, or the transaction times out mid-commit. So we should be talking about fairly rare situations.

Without wanting to reveal too many internal details (as we need flexibility to make future changes at the protocol level), cleanup currently works by looking for expired transactions via reading a distributed index, and if it finds any, picking up wherever the app left off. Usually it will find these ‘lost’ transactions within 60 seconds of them expiring - that is configurable and can be tuned to be more aggressive, at the cost of a slightly higher rate of reads (to locate the transactions).

And yes, in the rare cases of a ‘lost’ transaction, and only if that lost transaction had reached the commit point, then any docs it has staged will be locked from transactional writes (not reads) until cleanup finds them. So having cleanup running is a critical part of the infra.

Hopefully that also explains why it’s best to have cleanup running continuously, rather than on schedule.

I am concerned that by giving all these details I’m implying that users need to worry about them… For most users, the default cleanup settings should work well out-of-the-box, and require no special tuning.

The exception are users of short-lived applications, such as lambdas and tests. They do require the workaround given above, at present.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.