Memory leak in Couchbase .NET SDK

@thinh-ng

I’ve dug into this a bit deeper, and here’s what I discovered.

First, the very large chain of CallbackNode instances is due to the way they are stored on the CancellationTokenSource. In order to avoid array resizing costs, CallbackNodes are stored as a linked list. This makes it fast to call CancellationToken.Register to add a callback to be triggered when the token is canceled. Therefore, we can infer that the issue is that a large number of callbacks are being registered on the same CancellationToken.

The next item of interest is, in fact, the link you provided above CancellationTokenPair. I had originally looked at that and dismissed it, based on my comment there that a Dispose is not required because we’re never setting a timer using CancelAfter. And that is true for a plain CancellationTokenSource. However, digging into the internals a bit more I found that calling CreateLinkedTokenSource with two tokens which are not CancellationToken.None (CanBeCanceled == true) doesn’t create a plain CancellationTokenSource. It creates a Linked2CancellationTokenSource, an internal .NET class that adds some additional behaviors.

When this type is created, it registers callbacks on the source tokens (adding CallbackNode instances to the linked list mentioned before). When disposed, it also disposes of those registrations (removing CallbackNode instances from the linked list. So, in summary, I believe you are correct and we DO need to dispose of our CancellationTokenPair in cases where it’s linking two tokens.

I’ve created an issue to track this: Loading...

For now, the workaround is to avoid using a long-lived CancellationToken on calls to Key/Value operations on ICouchbaseCollection. You can either not pass the CancellationToken, or create your own short-lived CancellationTokenSource for each message on your service bus via using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); and pass that token instead.

I’m also not 100% sure why you still saw leaks when you didn’t pass the token. It’s possible that there are some other internal background tasks within the SDK that are triggering this leak more slowly. But I’m not seeing the same symptoms in your later memory dump, seems like maybe it’s related to the OrphanTraceListener and OrphanReporter? I think we should probably treat that as a separate issue and return to it after the first, more significant issue is resolved.