I’ve been trying to figure out a memory leak with a .NET Core 3.1 app of ours that uses the CB SDK and DI packages. The latest versions of both. I have been looking at this from a variety of angles. The latest troubleshooting I’ve done, when getting a dump via dotnet-gcdump, I see that the app at startup has 2MB associated with Couchbase.Core.IO.Connections.MultiplexingConnection (the highest memory use of any object in the application). After the application has been running for an hour that same object has just over 100MB of memory usage (by far, the highest in the app). Anyone else experiencing this sort of issue?
@justusweber What size documents are you storing in the Couchbase cluster? For performance reasons, each MultiplexingConnection keeps a buffer. It starts small and grows as necessary to the size of the largest document received (max is 20MB per connection). At least one MultiplexingConnection is created per Couchbase node, and the number per node scales based on demand (there are settings to control the min and max).
Thanks @btburnett3 . Our documents vary in size, consistently less than 1MB. Is there a way to get some insight into what MultiplexingConnection is doing (how many connections, the documents associated with the current cached buffer state, etc…)
@justusweber You should be able to turn up debug logging to get some details about when it scales up or down connections. If you’re able to take a dotMemory snapshot and send it, I’d be glad to look at it, it can probably tell me some stuff as well.
The MultiplexingConnection is a wrapper around a single Socket that handles sending and receiving using a binary format. The rest that @btburnett3 said above is true. Based upon what your describing though, it does indeed sound like a possible memory leak.
There really isn’t a way to get a snapshot of the current state, but its something that can be added to help troubleshoot issues. I created a ticket for tracking this. We’ll investigate the memory issue and let you know what we find there and create another Jira ticket if needed.
One last thing, if its indeed a memory leak the memory will continue grow until you get an OutOfMemoryException. Have you experienced this yet? Or does the memory profile eventually flatten out (slope of 0)?
The memory profile does not flatten out. As a test we bumped up the RAM on the VM, and it will clearly trend toward consuming all available memory over time.
The issue appears to be related to ThresholdTraceLogger leaking activities. In a memory dump, there were 7,243,541 instances of Activity+KeyValueListNode retaining a total of almost 500MB of RAM, though there were only 729 Activity objects.
It appears that all of this memory is retained by just one Activity object. Looking at the object in question, it appears to have a collection of tags that repeats values over and over again for last_remote_address, last_operation_id, etc. It feels like we have one activity somewhere that is getting used over and over again, continually getting tags added to it. The OperationName of the root activity is Microsoft.AspNetCore.Hosting.HttpRequestIn, making me think this may be related to the first request triggering bootstrap?
Issue filed: https://issues.couchbase.com/browse/NCBC-2762