Java SDK memory leak with bad cluster

dpozhidaev · May 24, 2016, 11:49am

Found memory leak in Java SDK 2.1.6. It happens only with specific conditions - when we got problems with a cluster.
Preconditions:

Created cluster_1 with 3 nodes.
3rd node has a problem with size of HDD - it’s full.
On client we from time to time can get error: com.couchbase.client.java.error.CouchbaseOutOfMemoryException, but there is no memory leaks.
We have Couchbase client (JDK) which periodically performs reconnect do Couchbase cluster.

Conditions:
5. Starting XDCR replication from external Couchbase cluster_2 to our problem cluster_1.
6. On Couchbase cluster_1 we can see a messages like this:

[14:38:25] - Approaching full disk warning. Usage of disk “F:” on node “172.20.112.93” is around 91%.
[14:38:25] - Approaching full disk warning. Usage of disk “F:” on node “srv3” is around 100%.
[14:38:25] - Hard Out Of Memory Error. Bucket “ufm_2” on node srv3 is full. All memory allocated to this bucket is used for metadata.

And now we can see that next classes have a memory leak:
com.couchbase.client.deps.io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
com.couchbase.client.core.ResponseEvent
com.couchbase.client.core.RequestEvent

Leak.PNG751×487 30 KB

memory.PNG774×504 17.6 KB

Example of code to model such a problem:

package com.ufm.api;

import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.CouchbaseCluster;
import com.couchbase.client.java.env.CouchbaseEnvironment;
import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;
import org.junit.Test;

public class MemoryLeakTest {

@Test
public void testCBMemoryLeak() throws Exception {

    while (true) {
        ConnectionContainer connectionContainer = getConnectionContainer();
        Thread.sleep(200);
        closeConnection(connectionContainer);
        Thread.sleep(200);
    }

}

private void closeConnection(ConnectionContainer connectionContainer) {
    connectionContainer.getBucket().close();
    connectionContainer.getCouchbaseCluster().disconnect();
}

private ConnectionContainer getConnectionContainer() {
    ConnectionContainer connectionContainer = new ConnectionContainer();

    CouchbaseEnvironment couchbaseEnvironment = DefaultCouchbaseEnvironment.builder()
            .kvTimeout(5000L)
            .build();
    connectionContainer.setCouchbaseEnvironment(couchbaseEnvironment);

    CouchbaseCluster cluster = CouchbaseCluster.create(couchbaseEnvironment, "http://srv3:8091");
    connectionContainer.setCouchbaseCluster(cluster);

    Bucket bucket = cluster.openBucket("ufm_2", "1111");
    connectionContainer.setBucket(bucket);
    return connectionContainer;
}


private class ConnectionContainer {
    private CouchbaseEnvironment couchbaseEnvironment;
    private CouchbaseCluster couchbaseCluster;
    private Bucket bucket;

    public CouchbaseEnvironment getCouchbaseEnvironment() {
        return couchbaseEnvironment;
    }

    public void setCouchbaseEnvironment(CouchbaseEnvironment couchbaseEnvironment) {
        this.couchbaseEnvironment = couchbaseEnvironment;
    }

    public CouchbaseCluster getCouchbaseCluster() {
        return couchbaseCluster;
    }

    public void setCouchbaseCluster(CouchbaseCluster couchbaseCluster) {
        this.couchbaseCluster = couchbaseCluster;
    }

    public Bucket getBucket() {
        return bucket;
    }

    public void setBucket(Bucket bucket) {
        this.bucket = bucket;
    }
}
}

simonbasle · May 30, 2016, 8:54am

hi @dpozhidaev, sorry for the late answer…
Looks similar to something reported in a Netty bug: io.netty.channel.ChannelOutboundBuffer$Entry weird behaviour · Issue #4134 · netty/netty · GitHub

There is a possible workaround, but it would require you to upgrade the SDK from 2.1.6 to 2.2.7 (the latest in the current series), because it requires a newer version of Netty.

The workaround is to disable the netty pooling by starting the JVM with the following option:

-Dcom.couchbase.client.deps.io.netty.recycler.maxCapacity=0

This will however produce more GC pressure. The upstream Netty bug and its impact on the SDK is tracked in our own ticket, JCBC-951.

Keep in mind there’s been a few behavioral changes in 2.2.0 (most notably, in the async API no request is triggered before you call subscribe(...) on an Observable). See the release notes for the 2.2.x series.

dpozhidaev · May 30, 2016, 2:53pm

Thank you.
Moved to newest SDK (2.2.7), but it didn’t help. Thank you for advice about using maxCapacity - I will try it.

daschl · July 8, 2016, 8:09am

@dpozhidaev did setting the capacity help? I saw that there have been some upstream changes in netty to the recycler which we can pick up once released in later java SDK releases…

Topic		Replies	Views
Java.lang.OutOfMemoryError: Direct buffer memory Java SDK	5	7980	November 5, 2015
OutOFMemory issue due to couchbase Java SDK	3	2202	July 10, 2020
Couchbase Java SDK Out of memory error on initialization Java SDK	3	1064	April 5, 2022
Java SDK direct memory leak (missing ByteBuf release ?) Java SDK java	17	5994	February 13, 2017
Memory leak in Couchbase .NET SDK .NET SDK dot-net	9	2324	November 16, 2021

Java SDK memory leak with bad cluster

Conditions: 5. Starting XDCR replication from external Couchbase cluster_2 to our problem cluster_1. 6. On Couchbase cluster_1 we can see a messages like this:

Related topics

Conditions:
5. Starting XDCR replication from external Couchbase cluster_2 to our problem cluster_1.
6. On Couchbase cluster_1 we can see a messages like this: