Couchbase Java SDK - Warning During Initialisation

Hi All,

I am getting this one warning during the initialization of my Spring Boot application.

2025-06-19 11:07:29:366 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - -Dio.netty.noUnsafe: false
2025-06-19 11:07:29:367 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - Java version: 21
2025-06-19 11:07:29:369 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - sun.misc.Unsafe.theUnsafe: available
2025-06-19 11:07:29:370 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - sun.misc.Unsafe.copyMemory: available
2025-06-19 11:07:29:370 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - sun.misc.Unsafe.storeFence: available
2025-06-19 11:07:29:371 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - java.nio.Buffer.address: available

2025-06-19 11:07:29:377 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - direct buffer constructor: unavailable
java.lang.UnsupportedOperationException: Reflective setAccessible(true) disabled
	at com.couchbase.client.core.deps.io.netty.util.internal.ReflectionUtil.trySetAccessible(ReflectionUtil.java:31)
	at com.couchbase.client.core.deps.io.netty.util.internal.PlatformDependent0$5.run(PlatformDependent0.java:293)
	at java.base/java.security.AccessController.doPrivileged(Unknown Source)


2025-06-19 11:07:29:382 [main] DEBUG c.c.c.c.d.i.n.u.i.PlatformDependent0 - [MessageId: ] - [TraceId: ] - jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
java.lang.IllegalAccessException: class com.couchbase.client.core.deps.io.netty.util.internal.PlatformDependent0$7 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module @150fc7a7
	at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Unknown Source)
	at java.base/java.lang.reflect.AccessibleObject.checkAccess(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at com.couchbase.client.core.deps.io.netty.util.internal.PlatformDependent0$7.run(PlatformDependent0.java:429)

Using JDK 21
spring boot - 3.3.5
Couhbase Libs

Our server configuration is

  1. Writes to a cluster(20 servers) in region A
  2. The reads from a cluster(20 servers) in region B
  3. XDCR from region A to region B

We are getting a lot of overthreshold event logs, as we set OT for KV operation to 50 MS
Sample metrics

{
    "kv":
    {
        "top_requests":
        [
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70428,
                "last_remote_socket": "server1.org:11210",
                "last_local_id": "3267275900000001/0000000063FC5FD7",
                "total_dispatch_duration_us": 70428,
                "total_server_duration_us": 23,
                "operation_id": "0x51f04f4",
                "timeout_ms": 2500,
                "last_server_duration_us": 23,
                "total_duration_us": 70452
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70388,
                "last_remote_socket": "server2.org:11210",
                "last_local_id": "3267275900000001/00000000B94BA63B",
                
                "total_dispatch_duration_us": 70388,
                "total_server_duration_us": 8,
                "operation_id": "0x51f04ff",
                "timeout_ms": 2500,
                "last_server_duration_us": 8,
                "total_duration_us": 70408
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70388,
                "last_remote_socket": "server2.org:11210",
                "last_local_id": "3267275900000001/00000000B94BA63B",
       
                "total_dispatch_duration_us": 70388,
                "total_server_duration_us": 8,
                "operation_id": "0x51f04fd",
                "timeout_ms": 2500,
                "last_server_duration_us": 8,
                "total_duration_us": 70400
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70370,
                "last_remote_socket": "server2.org:11210",
                "last_local_id": "3267275900000001/00000000B94BA63B",
               
                "total_dispatch_duration_us": 70370,
                "total_server_duration_us": 15,
                "operation_id": "0x51f04fb",
                "timeout_ms": 2500,
                "last_server_duration_us": 15,
                "total_duration_us": 70391
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70376,
                "last_remote_socket": "server1.org:11210",
                "last_local_id": "3267275900000001/0000000063FC5FD7",

                "total_dispatch_duration_us": 70376,
                "total_server_duration_us": 6,
                "operation_id": "0x51f0504",
                "timeout_ms": 2500,
                "last_server_duration_us": 6,
                "total_duration_us": 70385
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70366,
                "last_remote_socket": "server1.org:11210",
                "last_local_id": "3267275900000001/0000000063FC5FD7",
                "total_dispatch_duration_us": 70366,
                "total_server_duration_us": 8,
                "operation_id": "0x51f0501",
                "timeout_ms": 2500,
                "last_server_duration_us": 8,
                "total_duration_us": 70380
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70329,
                "last_remote_socket": "server3.org:11210",
                "last_local_id": "3267275900000001/000000008BA89B27",
                "total_dispatch_duration_us": 70329,
                "total_server_duration_us": 15,
                "operation_id": "0x51f04f9",
                "timeout_ms": 2500,
                "last_server_duration_us": 15,
                "total_duration_us": 70351
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70308,
                "last_remote_socket": "server4.org:11210",
                "last_local_id": "3267275900000001/0000000058D1B6E3",
                "total_dispatch_duration_us": 70308,
                "total_server_duration_us": 11,
                "operation_id": "0x51f04f7",
                "timeout_ms": 2500,
                "last_server_duration_us": 11,
                "total_duration_us": 70343
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70330,
                "last_remote_socket": "server5.org:11210",
                "last_local_id": "3267275900000001/00000000717D8861",
                "total_dispatch_duration_us": 70330,
                "total_server_duration_us": 32,
                "operation_id": "0x51f04fc",
                "timeout_ms": 2500,
                "last_server_duration_us": 32,
                "total_duration_us": 70342
            },
            {
                "operation_name": "get",
                "last_dispatch_duration_us": 70318,
                "last_remote_socket": "server2.org:11210",
                "last_local_id": "3267275900000001/00000000B94BA63B",
                "total_dispatch_duration_us": 70318,
                "total_server_duration_us": 8,
                "operation_id": "0x51f0510",
                "timeout_ms": 2500,
                "last_server_duration_us": 8,
                "total_duration_us": 70327
            }
        ],
        "total_count": 10
    }
}

Could you please help me identify the root cause?

Thanks

Its because you’ve set the threshold to a small value. Don’t set it.
I suspect all those OverThreshold events are for the first operations and that it is taking 70ms to establish the connection.

see https://stackoverflow.com/questions/57885828/netty-cannot-access-class-jdk-internal-misc-unsafe

1 Like

@mreiche Thanks for the reply.

  1. Regarding the KV threshold, it is set to 50 MS as our API SLA is 100 MS.

    1. These logs are being generated throughout the day, but why is it establishing a connection for the first time? Doesn’t Couchbase create a reusable connection pool?
    2. How can I debug this?
  2. Regarding the initialization of my Spring Boot application

    1. I checked this doc on supported OS for couchbase SDK https://docs.couchbase.com/java-sdk/current/project-docs/compatibility.html#alpine-linux-compatibility

    2. It appears that the Netty performance optimization will not be effective on this OS.

      /etc/os-release

      NAME=“Alpine Linux”
      ID=alpine
      VERSION_ID=3.21.3
      PRETTY_NAME=“Alpine Linux v3.21”
      HOME_URL=“https://alpinelinux.org/”
      BUG_REPORT_URL=“https://gitlab.alpinelinux.org/alpine/aports/-/issues”

      java -version

      openjdk version “21.0.7” 2025-04-15 LTS
      OpenJDK Runtime Environment Temurin-21.0.7+6 (build 21.0.7+6-LTS)
      OpenJDK 64-Bit Server VM Temurin-21.0.7+6 (build 21.0.7+6-LTS, mixed mode, sharing)

  3. We wanted to understand why this issue is happening.

    1. Is it due to XDCR, extra load on region B, cause of KV reads and writes due to XDCR?
    2. Is this latest getting introduced by n/w?
      • We ran the traceroute command from the application VM to the Couchbase VM, and it’s reaching within 1 hop.
    3. Is it due to the application end(Couchbase SDK)?
  1. 70ms is within the 100ms SLA.
    The network connections are not created until they are used, so the first operations include the time to create the connections. After creating the cluster, you can call cluster.waitUntilReady() and bucket.waitUntilReady() to create the connections.

Edit: in spring-data-couchase, in your implementation of AbstractCouchbaseConfiguration, you could override

	@Bean(destroyMethod = "disconnect")
	public Cluster couchbaseCluster(ClusterEnvironment couchbaseClusterEnvironment) {
              Cluster cluster = super.couchbaseCluster( couchbaseClusterEnvironment );
              cluster.waitUntilRead( Duration.ofSeconds(30) ) ;
              cluster.bucket( getBucketName() ).waitUntilReady( Duration.ofSeconds(30) ); 
              return cluster;
      }
  1. For the netty warning - you could follow the instructions in the stackoverflow post on how to allow it. Or you could ignore it - there might not be a noticeable difference with netty native io. It’s not likely the cause of the 70ms response for the first request.

  2. I assume you mean the 70ms for the first requests. There are several things that need to happen. The initial connection to a node in the connection string - this might first require two dns lookups. Then the actual tcp connection. For tls connections, the tls handshake is required. Then authentication. Then the sdk retrieves the cluster configuration so it knows what nodes have what services. Then a select bucket operation on each of the kv (data) nodes. Then the bucket partition map is retrieved from one of the data nodes so the sdk can determine which node holds each document. Then a data operation can take place. The connections remain established until the sdk shuts down or the configuration changes so that all of the above does not need to be repeated. There are several requests in your Overthreshold report as all the requests made before initialization are gated by initialization. bucket.waitUntilReady() will do all of the initialization, but it will block while doing them.
    If you are using spring data couchbase, the couchbase sdk is accessible directly from clientConnectFactory().

1 Like

@akhileshsoni this is probably too broad an issue to resolve in a forum post - this is the kind of thing a Couchbase Support contract is excellent for.

I’ll try to provide some starting points regardless:

Couchbase itself is well-capable of sub-millisecond reads and writes at scale, as long as you have sized your cluster, appservers and network accordingly.

The very first thing to determine is your network latency from the appserver to the cluster. If it’s 25 milliseconds then the absolute minimum RTT you’ll see is 50 milliseconds. You want to be running as close as possible.

Then it’s a question of sizing. You need to look at your server metrics - disk, CPU and network usage. Are you hitting a bottleneck on any of those.
Similarly look at your appserver metrics. Check garbage collection logs here too.

2 Likes

@mreiche, thanks for this detailed explanation.

However, as I mentioned, we receive these OT thresholds logs throughout the day, which means that for subsequent requests, the same connection should be reused until the SDK shuts down the connection.
Based on your explanation, it looks like the connections are getting re-established more frequently.

So I enabled trace-level logs to check the configuration of the SDK, sharing them here
Also, we have tried taking a TCP dump as well, but don’t see any issue in the n/w as well.

{
  "clientVersion": "3.6.3",
  "coreVersion": "2.6.3",
  "userAgent": "couchbase-java/3.6.3 (Linux 5.14.0-503.23.1.el9_5.x86_64 amd64; OpenJDK 64-Bit Server VM 21.0.7+6-LTS)",
  "maxNumRequestsInRetry": 32768,
  "ioEnvironment": {
    "nativeIoEnabled": true,
    "eventLoopThreadCount": 4,
    "eventLoopGroups": [
      "EpollEventLoopGroup"
    ]
  },
  "ioConfig": {
    "captureTraffic": [ ],
    "mutationTokensEnabled": true,
    "networkResolution": "auto",
    "dnsSrvEnabled": true,
    "tcpKeepAlivesEnabled": true,
    "tcpKeepAliveTimeMs": 60000,
    "configPollIntervalMs": 2500,
    "kvCircuitBreakerConfig": "disabled",
    "queryCircuitBreakerConfig": "disabled",
    "viewCircuitBreakerConfig": "disabled",
    "searchCircuitBreakerConfig": "disabled",
    "analyticsCircuitBreakerConfig": "disabled",
    "managerCircuitBreakerConfig": "disabled",
    "eventingCircuitBreakerConfig": "disabled",
    "backupCircuitBreakerConfig": "disabled",
    "numKvConnections": 1,
    "maxHttpConnections": 12,
    "idleHttpConnectionTimeoutMs": 1000,
    "configIdleRedialTimeoutMs": 300000,
    "memcachedHashingStrategy": "StandardMemcachedHashingStrategy",
    "timerConfig": {
      "numTimers": 1,
      "tickDurationMs": 10,
      "numBuckets": 512
    }
  },
  "compressionConfig": {
    "enabled": true,
    "minRatio": 0.83,
    "minSize": 32
  },
  "securityConfig": {
    "tlsEnabled": false,
    "nativeTlsEnabled": true,
    "hostnameVerificationEnabled": true,
    "trustCertificates": null,
    "trustManagerFactory": null,
    "ciphers": [ ]
  },
  "timeoutConfig": {
    "kvMs": 2500,
    "kvDurableMs": 10000,
    "kvScanMs": 75000,
    "managementMs": 75000,
    "queryMs": 75000,
    "viewMs": 75000,
    "searchMs": 75000,
    "analyticsMs": 75000,
    "connectMs": 10000,
    "disconnectMs": 10000,
    "eventingMs": 75000,
    "backupMs": 75000
  },
  "loggerConfig": {
    "diagnosticContextEnabled": false
  },
  "orphanReporterConfig": {
    "emitIntervalMs": 10000,
    "sampleSize": 10,
    "queueLength": 1024,
    "enabled": true
  },
  "thresholdLoggingTracerConfig": {
    "enabled": true,
    "emitIntervalMs": 30000,
    "sampleSize": 10,
    "queueLength": 1024,
    "kvThresholdMs": 50,
    "queryThresholdMs": 1000,
    "searchThresholdMs": 1000,
    "analyticsThresholdMs": 1000,
    "viewThresholdMs": 1000,
    "transactionsThresholdMs": 5000
  },
  "loggingMeterConfig": {
    "enabled": true,
    "emitIntervalMs": 600000
  },
  "retryStrategy": "BestEffortRetryStrategy",
  "requestTracer": "ThresholdLoggingTracer",
  "meter": "LoggingMeter",
  "numRequestCallbacks": 0,
  "scheduler": "ParallelScheduler",
  "schedulerThreadCount": 8,
  "transactionsConfig": {
    "durabilityLevel": "MAJORITY",
    "timeoutMs": 15000,
    "cleanupConfig": {
      "runLostAttemptsCleanupThread": true,
      "runRegularAttemptsCleanupThread": true,
      "cleanupWindowMs": 60000,
      "cleanupSet": ""
    },
    "numAtrs": 1024,
    "metadataCollection": "none",
    "scanConsistency": "none"
  }
}

That’s not a good assumption. Normally kv connections are held open for ever. If you run with debug logging you will see all the SDK activity with respect to connections - when it closes old ones and opens new ones. But in general - unless there is rebalances - the connection is opened and stays open forever.

So I enabled trace-level logs to check the configuration of the SDK

Ok - you already have that. Please attach the whole log and I’ll take a look.

the same connection

In the logging, there is an identifier <hex_number>/<hex_number> the first hex_number is the SDK instance id, the second hex_number is the connection id.

Looking at some of the overthreshold reports you provided - while the operation took only 8 microseconds to be processed by the server, it took 70,388 microseconds from the time the request was dispatched (sent) from the client to the time the response was received. So it looks like that time is mostly “on the network”. (If I would have looked closely earlier, I would have noticed that and not concluded the time was spent waiting for the connection to be established) What’s interesting is that there are several requests almost taking almost exactly that time.

            "total_dispatch_duration_us": 70388,
            "total_server_duration_us": 8,
            "total_duration_us": 70400

@mreiche
Not able to attach the file, getting error as