The remote side disconnected the endpoint unexpectedly - warn in Java sdk 3.0 + couchbase server 6.5.1

Hello, I just installed Couchbase 6.5.1 build 6299 + Java client 3.0.4.
Unfortunately, every 5-15 minutes I get the following error (warn) in Java log:
[cb-events] WARN com.couchbase.endpoint - [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"DISABLED","coreId":"0x7f2debc400000001","local":"127.0.0.1:52754","remote":"localhost:8093","type":"QUERY"}

I tried to restart the server, close and open the web console - the error is always present.
I haven’t found examples for sdk 3.0 initialization, but similar to version 2.7, I create two global instance of cluster and bucket and access them from different threads (I use kotlin+ktor as a framework) - perhaps you can suggest a better solution for initialization and configuration params for server app, but for now I initialize couchbase as follows:

lateinit var bucket: Bucket
lateinit var cluster: Cluster
...
fun initCouchbase() {
    val env = ClusterEnvironment
            .builder()
            .timeoutConfig(TimeoutConfig.kvTimeout(Duration.ofSeconds(15)).queryTimeout(Duration.ofSeconds(15)).connectTimeout(Duration.ofSeconds(15)))
            .ioConfig(IoConfig.maxHttpConnections(11).numKvConnections(11))
            .requestTracer(com.couchbase.client.core.cnc.tracing.ThresholdRequestTracer.builder(null).queryThreshold(Duration.ofSeconds(12)).build())
            .build()

    cluster = Cluster.connect("localhost", clusterOptions("...", "...").environment(env))
    bucket = cluster.bucket("...")
}

The error appears in the logs even when there are no requests to the database from the server. After this error appears, the server continues to work and correctly executes requests to the database, if they are received. I found no mention of this error in discussions on the forum or elsewhere, so I decided to write here. I would like to figure out if this error can affect the stability and performance of the server.

Sincerely.

1 Like

Can I get any comments from the support team or other users who are experiencing this problem? I am ready to provide all the necessary logs, but information from which files can help?

Hi Poltar

My guess, refer to https://docs.couchbase.com/java-sdk/current/ref/client-settings.html (JAVA) or https://docs.couchbase.com/dotnet-sdk/3.0/ref/client-settings.html#io-options (NODE) and search for the string " Circuit Breaker Options" you should be able to add the Template to your current ClusterEnvironment.

See Matt’s answer ingenthr below

Best

Jon

This sounds a lot like the behavior @daschl identified from the change in MB-37032. Effectively, the query service changed its behavior to drop idle connections as mitigation against an attack vector. Since the SDK prior to that is designed to keep the connection open to have the lowest latency when a query is requested, the SDK will constantly reconnect whenever query drops the connection.

Unfortunately, the change wasn’t well identified in advance and it wasn’t found in testing.

The workaround is to lower the idle timeout in the SDK.

For Java 3.x, that’s:

CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder().queryServiceConfig(QueryServiceConfig.create(0, 12, 10)).build();

Similar tune-ables exist for Java 2.x and other SDKs.

There’s a rough plan to adjust to the new cluster behavior tracked under CBD-3366.

Thank you for your answer.
I configured couchbase like this:
val env = ClusterEnvironment.builder().timeoutConfig(TimeoutConfig.kvTimeout(Duration.ofSeconds(16)).queryTimeout(Duration.ofSeconds(16)).connectTimeout(Duration.ofSeconds(16))).ioConfig(IoConfig.maxHttpConnections(12).numKvConnections(12).enableDnsSrv(false).kvCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(true).volumeThreshold(45).errorThresholdPercentage(25).sleepWindow(Duration.ofSeconds(1)).rollingWindow(Duration.ofMinutes(2))).queryCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(true).volumeThreshold(45).errorThresholdPercentage(25).sleepWindow(Duration.ofSeconds(1)).rollingWindow(Duration.ofMinutes(2))).managerCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(true).volumeThreshold(45).errorThresholdPercentage(25).sleepWindow(Duration.ofSeconds(1)).rollingWindow(Duration.ofMinutes(2)))).ioEnvironment(IoEnvironment.eventLoopThreadCount(12)).requestTracer(com.couchbase.client.core.cnc.tracing.ThresholdRequestTracer.builder(null).queryThreshold(Duration.ofSeconds(12)).build()).build()

But now this type of error appears in log:
[cb-events] WARN com.couchbase.endpoint - [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"CLOSED","coreId":"0xbe7ca8b000000001","local":"127.0.0.1:56635","remote":"localhost:8093","type":"QUERY"}

@ingenthr, thank you, I will try your solution!)

DefaultCouchbaseEnvironment and ClusterEnvironment.queryServiceConfig are unresolved reference.
I think DefaultCouchbaseEnvironment is for 2.x.x SDK.

For 3.0.0 I think you might need
import com.couchbase.client.core.service.QueryServiceConfig;
but I am not sure how to use it.
But you can use the following to disable / enable the query circuit breaker however it will still emits log messages but I imagine if it is disable it isn’t doing anything.
IoConfig.queryCircuitBreakerConfig(CircuitBreakerConfig.builder().enabled(false))

@Poltar as far as your issue with the messages RE “circuitBreaker”:“DISABLED” as I also got in my test setup

Jun 12, 2020 9:29:32 AM com.couchbase.client.core.cnc.LoggingEventConsumer$JdkLogger warn
WARNING: [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"DISABLED","coreId":"0x15ed0a3e00000001","local":"192.168.3.249:50781","remote":"192.168.3.150:8093","type":"QUERY"}

the new Java SDK seems to have lowered a key server setting from 30 seconds down to 5 seconds adjusting idleHttpConnectionTimeout to 4 seconds will eliminate the messages you see.

Note setting the other item to 12 i.e. queryThreshold seems to clean up another log file message that I see in my specific test jig you may or may not need this.

ClusterEnvironment env = ClusterEnvironment.builder()
			.ioConfig(IoConfig.idleHttpConnectionTimeout(Duration.ofSeconds(4)))
			.requestTracer(ThresholdRequestTracer.builder(null)
			.queryThreshold(Duration.ofSeconds(12)).build())
			.build();

Sorry I can’t provide the internal details on the why and how this works. I only got the messages suppressed/cleaned up with some help from @daschl

1 Like

Setting the idleHttpConnectionTimeout parameter solved the problem. Some of my queries take more than two seconds (the default values for the queryThreshold parameter, if I’m not mistaken) - this is normal and I don’t need a log message about it, so I increased this value.

Thank you! I noted your answer as a solution to the problem)

The solution works for 3.0.4 but happen again quite regularly on 3.0.7. Anyone knows a solution?

Setting the idleHttpConnectionTimeout parameter solve the problem ?

Setting the idleHttpConnectionTimeout parameter solve the problem ?

192.168.100.1 192.168.1.1 jpg to pdf

Hi,

I’ve also found the same issue,
I’ve tried using sdk 3.0.4 or 3.0.7 still the same issue
WARN [com.cou.endpoint] (cb-events) [com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {“circuitBreaker”:“DISABLED”,“coreId”:“0x1ec6629500000001”,“local”:“127.0.0.1:57170”,“remote”:“localhost:8093”,“type”:“QUERY”}

@blancat setting idleHttpConnectionTimeout to 4 seconds does not seem to solve the problem on 3.0.7 and even the latest 3.0.8

1 Like

I’m also having this issue when trying to run couchbase locally on SDK 3.0.7 using test containers (v 1.14.3).
@blancat @ingenthr

Note for anyone reading this thread, the WARNing in the logs with:

Is a generic message. The specific cause can’t be determined from the SDK log. It could be anything from the query process crashing to the remote computer crashing to some kind of network change that closes the TCP connection. To understand more specifically why, the server side logs would need to be examined.

There was one cause which was a change in the query service default idle time I describe above.

If changing the idleHttpConnectionTimeout doesn’t have an impact, then it’s likely the cause of the server side dropping the connection is something else. The SDK won’t know that cause.

For the most recent posters, can you correlate this to any logs from the query service around the same time?

The testcontainers might be a different scenario. If, for example, with testcontainers you’re taking down the cluster before disconnecting the SDK, you’d expect to see this. Maybe you can describe the scenario more fully?

Oh, is there any documentation on how to shut down the SDK?

So for testcontainers, first thing I do is set up my container
@Container
final static CouchbaseContainer couchbaseContainer = new CouchbaseContainer(couchbaseServerImg)
.withBucket(new BucketDefinition(couchbaseBucketName));
and use CouchbaseInitializer to initialize some key values (username, password, bucket, etc)

Then as a @PostConstruct, I connect to the cluster, then I get the cluster itself & then start adding some data.

Does that flow sound right to you?

Yes, here is the documentation on the connection lifecycle. It boils down to calling Cluster.disconnect(), followed by ClusterEnvironment.shutdown()if you created a custom environment.

Does that flow sound right to you?

Sure. You might want to call Cluster.waitUntilReady right after connecting, if you’re not doing that already.

Thanks,
David Nault

in spring boot add this to application.property file

spring.couchbase.env.io.idle-http-connection-timeout=4s

FYI: the issue discussed here is fixed in Java SDK 3.0.9 and in Spring Data Couchbase 4.0.5.

2 Likes