Couchbase failed to insert due to kvTimeout

Pothiq · April 21, 2023, 8:02am

Hello Couchbase Support Team,
We are encountering an issue where insert requests to the cache are failing with a timeout error during high-load situations. The error message is as follows:

"Failed when inserting into cache: InsertRequest, Reason: TIMEOUT {"cancelled":true,"completed":true,"coreId":"0x6dc425ca00000002","idempotent":false,"lastDispatchedTo":"...prod...","reason":"TIMEOUT","requestId":...,"requestType":"InsertRequest","retried":14,"retryReasons":["ENDPOINT_NOT_WRITABLE"],"service":{"bucket":"bucket-disableFlag","collection":"_default","documentId":"1.0.0-disableFlag-id-id",opaque":"0x4e216a0","scope":"_default","type":"kv","vbucket":45},"timeoutMs":2500,"timings":{"encodingMicros":11,"totalMicros":2510048}}"

We have found that increasing the “kvTimeout” setting to 10000 resolves the issue. However, we would like to explore alternative solutions that do not require increasing the timeout value.
Could you please advise on any right way or alternative ways to improve the situation?
Additionally, we have a question regarding the “retried”:14 parameters in the error message. Is this parameter configurable and can we reduce the number of retries? Also, are there any negative consequences of having a high number of retries?

mreiche · April 21, 2023, 4:34pm

Look at earlier messages from the SDK to find out why there were ENPOINT_NOT_WRITEABLE conditions. (com.couchbase.core info or perhaps debug). There may also be some clues in the server logs. The a node may be (temporarily) rejecting connections due to heavy load. Adding more nodes may improve performance. These retries don’t have much of a downside - after a short delay, the client just checks the endpoint and sees that it is (still) not available and then schedules another retry.

As improvements are continually being made to the SDKs, it’s beneficial to use the latest version.

logback.xml

<?xml version="1.0" encoding="UTF-8"?>

<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
        <pattern>%d %5p %40.40c:%4L - %m%n</pattern>
    </encoder>
</appender>

<root level="warn">
    <appender-ref ref="console"/>
</root>

<logger name="com.couchbase.core" level="info"/>"

graham.pople · April 24, 2023, 3:21pm

This has a couple of main reasons:

The Endpoint (TCP connection, essentially) isn’t connected currently. The SDK logging should show if this is the case.
The network library we use (Netty) is reporting that it is not immediately ready to write to the connection. Since you report it’s happening under high load, this is most likely.

It can be challenging to debug exactly where the bottleneck is. I’d start with checking resource usage on both the cluster and application side, and GC logs on the application side.

You could try adding more KV connections from the SDK to the cluster, using numKvConnections (Client Settings | Couchbase Docs).

What Couchbase server version are you using? 7.0+ will make most efficient use of the connections.

Additionally, we have a question regarding the “retried”:14 parameters in the error message. Is this parameter configurable and can we reduce the number of retries? Also, are there any negative consequences of having a high number of retries?

You could use the FastFailRetryStrategy - but I wouldn’t recommend it. In any distributed system, some degree of retry is a necessity; networks are unreliable, and servers can be transiently overloaded.

The more retries the better the availability for a single request - but the more load added to the system. As with all things in distributed system, it’s a tradeoff. The default BestEffortRetryStrategy uses an exponential backoff that aims for a middle ground.

system · July 23, 2023, 3:22pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
KV upsert throwing TimeoutException [Couchbase 4.5] Java SDK query , java	7	3383	August 18, 2016
Insert request failed with error kv_temporary_failure Couchbase Server java	1	1228	November 15, 2021
Frequent TimeoutException from Java SDK Java SDK query , connections , java	7	3678	January 13, 2021
Couchbase client randomly timeouts during document upsert Java SDK	12	6842	December 8, 2017
Getting a timeout error when trying to upsert Java SDK	2	1975	July 7, 2021

Couchbase failed to insert due to kvTimeout

Related topics