com.couchbase.client.core.RequestCancelledException: Could not dispatch request, cancelling instead of retrying

Great news! Curious, what did you turn it to? And did you use the queryServiceConfig() rather than queryEndpoints? That should allow the SDK to manage it more dynamically.

Hi @ingenthr,

I used the QueryServiceConfig since the queryEndpoints are deprecated in 2.5.1. I set the min to 10 and the max to 200. I will probably set the max to 500 just to be safe.

Thanks,

K

Actually I just got a report that we are seeing sporadic warnings like this:

log_level=WARN thread=RxComputationScheduler-8 logger=c.c.c.c.e.AbstractGenericHandler[QueryEndpoint]: Got error while consuming KeepAliveResponse.
java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at rx.internal.operators.OperatorTimeoutBase$TimeoutSubscriber.onTimeout(OperatorTimeoutBase.java:177)
at java.lang.Thread.run(Thread.java:745)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

Though I do not see any RequestCancelledExceptions. Any ideas?

There is a background keep alive to check on the health of the connection if it’s otherwise idle. That one wouldn’t cause anything in your application since the consumer of the keepalive is in the internals of the SDK. We’ve long been doing this from the SDK, but recently added warning logging if I recall correctly. It was necessary to add because we found some cloud type environments would shut connections down silently if they went idle.

However, I think it may indicate something is not healthy in your environment. This is against the QueryEndpoint, which is a small HTTP ‘ping’ request.

Does that keepalive timeout correlate to your workload?

Hi @ingenthr,

No this is actually occurring with minimal load. It may be something related to our environment so we’ll keep an eye on it. However, the RequestCancelledExeption has revealed its ugly head again. Though this time it’s not the exact same error message:


com.couchbase.client.core.RequestCancelledException: Request cancelled in-flight.
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyDown(DefaultChannelPipeline.java:876)
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroy(DefaultChannelPipeline.java:834)
at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyDown(DefaultChannelPipeline.java:876)
at java.lang.Thread.run(Thread.java:745)
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroy(DefaultChannelPipeline.java:834)
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyUp(DefaultChannelPipeline.java:842)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:162)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:176)

Notice that this time is says Request cancelled in-flight instead of Could not dispatch request, cancelling instead of retrying. This is occurring in a module that does not have much traffic running though it. I don’t have the trace logs so I will work on getting them. In the meantime, any suggestions would be very helpful.

Thanks,

K

As covered in the docs, this is most likely related to an IO problem after the operation is dispatched. Chances are, your logs (and you don’t need trace, just WARNING and below or equivalent should be good) will show a connection dropped.

Since the keepalives are also timing out, is it possible that there’s something in the environment that can lead to higher latencies than expected? Is this running across a WAN or in some kind of lambda execution environment which may be paged out/quiesced between operations? If so, you can turn the default timeouts higher or the thresholds for occasional errors on keepalives higher. Across a WAN is not tested to (and thus, not supported). A different kind of execution environment than running in an app server might need some adjustments from the defaults.