When left long enough, the connector pod errors on every request while taking 100% cpu usage

connector is taking long time to process queries while its cpu usage is 100% and below logs were getting printed repeatedly.

23:01:04.922 [nioEventLoopGroup-5-1] WARN c.c.c.d.b.PersistencePollingHandler - Failed to fetch failover log for 623/0. Closing channel.
com.couchbase.client.core.state.NotConnectedException: Channel became inactive while awaiting response.
at com.couchbase.client.dcp.transport.netty.DcpMessageHandler.channelInactive(DcpMessageHandler.java:163) [dcp-client-0.19.0.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:240) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:226) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:240) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:226) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:379) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:344) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:240) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:226) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1299) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:240) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:226) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:903) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:768) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:464) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [core-io-1.6.1.jar:?]
at com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [core-io-1.6.1.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

Hi Naveen,

The fact that dcp-client-0.19.0.jar appears in the stack trace indicates you’re running an old version of the Elasticsearch connector that doesn’t throttle reconnection attempts. Does upgrading to the latest version of the Elasticsearch connector bring the CPU usage back under control?

Thanks,
David