Every precisely 30 minutes I get an error message in my logs stating
[<URL HERE>/<IP HERE>:8092][ViewEndpoint]: Could not connect to endpoint, retrying with delay x MILLISECONDS:
about 5-10 times. After that I get around 5 times
[/<IP HERE>:8092][ViewEndpoint]: Could not connect to endpoint, retrying with delay x MILLISECONDS:
The full stacktrace for the exception for that is
java.net.ConnectException: Connection refused: /<IP HERE>:8092
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_31]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[?:1.8.0_31]
at com.couchbase.client.deps.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208) ~[core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281) [core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) [core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [core-io-1.1.3.jar:1.1.3]
at com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [core-io-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_31]
Please note the leading slash / before the IP address and the difference in the error message, once with URL and IP and once without URL.
How could this happen?
The cluster consists of 3 servers, freshly set up. I have another cluster replicating to this cluster into one bucket via XDCR and another bucket which is empty. The error messages occur for all 3 servers in the cluster with the repetitions mentioned above. After that the client can connect to the cluster again.
If I am in the web admin console while the exceptions are happening, in the “Data Buckets” view, I can see all buckets being in the state of rebalancing for a short period of time (yellow portion of the green circle). Reloading the web console during this state takes longer than usual.
Besides that nothing is happening on that cluster other than a client being connected to the empty bucket, but not accessing it.
Another cluster set up just like this one (using Chef) works without problems.
I am using Couchbase 3.0.2 (1603) Enterprise + JDK 2.1.2 (happens with 2.1.3, too).
Googling the exception yields exactly one result - There is a problem when i using java sdk 2.1.x which is the topic before this one.