when application connect Query node occurred Error message about Authentication.
Is this what it is? Why is it a data node and not a query node? In python, both the data node and the query node work well.
Error Message is follow:
2017-11-16 15:52:12.022 WARN 2189 â [ cb-io-1-6] c.c.client.core.endpoint.Endpoint : [null][KeyValueEndpoint]: Authentication Failure.
2017-11-16 15:52:12.025 WARN 2189 â [ cb-io-1-6] c.c.client.core.endpoint.Endpoint : [null][KeyValueEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS:
com.couchbase.client.core.endpoint.kv.AuthenticationException: Bucket not found on Select Bucket command
at com.couchbase.client.core.endpoint.kv.KeyValueSelectBucketHandler.channelRead0(KeyValueSelectBucketHandler.java:154)
at com.couchbase.client.core.endpoint.kv.KeyValueSelectBucketHandler.channelRead0(KeyValueSelectBucketHandler.java:41)
at com.couchbase.client.deps.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at com.couchbase.client.deps.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
at com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
at com.couchbase.client.deps.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at com.couchbase.client.deps.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1302)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at com.couchbase.client.deps.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 15:52:12.025 INFO 2189 â [ cb-io-1-6] c.c.client.core.endpoint.Endpoint : [null][KeyValueEndpoint]: Got notified from Channel as inactive, attempting reconnect.
Hm, the error message printed in the stack trace is actually referring to KV (11210) and not to Query, but it should also be transient. Can you share the full logs maybe?
@notemusic are you bootstrapping in the SDK (the list of nodes) only against nodes which have the KV service enabled? If you have a MDS setup (so i.e. nodes with only the query service), they need to be excluded from the bootstrap list.
Also, can you still share the full log from startup so we can see whats going on? this one stack doesnt help much unfortunately.
@notemusic no Iâm only saying in the bootstrap list, when doing CouchbaseCluster.create(ââŚâ) you should only pass in nodes with KV on them, the rest is fully transparent.
With our upgrade to 5.1 (from 4.5.1) we are seeing lots of WARN logs of âBucket not found on Select Bucket command.â We are not sure if these are real problems yet, but they do not occur with 4.5.1, and do in both 5.0.1 and 5.1.0.
We currently have all nodes in the cluster in our bootstrap process. Our code, through a VIP, reads the cluster map and generates the configuration with all the nodes in the cluster. We will be changing that to have only âkvâ nodes in the bootstrap, which seems to work. Thatâs a deployment of > 100 services.
But⌠The question is why was the decision made to change that?
@unhuman 5.0 changed the way the SDK works against the server through KV since we added RBAC - the authentication is now separate from bucket selection⌠since you can have more than one bucket with your user creds accessibleâŚ
@daschl Hey - so these logged messages are warnings. Do they affect our ability to interact with the cluster in any way? Weâre trying to figure out how we upgrade our applications to the fixes (do we need to mandate do this ASAP or can we take a more moderate approach?). The affected applications are using the migrated bucket credentials.
If you want 5.x to behave compatibly with 4.x, the guidelines are to create a user whose name is the same as the bucket. With that, you can take a moderate approach and then shift the app servers over to the newer client using the newer authentication approach. See the docs for details!
Iâd also recommend against using a VIP that resolves to all of the nodes. There is a subtle thing incorrect there in that the config generated has an entry that the client substitutes for the bootstrap hostname. Itâll probably âcorrectâ itself after some time as it updates the config, but itâll be incorrect initially. DNS SRV is our supported way to maintain the list in a single location as you add/remove nodes. Again, see the docs.
Also, to investigate whatâs happening, you may want to try running sdk-doctor in that env? It may not find anything, but it will validate that common connectivity problems arenât there.
Youâll find pre-built binaries on the release page.
Usually the summary at the end is pretty easy to interpret.
@ingenthr Weâre experiencing this issue with latest Java SDK 2.5.8 connecting to the latest version of couchbase server 5.1.0. This seems to only happen when the server is in the middle of a rebalance. This is a large cluster with > 1TB of data so the rebalances are long and slow.
WARN [2018-06-01 17:04:03,865] com.couchbase.client.core.endpoint.Endpoint: [KeyValueEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS: com.couchbase.client.core.endpoint.kv.AuthenticationException: Bucket not found on Select Bucket command
WARN [2018-06-01 17:04:03,866] com.couchbase.client.core.endpoint.Endpoint: [KeyValueEndpoint]: Authentication Failure.
WARN [2018-06-01 17:04:03,866] com.couchbase.client.core.endpoint.Endpoint: [KeyValueEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS: com.couchbase.client.core.endpoint.kv.AuthenticationException: Bucket not found on Select Bucket command
WARN [2018-06-01 17:04:03,884] com.couchbase.client.core.endpoint.Endpoint: [KeyValueEndpoint]: Authentication Failure.
WARN [2018-06-01 17:04:03,884] com.couchbase.client.core.endpoint.Endpoint: [KeyValueEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS: com.couchbase.client.core.endpoint.kv.AuthenticationException: Bucket not found on Select Bucket command
We see this over and over again and it doesnât recover.
It does seem like things are still functionally working but the logs are spamming this over and over again and itâs not clear whatâs going on. Is this a bug in the java client SDK or is something else broken here?
It looks like all these AuthenticationException: Bucket not found on Select Bucket command warnings are because these particular endpoints in the cluster map are the nodes that are being added to the cluster during the rebalance and the buckets/data do not exist on the cluster yet.
This seems like a bug in the java sdk client in that it shouldnât log these as warnings given that is expected/normal behavior while adding a new node to an existing cluster (rebalance).
I donât think it is expected, but will need to check. The bug may be that they are in the configuration (as other than fast forward) and are taking connections before theyâre ready to service those connections. There might be nothing we can do other than WARN and retry, since the client doesnât know when itâll be ready and silently retrying might mask a different issue.
You said this is with 5.1? Let us look into this a bit.
@ingenthr Any update on this issue? Just curious if this is being addressed in a later version of the java client SDK? Weâre currently using version â2.5.8â
@bryan the root cause of this is that there is a delay when the server sends the SDK that a new node is available and the time the kv engine is able to serve that bucket.
When you say it doesnât recover⌠do you mean during the rebalance? I was expecting that it at least vanishes once rebalance is done? Iâll see if there is a way we can improve that experience so that it doesnât spam the logs. Also to double check: actual traffic should not be impacted right.