bucket.waitUntilReady stuck after a password change on server side

Hello
I currently used the Java SDK 3.0.8 and have the following issue.
I have a junit that simulate the update of the login\password ob server side. The idea behind is to automatically re login when the password change with a password rotation.
so the code is quite simple :

ClusterEnvironment clusterEnvironment = ClusterEnvironment.builder()
.retryStrategy(FailFastRetryStrategy.INSTANCE)
.build();

ClusterOptions clusterOptions = ClusterOptions.clusterOptions("MyUser",
    "qwerty").environment(clusterEnvironment);

Cluster clusterTmp = Cluster.connect("couchbase://localhost", clusterOptions);
clusterTmp.waitUntilReady(Duration.ofMillis(1000));

clusterEnvironment.eventBus().subscribe(new ddd(clusterTmp));

Bucket bucket = clusterTmp.bucket("MyBucket");
bucket.waitUntilReady(Duration.ofMillis(1000));
for (int i = 0; i < 30; i++) {
  log.info("i:" + i);
  TimeUnit.SECONDS.sleep(1);
}

at this level during the previous loop , i change the pass from qwerty to MyPass
I reset the access to the server with ip action and i continue with

bucket.waitUntilReady(Duration.ofMillis(1000));

Collection collection = bucket.defaultCollection();
collection.get("DoicInsert1599546421768");

unfortunatly the " bucket.waitUntilReady" is stuck and my junit nerver stop.

in the log we can see :
2020-09-08 14:25:16 1747356 [cb-events] WARN com.couchbase.endpoint - [com.couchbase.endpoint][EndpointConnectionFailedEvent][13ms] Connect attempt 428 failed because of AuthenticationFailureException: Authentication Failure {“channelId”:“1A1AF5A300000001/000000008275B2E2”,“circuitBreaker”:“DISABLED”,“coreId”:“0x1a1af5a300000001”,“local”:“127.0.0.1:58050”,“remote”:“localhost:11210”,“type”:“KV”}
com.couchbase.client.core.error.AuthenticationFailureException: Authentication Failure {“channelId”:“1A1AF5A300000001/000000008275B2E2”,“circuitBreaker”:“DISABLED”,“coreId”:“0x1a1af5a300000001”,“local”:“127.0.0.1:58050”,“remote”:“localhost:11210”,“status”:“UNKNOWN”,“type”:“KV”,“xerror”:{“ref”:“d9f47e2c-87cd-446a-0d07-b15c0b11d446”}}
at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.failConnect(SaslAuthenticationHandler.java:475)
at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.maybeFailConnect(SaslAuthenticationHandler.java:280)
at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.channelRead(SaslAuthenticationHandler.java:239)
at com.couchbase.client.core.io.netty.kv.MemcacheProtocolVerificationHandler.channelRead(MemcacheProtocolVerificationHandler.java:84)
at java.lang.Thread.run(Thread.java:748)

As you can see the retry is 428… little too much

My question is how can to be sure that the wait is not stuck ?

I already try to creante an event consumer with some code that diconnect the bucket but without success

Nb The version of sdk is 3.0.8 (latest one)

Thanks for you help.
Vincent

@vdoleans I’m not sure I clearly understand what’s going on in your environment. The reason is that the waitUntilReady has a mandatory timeout chained in at the end so that should fire. I just tried setting a very low timeout on waitUntilReady and got:

Exception in thread "main" com.couchbase.client.core.error.UnambiguousTimeoutException: WaitUntilReady timed out
	at com.couchbase.client.java.AsyncUtils.block(AsyncUtils.java:51)
	at com.couchbase.client.java.Cluster.waitUntilReady(Cluster.java:545)

So are you saying the method does never return? If so, it would be great if you could share debug logs of the bootstrap process so I can take a closer look.

Posting on behalf of @vdoleans:
The message before the wait is

2020-09-09 14:25:02 32100 [main] INFO  com.amadeus.datastore.nosql.couchbase.realdb.TestConnection  - i:29
2020-09-09 14:25:05 34931 [cb-events] ERROR [com.couchbase.io](http://com.couchbase.io/)  - [[com.couchbase.io](http://com.couchbase.io/)][SaslAuthenticationFailedEvent][16ms] Authentication Failure {"channelId":"C105D93D00000001/000000009D9E7B90","circuitBreaker":"DISABLED","coreId":"0xc105d93d00000001","local":"[127.0.0.1:56580](http://127.0.0.1:56580/)","remote":"localhost:11210","status":"UNKNOWN","type":"KV","xerror":{"ref":"5bd87f3d-0051-4a5d-ca2d-dc9fee4caf28"}}
2020-09-09 14:25:05 34931 [cb-events] INFO  com.amadeus.datastore.nosql.couchbase.realdb.TestConnection  - Authentication Failure
2020-09-09 14:25:05 34931 [cb-events] WARN  com.couchbase.endpoint  - [com.couchbase.endpoint][EndpointConnectionFailedEvent][18ms] Connect attempt 12 failed because of AuthenticationFailureException: Authentication Failure {"channelId":"C105D93D00000001/000000009D9E7B90","circuitBreaker":"DISABLED","coreId":"0xc105d93d00000001","local":"[127.0.0.1:56580](http://127.0.0.1:56580/)","remote":"localhost:11210","type":"KV"}
com.couchbase.client.core.error.AuthenticationFailureException: Authentication Failure {"channelId":"C105D93D00000001/000000009D9E7B90","circuitBreaker":"DISABLED","coreId":"0xc105d93d00000001","local":"[127.0.0.1:56580](http://127.0.0.1:56580/)","remote":"localhost:11210","status":"UNKNOWN","type":"KV","xerror":{"ref":"5bd87f3d-0051-4a5d-ca2d-dc9fee4caf28"}}
               at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.failConnect(SaslAuthenticationHandler.java:475)
               at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.maybeFailConnect(SaslAuthenticationHandler.java:280)
               at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.channelRead(SaslAuthenticationHandler.java:239)
               at com.couchbase.client.core.io.netty.kv.MemcacheProtocolVerificationHandler.channelRead(MemcacheProtocolVerificationHandler.java:84)
               at java.lang.Thread.run(Thread.java:748)

The lastest message in my log is :

2020-09-09 14:29:37 306434 [cb-events] ERROR [com.couchbase.io](http://com.couchbase.io/)  - [[com.couchbase.io](http://com.couchbase.io/)][SaslAuthenticationFailedEvent][21ms] Authentication Failure {"bucket":"MyBucket","channelId":"C105D93D00000001/00000000523AF4BB","circuitBreaker":"DISABLED","coreId":"0xc105d93d00000001","local":"[127.0.0.1:56599](http://127.0.0.1:56599/)","remote":"localhost:11210","status":"UNKNOWN","type":"KV","xerror":{"ref":"f1631cff-be54-4b01-4514-54a9241de5d1"}}
2020-09-09 14:29:37 306434 [cb-events] INFO  com.amadeus.datastore.nosql.couchbase.realdb.TestConnection  - Authentication Failure
2020-09-09 14:29:37 306434 [cb-events] WARN  com.couchbase.endpoint  - [com.couchbase.endpoint][EndpointConnectionFailedEvent][24ms] Connect attempt 78 failed because of AuthenticationFailureException: Authentication Failure {"bucket":"MyBucket","channelId":"C105D93D00000001/00000000523AF4BB","circuitBreaker":"DISABLED","coreId":"0xc105d93d00000001","local":"[127.0.0.1:56599](http://127.0.0.1:56599/)","remote":"localhost:11210","type":"KV"}
com.couchbase.client.core.error.AuthenticationFailureException: Authentication Failure {"bucket":"MyBucket","channelId":"C105D93D00000001/00000000523AF4BB","circuitBreaker":"DISABLED","coreId":"0xc105d93d00000001","local":"[127.0.0.1:56599](http://127.0.0.1:56599/)","remote":"localhost:11210","status":"UNKNOWN","type":"KV","xerror":{"ref":"f1631cff-be54-4b01-4514-54a9241de5d1"}}
               at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.failConnect(SaslAuthenticationHandler.java:475)
               at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.maybeFailConnect(SaslAuthenticationHandler.java:280)
               at com.couchbase.client.core.io.netty.kv.SaslAuthenticationHandler.channelRead(SaslAuthenticationHandler.java:239)
               at com.couchbase.client.core.io.netty.kv.MemcacheProtocolVerificationHandler.channelRead(MemcacheProtocolVerificationHandler.java:84)
               at java.lang.Thread.run(Thread.java:748) 

we can see clearly a loop after the row:

2020-09-09 14:25:02 32100 [main] INFO com.amadeus.datastore.nosql.couchbase.realdb.TestConnection - i:29

the last entry is

2020-09-09 14:29:37 so the delta is more than 4 minutes

In the other end I put a Thread dump ( via jstack -l 11864 )

you can see:

"main" #1 prio=5 os_prio=0 tid=0x0000000003418000 nid=0x3b4c waiting on condition [0x000000000340d000]
   java.lang.Thread.State: WAITING (parking)
               at sun.misc.Unsafe.park(Native Method)
               - parking to wait for  <0x00000007705b8f60> (a java.util.concurrent.CompletableFuture$Signaller)
               at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
               at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
               at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
               at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
               at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
               at com.couchbase.client.java.AsyncUtils.block(AsyncUtils.java:38)
               at com.couchbase.client.java.Bucket.waitUntilReady(Bucket.java:224)

So for me there is a lock inside the java driver 11864.zip (4.1 KB) example.log.zip (13.5 KB)

Hello
To help the investigation I create a reproducer with a basic Junit
The complete scenario can be found in the document scenario.txt
I hope it help you to identify the issue
Vincentcouchbase-client-impl-pass.zip (129.1 KB)

@vdoleans I could reproduce locally and I think I already have a fix for it. Needs review and some more testing but you can follow JVMCBC-889 .