Issue when trying to connect to the cluster after updating the version of Java SDK

We are experiencing the issue when trying to connect to the cluster after updating the version of Java SDK.

The setup of the system is as follows:

We have a web application that is using Java SDK and a Couchbase cluster. In between we have a VIP (Virtual IP Address). We realise that isn’t ideal but we’re not able to change that immediately since VIP was mandated by Tech Ops. VIP is basically only there to reroute the initial request on application startup. That way we can make modifications on the cluster and ensure that when application starts it can find the cluster regardless of the actual nodes in the cluster and their IPs.

Prior to the issue we used JAVA SDK version 1.4.4. Our application would start and Java SDK would initiate a request on port 8091 to VIP. Please note that port 8091 is the only port open on VIP. VIP would reroute the request to one of the node cluster currently in use the cluster would respond to Java SDK. At that point Java SDK would discover all the nodes in the cluster and application would run fine. During up time if we would add, remove a node from the cluster Java SDK would update automatically and everything would run without the issue.

In the last sprint we updated the Java SDK to version 2.1.3. Our application would start and Java SDK would initiate a request on port 11210 to VIP. Since this port is not open the request would fail and Java SDK would throw an exception:

Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseCluster.openBucket(CouchbaseCluster.java:108)
at com.couchbase.client.java.CouchbaseCluster.openBucket(CouchbaseCluster.java:99)
at com.couchbase.client.java.CouchbaseCluster.openBucket(CouchbaseCluster.java:89)

No further request would be made on any port.

It appears the order in which port are being used has been changed between versions. Could somebody please confirm, or dispute, that the order in which ports are being used for cluster discovery has been changed between versions. Also could somebody please provide some advice on how we could resolve the issue. We are trying to understand the clients behavior, if we could open all those ports on the VIP would the client still then function correctly and at full performance?

The issue is happening on our production environment which we cannot use for testing out potential solutions since it will interfere with our products.

Would it be possible for you to share the TRACE log of the 2.1.3 connect process? And maybe also try with 2.2.0 just to get a fresh set of logs.

I’d like to see where it is stuck, the bootstrap process between 1.x and 2.x is slightly different, so there might be something going on - we’ll take it from there!

Thank you for responding.

I will speak with the Dev Lead of the team that is working on the project and with the Dev Ops and see what we have in logs.

One issue is that all logs our applications produce are considered classified and we will never get the clearance to share them on a public forum. If we find have any TRACE logs that could be useful we will need a different way to provide them to you for inspection.

I will let you know of the result.

Any advice where we should focus during trouble shooting?

Thank you for your time.

@idejanovic you can DM me the logs here so they will be private. Or you can send me a DM and I’ll get in touch with you over email if this is more convenient for you. Also, of course if you have a support contract please work it through them, this is the most reliable way to get help.

@daschl

Thanks a lot for your time. I will take this to our DevOps team since they are the only one who have access to production logs. If they have something we could provide to you and they receive clearance to give me the logs I will contact you over DM to agree what is the best way for me to provide the logs for you.

Thanks again for the help.

1 Like

Was there a solution here? I have a project thats pretty much exactly the same down to the netscaler. Opened up both 8091 and 11210 ports to no avail…information greatly appreciated!

@eallenvii can you enable TRACE logging and share those logs from the bootstrap process, please?

Working to get it…working within a Spring Boot application, any guidance on how to enable that for couchbase?

I think spring boot uses a regular logger like log4j or something. Just bump up the log level to the highest level, FINEST or TRACE, something like that :smile: