Golang SDK 2.1 not reconnecting in case of dead node

Hello!

I’m testing Couchbase as an alternative for my MongoDb cluster. We using golang and python as our main backend languages.
Now I’m stucked with Go driver (v.2.1.1) because it not ignoring dead nodes. My test cluster setup:

  • 3x CB nodes (all with default configs)
  • 1 bucket with 1 replica (so 2 versions of data spreads on cluster)
  • connection URI: couchbase://node1,node2,node3

Golang driver stops working when I shuting down node1. While python driver with same URI (and actually code) is OK for read and writes.

Error is: panic: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"Get","Opaque":"0x0","TimeObserved":2500413918,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}

Full log here https://pastebin.com/mCNyhN6P

Hope to find help here. Thanks

UPD when I change nodes order say couchbase://node2, node1,node3 requests are started to work.

Hi @kirik how are you shutting down the node? I can see in the logs that the SDK is connecting to the cluster and fetching the cluster configuration correctly. I’m wondering if there’s a timing issue going on. How connecting works is:

  • When you call Connect the SDK verifies that the options are all good and then asynchronously connects to the nodes listed in the connection string (in parallel). It will keep retrying to connect until it’s successful or shut down.
  • Once it successfully manages to fetch a cluster config map then it will create additional connections to any nodes that weren’t in the connection string and stop trying to connect to nodes that were in the connection string but not in the map.
  • Any operations received whilst connecting/setup is happening will placed into a queue until the SDK is ready.

If your timeout is short then it’s possible that your Get is being queued but times out in the queue, before the SDK is ready. You can try to use bucket.WaitUntilReady(timeout, options) which will block until either the connections are ready for use or the timeout value is reached.

Hello @chvck !

how are you shutting down the node?
I just powered off the server. I also tried to stop couchbase service, but got the same behavior.

You can try to use bucket.WaitUntilReady(timeout, options)
I tried this too with no luck - it dies when first node is down.

This timeout occurs only when first node in the connection URI is down (see UPD). I tried to make RF=2 and shut down 2 of 3 nodes - everything is fine as far as the first node is UP.

Hi @kirik, I’m not able to reproduce this locally. I use a connection string with 3 address, first in the list is one that I’ve made up and doesn’t point to anything. Could you share some of your code please?