If the Couchbase cluster referenced by a bucket is unexpectedly unavailable when an operation is attempted (eg. as a result of a network failure) the SDK raises a code 16 error through the ‘error’ event attached to each Bucket instance. Requests after the connection times out will raise the code 23 timeout error. It appears that even if the Couchbase cluster becomes available again the Bucket connection will not attempt to reconnect if these errors are encountered and there is no way to configure an automatic reconnection in the Node.js SDK in this particular scenario.
I’ve seen several examples where others detect the code 16 and/or 23 error and attempt to recover via a variety of methods. It seems easy enough to call ‘openBucket’ again under these circumstances and substitute the previous bucket instance with with a new instance if the connection is successful. Is this a recommended methodology for reconnecting after a failure? Furthermore, are there any limitations or issues I should be aware of with this approach?
Interesting, now that you’ve said you use traffic to drive the reconnect I’ve modified some of my tests and discovered that with a reasonable request rate it does seem to reconnect automatically, but the more time between requests, the less reliable the reconnect becomes.
My test environment is as follows:
Client: MacOS 10.10.4, Node.js 4.4.2, Couchbase Node.js SDK 2.2.1
Server MacOS 10.10.4, Couchbase Server 3.0.2-1603
I am disconnecting the network interface the Couchbase host is on to simulate a network failure. I’ve also tried terminating the Couchbase Server process with similar results. If I run this script side-by-side with an interval of 500ms and an interval of 10000ms, the 500ms script will consistently reconnect, the 10000ms script will only reconnect sometimes. I’ve left the scripts running for upwards of 20 minutes and a process that doesn’t reconnect near-immediately never appears to recover.
Is there a certain threshold or limit that needs to be met in order to detect that the cluster is available again. If so, what is that threshold?
Just an update for others seeing similar behaviour:
After escalating this through Couchbase’s support channels an issue was identified in the C SDK, libcouchbase, which resulted in this behaviour. Updating to libcouchbase 2.6.2 solved the issue in our test cases. The Couchbase Node.js SDK 2.2.2 packaged libcouchbase 2.6.2 and has been released.
TL;DR: Upgrade to Couchbase Node.js SDK 2.2.2 or higher to resolve.