Cluster Node Connection Failure Handling

behrad · October 14, 2015, 8:07am

When some nodes of the configured cluster fail, couchnode is not trying to remove them from the client connections, and I see continuous errors although there are still some healthy nodes in my cluster.
It seems couchnode is not trying to reconnect also, when failed node is again up & available.

Any plans to support connection reconnect/failover in module?

There’s a simple implementation of such scenario in influx node.js module here
https://github.com/node-influx/node-influx/blob/master/lib/InfluxRequest.js

mnunberg · October 14, 2015, 3:16pm

Nodes are only removed from the client if they are failed over, meaning that they have been either manually removed from the cluster, or they have been removed via auto-failover (an optional feature allowing the cluster to automatically remove servers it thinks are dead). Otherwise, the client library must assume that any sort of socket or connectivity error is temporary. Note that Failover is not just a colloquial term to indicate a dead server, but corresponds to an actual Couchbase Cluster Management API call (via REST) to indicate that a specific server has been “Failed over”, as failover is a potentially destructive operation with respect to data.

If a socket connection has been broken (ECONNRESET for example), the client will try to reconnect. It will not try to reconnect on a timeout error because a timeout does not indicate a broken connection – it does indicate that the connection is still open, but for some reason the remote host is not responding - often due to resource exhaustion. In such a scenario, it is certainly not a good idea to add more TCP connections to an already-exhausted server.

It is possible to pre-empt and determine if a given document ID (key) will be routed to a specific server, and thus you can write code which can check for each given key if it would be routed to a server which is known to be “dead” (where “dead” is something defined in your application, and exceeds Couchbase’s definition of dead, which is failed-over). I am not sure if such an API is exposed in the node.js library, but is certainly available in the underlying C library

behrad · October 14, 2015, 5:45pm

May be I was not clear enough @mnunberg
When one of nodes is down (not failed over, or removed) temporarly and node.js client faces timeouts or connection resets, it can circuit break current node and pass requests to other nodes in cluster. However I see node.js module still sends subsequent requests to all cluster nodes, so some of them get connection errors…

Topic		Replies	Views
Couchbase java client fails to reconnect to cluster Java SDK connections , java	2	2344	June 6, 2018
How to handle node failure in the cluster .NET SDK connections , dot-net	1	881	June 27, 2022
Issue with number of java client connections increasing rapidly after fail over on single node in cluster Couchbase Server connections , java	15	3131	July 28, 2017
Manually failed-over node removed from Cluster Couchbase Server	7	509	October 5, 2023
Java client not aware about failed over node Java SDK	2	2052	May 8, 2014

Cluster Node Connection Failure Handling

Related topics