checking if a Couchbase node is down

i am working with Couchbase server 2.5.1 and the java client.

the Couchbase auto fail-over happens after 30 seconds and Couchbase can only auto fail-over one node until you reset the fail-over counter. because of this limitations i need to develop my own fail over mechanism. i found all the rest commands the allow me to choose a node and fail-over it. now i need to find a way to identify if a chosen node is down.

has any one faced this problem?
does any one has any suggestions?

1 Answer

« Back to question.


we did it as well, because it should be possible that half of the cluster may go down (one datacenter) and the cluster still remain active.

I did by polling the "/pools/nodes" every 10 seconds-
Parsing the response via Jackson. The important values here are clusterMembership and status.
If one node goes down (Couchbase notices this after a couple of seconds), its status will reflect this, and after 20-30 seconds I will initiate a failover on one of the healthy nodes via REST.
If one more node drops out, I check whether it is in the same save data center (using the new server group feature in 2.5.x) and if yes, I will fail that over as well, else wait for recovery.
It works, but is not yet productive yet.