i am working with Couchbase server 2.5.1 and the java client.
the Couchbase auto fail-over happens after 30 seconds and Couchbase can only auto fail-over one node until you reset the fail-over counter. because of this limitations i need to develop my own fail over mechanism. i found all the rest commands the allow me to choose a node and fail-over it. now i need to find a way to identify if a chosen node is down.
has any one faced this problem?
does any one has any suggestions?
we did it as well, because it should be possible that half of the cluster may go down (one datacenter) and the cluster still remain active.
I did by polling the “/pools/nodes” every 10 seconds-
Parsing the response via Jackson. The important values here are clusterMembership and status.
If one node goes down (Couchbase notices this after a couple of seconds), its status will reflect this, and after 20-30 seconds I will initiate a failover on one of the healthy nodes via REST.
If one more node drops out, I check whether it is in the same save data center (using the new server group feature in 2.5.x) and if yes, I will fail that over as well, else wait for recovery.
It works, but is not yet productive yet.