There are a number of restrictions on automatic failover in Couchbase Server. This is to help prevent some issues that can occur when you use automatic failover. For more information about potential issues, see Choosing a Failover Solution.
Disabled by Default Automatic failover is disabled by default. This prevents Couchbase Server from using automatic failover without you explicitly enabling it.
Minimum Nodes Automatic failover is only available on clusters of at least three nodes.
If two or more nodes go down at the same time within a specified delay period, the automatic failover system will not failover any nodes.
Required Intervention Automatic failover will only fail over one node before requiring human intervention. This is to prevent a chain reaction failure of all nodes int he cluster.
Failover Delay There is a minimum 30 second delay before a node will be failed over. This time can be raised, but the software is hard coded to perform multiple pings of a node that may be down. This is to prevent failover of a functioning but slow node or to prevent network connection issues from triggering failover. For more information about this setting, see Enabling and Disabling Auto-Failover.
You can use the REST API to configure an email notification that will be sent by Couchbase Server if any node failures occur and node is automatically failed over. For more information, see Enabling and Disabling Email Notifications .
To configure automatic failover through the Administration Web Console, see Section 6.8.2, “Enabling Auto-Failover Settings”. For information on using the REST API, see Section 8.7.8, “Retrieving Auto-Failover Settings”.
Once an automatic failover has occurred, the Couchbase Cluster is relying on other nodes to serve replicated data. You should initiate a rebalance to return your cluster to a fully functioning state. For more information, see Section 5.5.4, “Handling a Failover Situation”.
Resetting the Automatic failover counter
After a node has been automatically failed over, Couchbase Server increments an internal counter that indicates if a node has been failed over. This counter prevents the server from automatically failing over additional nodes until you identify the issue that caused the failover and resolve it. If the internal counter indicates a node has failed over, the server will no longer automatically failover additional nodes in the cluster. You will need to re-enable automatic failover in a cluster by resetting this counter.
You should only resetting the automatic failover after you resolve the node issue, rebalance and restore the cluster to a fully functioning state.
You can reset the counter using the REST API:
shell> curl -i -ucluster-username:cluster-password\ http://localhost:8091/settings/autoFailover/resetCount
For more information on using this REST API see Resetting Auto-Failover.