Search:

Search all manuals
Search this manual
Manual
Membase Manual 1.7
Additional Resources
Community Wiki
Community Forums
Couchbase SDKs
Parent Section
5.5 Failover with Membase
Chapter Sections
Chapters

5.5.1. Automatic Failover

5.5.1. Resetting the Autofailover counter

Automatic Failover was introduced in Membase Server 1.7.1

If there is a genuine server failure (e.g. a hardware crash) of a node (or small number of nodes) in a cluster, and there is enough headroom in the remaining nodes to handle the additional load, automated failover with an alert can increase system availability. Of course, deciding that a node is down is non-trivial, especially in cloud environments with high variability in network latency.

Due to a number of possible bad situations, we have placed a number of restrictions on the feature:

There is a minimum 30 second delay before a node will be failed over. This can be raised, but the software is hard coded to perform multiple "pings" of a node that is perceived down. This is to prevent a slow node or flaky network connection from being failed-over inappropriately.

If there are any node failures, an email can be configured to be sent out both when an automatic failover occurs, and when it doesn't.

To configure the feature, select Settings -> Automatic Failover from the UI or using the REST API to configure (REST API)

Resetting the Autofailover counter

After a node has been automatically failed over, the administrator must reset the counter in order for the autofailover feature to work again.

This should only be done after restoring the cluster to a healthy and balanced state.