When a Couchbase Server node fails, any other node functioning in the cluster will continue to process requests and provide responses and you will experience no loss of administrative control. Couchbase SDKs will try to communicate to a failed node, but will receive a message that the requested information cannot be found on the failed node; an SDK will then request updated cluster information from Couchbase Server then communicate with nodes that are still active. Since Couchbase Server distributes information across nodes, and also stores replica data, information from any failed node will still exist in the cluster and an SDK can access it.
There are two ways to handle possible node failures with Couchbase Server:
Auto-failover: You can specify the maximum amount of time a node is unresponsive and then Couchbase Server will remove that node from a cluster. For more information, see Couchbase Server Manual, Node Failure.
Manual-failover: In this case, a person will determine that a node is down, and then remove the node from a cluster.
In either case, when a node is removed, Couchbase Server will automatically redistribute information from that node to all other functioning nodes in the cluster. However, at this point, the existing nodes will not have replicas established for the additional data. In order to provide replication, you will want to perform a rebalance on the cluster. The rebalance will:
Redistribute stored data across remaining nodes in the cluster,
Create replica data for all buckets in the cluster,
Provide information on the new location for information, based on SDK requests.
In general, rebalances with Couchbase Server have less of a performance impact than you would expect with a traditional relational database, with all other factors such as size of data set as a constant. However, rebalances will increase the overload load and resource utilization for a cluster and will lead to some amount of performance loss. Therefore, it is a best practice to perform a rebalance after node failure during the lowest application use, if possible. After rebalance, you could choose to perform one of these options:
Leave the cluster functioning with one less node. Be aware that the cluster still needs to adequately maintain the volume of requests and data with one less node,
If possible, get the failed node functioning once again, add it to the cluster and then rebalance,
Create a new node to replace the failed node, add it to the cluster, and then rebalance.
For more information about this topic, see Couchbase Server Manual, "Handling a Failover Situation."