Search:

Search all manuals
Search this manual
Manual
Couchbase Server Manual 2.0
Community Wiki and Resources
Download Couchbase Server 2.0
Couchbase Developer Guide 2.0
Client Libraries
Couchbase Server Forum
Additional Resources
Community Wiki
Community Forums
Couchbase SDKs
Parent Section
5 Administration Tasks
Chapter Sections
Chapters

5.5. Failing Over Nodes

5.5.1. Choosing a Failover Solution
5.5.2. Using Automatic Failover
5.5.3. Initiating a Node Failover
5.5.4. Handling a Failover Situation
5.5.5. Adding Back a Failed Over Node

If a node in a cluster is unable to serve data you can failover that node. Failover means that Couchbase Server removes the node from a cluster and makes replicated data at other nodes available for client requests. Because Couchbase Server provides data replication within a cluster, the cluster can handle failure of one or more nodes without affecting your ability to access the stored data. In the event of a node failure, you can manually initiate a failover status for the node in Web Console and resolve the issues.

Alternately you can configure Couchbase Server so it will automatically remove a failed node from a cluster and have the cluster operate in a degraded mode. If you choose this automatic option, the workload for functioning nodes that remain the cluster will increase. You will still need to address the node failure, return a functioning node to the cluster and then rebalance the cluster in order for the cluster to function as it did prior to node failure.

Whether you manually failover a node or have Couchbase Server perform automatic failover, you should determine the underlying cause for the failure. You should then set up functioning nodes, add the nodes, and then rebalance the cluster. Keep in mind the following guidelines on replacing or adding nodes when you cope with node failure and failover scenarios:

Be aware that failover is a distinct operation compared to removing/rebalancing a node. Typically you remove a functioning node from a cluster for maintenance, or other reasons; in contrast you perform a failover for a node that does not function.

When you remove a functioning node from a cluster, you use Web Console to indicate the node will be removed, then you rebalance the cluster so that data requests for the node can be handled by other nodes. Since the node you want to remove still functions, it is able to handle data requests until the rebalance completes. At this point, other nodes in the cluster will handle data requests. There is therefore no disruption in data service or no loss of data that can occur when you remove a node then rebalance the cluster. If you need to remove a functioning node for administration purposes, you should use the remove and rebalance functionality not failover. See Performing a Rebalance, Adding a Node to a Cluster.

If you try to failover a functioning node it may result in data loss. This is because failover will immediately remove the node from the cluster and any data that has not yet been replicated to other nodes may be permanently lost if it had not been persisted to disk.

For more information about performing failover see the following resources: