Handling Node Failover from REST API
I'm writing a script to automatically failover when a node is down, but I can't figure out the correct HTTP call. I've read through the REST API documentation, but couldn't find anything specific to failover.
The closest method I could find is Ejecting a Node from a Cluster:
curl -i -u Administrator:password -d otpNode=10.0.1.11:8091 http://localhost:8091/controller/ejectNode
The value for otpNode is the node's "hostname" attribute, as returned by Getting a Bucket. But I keep getting" 400: Server does not exist."
So my questions are:
optNode?/controller/failOver endpoint, but there are no details given. If this is what I need, how do I use it?Many thanks!
(Note: we currently can't use the automatic failover in 1.7.1 because we only have 2 nodes. Automatic failover requires at least 3 nodes).
Aha, thanks! I somehow missed that /pools/default returns the 'optNode' I want (I was using the 'hostname').
I'm wondering now what the difference is between the `ejectNode` and `failover` operations. Would it be appropriate to first failover the node, rebalance, and then eject the node?
You may wish to 'failover' without ejecting the node. For instance, you may want to failover to replicas within the cluster, fix the machine, then rebalance to bring it back in. You may also want to eject it, rather than bother with fixing it.
You're on the right path with ejectNode, yes.
The otpNode should be similar to one in the list from the /pools/default URI. The format you have there is wrong. Note in the docs there is an example with similar format.
The code that handles this seems to agree that it's not in the list you're trying to eject it from:
https://github.com/membase/ns_server/blob/master/src/menelaus_web.erl#L795