I have a situation where we had a node failed and couchbase took care of auto failover but it did not do an auto rebalance which is fine, but after that, we had another node failure that did not failover. Looking for a solution where I can auto rebalance after a failover. So that the second node failure would have auto failover and re-balance.
Our cluster is a 28 node cluster with 4 buckets, each node is of 24 core and 96 GB ram.
For a while, there has been discussion about this topic and the exact features related to it for our next Server version; so please tell us more about your requirements and expected behavior. Do you want the option for a 2nd auto-failover or do you need a setting for any user-specified number of auto-failovers? Adjustable by only the Administrator or which user roles? Do you need the auto-rebalance every time it auto-failovers or only after the 2nd auto-failover? Do you need auto-failover and auto-rebalance to always be together or be separate (mutually exclusive) operations? Please let us know any other related details or considerations.
In our next major release (this year) we plan on extending the auto-failover quota to be configurable up to 3 without needing manual intervention to reset it. Would that meet your requirements here?
The challenge with doing an automatic rebalance after failover is that it ends up recreating data and putting more load onto an already degraded (N-1) cluster…so historically we have recommended strongly against that.
We are indeed having more discussions internally about whether it makes sense to add an option for auto-rebalance after failover, but it would then require you to be aware of a possible “cascading” failure and size the cluster to be able cope at reduced capacity.