Details
Description
when rebalance fails or is stopped by the user the vbucket state for those rebalance operations which were making progress are still pending.
ns_server janitor runs every few seconds which will change the vbucket state from pending->dead
now when the user restarts the rebalance sooner than 5 minutes ep-engine will try to reuse that tap stream and will not send TAP_VBUCKET_SET when restarting the takeover and since the vbucket state is dead now ep-engine will not start the vbucket transfer and this will result in rebalance getting stuck.
ns_server janitor runs every few seconds which will change the vbucket state from pending->dead
now when the user restarts the rebalance sooner than 5 minutes ep-engine will try to reuse that tap stream and will not send TAP_VBUCKET_SET when restarting the takeover and since the vbucket state is dead now ep-engine will not start the vbucket transfer and this will result in rebalance getting stuck.
So this message has nothing at all to do with rebalance failing. May we ask logs from master node ? Master node can be identified by looking at user visible logs. Server that logs "rebalance failed" message is the master node for that failed rebalance.