Node failure while rebalancing, won't come back up... data loss?
I had a 6 node cluster with replica 1, decided to add 4 new nodes and run rebalance. (now 10 nodes)
While it was rebalancing (between 90 and 99% done) one of the node (that node was one of the original 6 and showed 97% done.) had a disk issue and became unresponsive on port 8091.
Restarting service couchbase-server was a no go... service wouldn't stop even afer a few hours and saw no disk IO either. So I reset the machine and now the cluster sees that node as down.
After the reboot the node wont show up in the cluster...
When I go to the address of that failed node, I see the new node setup page on port 8091... but the cluster still sees that node as part of the cluster but in down state.
How can I rejoin that network without loosing my data? I'm pretty sure I'm missing replicas for data on that down node...
Thanks for any help or suggestions.
Losing one node is ok since you have replica, you should be able to fail over this node and rebalance.
Since you have a non stable environment I am inviting you to do a backup first