Trying to recover from an outage, rebalancing fails immediately

andreas.o · June 12, 2023, 5:49am

We had a major outage where two out of three nodes where offline for a while, once our servers where back online we had major issues because the single node that was left was running out of disk. We finally got some more disk added which got the cluster back to serving data, but none of the nodes are healthy still. When we try to rebalance all we get is:
Rebalance exited with reason {badmatch,
{error,{failed_nodes,[‘ns_1@-.-.-.14’]}}}

We are running: Community Edition 6.0.0 build 1693 on all the nodes.
I have done a cbcollect_info that I can share but I don’t want to attach it here.

We have had issues with the cluster before but never like this where the cluster just straight up doesn’t seem to want to try a rebalance. I can ping between the nodes with low latency and nc reports open ports on 8091-8094 9100-9105.

Would be grateful for any help regarding this.

I found this in the reports.log,
exception exit: {{badmatch,
{error,
{setup_replications_failed,
[{‘ns_1@-.-.-.12’,
{errors,
[{34,999},
{34,936},
{34,919},
{34,823},
{34,320}

And the memcache logs have a lot of closed stream messages:
2023-06-12T06:57:24.761894+02:00 INFO 875: (Catalog) DCP (Consumer) eq_dcpq:replication:ns_1@-.-.-.12->ns_1@-.-.-.14:Catalog - (vb 1019) Setting stream to dead state, last_seqno is 102814, unAckedBytes is 0, status is The stream closed due to a close stream message.

perry · June 12, 2023, 12:45pm

Do you have a backup of the data that you could reload? Did any of the nodes get failed over?

andreas.o · June 12, 2023, 2:32pm

We will attempt to restore the entire cluster to a backup. Thank you.

system · September 10, 2023, 2:33pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to rebalance cluster after node failure Couchbase Server	3	2240	July 29, 2013
Rebalance failed with error Couchbase Server	0	1814	May 23, 2016
Failure during rebalance Couchbase Server	5	5899	July 2, 2013
Failure Recovery - Can't Rebalance Couchbase Server	6	3825	November 9, 2014
Rebalance failed Couchbase Server	2	956	September 6, 2023

Trying to recover from an outage, rebalancing fails immediately

Related topics