Unable to rebalance production cluster

dhenken · April 24, 2023, 1:01pm

I am unable to rebalance a production Couchbase 6.5 cluster. Ive added a 7.1 node and attempted to rebalance. The first rebalance failed with:

{"completionMessage":"Rebalance stopped by janitor."}

A subsequent rebalance ran for awhile then failed.

“Rebalance exited with reason {service_rebalance_failed,index,\n {agent_died,<29443.456.0>,\n {linked_process_died,<29443.1509.0>,\n {timeout,\n {gen_server,call,\n [<29443.1507.0>,\n {call,"ServiceAPI.GetTaskList",\n #Fun<json_rpc_connection.0.102434519>},\n 60000]}}}}}.”}

The rebalance button is now disabled. Any help is greatly appreciated.

mreiche · April 24, 2023, 4:56pm

If this is an Enterprise server, please open a case with Customer Support.
Otherwise - look in the server logs for more information about the ‘agent_died’.
It might be worthwhile to try adding a 6.5 server to eliminate one variable.
Ensure all the ports are accessible: Couchbase Server Ports | Couchbase Docs

Can you please collect logs using cbcollect_info? it will give a complete picture of what is going on. https://docs.couchbase.com/server/current/manage/manage-logging/manage-logging.html should show how to do this via different options.

dhenken · April 25, 2023, 4:17pm

Thanks for your response and information. This is not an Enterprise server.

I’ve collected redacted logs but am in the process of clearing the organization procedures for potentially distributing them. If cleared, should I use the upload to Couchbase feature?
Review of logs around the agent_died event hasn’t yielded much context to identify a root cause so far. Attached at the bottom is a section of the reports.log from a cluster node.
We’ve ensured all ports are accessible and there is no firewall between nodes.

Thanks again for your help.

crash-snippet.reports.log.zip (3.4 KB)

mreiche · April 25, 2023, 6:20pm

Ok. In that file I find what you first posted:

    messages: [{'EXIT',<0.26359.955>,
                      {linked_process_died,<0.26276.955>,
                          {timeout,
                              {gen_server,call,
                                  [<0.26083.955>,
                                   {call,"ServiceAPI.GetTaskList",

and I search issues.couchbase.com for “linked_process_died ServiceAPI.GetTaskList”. And I find Loading... which says the issue if fixed in 7.0.0. So upgrade your existing servers to 7.1, and then add the new node.

system · July 24, 2023, 6:21pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster not rebalancing Couchbase Server connections	2	611	May 29, 2025
Can't drop node on couchbase 6.0 Couchbase Server	13	1570	April 9, 2019
Re-balance failing with error - Rebalance exited with reason {badmatch,failed} Couchbase Server server	1	8065	February 13, 2018
CE 6.6.0 rebalance fail Couchbase Server	3	1284	January 26, 2021
Problem with rebalance Couchbase Server	0	1561	December 14, 2016

Unable to rebalance production cluster

Related topics