CE 6.6.0 rebalance fail

I have a cluster of 3 nodes, previously running on Server Community Edition 6.0.0 and i upgraded them to 6.6.0, removing each node, rebalancing, upgrading and then adding back to the cluster.
From the first node re-added i keep getting the message “rebalance failed” right after the notification of “rebalance complete”, watching the logs i can see only this error:

Rebalance exited with reason {service_rebalance_failed,index,
{agent_died,<0.19930.33>,
{lost_connection,shutdown}}}.
Rebalance Operation Id = a42ef499ddccccd2e8c6b9baf89acc11

The server was up, running and taking traffic, it did not create any issue (i do not use indexes so far) but my guess is why this happens, i got it for each server… never happened before,

Now i keep getting the “Rebalance failed. See logs for detailed reason. You can try again.” toast each time i get in the web console, that’s frustrating, since i keep pressing the rebalance button, but just after 2 seconds it fails.

Any idea?

here some logs: i tried removing 2 nodes and readding them, but still no rebalance operations ends up correctly, not even when trying to remove a node…

Rebalance exited with reason {service_rebalance_failed,index,
{agent_died,<22256.15006.2>,
{lost_connection,shutdown}}}.
Rebalance Operation Id = 322dad547d652f544cc5c455f5eaf057
ns_orchestrator 000
ns_1[at]10.0.10.8

8:23:36 AM 26 Jan, 2021
Bucket “config” loaded on node ‘ns_1[at]10.0.10.4’ in 0 seconds.
ns_memcached 000
ns_1[at]10.0.10.4

8:22:19 AM 26 Jan, 2021
Bucket “config” loaded on node ‘ns_1[at]10.0.10.9’ in 0 seconds.
ns_memcached 000
ns_1[at]10.0.10.9

8:22:15 AM 26 Jan, 2021
Bucket “config” rebalance does not seem to be swap rebalance
ns_vbucket_mover 000
ns_1[at]10.0.10.8

8:22:08 AM 26 Jan, 2021
Started rebalancing bucket config
ns_rebalancer 000
ns_1[at]10.0.10.8

8:22:04 AM 26 Jan, 2021
Bucket “sessions” loaded on node ‘ns_1[at]10.0.10.4’ in 0 seconds.
ns_memcached 000
ns_1[at]10.0.10.4

8:20:22 AM 26 Jan, 2021
Bucket “sessions” loaded on node ‘ns_1[at]10.0.10.9’ in 0 seconds.
ns_memcached 000
ns_1[at]10.0.10.9

8:20:19 AM 26 Jan, 2021
Bucket “sessions” rebalance does not seem to be swap rebalance
ns_vbucket_mover 000
ns_1[at]10.0.10.8

8:20:10 AM 26 Jan, 2021
Started rebalancing bucket sessions
ns_rebalancer 000
ns_1[at]10.0.10.8

8:20:08 AM 26 Jan, 2021
Bucket “cache” loaded on node ‘ns_1[at]10.0.10.4’ in 0 seconds.
ns_memcached 000
ns_1[at]10.0.10.4

8:19:39 AM 26 Jan, 2021
Bucket “cache” loaded on node ‘ns_1[at]10.0.10.9’ in 0 seconds.
ns_memcached 000
ns_1[at]10.0.10.9

8:19:36 AM 26 Jan, 2021
Bucket “cache” rebalance does not seem to be swap rebalance
ns_vbucket_mover 000
ns_1[at]10.0.10.8

8:19:26 AM 26 Jan, 2021
Started rebalancing bucket cache
ns_rebalancer 000
ns_1[at]10.0.10.8

8:19:25 AM 26 Jan, 2021
Starting rebalance, KeepNodes = [‘ns_1[at]10.0.10.4’,‘ns_1[at]10.0.10.8’,
‘ns_1[at]10.0.10.9’], EjectNodes = , Failed over and being ejected nodes = ; no delta recovery nodes; Operation Id = 322dad547d652f544cc5c455f5eaf057
ns_orchestrator 000
ns_1[at]10.0.10.8

8:19:24 AM 26 Jan, 2021
Hot-reloaded memcached.json for config change of the following keys: [<<“scramsha_fallback_salt”>>]
memcached_config_mgr 000
ns_1[at]10.0.10.9

8:19:19 AM 26 Jan, 2021
Node ‘ns_1[at]10.0.10.4’ saw that node ‘ns_1[at]10.0.10.9’ came up. Tags:
ns_node_disco 004
ns_1[at]10.0.10.4

8:19:19 AM 26 Jan, 2021
Node ns_1[at]10.0.10.9 joined cluster
ns_cluster 003
ns_1[at]10.0.10.9

8:19:19 AM 26 Jan, 2021
Couchbase Server has started on web port 8091 on node ‘ns_1[at]10.0.10.9’. Version: “6.6.0-7909-community”.
menelaus_sup 001
ns_1[at]10.0.10.9

8:19:19 AM 26 Jan, 2021
Node ‘ns_1[at]10.0.10.8’ saw that node ‘ns_1[at]10.0.10.9’ came up. Tags:
ns_node_disco 004
ns_1[at]10.0.10.8

8:19:05 AM 26 Jan, 2021
Started node add transaction by adding node ‘ns_1[at]10.0.10.9’ to nodes_wanted (group: undefined)
ns_cluster 000
ns_1[at]10.0.10.8

8:19:04 AM 26 Jan, 2021
Node ‘ns_1[at]10.0.10.4’ saw that node ‘ns_1[at]10.0.10.9’ went down. Details: [{nodedown_reason,
connection_closed}]
ns_node_disco 005
ns_1[at]10.0.10.4

8:17:51 AM 26 Jan, 2021
Node ‘ns_1[at]10.0.10.4’ saw that node ‘ns_1[at]10.0.10.9’ came up. Tags:
ns_node_disco 004
ns_1[at]10.0.10.4

8:17:48 AM 26 Jan, 2021
Couchbase Server has started on web port 8091 on node ‘ns_1[at]10.0.10.9’. Version: “6.6.0-7909-community”.
menelaus_sup 001
ns_1[at]10.0.10.9

8:17:45 AM 26 Jan, 2021
Node ‘ns_1[at]10.0.10.9’ saw that node ‘babysitter_of_ns_1[at]cb.local’ came up. Tags:
ns_node_disco 004
ns_1[at]10.0.10.9

8:17:45 AM 26 Jan, 2021
Node ‘ns_1[at]10.0.10.9’ is leaving cluster.
ns_cluster 001
ns_1[at]10.0.10.9

8:17:45 AM 26 Jan, 2021
Conflicting configuration changes to field {node,‘ns_1[at]10.0.10.4’,uuid}:
[{’_vclock’,[{<<“297639b37b24121ca87ef92f2b8d62aa”>>,{2,63778864544}}]}|
<<“297639b37b24121ca87ef92f2b8d62aa”>>] and
[{’_vclock’,[{<<“8437cd269d4c4c0c8107390dfdf345aa”>>,{2,63778828879}}]}|
<<“8437cd269d4c4c0c8107390dfdf345aa”>>], choosing the former, which looks newer.
ns_config 000
ns_1[at]10.0.10.9

8:17:45 AM 26 Jan, 2021
Conflicting configuration changes to field {node,‘ns_1[at]10.0.10.4’,memcached}:
[{’_vclock’,[{<<“297639b37b24121ca87ef92f2b8d62aa”>>,{1,63778864544}},
{<<“b0d315ca2d672c779a958912588c84eb”>>,{1,63778864320}}]},
{port,11210},
{dedicated_port,11209},
{dedicated_ssl_port,undefined},
{ssl_port,undefined},
{admin_user,"[at]ns_server"},
{other_users,["[at]cbq-engine","[at]projector","[at]goxdcr","[at]index","[at]fts",
“[at]eventing”,"[at]cbas"]},
{admin_pass,"*****"},
{engines,[{membase,[{engine,“c:/Program Files/Couchbase/Server/lib/memcached/ep.so”},
{static_config_string,“failpartialwarmup=false”}]},
{memcached,[{engine,“c:/Program Files/Couchbase/Server/lib/memcached/default_engine.so”},
{static_config_string,“vb0=true”}]}]},
{config_path,“c:/Program Files/Couchbase/Server/var/lib/couchbase/config/memcached.json”},
{audit_file,“c:/Program Files/Couchbase/Server/var/lib/couchbase/config/audit.json”},
{rbac_file,"c:/Progra show…
ns_config 000
ns_1[at]10.0.10.9

8:17:45 AM 26 Jan, 2021
Conflicting configuration changes to field {node,‘ns_1[at]10.0.10.4’,
eventing_dir}:
[{’_vclock’,[{<<“297639b37b24121ca87ef92f2b8d62aa”>>,{1,63778864544}},
{<<“b0d315ca2d672c779a958912588c84eb”>>,{1,63778864324}}]},
101,58,47,99,111,117,99,104,98,97,115,101,47,105,110,100,101,120,101,115] and
[{’_vclock’,[{<<“2a9dcd7d96de88e604a7479c5ff5bc9a”>>,{2,63778828721}},
{<<“8437cd269d4c4c0c8107390dfdf345aa”>>,{1,63778828879}}]},
99,58,47,80,114,111,103,114,97,109,32,70,105,108,101,115,47,67,111,117,99,
104,98,97,115,101,47,83,101,114,118,101,114,47,118,97,114,47,108,105,98,47,
99,111,117,99,104,98,97,115,101,47,100,97,116,97], choosing the former, which looks newer.
ns_config 000
ns_1[at]10.0.10.9

8:17:45 AM 26 Jan, 2021
Conflicting configuration changes to field {node,‘ns_1[at]10.0.10.4’,cbas_dirs}:
[{’_vclock’,[{<<“297639b37b24121ca87ef92f2b8d62aa”>>,{1,63778864544}},
{<<“b0d315ca2d672c779a958912588c84eb”>>,{1,63778864324}}]},
“e:/couchbase/indexes”] and
[{’_vclock’,[{<<“2a9dcd7d96de88e604a7479c5ff5bc9a”>>,{2,63778828721}},
{<<“8437cd269d4c4c0c8107390dfdf345aa”>>,{1,63778828879}}]},
“c:/Program Files/Couchbase/Server/var/lib/couchbase/data”], choosing the former, which looks newer.

@simone.barbolini thanks for posting your query in our forums! Can you please collect logs using cbcollect_info and upload it? it will give us a complete picture of what is going on. https://docs.couchbase.com/server/current/manage/manage-logging/manage-logging.html should show how to do this via different options. Thanks!

Thank you @aruns1987 but i choosed to restore those 3 nodes to a previous version backup, to avoid any problem so i don’t have logs anymore.
Will try wgain maybe with the next Major release