Rebalance issues working with a dockerized Couchbase cluster setup

tallinn1960 · September 26, 2022, 3:59pm

My setup is three docker container each running couchbase:latest with different port-mappings to the Docker host: here i
s the excerpt of my docker-compose.yaml:

cdb1:
  ports:
      - "8091-8096:8091-8096"
      - "11210:11210"

cdb2:
    ports:
      - "11091-11096:8091-8096"
      - "12210:11210"

cdb3:
    ports:
      - "12091-12096:8091-8096"
      - "13210:11210"

The nodes are connected using a user defined docker network, all nodes have hostnames, a domainname, and a complete /etc/hosts file listing all three nodes on each node. The cluster is constructed by referring to the nodes per name.

To access those nodes from the Dockerhost I followed the manual and configured external access on all three nodes via posting to /node/controller/setupAlternateAddresses/external on each node, providing the hostname 127.0.0.1 and appropriate settings for kv, fts, cbas, n1ql, mgmt, capi and eventingAdminPort.

This setup works flawlessly until I test failover by stopping a node. Auto failover happens, but when I restart the disabled node, it starts to write this into the logs:

> Service 'backup' exited with status 1. Restarting. Messages: 2022-09-26T15:04:31.826Z WARN (REST) (Attempt 2) (GET) Retrying request to endpoint '/pools': which failed due to error: failed to perform request: Get "http://127.0.0.1:11091/pools": dial tcp 127.0.0.1:11091: connect: connection refused 2022-09-26T15:04:31.826Z DEBUG (REST) (Attempt 3) (GET) Dispatching request to 'http://127.0.0.1:11091/pools' 2022-09-26T15:04:31.826Z ERROR (REST) (Attempt 3) (GET) Failed to perform request to 'http://127.0.0.1:11091/pools': Get "http://127.0.0.1:11091/pools": dial tcp 127.0.0.1:11091: connect: connection refused 2022-09-26T15:04:31.826Z WARN (REST) (Attempt 3) (GET) Request to endpoint '/pools' failed due to error: failed to perform request: Get "http://127.0.0.1:11091/pools": dial tcp 127.0.0.1:11091: connect: connection refused 2022-09-26T15:04:31.927Z ERROR (Main) Failed to run node {"err": "could not create REST client: failed to get cluster information: failed to get cluster metadata: failed to execute request: failed to execute request: exhausted retry count after 3 retries, last error: failed to perform request: Get \"http://127.0.0.1:11091/pools\": dial tcp 127.0.0.1:11091: connect: connection refused"

this log entry repeats every minute then and rebalancing the cluster fails at “backup” with the completion message:

"completionMessage": "Rebalance exited with reason {service_rebalance_failed,backup,\n {agent_died,<17875.2285.0>,\n {linked_process_died,<17875.2286.0>,\n {'ns_1@172.24.0.4',\n {no_connection,\"backup-service_api\"}}}}}."

I read the log entry as the recovering node trying to obtain cluster info by accessing its own external interface, which looks like a bug to me.

Resolution so far is to delete the external hostname and port assignments on the recovered node, rebalance the cluster, and setting the external hostname and ports again after that on the recovered node.

mreiche · October 4, 2022, 8:11pm

What version are you running? This sounds like https://issues.couchbase.com/browse/MB-41363

Mike

tallinn1960 · October 5, 2022, 5:20pm

Couchbase is Enterprise Edition 7.1.1 build 3175.

mreiche · October 5, 2022, 5:44pm

MB-41363 should be fixed in 7.1.1.
Can you please create an account and a ticket in issues.couchbase.com so someone from the backup team can look at it? Use the project “Couchbase Server (MB)”

tallinn1960 · October 6, 2022, 1:34pm

Was reluctant to open an issue on a JIRA board when I am not in the team Will do.

Topic		Replies	Views
Failing rebalance, followed by log out, Couchbase Server server	0	1173	February 8, 2017
Container cluster Kubernetes connections	1	591	August 9, 2021
Rebalance failure after adding node Couchbase Server	2	1336	December 7, 2021
Failure during rebalance Couchbase Server	5	5681	July 2, 2013
Failure of a specific node in a cluster causes an infinite rebalance loop Kubernetes	8	1905	July 5, 2019

Rebalance issues working with a dockerized Couchbase cluster setup

Related topics