Couchbase Operator

harshitbatra · February 28, 2019, 11:42am

Hi,
I am using Couchbase Operator to create a Couchbase Cluster on Kubernetes. If I give server size as 1 in default operator.yaml, the cluster is running fine and is creating a bucket. But if I increase the server size greater than 1 (lets say 3), cluster nodes are going in unready state and also the bucket is not being created.
I am attaching the operator logs as well as Couchbase Cluster logs.
Looking forward for your assistance.

Cluster Logs:

Operator logs:
time=“2019-02-28T07:07:55Z” level=info msg=“couchbase-operator v1.1.0 (release)” module=main
time=“2019-02-28T07:07:55Z” level=info msg=“Obtaining resource lock” module=main
time=“2019-02-28T07:07:55Z” level=info msg=“Starting event recorder” module=main
time=“2019-02-28T07:07:55Z” level=info msg=“Attempting to be elected the couchbase-operator leader” module=main
time=“2019-02-28T07:07:55Z” level=info msg=“I’m the leader, attempt to start the operator” module=main
time=“2019-02-28T07:07:55Z” level=info msg=“Creating the couchbase-operator controller” module=main
time=“2019-02-28T07:07:55Z” level=info msg=“Event(v1.ObjectReference{Kind:“Endpoints”, Namespace:“cb-create”, Name:“couchbase-operator”, UID:“924cf1d9-3b27-11e9-b2ba-0260905e6e12”, APIVersion:“v1”, ResourceVersion:“42394617”, FieldPath:”"}): type: ‘Normal’ reason: ‘LeaderElection’ couchbase-operator-8c554cbc7-r2dc8 became leader" module=event_recorder
time=“2019-02-28T07:07:55Z” level=info msg=“CRD initialized, listening for events…” module=controller
time=“2019-02-28T07:07:55Z” level=info msg="starting couchbaseclusters controller"
time=“2019-02-28T08:10:40Z” level=info msg=“Watching new cluster” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:40Z” level=info msg=“Janitor process starting” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:40Z” level=info msg=“Setting up client for operator communication with the cluster” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:40Z” level=info msg=“Cluster does not exist so the operator is attempting to create it” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:40Z” level=info msg=“Creating headless service for data nodes” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:40Z” level=info msg=“Creating NodePort UI service (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-ui) for data nodes” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:40Z” level=info msg=“Creating a pod (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000) running Couchbase enterprise-5.5.2” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:49Z” level=warning msg=“node init: failed with error [Post http://cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/node/controller/rename: dial tcp: lookup cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc on 172.20.0.10:53: no such host] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:54Z” level=info msg=“Operator added member (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000) to manage” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:54Z” level=info msg=“Initializing the first node in the cluster” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:54Z” level=warning msg=“node init: failed with error [Post http://cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/node/controller/rename: dial tcp: lookup cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc on 172.20.0.10:53: no such host] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:10:59Z” level=warning msg=“node init: failed with error [Post htt://cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/node/controller/rename: dial tcp: lookup cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc on 172.20.0.10:53: no such host] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:04Z” level=warning msg=“node init: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/node/controller/rename): [error - Could not resolve the hostname: nxdomain]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:09Z” level=warning msg=“cluster init: failed with error [Post htt://cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/node/controller/setupServices: dial tcp: lookup cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc on 172.20.0.10:53: no such host] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:14Z” level=info msg=“start running…” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“server config all_services: cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“Cluster status: balanced” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“Node status:” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“┌───────────────────────────────────────────────────┬──────────────┬────────────────┐” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“│ Server │ Class │ Status │” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“├───────────────────────────────────────────────────┼──────────────┼────────────────┤” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“│ cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000 │ all_services │ managed+active │” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info msg=“└───────────────────────────────────────────────────┴──────────────┴────────────────┘” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:22Z” level=info cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:26Z” level=info msg=“Creating a pod (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001) running Couchbase enterprise-5.5.2” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:35Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Prepare join failed. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:40Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Failed to reach erlang port mapper. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:45Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Prepare join failed. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:50Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Failed to reach erlang port mapper. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:11:55Z” level=warning msg=“add node: failed with error [Server Error 500 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): []] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:02Z” level=info msg=“added member (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001)” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:02Z” level=info msg=“Creating a pod (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002) running Couchbase enterprise-5.5.2” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:15Z” level=warning msg=“node init: failed with error [Post htt://cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/node/controller/rename: dial tcp: lookup cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc on 172.20.0.10:53: no such host] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:20Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Prepare join failed. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]], [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Failed to reach otp port 21101 for node [” “,\n <<“Failed to resolve address for \“cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc\”. The hostname may be incorrect or not resolvable.”>>].ns_1@cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc This can be firewall problem.]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:25Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Failed to reach erlang port mapper. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]], [Server Error 500 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): []] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:30Z” level=warning msg=“add node: failed with error [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Failed to reach erlang port mapper. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]], [Server Error 400 (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc:8091/controller/addNode): [error - Failed to reach erlang port mapper. Failed to resolve address for “cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc”. The hostname may be incorrect or not resolvable.]] …retrying” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:37Z” level=info msg=“added member (cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002)” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:38Z” level=info msg=“Rebalance progress: 0.000000” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:42Z” level=info msg=“Rebalance progress: 40.000000” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:46Z” level=info msg=“Rebalance progress: 40.000000” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:50Z” level=info msg=“Rebalance progress: 40.000000” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:12:54Z” level=info msg=“Rebalance progress: 40.000000” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:13:02Z” level=info msg=“reconcile finished” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:02Z” level=error msg="failed to reconcile: Unable to create bucket named - bucket-cd7a368e-4dc7-4951-b6a4-7aeee9f58ca7: still failing after 36 retries: " cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg="server config all_services: " cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg=“Cluster status: balanced” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg=“Node status:” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg=“├───────────────────────────────────────────────────┼──────────────┼────────────────┤” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg=“│ cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000 │ all_services │ managed+warmup │” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg=“│ cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0001 │ all_services │ managed+warmup │” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster
time=“2019-02-28T08:16:10Z” level=info msg=“│ cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0002 │ all_services │ managed+warmup │” cluster-name=cluster-000f387c-1637-4620-8a5f-37c1f42c4160 module=cluster

Thanks and Regards,
Harshit

simon.murray · March 5, 2019, 9:35am

Hi Harshit,

Those logs are littered with

lookup cluster-000f387c-1637-4620-8a5f-37c1f42c4160-0000.cluster-000f387c-1637-4620-8a5f-37c1f42c4160.cb-create.svc on 172.20.0.10:53: no such host

So it appears your DNS isn’t working (or does so very occasionally). The DNS names themselves look to be valid (correct character set and label limits etc).

I’d consult with your cloud provider and and see if they can debug the issue as it may be down to networking or the service itself.

harshitbatra · March 6, 2019, 11:08am

Hi Simon,
Thanks for replying .
I have also looked into this issue for about 4-5 days but couldn’t figure out the cause.
Please tell if the concerned team succeeds in figuring out the issue.

Thanks,
Harshit

eldorado · April 23, 2020, 2:06pm

@simon.murray -
I have ended up with similar issue where one of POD died in k8s after keep running for couple of weeks … what could be reason ? I am on 6.5 and below issue shows in operator logs :

If I want to get rid of this issue what is next steps . I tried editing cluster yaml and adding new nodes . not taking any action . same errors . I have tried to remov the node from couchbase UI and did failover and rebalance and it worked but how can remove the missing node completely ?
Also my understanding was Couchbase Operator should take care of adding new cluster node if other one terminated and disappeared and not sure why that didn’t happen in this scenarios . isn’t it the purpose of operator in k8s ?

ime=“2020-04-23T13:54:58Z” level=warning msg=“Unable to get cluster state, skiping reconcile loop: still failing after 5 retries: [Get https://cb-revstrat-ilcb-0000.cb-revstrat-ilcb.bi-cb.svc:18091/pools/default: uuid check: unexpected status code ‘401 Unauthorized’ from cb-revstrat-ilcb-0000.cb-revstrat-ilcb.bi-cb.svc:18091], [Get https://cb-revstrat-ilcb-0001.cb-revstrat-ilcb.bi-cb.svc:18091/pools/default: dial tcp: lookup cb-revstrat-ilcb-0001.cb-revstrat-ilcb.bi-cb.svc on 10.43.0.10:53: no such host]” cluster-name=cb-revstrat-ilcb module=cluster
time=“2020-04-23T13:54:58Z” level=error msg=“failed to reconcile: still failing after 5 retries: [Get https://cb-revstrat-ilcb-0000.cb-revstrat-ilcb.bi-cb.svc:18091/pools/default: uuid check: unexpected status code ‘401 Unauthorized’ from cb-revstrat-ilcb-0000.cb-revstrat-ilcb.bi-cb.svc:18091], [Get https://cb-revstrat-ilcb-0001.cb-revstrat-ilcb.bi-cb.svc:18091/pools/default: dial tcp: lookup cb-revstrat-ilcb-0001.cb-revstrat-ilcb.bi-cb.svc on 10.43.0.10:53: no such host]” cluster-name=cb-revstrat-ilcb module=cluster

I am not seeing Operator is doing auto create of nodes in cluster what it is supposed to do :

https://docs.couchbase.com/operator/current/node-recovery.html

thanks

simon.murray · April 23, 2020, 3:04pm

That looks like the password the operator is using (from spec.authSecret) doesn’t match the one used by Couchbase. Have you changed it? Does the user in the secret exist? Doe the password in the secret work manually?

eldorado · April 23, 2020, 3:09pm

Yes , thats correct I have changed the password from Couchbase UI only for Administrator .
But this node disappear from k8s happen (if I remember correctly) after 7-8 days changing the Administrator password and so far it was working fine .

I have changed the password to : couchbase however when I edit the cluster this is what it shows :
kubectl edit cbc cb-revstrat-ilcb

spec:
adminConsoleServiceType: NodePort
adminConsoleServices:

data
authSecret: youthful-mule-cb-revstrat-ilcb

simon.murray · April 23, 2020, 3:24pm

I nave no idea why that happened, perhaps a session cookie kept the connection working. One to ask the server team.

Anyway to fix it do:

$ echo -n "new_password" | base64 
bmV3X3Bhc3N3b3Jk

Then with kubectl edit -n my-namespace secret youthful-mule-cb-revstrat-ilcb do the following:

apiVersion: v1
kind: Secret
metadata:
  name: youthful-mule-cb-revstrat-ilcb
data:
  password: bmV3X3Bhc3N3b3Jk  # << replace me

And restart the operator (because the password is cached)…

$ kubectl scale --namespace my-namespace --replicas=0 deployment/couchbase-operator
$ kubectl scale --namespace my-namespace --replicas=1 deployment/couchbase-operator

eldorado · April 23, 2020, 4:56pm

thanks …did that and for be base64 password was same in kubectl edit before I update … however restarted the operator and have below issue still :

time=“2020-04-23T16:46:43Z” level=error msg=“failed to update members: still failing after 5 retries: [Get https://cb-revstrat-ilcb-0000.cb-revstrat-ilcb.bi-cb.svc:18091/pools/default: uuid check: unexpected status code ‘401 Unauthorized’ from cb-revstrat-ilcb-0000.cb-revstrat-ilcb.bi-cb.svc:18091]” cluster-name=cb-revstrat-ilcb module=cluster

When I try to edit the cluster I see 001 node (which was evacuated) still there , I have made the size : 1 and removed 001 node details and still doesn’t help and its coming back with same configuration :

members:
index: 1
unready:
- Name: cb-revstrat-ilcb-0000
- Name: cb-revstrat-ilcb-0001
nodePorts:
cb-revstrat-ilcb-0000:
adminServicePort: 30659
adminServicePortTLS: 30351
analyticsServicePort: 30709
analyticsServicePortTLS: 30955
dataServicePort: 30853
dataServicePortTLS: 30434
eventingServicePort: 31154
eventingServicePortTLS: 32424
indexServicePort: 31430
indexServicePortTLS: 30320
queryServicePort: 31736
queryServicePortTLS: 32263
searchServicePort: 31640
searchServicePortTLS: 32433
cb-revstrat-ilcb-0001:
adminServicePort: 30640
adminServicePortTLS: 30599
analyticsServicePort: 31674
analyticsServicePortTLS: 31063
dataServicePort: 30297
dataServicePortTLS: 31892
eventingServicePort: 30547
eventingServicePortTLS: 30020
indexServicePort: 31758
indexServicePortTLS: 30647
queryServicePort: 32703
queryServicePortTLS: 31852
searchServicePort: 30759
searchServicePortTLS: 31544
phase: Running
reason: “”
size: 1

want to get the operator and cluster back healthy before I increase cluster size and expand it
thanks

eldorado · April 23, 2020, 5:44pm

with all back and forth somehow I managed to get the cluster up and running and have operator take care of load of adding additional nodes .

thanks but I think operator restart did help . by the way changed the password back to what it was before as default . may be that’s the issue as well .

Let me know what is the best practice for changing Administrator password .
thanks

simon.murray · April 24, 2020, 8:28am

Don’t do it, in short However, because I’m nice, we have password rotation planned for the 2.1 release, so it’ll happen automatically soon! Hopefully in the next few months.

If you can’t wait: change the password in the UI, change the password in the secret, restart the operator

eldorado · April 24, 2020, 4:25pm

thats good step to know … thank you … but again mystery remains why password change will remove one node out of cluster suddenly …
thanks

Topic		Replies	Views
Operator not able to create cluster with couchbase/server:enterprise-6.6.1 Kubernetes	1	827	April 29, 2021
Couchbase kubernetes operator failed Kubernetes	1	1159	July 20, 2018
Unexpected cluster phase: Failed to create cluster using Operator 2.0 in own CENT OS cloud kubernetes Couchbase Server	8	1860	September 24, 2020
Operator crashes when creating a cluster Kubernetes	5	2174	November 18, 2020
Operator pods are restarting serveral time Kubernetes	4	1231	October 4, 2023

Couchbase Operator

Related topics