Couchbase openshift operator problem- couchbase cluster will not come up

  1. We recently applied network policies to the our OpenShift project to enable multi-tenancy per OpenShifts documentation Configuring multitenant network policy - Network policy | Networking | OpenShift Container Platform 4.6.
  2. After doing this, when we created a new Couchbase instance with 4 pods, only one pod was getting created.
  3. We opened a ticket with RedHat to diagnose this issue further as we were not seeing any errors.
  4. While working with RedHat, we notice that the Couchbase Operator was installed on the openshift-operator project instead of the inf-auto project where we created the Couchbase cluster instance. I remember selecting inf-auto when I installed it the first time, so this was unexpected.
  5. We removed the operator and re-installed it in the inf-auto project.
  6. When we tried to create a new Couchbase cluster instance, no pods are getting created and we see the following error:

{“level”:“info”,“ts”:1619812059.4318697,“logger”:“cluster”,“msg”:“Cluster does not exist so the operator is attempting to create it”,“cluster”:“a-couchbase-test/cb-example-test4”}

{“level”:“info”,“ts”:1619812059.4931834,“logger”:“cluster”,“msg”:“Creating pod”,“cluster”:“a-couchbase-test/cb-example-test4”,“name”:“cb-example-test4-0000”,“image”:“”}

{“level”:“info”,“ts”:1619812059.515399,“logger”:“cluster”,“msg”:“Member creation failed”,“cluster”:“a-couchbase-test/cb-example-test4”,“name”:“cb-example-test4-0000”,“resource”:""}

{“level”:“info”,“ts”:1619812059.5357425,“logger”:“cluster”,“msg”:“Pod deleted”,“cluster”:“a-couchbase-test/cb-example-test4”,“name”:“cb-example-test4-0000”}

{“level”:“info”,“ts”:1619812059.5357752,“logger”:“cluster”,“msg”:“Reconciliation failed”,“cluster”:“a-couchbase-test/cb-example-test4”,“error”:“fail to create member’s pod (cb-example-test4-0000): pods “cb-example-test4-0000” is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.fsGroup: Invalid value: int64{1000}: 1000 is not an allowed group]”,“stack”:“\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/util/k8sutil/k8sutil.go:246\\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/util/k8sutil/pod_util.go:104\*Cluster).createPod\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:489\*Cluster).createMember\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/reconcile.go:299\*Cluster).create\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:289\*Cluster).reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/reconcile.go:117\*Cluster).runReconcile\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:398\*Cluster).Update\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:429\*CouchbaseClusterReconciler).Reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/controller/controller.go:90\*Controller).reconcileHandler\n\t/home/couchbase/go/pkg/mod/\*Controller).processNextWorkItem\n\t/home/couchbase/go/pkg/mod/\*Controller).worker\n\t/home/couchbase/go/pkg/mod/\\n\t/home/couchbase/go/pkg/mod/\\n\t/home/couchbase/go/pkg/mod/\\n\t/home/couchbase/go/pkg/mod/”}

Our openshfit admin installed our operator since we do not have the permission. This is the summary from them, however we do not have a working couchbase cluster. We really do not understand why the network policy change could affect couchbase cluster.

That’s the hint we need… let me explain. In the distant past, you needed to fill in the fsGroup correctly or persistent volumes wouldn’t work. After every user didn’t fill this in, we decided to try do it for you with the dynamic admission controller. On OCP this interrogates the namespace that the cluster lives in and extracts the fsGroup from the annotations, which makes me suspect that the dynamic admission controller isn’t working correctly. You can manually set the fsGroup using these instructions Persistent Volumes | Couchbase Docs

I manually reset the fsGroup, all the pods come up. Thanks for your help.

I sent the following to our openshift admin, questioned the DAC is not there.
Check the Status of the Operator
You can use the following command to check on the status of the deployments:
$ oc get deployments
couchbase-operator 1/1 1 1 8s
couchbase-operator-admission 1/1 1 1 8s
The Operator is ready to deploy CouchbaseCluster resources when both the DAC and Operator deployments are fully ready and available

root@usapprshilt100:/Automation/projects/openshift #oc project a-couchbase-test
Now using project “a-couchbase-test” on server “”.
root@usapprshilt100:/Automation/projects/openshift #
root@usapprshilt100:/Automation/projects/openshift #oc get deployment
couchbase-operator 1/1 1 1 3d3h
root@usapprshilt100:/Automation/projects/openshift #n the poc,
I did not see the couchbase DAC running, couchbase-operator-admission is missing

But our admin mentioned if in our dev env, when they installed couchbase for all named space, they did not see DAC running, but everything is working fine. Now they want to install the couchbase operator only for our name space. this is where the problem pods are not coming up. So when the operator is installed for all the name space, you do not need DAC ?

No the DAC needs to always be installed. We recommend it’s run in the default cluster mode, and therefore you only need one installed, in any namespace.

When we install it from GUI interface, according to our openshift admin, after he click install, the DAC is not installed. I could try to install it using the yaml file according to the instruction on the operator documents, however, I think that my permission as the admin of the name space is not good enough to finish the installation, it will still need openshift cluster admin role to install it ?

That’s correct, you need to install the DAC manually, it is not installed alongside the operator from the Openshift UI.

Hello guys, i feel i had same issue like @lukq 's issue.
Because i’ve installed the operator, only one instance(pod) from the cluster start running & it keep restarting. Checking events in desc order, this is what we had:

94s         Warning   MemberCreationFailed   couchbasecluster/couchbase-cluster   New member couchbase-cluster-0000 creation failed
89s         Normal    ServiceCreated         couchbasecluster/couchbase-cluster   Service for admin console `couchbase-cluster-ui` was created

Also i noticed that there is no deployment nor statefulset. Seems like the operator directly managing pods.
May be because we are running very recent version of the operator:

kind: Subscription
  labels: ""
  name: couchbase-enterprise-certified
  namespace: openshift-operators
  annotations: "4"
  channel: 2.3.2
  installPlanApproval: Automatic
  name: couchbase-enterprise-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
  startingCSV: couchbase-operator.v2.3.2-1

the cluster YAML comes by default with Operator UI in OCP console, we just copy it & rename the resource … and this is what we had:

kind: CouchbaseCluster
  name: couchbase-cluster
  namespace: databases-ntr-dev
    clusterName: couchbase-cluster
    dataServiceMemoryQuota: 256Mi
    indexServiceMemoryQuota: 256Mi
    searchServiceMemoryQuota: 256Mi
    eventingServiceMemoryQuota: 256Mi
    analyticsServiceMemoryQuota: 1Gi
    indexStorageSetting: memory_optimized
    autoFailoverTimeout: 120s
    autoFailoverMaxCount: 3
    autoFailoverOnDataDiskIssues: true
    autoFailoverOnDataDiskIssuesTimePeriod: 120s
    autoFailoverServerGroup: false
  upgradeStrategy: RollingUpgrade
  hibernate: false
  hibernationStrategy: Immediate
  recoveryPolicy: PrioritizeDataIntegrity
    adminSecret: couchbase-cluster-auth
      managed: true
          cluster: couchbase-cluster
    managed: false
        cluster: couchbase-cluster
    image: >-
    managed: false
    serviceAccountName: couchbase-backup
        cluster: couchbase-cluster
      enabled: false
    exposeAdminConsole: true
      - data
      - xdcr
    exposedFeatureServiceType: NodePort
    # adminConsoleServiceType: NodePort
        type: ClusterIP
    managed: true
        cluster: couchbase-cluster
  logRetentionTime: 604800s
  logRetentionCount: 20
  enablePreviewScaling: false
    - size: 3
      name: all_services
        - data
        - index
        - query
        - search
        - eventing
        - analytics

Is it same root cause ? i mean : the missing of admission controller & DAC?