Couchbase-operator-1.2 upgrade

Hi,

I’m using Terraform to deploy to k8s. I had it all working with the operator, cluster, + sync gateway.

I see the new operator 1.2 is out, so today I was working on updating to that. I noticed a couple changes:

  1. There is now an admission controller pod
  2. The operator no longer uses a clusterrole and instead uses a standard role

I have the admission controller working now, but the old operator script used to have a volume / volume mount that would mount a secret with a username/password. The new script removes that:

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  name: couchbase-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: couchbase-operator
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: couchbase-operator
    spec:
      containers:
      - args:
        - --pod-create-timeout=10m
        - --create-crd=false
        command:
        - couchbase-operator
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        image: couchbase/operator:1.2.0
        name: couchbase-operator
        ports:
        - containerPort: 8080
          name: http
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /readyz
            port: http
          initialDelaySeconds: 3
          periodSeconds: 3
        resources: {}
      serviceAccountName: couchbase-operator
status: {}

Is this correct? When I spawn the operator like this, it fails to start the pod and the logs indicate:

$ kc logs couchbase-operator-admission-7c49f757d-z8dxm 
I0508 20:57:50.499645       1 admission.go:300] couchbase-operator-admission 1.2.0 (release)
ml-dford1tb-01623:localhost dford$ kc logs couchbase-operator-676b4d94c4-dh6h9 
time="2019-05-08T20:58:31Z" level=info msg="couchbase-operator v1.2.0 (release)" module=main
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

goroutine 1 [running]:
github.com/couchbase/couchbase-operator/pkg/util/k8sutil.MustNewKubeClient(0xc0003862a0, 0x23)
	/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/util/k8sutil/k8sutil.go:69 +0x6a
main.main()
	/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/cmd/operator/main.go:93 +0x5e1

My old script used to mount this secret…but I removed that b/c it is gone from the new YAML files (note this is Terraform HCL script):

resource "kubernetes_deployment" "couchbase-operator" {
  metadata {
    name = "couchbase-operator"

    labels = {
      app = "couchbase-operator"
    }
  }

  spec {
    # we don't want more than one replica
    replicas = 1

    selector {
      match_labels {
        app = "couchbase-operator"
      }
    }

    template {
      metadata {
        labels {
          app = "couchbase-operator"
        }
      }

      spec {
        service_account_name = "${var.cb-operator-service-account-name}"

        container {
          name    = "couchbase-operator"
          image   = "${var.cb-operator-image}"
          command = ["couchbase-operator"]
          args    = "${var.cb-operator-args}"

          env {
            name = "MY_POD_NAMESPACE"

            value_from {
              field_ref {
                field_path = "metadata.namespace"
              }
            }
          }

          env {
            name = "MY_POD_NAME"

            value_from {
              field_ref {
                field_path = "metadata.name"
              }
            }
          }

          port {
            name           = "readiness-port"
            container_port = 8080
          }

          # must explicitly mount with terraform/k8s provider
          # https://github.com/kubernetes/kubernetes/issues/27973#issuecomment-463903176
          volume_mount {
            mount_path = "/var/run/secrets/kubernetes.io/serviceaccount"
            name       = "${kubernetes_service_account.couchbase-operator.default_secret_name}"
            read_only  = true
          }

          readiness_probe {
            http_get {
              path = "/readyz"
              port = "readiness-port"
            }

            initial_delay_seconds = 3
            period_seconds        = 3
            failure_threshold     = 19
          }
        }

        volume {
          name = "${kubernetes_service_account.couchbase-operator.default_secret_name}"

          secret {
            secret_name = "${kubernetes_service_account.couchbase-operator.default_secret_name}"
          }
        }
      }
    }
  }
}

Hi Davis,

The volume you saw from the previous operator was auto-mounted by kubernetes on creation. Your updated spec looks correct and the token should be auto-mounted here as well. I would need some more information about the steps you did using kubectl to understand what’s happening here, but is it possible that the couchbase-operator service account doesn’t exist(?) or that it hasn’t be updated with new roles for the 1.2 operator (?). I suggest going through the steps to setup RBAC first - https://docs.couchbase.com/operator/1.2/install-kubernetes.html#create-a-service-account
Then upgrading your operator deployment.

The missing token should be manually mounted, kubernetes will automatically mount it for you when your spec has declared a service account via --> serviceAccountName: couchbase-operator

Hi Tommie. The service account is defined and exists. I just deleted my whole minikube and rebuilt it and started from scratch just to ensure I had a clean environ – same problem tho. Here’s more information.

So, after executing terraform apply I get the two pods for the admission controller and the operator. …the former is happy, the latter is not:

$ kc get pods --watch
NAME                                           READY   STATUS    RESTARTS   AGE
couchbase-operator-676b4d94c4-jf2kc            0/1     Error     0          8s
couchbase-operator-admission-7c49f757d-ds754   1/1     Running   0          9s
couchbase-operator-676b4d94c4-jf2kc            0/1     Error     1          8s
couchbase-operator-676b4d94c4-jf2kc            0/1     CrashLoopBackOff   1          9s

Looking at logs (same problem):

$ kc logs couchbase-operator-676b4d94c4-jf2kc 
time="2019-05-09T13:56:07Z" level=info msg="couchbase-operator v1.2.0 (release)" module=main
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

goroutine 1 [running]:
github.com/couchbase/couchbase-operator/pkg/util/k8sutil.MustNewKubeClient(0xc000354150, 0x23)
	/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/util/k8sutil/k8sutil.go:69 +0x6a
main.main()
	/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/cmd/operator/main.go:93 +0x5e1

Let’s check if the service account is there:

$ kc get serviceaccounts 
NAME                           SECRETS   AGE
couchbase-operator             1         25s
couchbase-operator-admission   1         25s
default                        1         16m

Inspecting the couchbase-operator service account:

$ kc describe serviceaccounts couchbase-operator
Name:                couchbase-operator
Namespace:           default
Labels:              app=couchbase
Annotations:         <none>
Image pull secrets:  <none>
Mountable secrets:   couchbase-operator-token-8hsl2
Tokens:              couchbase-operator-token-8hsl2
Events:              <none>

Let’s look at its token:

$ kc get secret
NAME                                       TYPE                                  DATA   AGE
cb-operator-auth                           Opaque                                2      44s
cb-sync-gateway-auth                       Opaque                                2      44s
couchbase-operator-admission               Opaque                                2      44s
couchbase-operator-admission-token-9vf5q   kubernetes.io/service-account-token   3      44s
couchbase-operator-token-8hsl2             kubernetes.io/service-account-token   3      44s
default-token-j4fpn                        kubernetes.io/service-account-token   3      16m

$ kc describe secret couchbase-operator-token-8hsl2 
Name:         couchbase-operator-token-8hsl2
Namespace:    default
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: couchbase-operator
              kubernetes.io/service-account.uid: be2b61b1-7261-11e9-ba1a-000c292869c0

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1066 bytes
namespace:  7 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJ [TRUNCATED]

The problem is the mount is missing. Let’s inspect the operator config. First, here’s the deployment config (note it is specifying the ServiceAccount: couchbase-operator:

$ kc describe deploy couchbase-operator
Name:                   couchbase-operator
Namespace:              default
CreationTimestamp:      Thu, 09 May 2019 09:52:55 -0400
Labels:                 app=couchbase-operator
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=couchbase-operator
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=couchbase-operator
  Service Account:  couchbase-operator
  Containers:
   couchbase-operator:
    Image:      couchbase/operator:1.2.0
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      couchbase-operator
    Args:
      --create-crd=false
    Readiness:  http-get http://:readiness-port/readyz delay=3s timeout=1s period=3s #success=1 #failure=19
    Environment:
      MY_POD_NAMESPACE:   (v1:metadata.namespace)
      MY_POD_NAME:        (v1:metadata.name)
    Mounts:              <none>
  Volumes:               <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   couchbase-operator-676b4d94c4 (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  9m10s  deployment-controller  Scaled up replica set couchbase-operator-676b4d94c4 to 1

Now here is the pod config it generated – note how there are no mounts so how is it expected to read that token?

$ kc describe pod couchbase-operator-
couchbase-operator-676b4d94c4-jf2kc           couchbase-operator-admission-7c49f757d-ds754
ml-dford1tb-01623:localhost dford$ kc describe pod couchbase-operator-676b4d94c4-jf2kc 
Name:               couchbase-operator-676b4d94c4-jf2kc
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               minikube/172.16.129.128
Start Time:         Thu, 09 May 2019 09:52:55 -0400
Labels:             app=couchbase-operator
                    pod-template-hash=676b4d94c4
Annotations:        <none>
Status:             Running
IP:                 172.17.0.5
Controlled By:      ReplicaSet/couchbase-operator-676b4d94c4
Containers:
  couchbase-operator:
    Container ID:  docker://a27395f10d3aa6cdef5e67aeab212c3318f6abe60b5a040b7d292efde502efa8
    Image:         couchbase/operator:1.2.0
    Image ID:      docker-pullable://couchbase/operator@sha256:8f19438ae209402c07658c0020de005187a8d4d99ac48c7246afd1498762ded9
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      couchbase-operator
    Args:
      --create-crd=false
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Thu, 09 May 2019 09:58:59 -0400
      Finished:     Thu, 09 May 2019 09:58:59 -0400
    Ready:          False
    Restart Count:  6
    Readiness:      http-get http://:readiness-port/readyz delay=3s timeout=1s period=3s #success=1 #failure=19
    Environment:
      MY_POD_NAMESPACE:  default (v1:metadata.namespace)
      MY_POD_NAME:       couchbase-operator-676b4d94c4-jf2kc (v1:metadata.name)
    Mounts:              <none>
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:            <none>
QoS Class:          BestEffort
Node-Selectors:     <none>
Tolerations:        node.kubernetes.io/not-ready:NoExecute for 300s
                    node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m38s                  default-scheduler  Successfully assigned default/couchbase-operator-676b4d94c4-jf2kc to minikube
  Normal   Pulling    6m38s                  kubelet, minikube  Pulling image "couchbase/operator:1.2.0"
  Normal   Pulled     6m31s                  kubelet, minikube  Successfully pulled image "couchbase/operator:1.2.0"
  Normal   Created    4m59s (x5 over 6m31s)  kubelet, minikube  Created container couchbase-operator
  Normal   Started    4m59s (x5 over 6m31s)  kubelet, minikube  Started container couchbase-operator
  Normal   Pulled     4m59s (x4 over 6m31s)  kubelet, minikube  Container image "couchbase/operator:1.2.0" already present on machine
  Warning  BackOff    87s (x25 over 6m30s)   kubelet, minikube  Back-off restarting failed container

It is possible I’m missing the rolebinding – looking into that now, but I do create the role:

$ kc get role
NAME                 AGE
couchbase-operator   14m

$ kc describe role
Name:         couchbase-operator
Labels:       app=couchbase
Annotations:  <none>
PolicyRule:
  Resources                        Non-Resource URLs  Resource Names  Verbs
  ---------                        -----------------  --------------  -----
  poddisruptionbudgets.policy      []                 []              [create get delete]
  events                           []                 []              [create patch]
  pods/exec                        []                 []              [create]
  secrets                          []                 []              [get]
  endpoints                        []                 []              [list watch create update get delete]
  persistentvolumeclaims           []                 []              [list watch create update get delete]
  pods                             []                 []              [list watch create update get delete]
  services                         []                 []              [list watch create update get delete]
  couchbaseclusters.couchbase.com  []                 []              [list watch update get]

No, it looks like the role binding is ok to me?

$ kc get rolebindings.rbac.authorization.k8s.io 
NAME                 AGE
couchbase-operator   15m

$ kc describe rolebindings.rbac.authorization.k8s.io 
Name:         couchbase-operator
Labels:       app=couchbase
Annotations:  <none>
Role:
  Kind:  Role
  Name:  couchbase-operator
Subjects:
  Kind            Name                Namespace
  ----            ----                ---------
  ServiceAccount  couchbase-operator  default

$ kubectl create rolebinding couchbase-operator --role couchbase-operator --serviceaccount default:couchbase-operator
Error from server (AlreadyExists): rolebindings.rbac.authorization.k8s.io "couchbase-operator" already exists

Yea, the role/rolebinding is not the problem

Dayum – I’ve been bitten by this twice already. If you use the terraform kubernetes provider, you must explicitly mount the volume in the spec.

2 Likes

Ah good find! Good to know in terraform “AutomountServiceAccountToken is hardcoded to false”. We’ll get this documented, thanks!

1 Like

@tommie quick follow up question. can you tell me why I’m now seeming to get the same error for the admission controller? I can explicitly mount the same volume in that container, but I wasn’t aware that it needed it. But it seems to be trying to load it, as well…is that expected?

$ kc logs couchbase-operator-admission-7c49f757d-9v4gw 
I0509 16:21:12.878693       1 admission.go:300] couchbase-operator-admission 1.2.0 (release)
F0509 16:21:13.709976       1 admission.go:66] open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory


$ kc describe deployments. couchbase-operator-admission 
Name:                   couchbase-operator-admission
Namespace:              default
CreationTimestamp:      Thu, 09 May 2019 12:19:30 -0400
Labels:                 app=couchbase-operator-admission
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=couchbase-operator-admission
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=couchbase-operator-admission
  Service Account:  couchbase-operator-admission
  Containers:
   couchbase-operator-admission:
    Image:      couchbase/admission-controller:1.2.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Command:
      couchbase-operator-admission
    Args:
      --logtostderr
      --stderrthreshold
      0
      --tls-cert-file
      /var/run/secrets/couchbase.com/couchbase-operator-admission/tls-cert-file
      --tls-private-key-file
      /var/run/secrets/couchbase.com/couchbase-operator-admission/tls-private-key-file
    Environment:  <none>
    Mounts:
      /var/run/secrets/couchbase.com/couchbase-operator-admission from couchbase-operator-admission (ro)
  Volumes:
   couchbase-operator-admission:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  couchbase-operator-admission
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      False   MinimumReplicasUnavailable
OldReplicaSets:  <none>
NewReplicaSet:   couchbase-operator-admission-7c49f757d (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  2m25s  deployment-controller  Scaled up replica set couchbase-operator-admission-7c49f757d to 1

Yes the admission controller works in a similar way, but it uses a different SA because the bindings are different. Are you mounting the secret from the couchbase-operator-admission account?..ie couchbase-operator-admission-token-9vf5q.

Here’s my full deployment terraform script for both of them. This is basically just taking your yaml and transposing it into the terraform kubernetes provider.

As noted above – I must manually specify the volume/volume mounts b/c the provider doesn’t handle it. It just seems bizarre to me to load the same secret in both…maybe I’ve got it wrong, but the configuration below seems to be working.

My only issue now is something seems to be wrong with the readiness probe for the operator. I’m looking into this.

Events:
  Type     Reason     Age                      From               Message
  ----     ------     ----                     ----               -------
  Normal   Scheduled  9m29s                    default-scheduler  Successfully assigned default/couchbase-operator-5c458bb9cb-c2pg4 to minikube
  Normal   Pulled     9m28s                    kubelet, minikube  Container image "couchbase/operator:1.2.0" already present on machine
  Normal   Created    9m28s                    kubelet, minikube  Created container couchbase-operator
  Normal   Started    9m28s                    kubelet, minikube  Started container couchbase-operator
  Warning  Unhealthy  4m27s (x100 over 9m24s)  kubelet, minikube  Readiness probe errored: strconv.Atoi: parsing "readiness-port": invalid syntax

#
# Deployment of Couchbase Autonomous Admission Controller
#
resource "kubernetes_deployment" "couchbase-operator-admission" {
  metadata {
    name = "couchbase-operator-admission"

    labels = {
      app = "couchbase-operator-admission"
    }
  }

  spec {
    # we don't want more than one replica
    replicas = 1

    selector {
      match_labels {
        app = "couchbase-operator-admission"
      }
    }

    template {
      metadata {
        labels {
          app = "couchbase-operator-admission"
        }
      }

      spec {
        service_account_name = "couchbase-operator-admission"

        container {
          name    = "couchbase-operator-admission"
          image   = "couchbase/admission-controller:1.2.0"
          command = ["couchbase-operator-admission"]

          args = ["--logtostderr",
            "--stderrthreshold",
            "0",
            "--tls-cert-file",
            "/var/run/secrets/couchbase.com/couchbase-operator-admission/tls-cert-file",
            "--tls-private-key-file",
            "/var/run/secrets/couchbase.com/couchbase-operator-admission/tls-private-key-file",
          ]

          port {
            name           = "https"
            container_port = 8443
          }

          # must explicitly mount with terraform/k8s provider
          # https://github.com/kubernetes/kubernetes/issues/27973#issuecomment-463903176
          volume_mount {
            mount_path = "/var/run/secrets/couchbase.com/couchbase-operator-admission"
            name       = "couchbase-operator-admission"
            read_only  = true
          }

          # must explicitly mount with terraform/k8s provider
          # https://github.com/kubernetes/kubernetes/issues/27973#issuecomment-463903176
          volume_mount {
            mount_path = "/var/run/secrets/kubernetes.io/serviceaccount"
            name       = "${kubernetes_service_account.couchbase-operator.default_secret_name}"
            read_only  = true
          }
        }

        volume {
          name = "couchbase-operator-admission"

          secret {
            secret_name = "couchbase-operator-admission"
          }
        }

        volume {
          name = "${kubernetes_service_account.couchbase-operator.default_secret_name}"

          secret {
            secret_name = "${kubernetes_service_account.couchbase-operator.default_secret_name}"
          }
        }
      }
    }
  }
}

#
# Deployment of Couchbase Autonomous Operator
#
resource "kubernetes_deployment" "couchbase-operator" {
  metadata {
    name = "couchbase-operator"

    labels = {
      app = "couchbase-operator"
    }
  }

  spec {
    # we don't want more than one replica
    replicas = 1

    selector {
      match_labels {
        app = "couchbase-operator"
      }
    }

    template {
      metadata {
        labels {
          app = "couchbase-operator"
        }
      }

      spec {
        service_account_name = "${kubernetes_service_account.couchbase-operator.metadata.0.name}"

        container {
          name    = "couchbase-operator"
          image   = "${var.cb-operator-image}"
          command = ["couchbase-operator"]
          args    = "${var.cb-operator-args}"

          env {
            name = "MY_POD_NAMESPACE"

            value_from {
              field_ref {
                field_path = "metadata.namespace"
              }
            }
          }

          env {
            name = "MY_POD_NAME"

            value_from {
              field_ref {
                field_path = "metadata.name"
              }
            }
          }

          port {
            name           = "http"
            container_port = 8080
          }

          # must explicitly mount with terraform/k8s provider
          # https://github.com/kubernetes/kubernetes/issues/27973#issuecomment-463903176
          volume_mount {
            mount_path = "/var/run/secrets/kubernetes.io/serviceaccount"
            name       = "${kubernetes_service_account.couchbase-operator.default_secret_name}"
            read_only  = true
          }

          readiness_probe {
            http_get {
              path = "/readyz"
              port = "readiness-port"
            }

            initial_delay_seconds = 3
            period_seconds        = 3
            failure_threshold     = 19
          }
        }

        volume {
          name = "${kubernetes_service_account.couchbase-operator.default_secret_name}"

          secret {
            secret_name = "${kubernetes_service_account.couchbase-operator.default_secret_name}"
          }
        }
      }
    }
  }
}

The readiness error should go away if you change readiness-port to http in the probe.

Also with the secret on the admission controller, it’s actually a different token there but you have to specify the service account for the admission controller instead of the operator…ie
name = “${kubernetes_service_account.couchbase-admission-controller. default_secret_name}”

1 Like

Good catch, sir. Will update tomorrow, but I believe you are right – yes, already fixed the readiness-probe earlier, and your fix is absolutely correct. Thanks again.