Issue with file permissions setting up Couchbase Backup using Operator

I am trying to setup Couchbase cluster backup using Operator on Google Cloud.

It seems like jobs created by the operator has issues with file permissions:

Found 2 pods, using pod/ds-couchbase-backup-full-27681645-shmjf
Traceback (most recent call last):
  File "/usr/local/bin/backup.py", line 1213, in <module>
    Backup(context).run()
  File "/usr/local/bin/backup.py", line 378, in run
    self._setup_logging()
  File "/usr/local/bin/backup.py", line 1123, in _setup_logging
    os.makedirs(self.context.log_path, exist_ok=True)
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/data/scriptlogs'

I found in the documentation that it may be needed to set security context CouchbaseCluster Resource | Couchbase Docs.

But for some reason the image operator use for backup (couchbase/operator-backup:1.3.0) has different id for Couchbase user than the one used to run cluster nodes (couchbase/server:7.1.1) - 8453 instead of 1000.

Can you suggest how to fix this issue?

It seems that the issue with disk permissions was solved in image couchbase/operator-backup:1.3.1.

But there is another one - backup pods have TLS secret mounted, but no CA certificate (which is according to this tutorial is in another k8s secret Configure TLS | Couchbase Docs) and therefore cbbackupmgr returns this error:

2022-08-19T10:21:39.425+00:00 (Cmd) Error backing up cluster: open /var/run/secrets/couchbase.com/tls-mount/ca.crt: no such file or directory

Any ideas how to solve this?

UPD: newer image wasn’t the solution. While looking for backup logs and creating job manually as described in Configure Automated Backup and Restore | Couchbase Docs - I managed to run backup by adding security context as:

...
      securityContext:
        fsGroup: 8453

After this the backup job itself can also run successfully even so it doesn’t have security context configured.

1 Like

In case anyone has the same issue - manually adding ca.crt to the couchbase-server-tls secret helped to solve this problem.

1 Like

Thanks to your investigation and sharing your knowledge.
I’m running into the same problem.
Operator-backup 1.3.1 has couchbase:x:8453:8453::/home/couchbase:/bin/false
While in 1.1.0 was x:1000:1000

Server in 7.1.3 has couchbase:x:1000:1000::/home/couchbase:/bin/sh

How did you change the securityContext group at cluster level?
For me adding it makes no difference adding inside the spec, and the server pord returns me this message
“groups: cannot find name for group ID 8453”

Can you clarify a bit, thankyou

Update
I managed to run the backup with the runAsUser: 0
/data is owned by root on the docker image but can’t find scriptlogs file
Update2
I changed the folders ownership to couchbase on /data/backups and /data/scriptlogs and everything started working. No changes in kubernetes definition. It’s all about linux permision on the folders not properly set by the new script backup.py from version >= 1.2.0

I’m digging into the backup.py scripts from 1.1.0 and 1.3.1 extracted from the docker images.
Cause I’m a new user I can’t upload any file.

I find out that version 1.1.0 script has mk_dir definition for /data/scriptlogs /data/backups in case of they don’t exists while version 1.3.1 has not this case controlled so this is the reason I believe it fails.

Extract found script 1.1.0 in /opt/couchbase/bin/

It has the mounts on data on both version of course lines 41-44

MOUNT_LOCATION = os.path.join("/data")

BACKUPS_LOCATION = os.path.join(MOUNT_LOCATION, "backups")

LOGS_LOCATION = os.path.join(MOUNT_LOCATION, "scriptlogs")

STAGING_LOCATION = os.path.join(MOUNT_LOCATION, "staging")

And the making dir def lines 305-308

def create_local_archive(context):
    """
    Creates a local archive if required, and initializes a repository
    if one is required.
    """

    if context.args.mode == MODE_RESTORE:
        return

    if context.args.s3_bucket:
        if context.args.config:
            config_repo(context)
        return

    archive_created = False
    if not os.access(BACKUPS_LOCATION, os.F_OK):
        archive_created = mk_dir(BACKUPS_LOCATION)

    # remove any lock leftover by a dangling cbbackupmgr process
    logging.debug("Removing stale lock file")
    if os.path.exists(os.path.join(BACKUPS_LOCATION, "lock.lk")):
        os.remove(os.path.join(BACKUPS_LOCATION, "lock.lk"))

    # if archive directory was created e.g. an incremental was scheduled
    # first, or we're forcing a new one, configure it.
    if archive_created or context.args.config:
        logging.info("Performing config as backup archive was just created")
        config_repo(context)

Extract found script 1.3.1 in /usr/local/bin/ lines 41-44

MOUNT_LOCATION = os.path.join("/data")

BACKUPS_LOCATION = os.path.join(MOUNT_LOCATION, "backups")

LOGS_LOCATION = os.path.join(MOUNT_LOCATION, "scriptlogs")

STAGING_LOCATION = os.path.join(MOUNT_LOCATION, "staging")

And the initialization of Backups and logs location lines 112-125

    def __init__(self, **kwargs):
        """
        Initialialize defaults that cannot go wrong.
        Don't put any calls in here, they cannot be mocked during initialization.
        """
        self.log_path = LOGS_LOCATION
        if 'log_path' in kwargs:
            self.log_path = kwargs['log_path']

        self.archive = BACKUPS_LOCATION
        if 'archive' in kwargs:
            self.archive = kwargs['archive']

        self.timestamp = datetime.now()

Here it is almost May 2023 and this issue still exists?

Traceback (most recent call last): File “/usr/local/bin/backup.py”, line 1243, in Backup(context).run()
File “/usr/local/bin/backup.py”, line 379, in run self._setup_logging()
File “/usr/local/bin/backup.py”, line 1151, in _setup_logging os.makedirs(self.context.log_path, exist_ok=True)
File “/usr/lib/python3.8/os.py”, line 223, in makedirs mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: ‘/data/scriptlogs’

I’m using Couchbase operator 2.4.0 and image: couchbase/operator-backup:1.3.2

This seems like an issue that shouldn’t still exist, but it sure does. I followed directions https://docs.couchbase.com/operator/current/howto-backup.html

I poked around a bit more and also re-read this “issue” here as well.

issue related to CouchbaseBackups (see backup-restore/backup.yaml) Issue with file permissions setting up Couchbase Backup using Operator
https://docs.couchbase.com/operator/current/howto-backup.html#overview (See important note there !!! HINT: this should probably be setup in ALL CouchbaseCluster Examples !!!)
Persistent Volumes | Couchbase Docs

In case you’re newish to all of this what they’re all talking about is when you definte the ‘CouchbaseCluster’ YAML for Kubernetes (couchbasecluster.spec.securitycontext.fsgroup)
like this (more or less…)

apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cb-test
  namespace: couchbase  
spec:
  backup:
    managed: true 
    image: couchbase/operator-backup:1.3.2 
    serviceAccountName: couchbase-backup
  securityContext:
    # issue related to CouchbaseBackups (see backup-restore/backup.yaml) https://www.couchbase.com/forums/t/issue-with-file-permissions-setting-up-couchbase-backup-using-operator/34290
    # https://docs.couchbase.com/operator/current/howto-backup.html#overview (See important note there)
    # https://docs.couchbase.com/operator/current/concept-persistent-volumes.html#using-storage-classes
    fsGroup: 1000 # 8453
 ... (snip - fill in the rest of your cluster spec below) ...

This seems to get past the previous error messages. Just in case someone lands here and is a little lost as to what’s being talked about.

Additionally the Docker images DOES use this on line 21 (at least for v1.3.2)

USER 8453

https://hub.docker.com/layers/couchbase/operator-backup/1.3.2/images/sha256-6b9e09159210f05036b741627de5fac5f91a3c2b010b9812afb08653b1bfcdfd?context=explore