The cbbackup tool is a flexible backup command that enables you to backup both local data and remote nodes and clusters involving different combinations of your data:
Single bucket on a single node
All the buckets on a single node
Single bucket from an entire cluster
All the buckets from an entire cluster
Backups can be performed either locally, by copying the files directly on a single node, or remotely by connecting to the cluster and then streaming the data from the cluster to your backup location. Backups can be performed either on a live running node or cluster, or on an offline node.
The cbbackup command stores data in a format that allows for easy restoration. When restoring, using cbrestore, you can restore back to a cluster of any configuration. The source and destination clusters do not need to match if you used cbbackup to store the information.
The cbbackup command will copy the data in each course from the source definition to a destination backup directory. The backup file format is unique to Couchbase and enables you to restore, all or part of the backed up data when restoring the information to a cluster. Selection can be made on a key (by regular expression) or all the data stored in a particular vBucket ID. You can also select to copy the source data from a bucketname into a bucket of a different name on the cluster on which you are restoring the data.
The cbbackup command takes the following arguments:
cbbackup [options] [source] [backup_dir]Where the arguments are as described below:
[options]
One or more options for the backup process. These are used to configure username and password information for connecting to the cluster, backup type selection, and bucket selection. For a full list of the supported arguments, see Section 7.8, “cbbackup Tool”.
The primary options select what will be backed up by cbbackup, including:
--single-node
Only back up the single node identified by the source specification.
--bucket-source or -b
Backup only the specified bucket name.
[source]
The source for the data, either a local data directory reference, or a remote node/cluster specification:
Local Directory Reference
A local directory specification is defined as a URL
using the couchstore-files protocol.
For example:
couchstore-files:///opt/couchbase/var/lib/couchbase/data/defaultUsing this method you are specifically backing up the specified bucket data on a single node only. To backup an entire bucket data across a cluster, or all the data on a single node, you must use the cluster node specification. This method does not backup the design documents defined within the bucket.
cluster node
A node or node within a cluster, specified as a URL to the node or cluster service. For example:
http://HOST:8091
Or for distinction you can use the
couchbase protocol prefix:
couchbase://HOST:8091The administrator and password can also be combined with both forms of the URL for authentication. If you have named data buckets other than the default bucket which you want to backup, you will need to specify an administrative name and password for the bucket:
couchbase://Administrator:password@HOST:8091The combination of additional options specifies whether the supplied URL refers to the entire cluster, a single node, or a single bucket (node or cluster). The node and cluster can be remote (or local).
This method also backs up the design documents used to define views and indexes.
[backup_dir]
The directory where the backup data files will be stored on the node on which the cbbackup is executed. This must be an absolute, explicit, directory, as the files will be stored directly within the specified directory; no additional directory structure is created to differentiate between the different components of the data backup.
The directory that you specify for the backup should either not exist, or exist and be empty with no other files. If the directory does not exist, it will be created, but only if the parent directory already exists.
The backup directory is always created on the local node, even if you are backing up a remote node or cluster. The backup files are stored locally in the backup directory specified.
Backups can take place on a live, running, cluster or node for the IP
Using this basic structure, you can backup a number of different combinations of data from your source cluster. Examples of the different combinations are provided below:
Backup all nodes and all buckets
To backup an entire cluster, consisting of all the buckets and all the node data:
shell> cbbackup http://HOST:8091 /backups/backup-20120501 \ -u Administrator -p password [####################] 100.0% (231726/231718 msgs) bucket: default, msgs transferred... : total | last | per sec batch : 5298 | 5298 | 617.1 byte : 10247683 | 10247683 | 1193705.5 msg : 231726 | 231726 | 26992.7 done [####################] 100.0% (11458/11458 msgs) bucket: loggin, msgs transferred... : total | last | per sec batch : 5943 | 5943 | 15731.0 byte : 11474121 | 11474121 | 30371673.5 msg :84 |84 | 643701.2 done
When backing up multiple buckets, a progress report, and
summary report for the information transferred will be
listed for each bucket backed up. The
msgs count shows the number of documents
backed up. The byte shows the overall
size of the data document data.
The source specification in this case is the URL of one of the nodes in the cluster. The backup process will stream data directly from each node in order to create the backup content. The initial node is only used to obtain the cluster topology so that the data can be backed up.
A backup created in this way enables you to choose during restoration how you want to restore the information. You can choose to restore the entire dataset, or a single bucket, or a filtered selection of that information onto a cluster of any size or configuration.
Backup all nodes, single bucket
To backup all the data for a single bucket, containing all of the information from the entire cluster:
shell> cbbackup http://HOST:8091 /backups/backup-20120501 \ -u Administrator -p password \ -b default [####################] 100.0% (231726/231718 msgs) bucket: default, msgs transferred... : total | last | per sec batch : 5294 | 5294 | 617.0 byte : 10247683 | 10247683 | 1194346.7 msg : 231726 | 231726 | 27007.2 done
The -b option specifies the name of the
bucket that you want to backup. If the bucket is a named
bucket you will need to provide administrative name and
password for that bucket.
To backup an entire cluster, you will need to run the same operation on each bucket within the cluster.
Backup single node, all buckets
To backup all of the data stored on a single node across all of the different buckets:
shell> cbbackup http://HOST:8091 /backups/backup-20120501 \ -u Administrator -p password \ --single-node
Using this method, the source specification must specify the node that you want backup. To backup an entire cluster using this method, you should backup each node individually.
Backup single node, single bucket
To backup the data from a single bucket on a single node:
shell> cbbackup http://HOST:8091 /backups/backup-20120501 \ -u Administrator -p password \ --single-node \ -b default
Using this method, the source specification must be the node that you want to back up.
Backup single node, single bucket; backup files stored on same node
To backup a single node and bucket, with the files stored on the same node as the source data, there are two methods available. One uses a node specification, the other uses a file store specification. Using the node specification:
shell> ssh USER@HOST remote-shell> sudo su - couchbase remote-shell> cbbackup http://127.0.0.1:8091 /mnt/backup-20120501 \ -u Administrator -p password \ --single-node \ -b default
This method backups up the cluster data of a single bucket on the local node, storing the backup data in the local filesystem.
Using a file store reference (in place of a node reference) is faster because the data files can be copied directly from the source directory to the backup directory:
shell> ssh USER@HOST remote-shell> sudo su - couchbase remote-shell> cbbackup couchstore-files:///opt/couchbase/var/lib/couchbase/data/default /mnt/backup-20120501
To backup the entire cluster using this method, you will need to backup each node, and each bucket, individually.
Choosing the right backup solution will depend on your requirements and your expected method for restoring the data to the cluster.
The cbbackup command includes support for filtering the keys that are backed up into the database files you create. This can be useful if you want to specifically backup a portion of your dataset, or you want to move part of your dataset to a different bucket.
The specification is in the form of a regular expression, and is performed on the client-side within the cbbackup tool. For example, to backup information from a bucket where the keys have a prefix of 'object':
shell> cbbackup http://HOST:8091 /backups/backup-20120501 \ -u Administrator -p password \ -b default \ -k '^object.*'
The above will copy only the keys matching the specified prefix into the backup file. When the data is restored, only those keys that were recorded in the backup file will be restored.
The regular expression match is performed client side. This means that the entire bucket contents must be accessed by the cbbackup command and then discardeed if the regular expression does not match.
Key-based regular expressions can also be used when restoring data. You can backup an entire bucket and restore selected keys during the restore process using cbrestore. For more information, see Section 5.6.2.2, “Restoring using cbrestore tool”.
You can also backup by using either cbbackup and specifying the local directory where the data is stored, or by copying the data files directly using cp, tar or similar.
For example, using cbbackup:
shell> cbbackup \ couchstore-files:///opt/couchbase/var/lib/couchbase/data/default \ /mnt/backup-20120501
The same backup operation using cp:
shell> cp -R /opt/couchbase/var/lib/couchbase/data/default \ /mnt/copy-20120501
The limitation of backing up information in this way is that
the data can only be restored to offline nodes in an identical
cluster configuration, and where an identical vbucket map is
in operation (you should also copy the
config.dat configuration file from each
node.