Database Change Protocol (DCP) is the core replication protocol for version 3.0 and is used to connect nodes and clusters across geo distributed data centers. In this post, we’ll dive deep into its workings and how it enables higher availability, performance and scale with Couchbase Server 3.0.
Lets start with Why replication is such a critical part of any database: Here is “why”:
- Protecting Data: Replication is used to protect against failures – by maintaining the local and cross data center replica feeds. How fast a database can replicate the data to replicas decide the data-loss window. The less efficient a database in replication the larger the data loss window!
- Providing Fresher Indexes: Replication is also the means of updating views in Couchbase. The incremental map/reduce index that is used to answer queries needs a fast feed to stay fresh and up to date. Especially queries that need a consistent view of data cannot be answered if the replication feed isn’t blazingly fast and efficient!
So what makes DCP special and different from competing replication protocols? There are 4 key properties:
- Ordering: DCP orders mutations. This is important to be able to record where things were left off.
- Restart-able: DCP optimizes restarts after short or long lasting failures.
- Consistent: DCP is able to efficiently produce a consistent snapshot.
- High Performance: DCP is memory based. It streams changes eagerly as long as clients can keep up.
There are 3 pieces that make up the majority of magic of DCP: vbucket UUID, sequence numbers and failover log. I’ll assume you know a few concerpts like vbuckets already. (you can find more information here about vbuckets). Lets take a look at what they are:
A vbucket UUID and a mutation sequence number uniquely identify each mutation in Couchbase Server. Lets explain these concepts; vbuckets represent shards within Couchbase Server. Each vbucket has a unique UUID assigned to it. Each mutation gets assigned a sequence number. The sequence number is a monotonically increasing number and is scoped to each vbucket. The magic of consistency and granular resume-ability is achieved through the failover log. Each vbucket also maintains a failover log. The failover log records multiple pairs of vbucket UUID and sequence number. First entry marks the beginning of time and each new entry marks a failover. Master and replica vbuckets in the system maintain the same failover log. When a failure happen – lets say the master vbucket fail for example – an replica vbucket gets updated to an master vbucket. The new master vbucket failover log entry records the vbucket UUID and last sequence number of the new master vbucket and replicates that to replica vbuckets to mark the occasion. These basic mechanics enable a whole bunch of magic. In the next section, lets take a look at some of the top operations enhanced in 3.0 with DCP and explain how these mechanics make it happen.
3.0 can get you 100x efficiency in storage of backups. Here is how: With 3.0, full backups can be complemented with incremental backups for greater storage efficiency. Incremental backups simply backup only the data that has changed since the last full, cumulative or incremental backup. One can take a full snapshot and create a chain of incremental or cumulative backups, instead of taking full snapshots every time. Vbucket UUID, sequence number and failover log all provide the metadata necessary to be able to keep things consistent between the chain of backups.
Views and Incremental Map/Reduce Processing
Low latency view queries is key to a snappy application and we see 50x improvement in latencies of view queries with 3.0. Map/Reduce has been around for a while within Hadoop world but Couchbase provides incremental processing of map/reduce to allow pre-calculation of your queries in a view. This is how Couchbase Server gets you low latency queries. The incremental processing engine is fed by a DCP stream. As mutations arrive in memory, they are streamed for processing over to design documents and the views within them. The streaming improves the freshness of the indexes by over 50x compared to previous versions. Many queries require consistency. Couchbase developer has options on consistency of the view – One can query what the index has processed at any point (stale=ok) OR can ask to query the view that contains all mutation up to the point of the query (stale=false). DCP shines especially for the latter type queries as it has the ability to stream changes blazingly fast into the view processor.
Delta Recovery for Faster Rebalance
If you are bringing a node back into the cluster that was failed over due to an issue, you no longer need to rewind back hours or days and resend changes to the node. With the vbucket UUID, sequence number and failover log, DCP has the ability to restart with great granularity from where it left off. If your incoming node has just been absent from the cluster for a few mins, it simply can recover by simply receiving the missing mutations for the recent few mins with delta node recovery option. We have seen 100s improvement in rebalance times with delta node recovery, especially when the data size is massive per node.
Durability Guarantee with “ReplicateTo”
Couchbase Server has built in consistency – you can “read your own write” without issues even when you write mutations to memory. However, many developers need durability guarantees for their applications so they can survive node failures without losing data. Some depend on the classic disk persistence but for better availability and recoverability, many developers in Couchbase choose replication as a guarantee for durability of data. That is done through the ReplicateTo method in the native couchbase SDKs. ReplicateTo ensure the acknowledgement of your mutation to return AFTER it has been replicated to 1 or more other replicas. With DCP’s high-performance streaming replication, you can replicate mutation to a replica up to 180x faster. I have seen latencies within 1-2 ms with “ReplicateTo” in my tests on the cloud. However, these absolute numbers depend on the quality of the infrastructure and HW I am running on so take them with a grain of salt.
Cross Data Center Replication
XDCR – cross data center replication – depends on DCP as well. Streaming replication directly from memory to another data center means we can protect your data better! Imagine a bi directional replication between 2 clusters. If cluster #1 fails, the amount of mutations lost will be determined by the largest replication latency. Fast replication means under a regional disaster such as this one, the data loss exposure is minimized! We have seen 4x improvement in replication latency with XDCR in Couchbase Server 3.0.
XDCR is also prone to network errors and hiccups and it is important to be able to recover from these issues fast and get back to health! DCP metadata allows us to quickly pick up from where it left off under communication failures.
These are just a few examples of how DCP amplifies the capabilities within Couchbase Server. There is a ton more… In fact there are 200 more features in Couchbase Server 3.0 and most are there because of the strong foundation DCP provides.
If you like to dive deeper into the failure handling and mechanics behind DCPs fast presumably, you can view the detailed talk covering DCP at Couchbase Connector see implementation details here at the github site.
nice article. Quick q thought: where is the failover log stored?
Hi failover log is stored per vbucket in the vbucket.