In an internal discussion around some high write volumes we have coming down the pipe, an interesting idea came up. As we currently understand the replication process, a document is sent from the client SDK to whatever node is hashes to in the cluster map. The document sits in the write-queue of that node until it is replicated to other node(s) and is flushed to disk.
In a way, this creates a single point of failure for a short period of time while the document sits in write-queue of the primary node. Has the idea ever been floated to have the client SDK be responsible for writing the replica data as well? That would give you the performance of not waiting for a persistTo or replicateTo call, with the enhanced durability of knowing your primary & replica data are written independently.
So basically this would break the C (consistency) in the CAP pair Couchbase chooses (C/P)
The issue is you’re essentially trusting the client to write the same thing to both active and replica(s), and more to the point if a failure occurs the replica operation completes if-and-only-if the active has already completed.
There are systems which do this - what you’re talking about is essentially quorum writes - and they do have greater availability - at the cost of some level of consistency; or you have to wait for all updates to occur.
The model chosen by Couchbase isn’t really trying to prevent a single-point of failure on a document level - at least not a temporary failure - as we explicitly designate a single node as the “owner” of every key. However in the event of that node failing, it can be failed over to designate a new owner for that key (aka we promote a replica).
Thanks drigby, that is helpful. One question my developer had - He said the documentation wasn’t super clear on the different between persistTo and replicateTo options.
They both seem to involve waiting until a write is complete on multiple nodes, but he wasn’t clear on whether they were both waiting until the document was flushed to disk or not. The question being: does one of the options only wait until the document is copied to the write-queue of the specified nodes (faster, in-memory redundancy), and the other waits until the document is actually flushed to disk on the specified nodes (slower, but safest because it’s persisted to disk)?