Append seems to replicate the whole doc across nodes

zfedor · January 2, 2016, 3:07am

I am doing lots of small appends to Couchbase documents (binary) until they don’t reach the size of about 50Kb and then I start a new document. Reaching the 50Kb takes about 5000 appends to the given document.
Looking at the usage stats, as the document is growing and getting closer to 50Kb the replication between the Couchbase nodes start taking up more and more bandwidth and when the process rolls over the document and starts writing into a new document and appending to that then the bandwidth usage drops and starts creeping up again as the document’s size grows.

This tells me that at each append the whole document is being replicated across the nodes (that seems the be the only explanation why the same append takes up more and more bandwidth as the document size grows).
This sounds like a bad design - making appends operations very non-optimal (having the whole document replicated at each append).

Would be my understanding incorrect? Any way to get around this, to make this more efficient?

Thanks

cihangirb · January 2, 2016, 5:07am

you are right. with existing versions of couchbase server, we do replicate the full document. There is obvious benefit in network efficiency to replicate less data. However it isn’t free to be smart about replicated differences between full vs delta changes between all sorts of updates. My question is, would be ok trading network efficiency for higher latency? or higher cpu utilization?

zfedor · January 2, 2016, 2:45pm

Thanks for confirming, that is what I thought.

Yes, I understand that there could be some tradeoffs, although in an append-only document type that might not necessarily be the case. For example you can take a look at how Apache Kafka runs an append-only data structure to keep appending to a “topic” in a distributed cluster without rewriting the whole topic at each append on each cluster causing high I/O and high inter-node network traffic.

In either case, I am not saying that Couchbase necessarily needs to implement what Kafka is doing, as for that we can just use Kafka. I was just looking for a confirmation for the behavior I was seeing, so I can design around it (by making smaller, “caching” appending docs which would then roll up to bigger “storage” append docs, which would get append to in bulk).

Again, thanks for confirming it! Now I know how to work around this.

Ps.: Maybe it would be beneficial for others to mention this in the “raw append” part of the documentation (http://developer.couchbase.com/documentation/server/4.0/developer-guide/raw-append-prepend.html), which is even mentioning the raw append to store logs in Couchbase documents and considers this “efficient”. Maybe a note here saying that this is only “efficient” for the client (nor read of the whole doc and then write), not necessarily for the nodes (where there is a whole doc read and write). There is a note about increasing document size here, but making the nodes’ I/O and network traffic penalty clear might help others to understand it better too.

Topic		Replies	Views
Append / performance Couchbase Server	0	1259	May 20, 2016
Couchbase Sync Gateway Data Usage Couchbase Lite	4	100	September 5, 2024
Is it possible to scale single document using couchbase? Couchbase Server	3	1854	October 13, 2014
CBlite size, bandwidth and replication considerations for 1M docs, 100.000 of them updated every 10 sec Mobile	38	6437	November 20, 2015
Document not being replicated Couchbase Server	6	1165	October 18, 2018

Append seems to replicate the whole doc across nodes

Related topics