How do durable writes affect the performance of the CLUSTER?

Enterprise 6.5

Hi, we are starting to take a closer look at our durability story and I have an important question:

I understand that a durable write affects client side peformance, meaning that a write request will take longer, waiting on the data to persist to the requisite nodes, but I don’t understand how asking for durable writes affects the performance of the cluster itself.

To be clear, I see the server’s response to a durable write request in one of two ways:

  1. the server’s behaviour with respect to the mutation is exactly the same and the only change is that the client will wait on things to run its course
  2. requesting a durable write changes the servers behavior (something akin to forcing a flush to disk)

if it’s number 2, then conceivably requesting many durable writes can completely change (probably for the worse) the cluster writing profile, and doing things like monitoring durable writes speed via diagnostic probes would actually degrade performance?

Can you point me to documents, explaining how in memory data is flushed to disk in normal circumstances so that I can try to reason about how things might change after introducing a good deal of durable write requests?

This is really important to us, and I appreciate any guidance you can offer.

Durable writes do change the underlying server behavior as they are based on a new Synchronous Write protocol. You can think of it as similar to the Raft consensus protocol where a quorum of replicas have to acknowledge the write back.

So net-net the server does do more work for durable writes than regular eventually consistent writes. This results in few areas of additional resource consumption:

  • There are more network messages exchanged between replica nodes to send back acknowledgements to the primary node.

  • There is more CPU consumption to run the Sync Write protocol

  • If you tune the disk_writer threads to increase the number of writer threads (and hence reduce response time for durable writes), you will use more CPU and IO for the same workload than with eventual consistent writes. This is because write batches get smaller.

The following documentation article may be helpful to you:

1 Like