Details
Description
When performing updates against the database, we're able to get throughput of about 160,000 ops/second, with throughput to disk of about 21,000 ops/second (92% cache hit rate).
However, when we try to do either synchronous writes to disk or synchronous replication, the server fails. Either we get 100% error rates, or if we throttle input and add auto-retries, the throughput drops to tens of operations per second. Given the disk is happily doing over 20k ops/sec, we'd expect to be able to process traffic at that volume even with synchronous disk writes.
We're calling the update() method in the Java client with PersistTo.ONE (or ReplicateTo.ONE). This is done through YCSB.
Is this the correct approach? Is there something else we should be doing? What numbers should we be expecting?
Attached is the console screenshot. You'll see that a some of the keys are getting requests at a few thousand per second, but virtually all those requests are failing, causing a true throughput of close to zero.
However, when we try to do either synchronous writes to disk or synchronous replication, the server fails. Either we get 100% error rates, or if we throttle input and add auto-retries, the throughput drops to tens of operations per second. Given the disk is happily doing over 20k ops/sec, we'd expect to be able to process traffic at that volume even with synchronous disk writes.
We're calling the update() method in the Java client with PersistTo.ONE (or ReplicateTo.ONE). This is done through YCSB.
Is this the correct approach? Is there something else we should be doing? What numbers should we be expecting?
Attached is the console screenshot. You'll see that a some of the keys are getting requests at a few thousand per second, but virtually all those requests are failing, causing a true throughput of close to zero.
On four-node cluster, under low load, the writes completes successfully, but the overall rate is hundreds ops/sec (in cases when async operation can do more than 60k writes/sec).
On four-node cluster, under high load, the writes returns the errors, as we said before.
Could you comment while observe polling works so slow? Is this the experimental feature? When it should be used?