increasing drain rate from the disk queue ?
Is there any way to increase the drain rate from the disk queue ? So for example - I've got several million items in the write queue, and i see it draining at about 25k a second. From the sar data, it looks like the disk utilization of the disk array is pretty low about 15%. So why isn't it draining any faster ?
sar output:
Average: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
Average: dev8-0 14.98 0.00 294.75 19.67 0.00 0.29 0.04 0.07
Average: dev8-16 2795.85 12.23 323854.35 115.84 1.99 0.71 0.05 14.35
Average: dev253-0 35.91 0.00 287.31 8.00 0.02 0.48 0.01 0.05
Average: dev253-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev253-2 40465.81 12.23 323714.55 8.00 131.67 3.25 0.00 14.39
Average: dev253-3 0.93 0.00 7.44 8.00 0.00 0.32 0.11 0.01
On a related note, how do you size RAM to account for the disk queues for documents in memory that are being updated (not new documents) since they go onto the write queue first ?
Assume each document is 1K, if i have 100k updates coming in a second, and the drain rate is 25k - those 75k updates need to be buffered. From the additional memory usage statistics, it does not look like I'm using 75k * 1k of additional memory -- its much less. From my rough calculation seems like for every updated document in the write queue, it looked like about 100 bytes or so ... so obviously the full 1k record is not stored in the write queue, a pointer to the actual data in cache + some housekeeping structures perhaps ? Whats the right way to calculate this ? (from the couchbase sizing guide it did not look like it accounted for disk IOPS, drain rate etc when lots of updates were happening)
Thanks !
Thanks mike. So guess that means there is no way to add additional flusher threads (for example, we can increase the number of couchbase memcached worker threads which defaults to 4) ?
About the sizing, what i trying to understand is if i have updates happening at 100,000 updates a second - and the flusher thread is only removing items at the rate of 25,000 a second ... how much additional memory do i need to allocate in the RAM to handle these update spikes, so that i can still keep my entire working set in memory.
So for example - if I expect this update spike to happen for 60 mins, then I need to allocate:
(100,000 - 25,000) * x bytes * 3600.
.. to handle memory allocation for the write queues.
I'm trying to see what 'x' might be ?
There is not a way to add an additional flusher, but we are in talks to add more writer threads for 2.1 in order to increase disk throughput. Also, since you seem to understand our architecture well we have 4 memcached worker threads per server instance and four dispatcher threads for each bucket. The dispatcher threads consist of a rw thread, two read threads and non-io thread. The memcached worker threads are only responsible for handling incoming requests.
Let me give you another example that will help you understand why you won't likely need a lot of extra memory for all of those updates. Let's imagine you have a single key that you are updating 100,000 times per second. It would seem like you would need a lot of space in Couchbase to hold all of those updates and that if your drain rate was 25,000 items per second that you would essentially incur 75,000 items of overhead every second forever. This of course would be unsustainable for any cluster. In Couchbase however we don't have this problem because we de-duplicate all of our incoming requests before we persist them. For example, let's say the flusher comes and persists all of my items so that my disk write queue is 0. Now if I update a key 5 times before the flusher runs again we will only persist that item once and at all times only one value for that key will be in memory because we de-duplicate.
There is however a scenario where you could end up running into an issue where your server would tell you it was temporarily out of memory. This would be the case if your items took up more space on disk than you had available memory and then you updated every item in the cluster. Depending on your clusters size you could get a tmp_oom error.
So to help you with sizing can you describe your workload?
To improve the drain rate from the disk queue is a must.
I know, 2.0 is just launched and it's amazing, but I wait to 2.1 to see this important improvement.
Just to give an update we are working on this issue as a top priority and are working to get it into a release as soon as possible.
Is the temporary workaround to add more nodes to the cluster? Seems like the per node write queue limit is ~25k entries a second.
That would be one work around. Another thing you can try would be creating a few buckets. Four would be good if possible. This will help because the current architecture creates four dispatcher threads for each bucket you create which will mean one extra write thread per bucket. This might help you get better disk performance too.
My only concern is that it appears that this drain rate directly affects how fast a view is updated. According to the docs:
--------
All views within Couchbase operate as follows:
Views are updated when the document data is persisted to disk. There is a delay between creating or updating the document, and the document being updated within the view.
--------
The thread starter pointed out about a ~25k maximum drain rate. However in comparison, a google search reveals on average MySQL can perform ~28K-~35K writes per second depending on your configuration.
I'm comparing drain rate and write rate because it seems that a view and the index is not completely updated until the document is drained from the write queue and written to disk.
In general clustering Couchbase would in long run outscale any one mysql/postgres instance.
You are correct about the impact of indexing and as I mentioned above we are currently in the process of getting better disk utilization by adding more io threads. We will release a new version with this improvement as well as others as soon as we can. Clustering multiple nodes is one way to work around this problem for the time being.
There is no way to tell Couchbase to write to disk faster. We strive to make Couchbase as close to being IO bound as possible, but due to various factors this doesn't end up being the case and we will continue to work on performance improvements to make disk drain faster.
Before I get into sizing guidelines I want to explain why you don't see a lot of memory being used for updates. If you have a 1k key value pair and you update it with another key value pair then we de-duplicate this key in memory. This means we overwrite the old key value pair and then mark the item dirty so that the net memory used will be unchanged. If you add new items or your items are getting bigger then you will notice more memory overhead until the disk queue drains.
I don't personally deal with sizing, but the guidelines that we post are the same guidelines we recommend to all of our customers. I would recommend starting off with the recommendation and if you find that that they don't seem to be working for you then make the necessary changes. Please let us know if the guidelines are inaccurate since we are always working to make our recommendations to users better.