couchbase stops ejecting keys to disk, hits quota, rejects new creates
I have a problem where a single-node couchbase 1.8.1 cluster works well for some time. For example, under a load testing scenario I ran for about an hour at 250-400 creates per second. The disk write queue kept up well. However, as the bucket began to use up it's RAM quota it became slower. At some points the RAM usage went up and down, hovering around the high water mark.
However, eventually it exceeded the high water mark and never backed down. The disk write queue went to zero. Evictions cycled from 0 to 600 over the length of the test, and then went to zero (where it stayed.) Temp OOM errors went high. Our software is configured to perform exponential backoff as per the guidance found on your site. After a certain wait it will discard the write attempts.
When load was removed, the memory stayed high. It very, very slowly started to drain. After 20 minutes ejections per second went back up, on the order of several hundred per second. This dropped off until it was on the order of a few dozen, then less than 25 per second, and finally fell to zero again.
I'm trying to understand why couchbase hits this wall. It doesn't appear to be I/O throughput, and there's very little reading of data going on during the load test. The system itself is able to cope with the load very well, and Couchbase appears to keep up with the persistence load fine. It simply fails to eject keys at a fast enough rate at some point, and then appears to stop even trying.
We are using it to capture real-time trace information, and it's very important that couchbase be able to drain it's bucket in a timely enough manner that we don't lose large sections of trace data.
Any guidance would be appreciated.