We experience sync gateway crashes (outofmemory crashes) while running tests on our configuration :
Single Sync Gateway 1.2.1-4 (VM with dual core - RAM 4 Go)
3 nodes CouchBase cluster
Test details :
Insertion of about 5000 documents (of 300Ko) - each doc being associated with its own channel.
=> Memory consumption increases continuously, until the sync gateway craches (by reporting outofmemory error).
SG database configuration is based on default values.
It appears that when changing the rev_cache_size value to ‘1’ (instead of default value ‘5000’), SG memory consumption does not exceed 10% of the available RAM.
(a) Should we consider the SG crashes as "normal’ with respect to our configuration/context ?
(b) Is there any rules/advices in order to tune the rev_cache_size parameter according to a given use case and hardware capabilities?
Given that you avoid memory issues when decreasing the revision cache size, this looks like a straightforward case of the revision cache using all available memory when full, in this case affected by relatively large size of your documents.
The revision cache is an in-memory cache of the most recently accessed document revisions, and is a performance optimization to avoid repeated Couchbase bucket retrieval for these documents. As you reduce the revision cache size you’ll reduce your memory consumption, but typically increase replication latency and potentially CPU usage.
Tuning the revision cache size depends on a few things, including your expected document throughput - one of the key benefits of the revision cache is keeping recently modified documents in memory, as typically clients will be attempting to immediately replicate those documents (particularly connected clients running continuous replications). At minimum you’ll probably want updates to remain in the cache for several seconds, to ensure they get pulled by connected clients before expiring out of the cache.
As a back-of-the-envelope calculation, that means that you probably want your revision cache size to be at least
(target ops/second) * (5 seconds) to minimize Couchbase bucket retrieval. At the end of the day, though, there are enough variables (your document size, channel distribution, your cluster specs, latency between SG and Couchbase Server) that the type of testing you’re doing is the right approach - experiment with tuning the revs_cache_size until you find a level that meets your performance goals (and potentially increasing your hardware if needed).
Thanks for your quick feedback.
There is still a point that does not appear really clear to me.
Deactivating rev cache (rev_cache_size set to 1) leads to a 10% memory consumption.
When activated, expected memory usage should be something like:
10% (basic footprint) + (doc nb * (doc size + cache overhead))
Assuming the cache overhead is not significant with regard to the doc size, we should have:
4096Mo/10 + (5000*300Ko) = 409Mo + 1464 Mo = 1873 Mo
=> why does the SG end by consuming all the available 4Go memory?
As an additional info, we ran the same test on a 8Go VM, with the same result (it ends with a SG crash caused by an OutOfMemory error).
I was wondering the same thing. Initially I’d assumed that the additional overhead of storing document metadata (revision history, channel information) in the revision cache was accounting for hitting the 4GB limit.
That seems less likely if you’re running into the same thing with an 8GB VM, though.
If you’re able to generate a heap profile (some instructions and links here https://github.com/couchbase/sync_gateway/wiki/Profiling), you could post that to a new issue in github for additional investigation.
I created a new Sync Gateway issue on github [issue #1934] (it appears i am not allowed to insert link to the post).
An attached zip file includes 4 heap profiles captured during a new test (with default rev_cache_size value).