we have a 4 node cluster, with 24 GB RAM, out of which 18GB has been given to couchbase with zero replicaion. We have a approx 10M of records in this cluster with ~2.5M/hour and expire old items. My RAM Usage which is ~72GB is getting full every ~12 days, and i need to restart the cluster to fix this. After restart again the RAM usage is back to ~20GB.
Can someone please help to understand the reason for it.
FYI : Auto Compaction is set to 40% fragment level and Meta Data Purge Interval is set to 1 Day, – which we reduced to do 2 hours. But it didn’t help.
What do you mean by “full”? Couchbase by design will attempt to use as much of the Server Quota as you allocate to it for caching recently-accessed data. You mention you have set an 18GB Server Quota, and 4x that is indeed 72GB.
How much of the Server Quota have you allocated to Buckets? If you’ve allocated all of it, then Couchbase using 72GB to cache data is exactly expected.
Note also that there will be some additional usage over the Server Quota for general cluster management, but 6GB (24GB-18GB) is probably sufficient in most cases.
Currently we have sufficeint free mem avaliable — as we restarted the node 4 days back…
total used free shared buffers cached
Mem: 23934 15908 8026 0 227 4282
-/+ buffers/cache: 11398 12536
Swap: 4095 12 4083
ps -o rss,vsz 19419
the main issue is the difference between the two metrics of couchbase, and we don’t have any replica.
vb_active_itm_memory --> 3G
ep_kv_size —> 10.5G
In few days, this ep_kv_size will reach to 15 G, and then node will die…
MALLOC: 11324902704 (10800.3 MiB) Bytes in use by application
MALLOC: + 1119019008 ( 1067.2 MiB) Bytes in page heap freelist
MALLOC: + 292244776 ( 278.7 MiB) Bytes in central cache freelist
MALLOC: + 8535392 ( 8.1 MiB) Bytes in transfer cache freelist
MALLOC: + 28149832 ( 26.8 MiB) Bytes in thread cache freelists
MALLOC: + 29814944 ( 28.4 MiB) Bytes in malloc metadata
Ok, so this means that Couchbase's kv-engine ( `memcached`) on that one node has requested 10.54GB, and adding on memory allocator overhead it's actually using 11.97GB of RAM. That's not an uncommon / unusual amount of overhead with TCMalloc.
I assume these are from when your cluster is "good" - can you re-post when you're actually seeing the unexpectedly high usage.
Additionally you might want to start testing 4.0 RC - this has a different memory allocator (jemalloc) and defragmenter (on Linux) which should reduce the memory allocator overheads.