To fully understand how your cluster is working, and whether it is working effectively, there are a number of different statistics that you should monitor to diagnose and identify problems. Some of these key statistics include:
The 'watermark' determines when it is necessary to start freeing up available memory. Read more about this concept here. Some important statistics related to water marks are:
High WaterMark (ep_mem_high_wat)
The system will start ejecting values out of memory when this watermark is met. Ejected values need to be fetched from disk when accessed before being returned to the client.
Low WaterMark (ep_mem_low_wat)
The system does not do anything when this watermark is reached but this is the 'goal' of the system when it starts ejecting data as a result of high watermark being met.
Memory Used (mem_used)
The current size of memory used. If mem_used hits the RAM
quota then you will get OOM_ERROR. The
mem_used must be less than
ep_mem_high_wat which is the mark at which
data is ejected from the disk.
Disk Write Queue Size (ep_queue_size)
The size of the queue that has data waiting to be written to the disk.
Cache Hits (get_hits)
The rule of thumb is that this should be at least 90% of the total requests.
Cache Misses (get_misses)
Ideally this should be low, and certainly lower than
get_hits. Increasing or high values mean
that data that your application expects to be stored is not in
memory.
You can find values for these important stats with the following command:
shell> cbstats IP:11210 all | \
egrep "todo|ep_queue_size|_eject|mem|max_data|hits|misses"This will output the following statistics:
ep_flusher_todo: ep_max_data_size: ep_mem_high_wat: ep_mem_low_wat: ep_num_eject_failures: ep_num_value_ejects: ep_queue_size: mem_used: get_misses: get_hits:
Make sure that you monitor the disk space, CPU usage and swapping on all your nodes, using the standard monitoring tools.