No matter what we work on and what system we are operating, we always need to have the right set of tools to get the job done. Airplane pilots need to have the right set of visual metrics in the cockpit – displays for altitude, direction heading, speed, fuel flow, etc to monitor and control the aircraft.
What do all the controls in an airline cockpit do ?
Talking about databases, it is important to have visibility into all aspects of the database – enabling you to configure, monitor, diagnose and fix any problems. The Couchbase Server admin web console in Couchbase Server (shown in the figure below) provides a centralized view of the cluster metrics across the cluster. The web console also enables you to drill-down on the metrics so you get an idea of how well a particular server in the cluster is functioning or if there are any areas that need attention.
Here are some useful links for monitoring a Couchbase cluster:
- Cluster overview stats gives you a quick overview of your cluster health, including RAM, disk usage and activity:
- Individual bucket stats shows additional detailed information per bucket.
Some of the key stats to monitor include “operations per second”, “resident item ratio”, “cache miss ratio” and “disk write queue”.
The ops per second tell you the overall data throughput on your cluster. You can drill down and see the load each server is handling as well.
The resident item ratio shows the total number of active documents that reside in memory. Typically you want your working set,(actively accessed documents) to be in memory for low latencies and an awesome user experience.
The cache miss ratio shows the percentage of reads per second to this bucket which need to be served from disk rather than RAM. If you have a low resident item ratio and a fairly high cache miss ratio, adding more nodes or allocating more RAM to the bucket may be needed based on your latency requirements.
The disk write queue shows the number of items that have mutated in memory but have not been persisted to disk yet. If your disk writes queues are very high (millions of items) your cluster may not be sized accurately
- vBucket stats provide information for all virtual bucket or shared in the cluster. By default Couchbase always 1024 shards and these are distributed across the cluster.
- Couchbase Server users disk queues to manage items that are in RAM and are waiting to be persisted to disk. Disk queue stats displays information for data being placed into the disk queue.
The Couchbase Server admin console allows you to drill-down and get metrics for a particular server node. For example, in the figure below, the graph shows, the total number of items in the disk queue on the particular server node (nirvana.server.2)
- Couchbase Server uses TAP queues for replication and rebalancing. The TAP queues statistics shows you information about the TAP queue activity:
- Outgoing XDCR stats shows the cross datacenter replication operations between the current cluster (which is the source cluster) to a destination cluster:
- Incoming XDCR stats shows the cross datacenter replication requests that arrive at the current cluster from a remote cluster.
- View stats shows information about individual view design documents within a selected bucket. The view design documents store mapreduce functions used to index and query data in Couchbase Server :
- For memcached buckets a separate suite of memcached specific statistics are captured. This helps you understand the utilization rates for RAM-based storage.
These metrics are also available through the REST API for integration with external monitoring systems.
And indeed, just as an airplane has a warning system, Couchbase Server can notify and alert you so that you can check to ensure the health of your Couchbase Server cluster . Some of them include:
- IP Address Changes If the IP address of a Couchbase Server in your cluster changes, you will be warned that the address is no longer available. You should check the IP address on the server, and update your clients or server configuration.
- Metadata Overhead Indicates that a bucket is now using more than 50% of the allocated RAM for storing metadata and keys, reducing the amount of RAM available for data values. This is a helpful indicator that you may need to add nodes to your cluster.
- Disk Usage Indicates that the available disk space used for persistent storage has reached at least 90% of capacity. This is a signal that you may need to add more disks to your cluster.
So hope this information gets you started with how you can maneuver Couchbase Server, test drive it on staging and go full throttle in production.