RAM is usually the most critical sizing parameter. It's also the one that can have the biggest impact on performance and stability.
Before we can decide how much memory we will need for the cluster, we should understand the concept of a 'working set.' The 'working set' is the data that your application actively uses at any point in time. Ideally you want all your working set to live in memory.
It is very important that your Couchbase cluster's size corresponds to the working set size and total data you expect.
The goal is to size the available RAM to Couchbase so that all your document IDs, the document ID meta data, and the working set values fit. The memory should rest just below the point at which Couchbase will start evicting values to disk (the High Water Mark).
How much memory and disk space per node you will need depends on several different variables, which are defined below:
Calculations are per bucket
The calculations below are per-bucket calculations. The calculations need to be summed up across all buckets. If all your buckets have the same configuration, you can treat your total data as a single bucket. There is no per-bucket overhead that needs to be considered.
Table 4.1. Deployment — Sizing — Input Variables
|documents_num||The total number of documents you expect in your working set|
|ID_size||The average size of document IDs|
|value_size||The average size of values|
|number_of_replicas||The number of copies of the original data you want to keep|
|working_set_percentage||The percentage of your data you want in memory|
|per_node_ram_quota||How much RAM can be assigned to Couchbase|
Use the following items to calculate how much memory you need:
Table 4.2. Deployment — Sizing — Constants
|Metadata per document (metadata_per_document)||This is the amount of memory that Couchbase needs to store metadata per document. Prior to Couchbase 2.1, metadata used 64 bytes. As of Couchbase 2.1, metadata uses 56 bytes. All the metadata needs to live in memory while a node is running and serving data.|
|SSD or Spinning||SSDs give better I/O performance.|
|headroom [a]||Since SSDs are faster than spinning (traditional) hard disks, you should set aside 25% of memory for SSDs and 30% of memory for spinning hard disks.|
|High Water Mark (high_water_mark)||By default, the high water mark for a node's RAM is set at 70%.|
[a] The cluster needs additonal overhead to store metadata. That space is called the headroom. This requires approximately 25-30% more space than the raw RAM requirements for your dataset.
This is a rough guideline to size your cluster:
|Cluster RAM quota required|
|number of nodes|
[a] All the documents need to live in the memory.
You will need at least the number of replicas + 1 nodes regardless of your data size.
Here is a sample sizing calculation:
Table 4.3. Deployment — Sizing — Input Variables
Table 4.4. Deployment — Sizing — Constants
|Type of Storage||SSD|
|metadata_per_document||56 for 2.1, 64 for 2.0.X|
Table 4.5. Deployment — Sizing — Variable Calculations
|no_of_copies||= 2 [a]|
|total_metadata||= 1,000,000 * (100 + 120) * (2) = 440,000,000|
|total_dataset||= 1,000,000 * (10,000) * (2) = 20,000,000,000|
|working_set||= 20,000,000,000 * (0.2) = 4,000,000,000|
|Cluster RAM quota required||= (440,000,000 + 4,000,000,000) * (1+0.25)/(0.7) = 7,928,000,000|
[a] 1 for original and 1 for replica
For example, if you have 8GB machines and you want to use 6 GB for Couchbase...
number of nodes = Cluster RAM quota required/per_node_ram_quota = 7.9 GB/6GB = 1.3 or 2 nodes
You will not be able to allocate all your machine RAM to the per_node_ram_quota as there may be other programs running on your machine.