RAM is usually the most critical sizing parameter. It's also the one that can have the biggest impact on performance and stability.
Before we can decide how much memory will we need for the cluster, we should understand the concept of a 'working set'. The 'working set' at any point of time is the data that your application actively uses. Ideally you would want all your working set to live in memory.
It is very important that a Couchbase cluster is sized in accordance with the working set size and total data you expect.
The goal is to size the RAM available to Couchbase so that all your document IDs, the document ID meta data, along with the working set values fit into memory in your cluster, just below the point at which Couchbase will start evicting values to disk (the High Water Mark).
How much memory and disk space per node you will need depends on several different variables, defined below.
Calculations are per bucket
Calculations below are per bucket calculations. The calculations need to be summed up across all buckets. If all your buckets have the same configuration, you can treat your total data as a single bucket, there is no per-bucket overhead that needs to be considered.
Table 4.1. Deployment — Sizing — Input Variables
| Variable | Description |
|---|---|
| documents_num | The total number of documents you expect in your working set |
| ID_size | The average size of document IDs |
| value_size | The average size of values |
| number_of_replicas | number of copies of the original data you want to keep |
| working_set_percentage | The percentage of your data you want in memory. |
| per_node_ram_quota | How much RAM can be assigned to Couchbase |
The following are the items that are used in calculating memory required and are assumed to be constants.
Table 4.2. Deployment — Sizing — Constants
| Constant | Description |
|---|---|
| Meta data per document (metadata_per_document) | This is the space that Couchbase needs to keep metadata per document. Prior to 2.0.2, it is 64 bytes. As of Couchbase 2.0.2 metadata uses 56 bytes of memory. All the metadata for documents needs to live in memory while a node is running and serving data. |
| SSD or Spinning | SSDs give better I/O performance. |
| headroom [a] | Typically 25% (0.25) for SSD and 30% (0.30) for spinning (traditional) hard disks as SSD are faster than spinning disks. |
| High Water Mark (high_water_mark) | By default it is set at 70% of memory allocated to the node |
[a] The headroom is the additional overhead required by the cluster to store metadata about the information being stored. This requires approximately 25-30% more space than the raw RAM requirements for your dataset. | |
This is a rough guideline to size your cluster:
| Variable | Calculation |
|---|---|
| no_of_copies | 1 + number_of_replicas |
| total_metadata [a] | (documents_num) * (metadata_per_document + ID_size) *
(no_of_copies) |
| total_dataset | (documents_num) * (value_size) * (no_of_copies) |
| working_set | total_dataset * (working_set_percentage) |
| Cluster RAM quota required | (total_metadata + working_set) * (1 + headroom) /
(high_water_mark) |
| number of nodes | Cluster RAM quota required / per_node_ram_quota |
[a] All the documents need to live in the memory | |
You will need at least the number of replicas + 1 nodes irrespective of your data size.
Example sizing calculation
Table 4.3. Deployment — Sizing — Input Variables
| Input Variable | value |
|---|---|
| documents_num | 1,000,000 |
| ID_size | 100 |
| value_size | 10,000 |
| number_of_replicas | 1 |
| working_set_percentage | 20% |
Table 4.4. Deployment — Sizing — Constants
| Constants | value |
|---|---|
| Type of Storage | SSD |
| overhead_percentage | 25% |
| metadata_per_document | 120 |
| high_water_mark | 70% |
Table 4.5. Deployment — Sizing — Variable Calculations
| Variable | Calculation |
|---|---|
| no_of_copies | = 2 [a] |
| total_metadata | = 1,000,000 * (100 + 120) * (2) = 440,000,000 |
| total_dataset | = 1,000,000 * (10,000) * (2) = 20,000,000,000 |
| working_set | = 20,000,000,000 * (0.2) = 4,000,000,000 |
| Cluster RAM quota required | = (440,000,000 + 4,000,000,000) * (1+0.25)/(0.7) = 7,928,000,000 |
[a] 1 for original and 1 for replica | |
For example, if you have 8GB machines and you want to use 6 GB for Couchbase:
number of nodes = Cluster RAM quota required/per_node_ram_quota = 7.9 GB/6GB = 1.3 or 2 nodes
RAM quota
You will not be able to allocate all your machine RAM to the per_node_ram_quota as there maybe other programs running on your machine.