Search:

Search all manuals
Search this manual
Manual
Membase Manual 1.7
Additional Resources
Community Wiki
Community Forums
Couchbase SDKs
Parent Section
4.5 Membase Best Practices
Chapter Sections
Chapters

4.5.1. Sizing Guidelines

Working Set

Before we can decide how much memory will we need for the cluster, we should understand the concept of a 'working set'. The 'working set' at any point of time is the data that your application actively uses. Ideally you would want all your working set to live in memory.

Memory quota

It is very important that a Membase cluster is sized in accordance with the working set size and total data you expect.

The goal is to size the RAM available to Membase so that all your keys, the key meta data, along with the working set values fit into memory in your cluster, just below the point at which Membase will start evicting values to disk (the High Water Mark).

How much memory and disk space per node you will need depends on several different variables, defined below.

Calculations are per bucket

Calculations below are per bucket calculations. The calculations need to be summed up across all buckets. If all your buckets have the same configuration, you can treat your total data as a single bucket, there is no per-bucket overhead that needs to be considered.

Inputs

Table 4.11. Input Variables

VariableDescription
keys_numThe total number of keys you expect in your working set
key_sizeThe average size of keys
value_sizeThe average size of values
number_of_replicasnumber of copies of the original data you want to keep
working_set_percentageThe percentage of your data you want in memory.
per_node_ram_quotaHow much RAM can be assigned to Membase

Constants

The following are the items that are used in calculating memory required and are assumed constants.

Table 4.12. Constants

ConstantDescription
Meta data per key (metadata_per_key )This is the space that Membase needs to keep metadata per key. It is 120 bytes. All the keys and their metadata need to live in memory at all times
SSD or SpinningSSDs give better I/O performance.
headroom_percentagetypically 25% for SSD and 30% for Spinning as SSD are faster than Spinning Disks
High Water Mark percentage (high_water_mark_percentage)by default it is set at 70% of memory allocated to the node

The Working Set Size is the percentage of total data you want in-memory. This is a rough guideline to size your cluster:

Table 4.13. Variables

VariableCalculationComments
no_of_copies= 1 + number_of_replicas 
total_metadata= (keys_num) * (metadata_per_key+key_size) * (no_of_copies)All the keys need to live in the memory
total_dataset= (keys_num) * (value_size) * (no_of_copies) 
working_set= total_dataset * (working_set_percentage) 
Cluster RAM quota required= (total_metadata + working_set) * (1+headroom_percentage)/(high_water_mark_percentage) 
number of nodes=Cluster RAM quota required/per_node_ram_quota 

Figure 4.19. REPLACE

REPLACE

Number of Nodes You will need at least the number of replicas + 1 nodes irrespective of your data size.

Example sizing calculation

Table 4.14. Input Variables

Input Variablevalue
keys_num1000,000
key_size100
value_size10000
number_of_replicas1
working_set_percentage20%

Table 4.15. Constants

Constantsvalue
Type of StorageSSD
overhead_percentage25%
metadata_per_key120
high_water_mark70%

Table 4.16. Variable Calculations

VariableCalculationDescription
no_of_copies= 21 for original and 1 for replica
total_metadata= 100,0000 * (100 + 120) * (2) = 440,000,000 
total_dataset= 100,0000 * (10000) * (2) = 20,000,000,000 
working_set= 20,000,000,000 * (0.2) = 4,000,000,000 
Cluster RAM quota required= (440,000,000 + 4000,000,000) * (1+0.25)/(0.7) = 7928,000,000 

if you have 8GB machines and you want to use 6 GB for Membase:

number of nodes = Cluster RAM quota required/per_node_ram_quota = 7.9 GB/6GB = 1.3 or 2 nodes

RAM quota

You will not be able to allocate all your machine RAM to the per_node_ram_quota as there maybe other programs running on your machine.

Disk space

Disk space is required to persist data. How much disk space you should plan for is dependent on how your data grows. You will also want to store backup data on the system. A good guideline is to plan for at least 130% of the total data you expect. 100% of this is for data backup and 30% for overhead during file maintenance.