A vBucket is defined as the owner of a subset of the key space of a Couchbase cluster. These vBuckets are used to allow information to be distributed effectively across the cluster. The vBucket system is used both for distributing data, and for supporting replicas (copies of bucket data) on more than one node.
Clients access the information stored in a bucket by communicating directly with the node response for the corresponding vBucket. This direct access enables clients to communicate with the node storing the data, rather than using a proxy or redistribution architecture. The result is abstracting the physical toplogy from the logical partitioning of data. This architecture is what gives Coucbase Server the elasticity.
This architecture differs from the method used by memcached, which uses client-side key hashes to determine the server from a defined list. This requires active management of the list of servers, and specific hashing algorithms such as Ketama to cope with changes to the topology. The structure is also more flexible and able to cope with changes than the typical sharding arrangement used in an RDBMS environment.
vBuckets are not a user-accessible component, but they are a critical component of Couchbase Server and are vital to the availability support and the elastic nature.
Every document ID belongs to a vBucket. A mapping function is used to calculate the vBucket in which a given document belongs. In Couchbase Server, that mapping function is a hashing function that takes a document ID as input and outputs a vBucket identifier. Once the vBucket identifier has been computed, a table is consulted to lookup the server that "hosts" that vBucket. The table contains one row per vBucket, pairing the vBucket to its hosting server. A server appearing in this table can be (and usually is) responsible for multiple vBuckets.
The diagram below shows how the Key to Server mapping (vBucket
map) works. There are three servers in the cluster. A client
wants to look up (get) the value of KEY. The
client first hashes the key to calculate the vBucket which owns
KEY. In this example, the hash resolves to vBucket 8
(vB8) By examining the vBucket map, the
client determines Server C hosts vB8. The get
operation is sent directly to Server C.
After some period of time, there is a need to add a server to the cluster. A new node, Server D is added to the cluster and the vBucket Map is updated.
The vBucket map is updated during the rebalance operation; the updated map is then sent the cluster to all the cluster participants, including the other nodes, any connected "smart" clients, and the Moxi proxy service.
Within the new four-node cluster model, when a client again
wants to get the value of KEY, the hashing
algorithm will still resolve to vBucket 8
(vB8). The new vBucket map however now maps
that vBucket to Server D. The client now communicates directly
with Server D to obtain the information.