vBucket: controlling the hashing algorithm
Hi, I have been trying to wrap my head about how vBuckets work, and more importantly how we can manage them.
The hashing function used by Couchbase Server to map document IDs to vBuckets is configurable - both the hashing algorithm and the output space (i.e. the total number of vBuckets output by the function). Naturally, if the number of vBuckets in the output space of the hash function is changed, then the table which maps vBuckets to Servers must be resized.
The good thing is that of course, we do not have to worry where the data goes; however, we are trying to figure out how to change the algorithm so that the data set of a given user will always go only to one of n server internally, instead of being completely spread over a given cluster. This way, we figure that it should help us limitate the potential impact of a node going down, e.g. it will affect a fixed percentage of our users instead of impacting (potentially) all users.
Is there any way this could be done at all, or am I misunderstanding the level of control we have on hashing for vBuckets?
The clients have full control over which vbucket an item goes to. This means you can use any hashing algorithm you like or you can come up with your own scheme. Having your own scheme however would require you to modify our clients since we only support popular hashing algorithms in our clients.