What happens when a node in the cluster goes down?
One cluster with two nodes running a bucket named test. The bucket type is Couchbase. The bucket is set up to have one replica.
I upload 10K documents to the bucket. The cluster now divides the data across the nodes. Node 1: 5K active / 5K replica Node 2: 5K active / 5K replica.
If you remove one node and rebalance the replica of the still active node moves all documents from replica to active. Now the active node has 10K active and 0K replica. This is perfect behavior.
Now lets say I have both my nodes working with node 1 having 5K active / 5K replica and node 2 having 5K active / 5K replica).
Clients are now polling data from the cluster at lets say a rate of 2K gets per second. Then suddenly node 2 goes down. Now when the clients are polling data they will only get a result when the document is part of the 5K active on node 1. I would expect that the clients should get their data but with a little delay since it now needs to also get data from node one's replica documents.
What happens now (at least when using the C# API) is that when one node goes down half of the data is unavailable to the clients until a rebalance is done. If you have lets say 1M documents you will have about 500K documents unavailable to the clients until rebalance is done.
Shouldn't the cluster always return the same result as long as the data exist in active or replica?
Instead of a rebalance when a node goes down, what you really want to do first is failover. Rebalance will accomplish the same thing in the end, but failover will be much quicker.
When a node goes down, it's portion of the dataset becomes unavailable. All the other nodes will continue serving their data. At some point, there needs to be an explicit trigger to the rest of the cluster to make the replica vbuckets active for those that went down. This trigger can be either manual (by clicking the button), automatic (by using our auto-failover feature) or programmatic (by an external script triggering it). Manual will take as long as it takes for an admin to get to, automatic will take a minimum of 30 seconds (to ensure that the node is actually down) and programmatic can happen as quickly as an external system determines that the node is down. Once triggered, the failover is immediate. http://www.couchbase.org/wiki/display/membase/Failover+Best+Practices
We don't currently allow reading from replicas in order to enforce the very strong consistency that memcached/Membase/Couchbase provide.
Hope that helps explain things.