Best practice around using Membase on Amazon EC2
We're looking to deploy a 4-6 node Membase cluster on Amazon EC2 and would like to know what are the best practices and guidelines beyond what's mentioned in the Using Membase in the Cloud article?
Use of EBS Volumes
We're using Windows Server 2008 for the nodes which from what I understand is backed and run directly from an attached EBS volume, so am I correct to assume that we won't need to create dedicated EBS volumes in order to persist the data beyond the lifetime of the instances (so long that we don't terminate the instances manually or allow the auto-scaler to scale down the cluster, both of which will delete the attached EBS volume)?
Determining the number of replicas
Is there a recommended setting for the number of replicas?
If I set the number of replicas to 3 for each of say, 10 membase buckets in a 6 node cluster, how will those replicas be distributed amongst the 6 available nodes seeing as there's no way to select the replica nodes from the web console?
Will the same nodes hold the replica for each of the 10 replicated buckets or will each bucket have a different set of replica nodes?
If one of the nodes holding the replica for one of the buckets is removed from the cluster, will another node take over the responsibility of holding the replica and therefore maintaining the number of replicas in the cluster? Or will the number of replica for that bucket be reduced by 1?
Conversely, what happens when the removed node is added back into the cluster?
Is it possible to increase the number of replica for an existing bucket?
Removing a node
Assuming that all our buckets are membase buckets, persisted and replicated across multiple nodes, based on what I read on another thread, in order to remove a node from the cluster it's best to 'remove' the server as opposed to just 'fail over' because it allows any pending replication to complete.
Is there anything else we should be aware of when scaling down a cluster?
Backing up and Restore
I understand that right now there's no easy way to create a backup of the cluster beyond copying all the .sqlite files from all the nodes in the cluster, but if all the buckets are replicated, does it not mean that I just need to take one copy of the data for each replicated bucket?
Sorry this ended up being such a long question and many thanks in advance for any insight you might be able to offer!