I am not an expert on EC2, but in a general manner, all nodes of a cluster should be in the same data center. The cluster replication should be as fast as possible.
I do not know if somebody have experience with the topology you describe.
does anybody have any experience with running a couchbase cluster across different zones but in the same region? I.e. is it OK to have a 3-node cluster running in eu-west-1a and 1b and 1c (as opposed to have all 3 in 1b) latency/performance wise. In the guide it says to set them up in different regions which makes sense as latency is high across regions but there is no comment on if there are any across zone issues.
Appreciate any thought on this,
We have exactly same cluster you’ve described. Latency between 2 different zones in eu-west is approximitly 2 times higher than in same zone. In same zone it’s ~0.4 - 0.5 ms, between 2 zones it’s ~0.9 - 1 ms.
Here are some ping stats from our working cluster, so there are some ping spikes up to 50 ms that casue higher average value:
B2A: rtt min/avg/max/mdev = 0.734/4.631/41.800/8.644 ms
B2B-1: rtt min/avg/max/mdev = 0.414/3.701/23.573/6.559 ms
B2B-2: rtt min/avg/max/mdev = 0.337/0.597/4.140/0.667 ms
B2C: rtt min/avg/max/mdev = 0.901/4.489/44.032/9.523 ms
B2B-2 - ping from couchbase cluster node to non-couchbase node in zone b.
Also we use spot instances and in different regions it’s price could be different, so if in one zone price jumps to much higher value, we wouldn’t lose all nodes in same time.
Quick answer: Yes, it’s ok to have a 3-node cluster running in eu-west-1a and 1b and 1c.
Same response as Skellla regarding latency across AZ (in EU at least).
We have had a 3 nodes couchbase cluster with all nodes running in a different AZ (eu 1a-1b-1c) during 3 months and had no problem regarding performance or stability.
This cluster was made of 4 buckets with a few million items;
Each bucket had 1 replica configured, and we had an average 400 Ops on couchbase with peaks up to 5K/Sec.
We had a ~50%/50% get/set rate and the stats we had regarding vb_active_num and vb_replica_num were always giving the same number of items for both of them.
Our application/service could afford this ~500µs latency overhead and we were really happy and confortable with splitting couchbase nodes and application servers across AZ (ELB splitting the http traffic across all appNodes/AZ).
PS : We are still very happy with the flexibility/agility AWS provides and all the brilliant solutions they have, but after we could have some insight about the traffic we would have to handle, the ratio memory/cost on EC2 made us bought some cheap servers full of RAM and find a more standard housing provider.
– Edit : couchbase is “eventually persisted” regarding persistence from memory to disk an from active vBuckets to replica vBuckets.
The way we consider this is : “don’t use couchbase” if you can’t lose any bit of data in any given situation (server crash, power outage, network failure, etc …), because couchbase would have acknowledged the write to your app as soon as it hits the memory and before it is persisted to disk or replicated to an other node
If adding ~500µs to the “eventually persisted” replica is not acceptable, may be couchbase is not a good fit