Failover limitations (30 secs, 3 nodes)
Hello. I've read http://www.couchbase.org/wiki/display/membase/Failover+Best+Practices and I'm a bit confused about 30 seconds and 3 nodes limitations. I understand the reason but the limits look too strict and unfortunately they make impossible to use Membase (Couchbase) for our project. Can we expect these limits to be removed in future releases? Is it possible to have some kind of patch to override them in case we buy commercial support?
For some projects the time 30 seconds is pretty good, for others it is not good. I understand that it is so big now, because you are afraid of "false positive" failure detection.
Why don't you just make this parameter configurable? Let end users decide their trade off between the maximum service outage time and the rate of false positive failure detection.
Another limitation - minimum 3 nodes in the cluster. First of all, split brain is quite a rare situation if a cluster is located in one sub network. Especially if DualLAN is used between the hosts. Second, why are you so afraid of inconsistency? Why don't just provide the mechanism of merging two data sets using some conflict resolution policy, like it is done in Oracle TimesTen.
Due to the points above, we think of moving from Membase to the bundle "repcached + keepalived". In the solution "repcached + keepalived" we have pairs of mutually replicated hosts, each pair has one virtual IP, assigned to the first host of each pair. The virtual IP is controlled by keepalived. The application configures its repcached client with a set of "virtual IPs". This solution doesn't have the two limitations above. It has only lack of persitency, which is not a requirement for current project.
(Sorry for splitting a message to two pieces. Otherwise the spam filters would block me)
For some projects the time 30 seconds is pretty good, for others it is not good. I understand that it is so big now, because you are afraid of "false positive" failure detection.
Why don't you just make this parameter configurable? Let end users decide their trade off between the maximum service outage time and the rate of false positive failure detection.
Another limitation - minimum 3 nodes in the cluster. First of all, split brain is quite a rare situation if a cluster is located in one sub network. Especially if DualLAN is used between the hosts. Second, why are you so afraid of inconsistency? Why don't just provide the mechanism of merging two data sets using some conflict resolution policy, like it is done in Oracle TimesTen.
Due to the points above, we think of moving from Membase to the bundle "repcached + keepalived". In the solution "repcached + keepalived" we have pairs of mutually replicated hosts, each pair has one virtual IP, assigned to the first host of each pair. The virtual IP is controlled by keepalived. The application configures its repcached client with a set of "virtual IPs". This solution doesn't have the two limitations above. It has only lack of persitency, which is not a requirement for current project.
Your points are good, and we plan to enhance the auto-failover in future releases.
Look at it from another angle though.
Dual network cards reduce the likelihood of split brain, I agree, but then we really need to tell people to use dual network cards and build for network reliability. Couchbase is trying to make reasonably complex things (clustering, autofailover) simple. Adding more requirements to the system outside the fact that it has a network connection isn't as simple. The repcached deployment you describe isn't as simple. Making a UI widget that allows the failover time tunable is relatively simple. Making sure the admin knows what the consequences of turning it to 2 seconds is before they do so is really, really hard. We've always aimed to keep user data consistent, which makes for an easier system to program against.
Also, three nodes is hardly a large number. The cost structure is usually better by using a larger number of small memory nodes anyway, so I really don't see three nodes as a limitation.
Will we enhance things? Yep. Is it ideal for all situations right now, possibly not. Does it meet our goal of being really simple to set up a highly reliable cluster right now? Absolutely.
The time to declared failure and node limitations are really pretty good and comparable with high-end HA clustered RDBMSs. The three node limitation is there because with a net-split and two nodes, each could become it's own cluster leading to inconsistencies across the whole data set.
You should probably contact us directly over the phone so we can talk through your specific project to see if there's a simple approach. We've worked with a number of projects with very high availability requirements.