Server crash etc.
Hi, I just started playing with Couchbase and I very much like what I see and how simple it is to setup a cluster etc.
I am having problems understanding what happens and what I should do, should one of the nodes crash suddenly without being able to get it up again along with its disk (it is very clear what to do adding/removing nodes with rebalancing, but I found less info on sudden involuntary removal of a node). I read the following:
"Each vbucket is then assigned to given server within the cluster. With 256 vbuckets and one server they would all be on the same server. With 2 servers, they would be shared equally across the two with 128 buckets each, four servers would be assigned 64 each and so on."
Say I have three nodes in a cluster, will the asynchronous disk write, write the same data to the disks on ALL 3 nodes (as it seems they will have different data in RAM)? If yes, I can see that the 2 remaining nodes could serve what node 3 had, just from disk instead of RAM (at least the first request to it I assume).
If no, how can I avoid losing data (we have frequent writes to a disk every second with data we will not want to lose)? Using the replica? Because I also found this:
"When a machine in the cluster has crashed, the leader will detect that and notify member machines in the cluster that all vBuckets hosted in the crashed machine is dead. After getting this signal, machines hosting the corresponding vBucket replica will set the vBucket status as “active”. The vBucket/server map is updated and eventually propagated to the client lib."
But as I read the failover chapter, this would require me to have the auto-failover (or manual, or programmatic) to work and actually use the replica (and I should have one replica per bucket as I read it for 3+ nodes), right? And then later I should ofc. rebalance.
Basically, to sum up, I am looking into what I should do, setup wise, to be able to tolerate node crash without losing any data at all and having the rest of the nodes, instantly, serving everything until a replacement has been launched and rebalanced. Will this happen automatically if I have a regular 3 node cluster with 1 replica, or do I need to failover before the data on the 3rd node can be accessed (and as I read it there are several things to carefully consider before enabling auto failover).
Greatly appreciate any comment on this,