Server crash etc.
Hi, I just started playing with Couchbase and I very much like what I see and how simple it is to setup a cluster etc.
I am having problems understanding what happens and what I should do, should one of the nodes crash suddenly without being able to get it up again along with its disk (it is very clear what to do adding/removing nodes with rebalancing, but I found less info on sudden involuntary removal of a node). I read the following:
"Each vbucket is then assigned to given server within the cluster. With 256 vbuckets and one server they would all be on the same server. With 2 servers, they would be shared equally across the two with 128 buckets each, four servers would be assigned 64 each and so on."
Say I have three nodes in a cluster, will the asynchronous disk write, write the same data to the disks on ALL 3 nodes (as it seems they will have different data in RAM)? If yes, I can see that the 2 remaining nodes could serve what node 3 had, just from disk instead of RAM (at least the first request to it I assume).
If no, how can I avoid losing data (we have frequent writes to a disk every second with data we will not want to lose)? Using the replica? Because I also found this:
"When a machine in the cluster has crashed, the leader will detect that and notify member machines in the cluster that all vBuckets hosted in the crashed machine is dead. After getting this signal, machines hosting the corresponding vBucket replica will set the vBucket status as “active”. The vBucket/server map is updated and eventually propagated to the client lib."
But as I read the failover chapter, this would require me to have the auto-failover (or manual, or programmatic) to work and actually use the replica (and I should have one replica per bucket as I read it for 3+ nodes), right? And then later I should ofc. rebalance.
Basically, to sum up, I am looking into what I should do, setup wise, to be able to tolerate node crash without losing any data at all and having the rest of the nodes, instantly, serving everything until a replacement has been launched and rebalanced. Will this happen automatically if I have a regular 3 node cluster with 1 replica, or do I need to failover before the data on the 3rd node can be accessed (and as I read it there are several things to carefully consider before enabling auto failover).
Greatly appreciate any comment on this,
Elfar
>> Say I have three nodes in a cluster, will the asynchronous disk write, write the same data to the disks on ALL 3 nodes (as it seems they will have different data in RAM)?
This depends on the number of replicas you want on a given bucket. If you have one replica then the data will be written on two of the servers (one active, one replica). If you have two replicas then the data will be written to all three of the servers (one active, two replicas).
>> But as I read the failover chapter, this would require me to have the auto-failover (or manual, or programmatic) to work and actually use the replica (and I should have one replica per bucket as I read it for 3+ nodes), right?
No. We originally didn't actually have an auto-failover feature which mean that failing over a server was a manual process. This means if one of your server goes down all data on that server will be unavailable until you physically click the failover button in the web ui. So people didn't like this so we added auto-failover to do this automatically. In short you will always be able to failover to a replica whether or not you have auto-failover.
>> Will this happen automatically if I have a regular 3 node cluster with 1 replica, or do I need to failover before the data on the 3rd node can be accessed?
Data on a crashed node is unavailable until you failover.
>> Basically, to sum up, I am looking into what I should do, setup wise, to be able to tolerate node crash without losing any data at all and having the rest of the nodes, instantly, serving everything until a replacement has been launched and rebalanced.
Here is what I would do. Setup your 3 node cluster and have 2 replicas. I would use two because if one of your servers crashes and you failover there is a window of time where if you have another sever crash you might be in trouble with only one replica. You won't lose data, but you might have some down time.
If you truly need to make sure there is no way anything is lost then I would recommend using our new observe command that has been added in our 2.0 release. This command will allow you to specify higher levels of consistency and durability by allowing your client to wait until data has been replicated or persisted to a given number of nodes. You will have to make a trade offs here between performance and consistency so spend some time finding the right mix.
>> So was the warning popup wrong and comes up because I killed the process instead of closing it properly, or could I actually have risked losing data? And in that case, how can I avoid that from being the case? And will I always have to do a failover before I can access the data again?
The pop up warning might have been because the process crashed before being able to replicate some of the data that was just set on that node. Imagine it got into memory and returned success to the client, but wasn't replicated and wasn't persisted. The observe command will help you with this if that data is important. Can you try shutting down a node cleanly and seeing if the pop up message comes up again? If it does I will forward your question to our cluster management team.
Let me know if you have any other questions.
Cheers, really appreciate your feedback here, very useful.
I think I will start with 2 powerful nodes and do some testing with that setup, killing and stopping and failing over etc. Regarding the warning popup, I doubt that the process crashed before being able to replicate some of the data as I had only inserted like 5-6 elements and it was several minutes before I killed the process on a server that had nothing else running. Will let you know if I see any unexpected behavior during my testing there. Will definitely check out observe command, though with 2 nodes and one replica I guess I should be good, though I guess I might risk, if server A crashes before persisting the data to replica that I might be unlucky to lose something there worst case? (unless I use observe there at some performance cost...)
-elfar
One quick note. Observe will not be available until at least the 2.0 beta expected to be release in the coming months. Also, the only possible way you could lose something is if your server crashed before data could be replicated to another server. Persistence is not required for durability unless your active and replica servers crash.
Thanks for the input, yes was only planning to checking "observe" out, I feel quite comfortable using 1.8 and 2 powerful nodes with one replica to start with, and failover manually. I am already using it in a production setup, but where I fail back to files, as I am not yet comfortable relying 100% on couchbase as I had a few connection failures and timeouts while using get/set, though I suspect it was network/firewall related and not couchbase itself, had some very strange issues in allowing access between AWS EC2 acccounts, but seems to run perfectly now.
If your having timeout issues and they aren't network or firewall related then they are likely either client bugs or just issues with the way you are utilizing the clients. Whatever the issue, definitely post any problems here so we can help you out, or if the problem is in the client, get it fixed.
A quick follow up here describing my concerns about moving to Couchbase. I started a 3 node cluster with 1 replica backup. I then killed the couchbase process running on node number 2, which resulted in that I could not get much of the data any longer:
Warning: Couchbase::get(): Failed to get a value from server: Connection failure
I use the IP of the first node to connect to to the cluster and the PHP SDK as I read it would catch automatically the cluster structure and use that.
Then, I wondered if I would need to do a failover to get this working again, but when I click failover I get this text:
"Attention – There are not replica (backup) copies of all data on this node! Failing over the node now will irrecoverably lose that data when the incomplete replica is activated and this node is removed from the cluster. If the node might come back online, it is recommended to wait. Check this box if you want to failover the node, despite the resulting data loss"
I definitely do not wish to lose data, but I tried accepting since I was just testing, and I actually could access all my keys after doing that. So was the warning popup wrong and comes up because I killed the process instead of closing it properly, or could I actually have risked losing data? And in that case, how can I avoid that from being the case? And will I always have to do a failover before I can access the data again?
Appreciate any feed back, thanks
-elfar