Fast failover is one of the many improvements that come with the release of Couchbase Server 5.0 (now available for download).
Failover is one of the important concepts to understand when it comes to distributed databases. The CAP theorem states that a distributed database can’t be both available and consistent all of the time. Couchbase Server’s architecture is designed to be always consistent, and partition tolerate. With fast failover, Couchbase Server is closing the gap on high availability.
In this blog post, I’m going to demonstrate failover in action. I’ll be using Docker to create a cluster of 3 Couchbase nodes on my local machine.
You can follow along with the code sample in this blog post: it is available on GitHub.
Fast failover overview
You’ll need a bit of setup and preparation.
First, create a 3-node (at least) Couchbase Server cluster. There are a number of ways to do this, including Vagrants, Virtual Machines, actual machines, Azure, and more.
I chose to use Docker. I blogged about how to create a Couchbase Cluster on Docker and access it with a .NET Core application (don’t forget the bridge network!). So, I just followed those same instructions again. The only difference is that I used a console application instead of an ASP.NET application (which you can read more about later in this post).
I used the Couchbase Server 5.0.0-beta2 image from Docker Hub, but by the time you read this, an official release of Couchbase Server 5.0 should be available on the official docker Couchbase repository.
Next, I created a bucket called “mybucket”. Make sure to enable replicas to create additional cop(ies) of data within the same cluster.
After that, create a user (I called mine “myuser”) with at least Data Writer and Data Reader permission for “mybucket”). If you aren’t familiar yet with the Couchbase Server Role-based Access Control (RBAC), start with this blog post on Authentication with RBAC and .NET.
Finally, turn on automatic fast failover. From the Couchbase Console, go to Settings, and then Auto-Failover. Check the box to “Enable auto-failover”. As of version 5.0, you can set the Timeout value to as low as 5 (seconds). Previously, the value had to be at least 30 seconds.
There is a reason that auto-failover is off by default. Please review the full documentation on automatic failover to make sure that it’s a right fit for you.
.NET Example
Now that you have a 3-node cluster running inside of your Docker host, it’s time to write a demonstration application. I decided to write a console application that would continuously perform reads against Couchbase. At some point, I will “pull the plug” on one of the nodes to show automatic fast failover in action.
Connecting to the cluster
After creating a new .NET Core console application in Visual Studio, I added the Couchbase .NET SDK (currently version 2.5.1) using NuGet.
Then, I created a configuration to connect to the 3-node cluster, authenticate to “myuser”, and open up “mybucket”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
var clientConfig = new ClientConfiguration { Servers = new List<Uri> { new Uri("http://172.17.0.2"), new Uri("http://172.17.0.3"), new Uri("http://172.17.0.4") } }; var cluster = new Cluster(clientConfig); var credentials = new PasswordAuthenticator("myuser", "password"); cluster.Authenticate(credentials); _bucket = cluster.OpenBucket("mybucket"); |
Those IP addresses are the addresses that are internal to the Docker host. This .NET Core application will also be running inside the Docker host, where those IP addresses will resolve. From outside the docker host, only “localhost:8091” will resolve (assuming you are following the tutorial I linked to earlier). If you are not using Docker, put in the IP addresses of the Azure machines, the VMs, etc, instead.
Next, PasswordAuthentication
is used to ensure bucket access.
Finally, get a bucket object using OpenBucket
.
Setting up documents
For this demonstration, I want to setup a bunch of documents that I will later be reading from, repeatedly. First, I wrote a loop to create some arbitrary number of documents, that each have a key like “documentKey[num]” (e.g. “documentKey1”, “documentKey2”, etc).
1 2 3 4 5 6 7 |
var docKeys = new List<string>(); for (var i = 0; i < numDocuments; i++) { var key = "documentKey" + i; docKeys.Add(key); _bucket.Upsert(key, new { name = "Document" + i }); } |
In my code, numDocuments
is set to 50. But if you are following along, feel free to set it to another number and see what happens.
Reading documents
Therefore, there are 50 documents with well-known keys. The rest of the program will be continuously looping. Each loop iteration will attempt to retreive all 50 documents.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
var iteration = 0; while (true) { Console.WriteLine($"Getting {numDocuments} documents [{iteration++}]"); foreach(var docKey in docKeys) { var result = _bucket.Get<dynamic>(docKey); if(terse) ShowResultTerse(result, docKey); else ShowResult(result, docKey); } Console.WriteLine(); Thread.Sleep(2000); } |
First, notice that there’s a loop within the loop. The inner loop will run 50 times to perform a Get
on each document. ShowResult
will then output what’s going on to the console (ShowResultTerse
does the same thing, just in a much more compact fashion. ShowResult
is below, but later screenshots will be using ShowResultTerse
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
private static void ShowResult(IOperationResult<dynamic> result, string id) { // happy path, document was found if (result.Success) { Console.WriteLine("Result: success"); return; } // error, possibly node down // show error, try to get replica Console.WriteLine($"Result: unsuccessful {result.Message}"); Console.WriteLine("\tAttempting to get replica."); var replica = _bucket.GetFromReplica<dynamic>(id); // happy path for replica, it was found if (replica.Success) { Console.WriteLine("\tReplica result: success"); return; } // error! replication may not be configured // or it's possible something catastrophic happened // this should be rare, but definitely want to log it // maybe retry and/or escalate // in this example, it's just logged to console Console.WriteLine("\tReplica result unsuccessful: {result.Message}"); } |
The comments will help you follow along, but ShowResult
does three checks:
- Was the read successful? If so, output that. Done! Otherwise…
- Try to get a replica (from another node). Was THAT successful? If so, output that. Done! Otherwise…
- The application was unable to read the document or one of its replicas. In this example, that’s going to be very rare. In reality, it could mean that the document doesn’t exist, or replication isn’t configured correctly, or something else has gone wrong.
So, you’re ready to run the application. If you’re using Docker, don’t forget to run this application in Docker (which is easy to do from Visual Studio). (Also make sure to connect the .NET Core application container to the Docker bridge network).
Pull the plug!
Before pulling the plug on one of the nodes, let’s take a look at what the “normal” output is when running the above .NET Core application.
In the below GIF, you’ll see:
- A three node Couchbase Server cluster
- Switch over to Visual Studio
- Build and start the Docker container with CTRL+F5
- The (terse) console output of the Docker container
(I’ve sped up the animation a bit). Notice that “S” is being shown 50 times. This means that each document was (s)uccessfully retrieved.
Next, let’s show fast failover in action. I’m going to “pull the plug” on one of the nodes. With Docker, I can execute docker stop db2
, for example.
There is a lot to keep track of at one time, so I’ve created a short video that demonstrates what’s going on.
[youtube https://www.youtube.com/watch?v=KbU5eG2R9XU&w=700&h=394]
What you’re seeing in that video is:
- Normal operation (all “S” for success)
- A node being stopped (with Docker)
- Couchbase detecting a node being down.
- Couchbase initiating fast failover to activate replicas.
- During that failover period, it’s no longer all “S”. There are some “R” for replicas (which are read only) in there too.
- When the failover is complete, the results go back to all “S” again.
The goal of fast failover is to reduce the period of time where not all documents are entirely available.
Summary
Couchbase Server 5.0 has improved failover with a “fast failover” option that can be useful for environments with solid networking in place.
This blog post shows off a console app that’s meant to demonstrate fast failover. It’s not a very useful app outside of that, but you can take the principles and apply them to an ASP.NET or ASP.NET Core website.
Check out Couchbase Server 5.0 today for this and other great new features.
Special thanks to Jeff Morris and the SDK team for helping out with this blog post!
Here are some links for more information on fast failover:
- The SDK RFC for fast failover. This document covers .NET, but also Java, libcouchbase, and Go.
- Automatic failover documentation for Couchbase Server.
- Source code for the .NET Core console app used in this blog post (GitHub)
- Two JIRA tickets for .NET fast failover: NCBC-1366 and NCBC-1388.
If you have questions or comments on failover, make sure to check out the Couchbase forums.
Please leave your questions and comments on all things .NET and Couchbase or find me on Twitter @mgroves.