Distributed Multi-Document ACID Transactions

ACID Transactions are a must when you have strict data consistency requirements in your application. The costs of running transactions on distributed systems can rapidly create bottlenecks at scale. In this article, we will give you an overview of some of the challenges faced by NoSQL and NewSQL databases. Then, we’ll deep dive into how Couchbase implemented a scalable distributed transaction model with no central coordination and no single point of failure. Additionally, I will also give a short overview of how the support for transactions in N1QL looks like on Couchbase 7.0.

Some minor details were omitted for the sake of simplicity.

Relational vs NewSQL vs NoSQL Transactions

Before I start explaining how Couchbase implemented support for transactions, I need to first explain the inherent characteristics of atomicity in relational and NoSQL databases (using semi-structured data models like JSON):

Atomicity in RDBMS

Let’s say you need to save a new user in the database. Naturally, as it has many other tables associated with it, inserting a user will also require inserts on many other tables:

Transactions on Relational

Because the relational model forces you to store everything on “boxes” and split your data into small pieces, adding a new user should always run inside of a transactional context. Otherwise, if one of your inserts fail, your user will end up half-saved. Notice how an RDBMS relies heavily on transactions, as applications are far more complex than when the relational model was originally designed back in the seventies.

Luckily, as these databases are designed to run on a single node, you can use a central transaction coordinator to commit the data at once without any performance impact.

Atomicity in NewSQL

On the NewSQL side (distributed relational), things are a little bit more complicated. As most of these databases reuse the relational model, your entity’s data (or aggregate root) tends to be spread throughout multiple nodes.

atomicity in newsql

In the image above, if we need to load the user into memory, we would need to first get the user from Server 1, then load the association between users and roles on Server 2 and finally load the target role from Server 3. This simple operation requires data to travel at least twice over the network, which will ultimately limit your read performance. In a real-world scenario, a user has many more tables associated with it. This is why distributed relational is not yet practical when you need to read/write as fast as possible.

You can try to minimize the issues above by limiting the size of your cluster, by relying heavily on indexes to track all relationships, or by some sharding techniques to keep all related data in the same node (which is difficult to implement in practice). The last two approaches, even when well implemented, will consume significant resources from the database to be managed properly.

ACID transactions in NewSQL databases require more coordination than in NoSQL, as the data related to an entity is split into multiple tables which might live in different nodes. The relational model, as we use today, requires transactions for the majority of the writes, updates, and cascading deletes. The extra coordination that the NewSQL architecture requires comes at a cost of reduced throughput for applications requiring low latency operations.

Atomicity in Document Databases

The use of semi-structured data like JSON can drastically reduce the number of “cross-node joins”, therefore delivering better read/write performance without the need to rely too much on indexing. This was one of the key insights of the Dynamo Paper (first published ~13 years ago) which was the catalyst to create the NoSQL databases as we know them today.

Another interesting characteristic of a semi-structured data model is that it is less transactional, as you could fit the whole user data in a single document:

atomicity nosql

As you can see in the image above, the user preferences and roles can easily fit inside a “User Document”, so there is no need for a transaction to insert or update a user as the operation is atomic. We insert the document or the whole operation fails. The same is valid for many other common use cases: shopping carts, products, tree structures, and aggregate roots in general.

In the majority of the applications using document databases, 90% of the transactional operations will fall in this single document category. But… what about the other 10%? Well, for those we will need multi-document transaction support, which has been added in Couchbase since version 6.5 and is the main focus of this article.

Here is a presentation about transactions that was delivered at Couchbase Connect 2020. Matt Ingenthron gives you an explanation of when and why you might need multi-document ACID transactions:

Watch the full version at https://www.youtube.com/watch?v=2fsZVe2cT3M&ab_channel=Couchbase

Video Transcript

Click to read the full transcription.

Let’s talk about how this is applied to a fictitious but maybe kind of realistic example of a document model for a Couchbase system. So we’ve raised some money, and we’re going to build a massively multiplayer online role-playing game (an MMORPG), with players and monsters. So we need a data model.

We are going to have players who fight monsters and then based on that fight, win or lose, they’re going to. If they win, they’re going to get a weapon, if they lose, uh, they lose some hit points so nobody can die. You always can come back to life, you can always find another day. But your players fight monsters, and we get our version 1.0 built. Great! Okay, Got our funding, got our 1.0 built.

The problem is, we forgot to do the massively multiplayer part. There’s no collaborative play. I can’t have multiple players fight the same monster. So I need to fix that, right? So let’s roll out a new version. So players are going to continue to fight monsters and win weapons.

So we roll out our version two 2.0, and in version 2.0 players can fight monsters together. I can coordinate with my friends, we could go find a monster, and we can kill that monster.

But we left a bug in there. It’s possible for multiple players to deliver the death blow and the reason that’s a problem is the players of the game figure it out. Instead of battling a monster together to death, what they do is, they, in this massively multiplayer world, they’ll battle it till close to death, and then a bunch of players will gather up together, and they all deliver the death blow or many will deliver the death blow at the same time.

The problem is, they win items, they win multiple items, and because items have rarity, and if you don’t have a certain amount of rareness for an item, the gameplay is not very interesting. This bug has allowed too many items to exist in the world, and the players are just spending time hacking the game, and then they get bored and leave. We need to keep the gameplay interesting.

So let’s think about this. How can we fix this? I think what we’ll need to do is probably introduce a fix. We’ll still allow players and monsters, multiple players to fight a monster. But what we’re gonna do is take one of those tricks that we have, we’re gonna take Couchbase CAS Operations.

With the CAS operation what happens is now multiple players are fighting that monster, lowering its hit points until it gets down to zero. But only one of those players is going to be able to deliver the death blow to that monster and win an item.

So the way this works is if two players are trying to deliver that death blow, the application server that’s handling the request is going to try to modify the document. It has to pick up a little piece of opaque information that we call the CAS (that’s for Check And Set), and so that means that if that opaque does not match, then the documents have already been modified, and so you need to retry that operation. In the scenario that two actors within the system are trying to grab that document at the same time, what we want is one to succeed and one to fail, and the CAS will give us that in a very efficient way.

We pull out that trick, we introduce CAS operations, the bug is fixed, gameplay is now much more interesting and 2.1 does really well, so that’s great.

So now let’s try a , uh, we wanna move on, we wanna make things more interesting. Imagine now that I introduce another feature: “Players can still fight monsters together, but they have to do it outside the city. So you have to be outside the city wall where the players are, and if you go into the city, you will do trade in a bizarre”

This works really well at first, but then players figure something out.

So imagine player1, I have to retrieve the document for player1. Then I have to retrieve the document for player2. Then I have to move the sword from player1 to player2, that’s very easy to do in the application logic, and then I go to store that change back to the system with the CAS operation, and then I go to store the other change back, right? Sounds like it’ll be great. Except there’s a bug.

The bug here is that my players can start a trade and then disconnect, and then items that might we that we might want to be rare are not going to be rare. They can be duplicated within the system.

In massively multiplayer online role-playing games, it’s called a Dup Bug. If you were to go to Google and search search for a “Dup Bug”, you’ll find lots of scenarios.

Here’s one from just a couple of weeks ago where Final Fantasy Crystal Chronicles on switch had to be patched because of a Dup bug. And then there was one just a few days before that, this is from a blog where a game blogger showing people how to use this dup glitch to retrieve… to get additional items within the game.

So we need to fix this bug. How are we going to do that? Well, so we’re going to reach into our Couchbase bag of tricks.

We’re going to introduce Couchbase transactions. The gameplay is almost exactly the same. But what are we gonna do with Couchbase transactions? And so this is the one slide where I’m going to talk a little bit about the code.

Multi-Document Distributed ACID Transactions in Couchbase

Now that you understand how transactions behave in different data models, it is time to deep dive into how we have implemented it at Couchbase and what led to our design choices. First, let’s go through the syntax:

transactions.run((ctx) -> {
        // get the account documents for userA and UserB 	
        TransactionJsonDocument userA = ctx.getOrError(collection, “userA”);
        JsonObject userAContent = userA.contentAsObject();
        int userABalance = userAContent.getInt(“account_balance”);
        TransactionJsonDocument userB = ctx.getOrError(collection, “Beth”); 
        JsonObject userBContent = userB.contentAsObject();
        int userBBalance = userBContent.getInt(“account_balance”);

        // if userB has sufficient funds, make the transfer
        if (userBBalance > transferAmount) {
                userAContent.put(“account_balance”, userABalance + transferAmount);
                ctx.replace(userA, userAContent);
                userBContent.put(“account_balance”, userBBalance – transferAmount);
                ctx.replace(userB, userBContent);
        }
        else throw new InsufficientFunds();  
});

transactions.run((ctx) -> {

// get the account documents for userA and UserB

TransactionJsonDocument userA = ctx.getOrError(collection, “userA”);

JsonObject userAContent = userA.contentAsObject();

int userABalance = userAContent.getInt(“account_balance”);

TransactionJsonDocument userB = ctx.getOrError(collection, “Beth”);

JsonObject userBContent = userB.contentAsObject();

int userBBalance = userBContent.getInt(“account_balance”);

// if userB has sufficient funds, make the transfer

if (userBBalance > transferAmount) {

userAContent.put(“account_balance”, userABalance + transferAmount);

ctx.replace(userA, userAContent);

userBContent.put(“account_balance”, userBBalance – transferAmount);

ctx.replace(userB, userBContent);

}

else throw new InsufficientFunds();

});

The Java example above is the classic example of how to transfer money between two clients. Notice that we decided to use a lambda function to express the transaction. Proper error handling can be challenging in this scenario and wrapping your transaction with an anonymous function allows the Couchbase Java SDK to do that work for you (i.e retry if something fails).

When we first launched support for transactions we were trying to avoid verbosity. This was how one competitor’s transaction syntax used to look:

Lately, it seems like handling transactions inside lambda functions is becoming the norm for NoSQL databases.

For those who were expecting it to be similar to the relational syntax for transactions (e.g. BEGIN/COMMIT/ROLLBACK SQL commands), keep reading: you can also run transactions through N1QL! Now, let’s try to understand what is happening under the hood.

Couchbase Architecture Review

For those unfamiliar with Couchbase’s architecture, I need to quickly explain 4 important concepts before going any further:

Couchbase is highly scalable, you can easily go from 1 to 100 nodes in a single cluster with minimal effort
JSON Documents have a “Meta” space called xAttr where you can store metadata about your document.
Inside each Bucket (similar to a schema in RDBMS), Couchbase automatically distributed the data into 1024 shards called vBuckets. The sharding is totally transparent to the developer, and we also take care of the sharding strategy. Our sharding algorithm (CRC32) essentially guarantees that documents will be evenly distributed between these vBuckets and no resharding is ever needed. The vBuckets are evenly distributed between the nodes of your cluster(e.g. if you have a cluster of 4 nodes, each node contains 256 vBuckets).

The client’s SDK stores a copy of the cluster map, which is a hashmap of vBuckets and the node responsible for them. By hashing the document’s key, the SDK can find in which vBucket the document should be located. And thanks to the cluster map it can talk directly to the node responsible for the document during save/delete/update operations.

bucket to server mapping

The design choices above allow Couchbase to have a masterless architecture (also referred to as master/master) instead of the traditional master/slave used in other NoSQL databases. There are a number of advantages of this kind of architecture, but the ones relevant for us now are the following:

The SDK saves “one network hop” during insert/update/delete operations as it knows where a given document is located (In the master/slave architecture you have to ask the master where the document is).
The database itself has no central coordinator, therefore, no single point of failure. In practice, the client acts indirectly as a lightweight coordinator, as it knows exactly which node in the cluster to talk to.

Distributed Transactions without a Central Coordinator

On Couchbase’s architecture, each client is responsible for the coordination of its own transactions. Naturally, everything is done under-the-hood on the SDK level. To put it simply, if you have 100 instances of your application running transactions, then you have potentially ~100 coordinators. These coordinators add next-to-no overhead to your application, and you will soon understand why.

If we reuse the money transfer example shown in our code example and assume that the 2 documents involved in this transaction live in two different nodes, from a 1,000-foot view the transaction follows these steps:

distribute transaction flow

Each vBucket has a single document responsible for the transaction log called Active Transaction Record (ATR). The ATR can be easily identified by the _txn:atr- id prefix. Before the first document mutation ( ctx.replace(userA, userAContent) in this case) a new entry is added in the ATR in the same vBucket with the transaction id and the “Pending” status. Just one ATR is used per transaction.
The transaction Id and the content of the first mutation, ctx.replace(userA, userAContent), is staged in the xAttrs of the first document (“userA”).
The transaction Id and the content of the second mutation, ctx.replace(userB, userBContent), is staged in the xAttrs of the second document “userB”.
The transaction is marked as “Committed” in the ATR. We also leverage this call to update the list of document ids involved in the transaction.
Document “userA” is unstaged (removed from xAttrs and replaces the document body)
Document “userB” is unstaged (removed from xAttrs and replaces the document body)
The transaction is marked as “Completed” and removed from the ATR

Note that this implementation is not limited by scopes, collections, or shards (vBuckets). In fact, you can even execute transactions across multiple buckets. As long as you have enough permissions, any document inside your cluster can be part of a transaction.

At this point, I assume that you have many questions about all the potential failure scenarios. Let’s try to cover the most important topics here. Feel free to leave comments and I will try to update the article accordingly.

Handling Isolation – Monotonic Atomic View

Jepsen has a brilliant graph that explains the most important consistency models for databases:

Consistency Models

Couchbase has support for the Read Committed/Monotonic Atomic View” consistency models. But how good is that? Well, Read Committed is the default choice in Postgres, MySQL, MariaDB and many other databases out there; if you never changed that option, that’s what you are using right now.

Read Committed guarantees that your application can’t read uncommitted data, which is what you likely expect from your database, but the interesting part here is how the commit process actually happens. In relational databases, quite often, there is a coordination between the new versions of the rows changed in a transaction to take over their previous ones all at the same time. This is commonly referred to as write-point commit. In order for that to happen, Multiversion Concurrency Control(MVCC) is required. This is problematic because of all the baggage that comes with it, not to mention how expensive it gets (in terms of performance) to be implemented in an ACID distributed database where fast reads/writes are key.

Another disadvantage of write-point commits is that you might spend valuable time synchronizing your commit but … no other thread reads it right after, wasting all effort spent with synchronization. That is when Monotonic Atomic View(MAV) comes into play. It was first described in the Highly Available Transactions: Virtues and Limitations paper and had a great influence on our design.

With MAV we can provide an atomic commit at the read-point instead, which leads to a significant improvement in performance. Let’s see how it works in practice:

Repeatable Reads and Monotonic Atomic Views

In our transaction example, there is a fraction of time after Step 4 where we have set the transaction in the ATR as “Committed” but we haven’t unstaged the data of the documents involved in the transaction yet. So what happens if another client tries to read the data during this interval?

transaction failure scenario

Internally, if by any chance the SDK finds a document that has staged content in it, it will also read the ATR to get the transaction state. If the state is “Committed”, it will return the staged version instead. Boom! No need to synchronize writes if you can simply solve it WHEN it happens on read time.

Durability in a Distributed Database

One of the most important jobs of a database is to ensure that what is written stays written. Even in the event of a node failure, no data should be lost. This is achieved in Couchbase through two features: Bucket Replicas and Durability in the SDK.

During your bucket creation you can configure how many replicas (backups) of each document you want (two is the most common choice). This option allows you to lose N number of nodes without implying any potential data loss.

Atomicity-new SQL

Couchbase is configured by default to always take the fastest approach, so as soon as your data arrives on the server, an acknowledgment will be sent back to the client saying that your write was successful and all data replication will be handled under the hood. However, if your server fails before it gets the chance to replicate the data (we are talking about microseconds to a few milliseconds as the replication is made memory-to-memory) you might naturally lose your change. This might be fine for some low-value data, but .. hey! This totally violates the “durability” in ACID. That is why we allow you to specify your durability requirements:

durabilty options

The MAJORITY (default option in the transaction’s library) in the code above means that the mutation must be replicated to (that is, held in the memory allocated to the bucket on) a majority of the Data Service nodes. The other options are: majorityAndPersistActive, and persistToMajority. Please refer to the official documentation on durability requirements to better understand how it works. This feature can also be used outside of a transaction in case you need to pessimistically guarantee that a document has been saved.

What happens if something fails during a transaction?

You can configure how long your transaction should last before it is rolled back. The default value is 15 seconds. Within this timeframe, if there are concurrency or node issues, we will use a combination of wait and retry until the transaction reaches this time.

If the client managing the transaction suddenly disconnects, it might leave some staged content on the document’s metadata. However, other clients trying to modify the same document can recognize that the staged content can be overwritten as it is part of a transaction that already expired.

Additionally, the transaction library will periodically run cleanups to remove non-active transactions from the ATRs to keep it as small as possible.

Distributed SQL Transactions with N1QL

N1QL is a query language that implements the SQL++ spec, it is compatible with SQL92 but designed for structured and flexible JSON documents. Learn more in this interactive N1QL tutorial.

The distributed transaction solution that we discussed so far is great for application-level transactions. But, sometimes we need to run ad hoc data changes. Or, due to the number of documents involved in the operation, manipulating them in the application memory becomes an expensive operation (e.g. adding 10 credits to all users’ accounts). In Couchbase Server 7.0, you can run transactions through N1QL with virtually the same SQL syntax as most relational databases:

N1QL transactions have already been properly introduced at Couchbase Transactions with N1QL and “Use cases and Best Practices for Distributed Transactions through N1QL”, so I won’t deep dive on this topic. From a thousand-foot view, the transaction is managed by the query service. Since Couchbase is modular, you can increase your transaction throughput by scaling up or out your nodes running the query service.

You can use both the N1QL and lambda transactions together, but using the transaction library only is preferred whenever possible

Conclusion: The Best NoSQL for Transactions

Couchbase already has had support for atomic single document operations and Optimistic and Pessimistic Locking for a long time. Last year, we introduced fast transactional support regardless of buckets, collections, scopes, or shards. With Couchbase 7.0 you can even use the same traditional relational transaction syntax. The combination of all these features makes Couchbase the best NoSQL for transactions at scale.

Low Cost – Pay for what you use

The total transaction overhead is simply the number of document mutations + 3 (ATR marked as Pending, Committed, and Completed). For instance, running the money transfer example in a transactional context will cost you up to 5 additional calls to the database.

The transaction library is a layer on top of the SDK, we added support for transactions with zero impact on performance (something that other players can’t easily claim). This could only be achieved thanks to our solid architecture.

No Central Transaction Manager or Central Coordinator

Given Couchbase’s masterless architecture and the fact that clients are in charge of managing their own transactions, our implementation has no central coordination and no single point of failure. Even during a node failure, transactions that are not touching documents in the faulty server can still be completed successfully. In fact, with an appropriate max transaction time, even if you touch documents in a node that is failing, Couchbase’s Node Failover can be fast enough to isolate the faulty server and promote a new node before your transaction expires

No Internal Global Clock and no MVCC

Some transaction implementations require the concept of a global clock, which can be expensive to maintain without dedicated hardware in a distributed environment. In some cases, it can even bring the database down if the clock skew is higher than ~250 milliseconds.

With Multiversion Concurrency Control (MVCC) and global clocks, you can technically achieve higher levels of consistency. On the flip side, it will most likely impact the overall performance of the database. Couchbase supports the Read Committed consistency model, which is the same one that relational databases support by default.

Flexibility

You can use the durability options inside and outside of a transactional context, use optimistic and pessimistic locking to avoid potential concurrency issues, and use transactions in the SDK and/or via N1QL. There is a lot of flexibility for you to build any kind of application on top of Couchbase and fine-tune the performance according to your business needs.

What is next?

Here are a few links for you to learn more about what we just discussed and to get started with Couchbase:

Learn more about transactions	https://www.couchbase.com/transactions
Java SDK transactions	https://docs.couchbase.com/java-sdk/current/howtos/distributed-acid-transactions-from-the-sdk.html
.NET SDK transactions	https://docs.couchbase.com/dotnet-sdk/current/howtos/distributed-acid-transactions-from-the-sdk.html
C++ SDK transactions	https://docs.couchbase.com/cxx-txns/current/distributed-acid-transactions-from-the-sdk.html
N1QL Transactions	https://www.couchbase.com/blog/couchbase-transactions-with-n1ql/
Download Couchbase	https://www.couchbase.com/downloads
What’s New in Couchbase 7.0?	https://docs.couchbase.com/server/7.0/introduction/whats-new.html

Platform

Services

Self-Managed

Capabilities

By Use Case

By Industry

Popular Docs

Quickstart

Resource Center

About

Partnerships

How we implemented Distributed Multi-document ACID Transactions in Couchbase

What We Learned Evaluating Agent Memory:The Results (Part 2)

What We Learned Evaluating Agent Memory:The Setup (Part 1)

Building a Test Matrix Pipeline for Couchbase Autonomous Operator

App Development Cost: A Complete Pricing Guide and Breakdown

Azure Key Vault for Credentials

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch