Couchbase Architecture

Distributed Databases: An Overview

4 MIN DE LEITURA

A distributed database stores data across multiple servers (nodes) connected by a network, managed as a single logical database. Distributed architecture enables horizontal scaling (adding commodity nodes to increase capacity) and high availability through automatic data replication. There are two primary distribution architectures: primary/secondary (master/slave), where one node owns all writes, and shared-nothing, where data is partitioned across all nodes with no single point of bottleneck. Couchbase uses a shared-nothing architecture with automatic vBucket-based data distribution, making it a natural fit for applications that require elastic scalability and always-on availability.

Primary/Secondary vs. Shared-Nothing Architecture

DimensionPrimary/Secondary (Master/Slave)Shared-Nothing (Masterless)
Data ownershipAll data on primary; secondaries are read replicasData partitioned across all nodes, no single owner
Write scalabilityLimited: all writes go to one primaryLinear: writes distributed across all nodes
Read scalabilityGood, if reads can be served from secondariesExcellent: reads distributed across all nodes

Horizontal Scaling

A single database server for a small set of applications and data has historically worked well. However, when exposed to a large, public user base, the only way to increase the capacity of these servers is to upgrade them to a more expensive server.

To improve capacity, move the database software to another single machine with more memory, more disk space, and more processors. This is called “vertical scaling.” The drawback to this approach is that it may require downtime. There’s also a ceiling on the performance that can be obtained from a single machine.

Unfortunately, many databases, especially relational databases (RDBMS), are not designed to be distributed and clustered.

However, distributed databases are created from the start to support elastic scalability. Need to add more resources to handle more load? Install the database software on one or more additional machines and add it to the cluster.

Then, add inexpensive commodity machines to the cluster when necessary. You can also remove them and scale down if you no longer need them.

Primary/Secondary Architecture

In a primary/secondary architecture, there is a designated “primary” server. This server stores all the data and handles all data requests. There are one or more “secondary” servers. These servers will receive data updates from the primary in order to stay in sync and store a complete replica of the data.

If the primary server goes offline (crashes), the remaining servers (and/or coordination servers) appoint one of the secondary servers to be the new primary.

Architects use this pattern to provide high availability to traditional, non-distributed databases. However, this architecture doesn’t do much to address the issue of increased load. In order to accomplish that, sharding must be used.

Shared-Nothing Distributed Databases

Shared-nothing architecture involves splitting the data into partitions, usually called “shards.” Each shard lives on an individual server (node) in the cluster. For example, if there are 300 records and 3 nodes, each node would (ideally) store 100 records. Each additional node could further partition the data and continue spreading out the load as necessary.

The cluster will also replicate shards between the nodes to maintain high availability. For instance, if Node 1 contains Active Shard A, then Node 2 would contain a Replica Shard A, and so on.

Then, if Node 3 goes offline, the cluster promotes the replicas from Shard C to Active status in order to keep the distributed database cluster online (as a whole).

The nature of relational databases is to store individual rows of data together in a tightly coupled table. This makes distributed SQL databases difficult. This is why organizations often choose NoSQL, where clustering, high availability, and replication are critical. NoSQL trades strictly coupled data that cannot exist outside a table in exchange for independent data that can exist in any given shard in a cluster.

Distributed Database Examples

Depending on the distributed database that you use, sharding may be completely automatic or require considerable effort to plan and maintain.

Let’s look at two examples of popular NoSQL distributed databases and how they differ:

How Couchbase Distributes Data: vBuckets

Couchbase uses a concept called vBuckets (virtual buckets) to distribute data across nodes in a shared-nothing cluster. A bucket is divided into 1024 vBuckets by default. Each vBucket is assigned to a node – both as an active vBucket (the primary copy) and as one or more replica vBuckets on other nodes. The Cluster Manager maintains a vBucket map that every SDK client uses to route requests directly to the correct node, eliminating the need for a central router or proxy.

Share this article

Author

Matthew D. Groves is a guy who loves to code. It doesn’t matter if it’s C#, jQuery, or PHP: he’ll submit pull requests for anything. He has been coding professionally ever since he wrote a QuickBASIC point-of-sale app for his parent’s pizza shop back in the 90s. He currently works as a Senior Product Marketing Manager for Couchbase. His free time is spent with his family, watching the Reds, and getting involved in the developer community. He is the author of AOP in .NET, Pro Microservices in .NET, a Pluralsight author, and a Microsoft MVP.

Deixe um comentário

Ready to get Started with Couchbase Capella?

Start building

Check out our developer portal to explore NoSQL, browse resources, and get started with tutorials.

Use Capella free

Get hands-on with Couchbase in just a few clicks. Capella DBaaS is the easiest and fastest way to get started.

Get in touch

Want to learn more about Couchbase offerings? Let us help.