CosmosDB is Microsoft’s NoSQL offering that’s exclusive to Microsoft Azure. It used to be called DocumentDB, but they changed the name and added some interesting new features. Let’s go a little deeper on it and explore its strategy, documentation, what developers have been talking about, and how it compares to Couchbase Capella.
One Database to rule them all?
Microsoft claims that CosmosDB is a NoSQL database able to do literally everything: It is a Document database, Columnar storage, a Key-Value Store and a Graph Database. All achieved thanks to an abstraction of the data format called atom-record-sequence (ARS).
Let’s look at how data is organized according to each model. First, you have to choose the API you would like to use (SQL, MongoDB API, Microsoft Azure Table, Cassandra or Gremlin) and stick with it, as it can’t be changed later. But behind the scenes, it looks to be a custom JSON format.
CosmosDB is trying to compete with all of the major NoSQL databases, which may be a risky strategy. For one, this approach may limit the features that CosmosDB can ultimately offer. There’s a single common denominator, and that can’t be strayed too far from. Also, APIs like MongoDB and Cassandra are not defined or planned by Microsoft. This means that Microsoft will always be catching-up to the latest releases, and will ultimately never achieve 100% compatibility. Microsoft maintains documentation about which MongoDB features are supported and which are not (and the same thing for Cassandra). An all-in-one solution like CosmosDB might be good for simple applications with few functionality demands, but all those abstractions come with a cost and will ultimately impact simplicity, performance and be feature limited.
Couchbase vs CosmosDB – Comparing Apples with “Apples”
This comparison will focus most on scenarios that make sense to compare both technologies (for example, Couchbase is not a graph database, so the comparison wouldn’t make sense).
One other important note: Couchbase Capella is Couchbase’s DBaaS (database-as-a-service) offering, available in AWS and GCP (soon to be in Azure too). It is basically a managed version of Couchbase Server, which is still available for download, so they are very similar. Unless otherwise stated, the “Couchbase” column applies to both Capella and Server.
|Licensing||Proprietary, closed-source but free-tier is available.||Free trial available for Capella, Couchbase Community and Enterprise available for download, BSL|
|Data Integrity||Five options are available in configuration:
|Scalability||Highly scalable||Highly scalable|
|Mobile||No plans for CosmosDB for mobile or devices or any offline support||
|Deployment||Azure only, fully managed only.
There is a development version available (currently Windows only).
|Can be deployed anywhere, including Azure, on-premises, Kubernetes, Docker, VM, bare-metal.
Couchbase Capella offers a fully managed DBaaS
|Locking||Optimistic and pessimistic locking available||Optimistic and pessimistic locking available|
|Backup & Restore||Continuous backup mode for 30 days
Periodic backup mode (default)
|Automatic backup and restore service with configurable backup wizard
Continuous backup available using XDCR
|Querying||Based on which mode is chosen.
Example 1: SQL API is an extremely limited subset of standard SQL
Example 2: MongoDB API is a non-100% subset of Mongo API
|Full SQL implementation called SQL++ (with JOIN, aggregate, CTE, window functions, CRUD operations, etc) – previously known as “N1QL”|
|Data Center Replication||Push-button global master-master replication between supported Azure data centers||XDCR allows any combination of unidirectional and bidirectional replication between any Couchbase deployment, including data filtering|
|Speed/performance||More speed and performance is only obtained by increasing RUs, which will often be prohibitively expensive||Memory-first read and write operations.
Built-in caching layer.
Can be tuned by increasing memory, disk, or adding a new node.
Memory-optimized indexes available
|Sharding / partitioning||Partition key(s) must be created and managed manually, requiring a dedicated expert to set and design correctly in order to reach performance/scale goals||Sharding is completely automatic|
|Architecture||Unknown / proprietary||Every node is a master in Couchbase, making most efficient use of resources|
|Supported SDKs||.NET (primary, most feature complete)
(Others through Mongo/Cassandra)
|.NETC / C++
Success in the Real World
This side-by-side comparison may favor Couchbase, but what about the real-world experiences of an organization that was using CosmosDB and switched to using Couchbase?
Facet Digital cut their database costs by 50%, and improved their performance by 100x by switching to Couchbase Capella.
How was it possible?
- Faster deployment time
- Easy search integration
- Faster indexing
- Better DevOps automation (CI/CD index definitions)
- Familiar and complete SQL syntax
CosmosDB has a unique vision, but as a natural consequence of building something focusing on multiple fields at once, CosmosDB’s support for all of your desired features can be uneven.
One of the most prominent features is the ability to choose between multiple levels of eventual consistency: Bounded-staleness, Session, Consistent Prefix and Eventually Consistent. The fact that Session is set as the default consistency says a lot about the recommended way to use CosmosDB. It could mean that it might not be the best solution if you need a strong data consistency (and perhaps Microsoft would want to steer you back towards their flagship SQL Server database).
Being memory-first is one of the reasons why Couchbase is so fast. CosmosDB has an integrated cache (currently in preview), but like with search, it’s a separate product that must be added-on. Couchbase has been memory-first since its inception.
With CosmosDB, all fields are indexed in their Global Secondary Indexes (GSI). It seems like overkill. It may be easier to specify which fields to index than specifying which fields not to index. As soon as your JSON gets much bigger than a handful of properties (and especially when nesting JSON objects), these indexes are definitely going to be overkill, with the costs passed on by default. Too many indexes means too many RUs which means too many dollars.
Sharding seems to be one of the trickiest things in CosmosDB. Partitions are moved automatically among nodes, but you still have to specify a partition key. One drawback of this approach is that each partition is indivisible with a max size of 10Gb. If you pick a bad partition key, a lot of frequently accessed documents might end up in the same partition, which limits the throughput of your reads/writes by the node capacity where the partition is stored.
The partition key is also immutable, so in order to change it, you will be required to copy your whole data to another collection. In Couchbase, documents are distributed evenly between vBuckets to avoid this problem, and also to increase your read/write performance.
With CosmosDB, throttling up is done only by increasing Request Units (RUs). The challenge with this approach is that it is not a very good predictor of the query performance and makes it even harder to boost a specific behavior like increasing only the writes capacity. For some use cases, you may find a team needs a person to work on RUs full-time to figure out and maintain the queries properly.
Microsoft has put a lot of effort into trying to make RUs easier to understand, but it’s common for developers to underestimate their RUs (see here or here) and they end up stuck with a bill much higher than expected. On Couchbase, throttling up is very flexible, it can be done by vertical and/or horizontal scaling, running specific services according to the node hardware, keeping indexes in memory, etc.
CosmosDB also provides a cool push-button global data distribution that makes it really simple to replicate data in multiple data centers across the world. However, it can also be easily achieved in a matter of minutes in Couchbase Server without the limitation of running only in Azure.
Benchmarking is difficult, because of CosmosDB’s RUs model, but a third-party benchmark using the YCSB approach shows Couchbase Capella’s clear advantage in throughput and latency.
CosmosDB’s pricing is attractive if you have a small database with few reads/writes per second. But anything above that can cost a lot. CosmosDB’s price calculator shows that with a 50/50 mix of reads and writes, plus a handful of queries per second, can add up to thousands per month. CosmosDB provides a helpful calculator, but it’s somewhat unreliable, due to the difficulty in predicting RUs (as mentioned earlier). Also, the calculator does not consider the consistency model you are going to use, so you have to add a few extra dollars to this number for Strong-Consistency.
Couchbase Capella pricing is much more predictable, and will often be lower cost, especially for larger, mission-critical use cases.