Comparing Couchbase vs CosmosDB Comparison

Microsoft has generated a lot of buzz since the launch of CosmosDB. It is basically a rebranding of Amazon DocumentDB with some new cool features. Let’s go a little deeper on it and explore its strategy, documentation, what developers have been talking about and how does it compares with Couchbase Server.

One Database to rule them all?

In simple words, Microsoft claims that CosmosDB is a NoSQL database able to do literally everything: It is a Document database, Columnar storage, a Key-Value Store and a Graph Database. All achieved thanks to an abstraction of the data format called atom-record-sequence (ARS).

A good sign of Microsoft’s work is how data is differently organized according to each model. First, you have to choose the API you would like to use ( SQL, MongoDB API, Microsoft Azure Table, Cassandra or Gremlin) and stick with it as it can’t be changed later. Currently, you can still try to access some models through DocumentDB API. that was what gave me some hints of how CosmosDB uses internally a decorated JSON format to store its data.

It looks like Microsoft wants to compete with most of the NoSQL databases out there, which is a really risky strategy as we might have passed the gold era of a single database solution for everything. There are huge benefits of choosing specialized storages, and this is the path most of the applications have been following right now with the rise of polyglot persistences. An all-in-one solution like CosmosDB might be good for low-demanding applications, but all those abstractions come with a cost and will ultimately impact simplicity, performance and be feature limited.

Couchbase vs CosmosDB – Comparing Apples with “Apples”

I will try to limit my comparison with CosmosDB focusing most on scenarios that make sense to compare both technologies. The table below tries to show some of the differences side-by-side:

Feature	CosmosDB	Couchbase Server
Licensing	Proprietary	Open-Source Community and Enterprise editions
Type	Key-Value Store Document Database Graph Database Columnar storage	Key-Value Store Full Document Database Managed Cache Mobile Database
Model	Up to 2MB document size atom-record-sequence (ARS)	Up to 20Mb document size JSON Document Supports BLOB
Search	Azure Search	Internal Full-Text Search Engine ElasticSearch
Indexing	All fields are indexed by default using GSI Indexes can be also be customized by a user	Unlimited Number of Global and Secondary indexes Any fields can be added to the index Memory optimized indexes.
Data Integrity	Strongly Consistent Bounded-staleness Session (default) Consistent Prefix Eventually Consistent	Strongly Consistent Eventually Consistent between Regions
Scalability	Highly Scalable	Highly Scalable
Mobile	Not in their priority list at the moment	Couchbase Lite and Sync Gateway
Deployment	Azure Only, fully managed Development version available only on Windows	Can be deployed anywhere Full Kubernetes/Openshift support will be introduced in the next release
Locking	Optimistic Locking Pessimistic Locking	Optimistic locking Pessimistic locking
Backup & Restore	Automatically done every 4 hours but only the latest 2 snapshots are stored. For backups older than 12 hours currently you should use their Database Migration Tool	Merge, Incremental and full Backup Restore with automatic conflict resolution
Querying	Querying is made according to the chosen API, and each API has different limitations	SQL-Like querying with N1QL Map/Reduce views
Data Center Replication	Push-button global Bidirectional replication	Unidirectional and Bidirectional Allows Data Filtering Replication between differently sized clusters
Throttling	Only by increasing RCUs	Memory first reads/writes Cache layer is transparent Tuning is made by Increasing Memory, Disk or adding a new Node. Memory-optimized Indexes
Administration Interface	Feature-rich administration interface; Provide a few endpoints to automate some tasks Some features aren’t well documented	Flexible and feature-rich administration interface The same version can be used in production and development Powerful Query Workbench All implemented using Restful APIs
Sharding	Partitioning is made automatically but you are required to pick a partition key.	Sharding is automatically done under the covers
Security	Provided by normal Azure security measures	Role-based access control Auditing logs for management tasks Encryption at rest
Architecture	Unknown	All nodes are master
Integrations	Spark, Hadoop Other Azure Services MongoDB Connector Cassandra Gremlin	Spark, Hadoop Kafka Cloudera/Databricks/Horton Networks DataDog PowerBI/Tableau/TalenD Many others

Conclusion

I think this is the very first article comparing CosmosDB with another database. It took me a good amount of time to go through a lot of documentation, developer’s feedbacks, and some webinars.

My feeling, in general, is that CosmosDB has a great vision, but currently, it is still immature in some aspects. Documentation and backups, for instance, are not one of their strengths, which is a natural consequence of building something focusing on multiple fields at once. Microsoft’s database also brings a lot of innovations, one of the most prominent is the new multiple levels of eventual consistency: Bounded-staleness, Session, Consistent Prefix and Eventually Consistent.

The fact that Session is set as the default consistency says a lot about the recommended way to use CosmosDB. It also gives us hints that it might not be the best solution if you need a strong data consistency.

I could not find any mention of caching mechanisms in CosmosDB, so I am assuming that it is not a major part of the database. The problem is that caching is crucial for good performance in strongly consistent databases, being memory-first is one of the reasons why Couchbase Server is blazing fast.

CosmosDB does not provide memory-optimized indexes and by default, all fields are indexed in their Global Secondary Indexes (GSI). It totally sounds like overkill to me as I still think it is easier to specify which fields I want indexes than specifying which fields I don’t. Of course, you don’t necessarily need to remove those fields from the index but don’t forget you are getting charged for it.

Sharding seems to be right now one of the trickiest things in CosmosDB. Partitions are moved automatically among nodes, but you still have to specify a partition key. The drawback of this approach is that each partition is indivisible with a max size of 10Gb. If you pick a bad partition key, a lot of frequently accessed documents might end up in the same partition, which limits the throughput of your reads/writes by the node capacity where the partition is stored.

The partition key is also immutable, so in order to change it, you will be required to copy your whole data to another collection. In Couchbase, we transparently distribute your documents evenly between vBuckets to avoid this problem, and also to increase your reads/writes performance.

Currently, throttling is done only by increasing Request Units (RUs) which is a common standard for fully managed databases (on DynamoDB, for instance, throttling is made by increasing Read/Write capacity units). The challenge with this approach is that it is not a very good predictor of the query performance and makes even harder to boost just a specific behavior like increasing only the writes capacity.

Microsoft has put a lot of effort in trying to make RUs provisioning easy to understand, but I have found many comments of developers underestimating their RUs ( like here or here ) and ending up with a bill much higher than expected. In general, the pattern that I have seen of provisioning in CosmosDB is mostly based on trial-and-error. On Couchbase, throttling is very flexible, it can be done by vertical/horizontal scaling, running specific services according to the node hardware, keeping indexes in memory, etc.

Microsoft is also clearly trying to convince MongoDB’s users to migrate to CosmosDB. They even provide a fairly compatible connector to make the migration easier. The problem is that the root cause of why some users are willing to migrate to other databases is due to MongoDB’s scalability and performance issues. We know it very well because many of those users end up migrating to Couchbase Server, and CosmosDB performance does not seem to be a big plus, at least not for a reasonable cost.

Microsoft does provide a limited local version for development, but so far it runs only on Windows machines.

CosmosDB also provides a cool push-button global data distribution that makes really simple to replicate data in multiple locations of the world. It is, however, a feature not used daily to require such simplicity, it could also be easily achieved in a matter of minutes in Couchbase Server without the limitation of running in a single cloud.

In summary, I agree with CosmosDB point of view that eventual-consistency is a too broad definition. Their new consistency models let the developer choose the level of consistency their application tolerates.

The reasons to use it are nearly the same as the ones mentioned in my article about DynamoDB. The main difference, of course, is that CosmosDB is much more flexible than DynamoDB. It is right now an average multi-purpose database for applications demanding average performance with strong consistency . It also easily integrates with some features of Azure Functions.

CosmosDB still lacks famous use cases/clients, but it has the potential to stand out in applications with eventual consistency, as it seems to be their main focus. But when it comes to strongly-consistent medium/high demanding applications, Couchbase Server is by far a better choice, both from the price and performance point-of-view.

It is hard to come up with a fair benchmark between those two databases as it’s unclear, for instance, how many servers are running when you provision 30.000 RUs in CosmosDB, so the easiest way to predict their expected performance is through their architecture/features.

Pretty much like DynamoDB, CosmosDB pricing is attractive if you have a small database with few reads/writes per second. But anything above that with cost you a good amount of money: 200 000 documents of 45kb, with 4 writes/sec and 40 reads/sec will cost at least US$ 2 500.

Their calculator does not consider the consistency model you are going to use, so you have to add a few extra dollars to this number for Strong-Consistency. In this setup, CosmosDB cost is at least double the price you would spend to run Couchbase EE on Amazon Web Services with our recommend architecture (which is capable of handling more than that)

As I mentioned at the beginning of the article, there are a lot of advantages in choosing specialized storages for each specific purpose, and Couchbase Server really excels in delivering high performance with strong-consistency.

If you have any questions, feel free to tweet me at @deniswsrosa

1 Comment

defusenik

November 14, 2021 at 10:02 am

Really inaccurate on the pricing front for Cosmos DB. 40 reads/sec and 40 writes/sec at 45kb docs requires nearly 2500 RUs (request units) – they are not dollars! They are a throughput measure. Cost for that workload is around $130 / month.

The key advantage like every PAAS service is you don’t have to manage a bunch of VMs yourself and set every aspect of backups/replication/tuning/patching/etc etc. Also you’re not hit with a license cost as well as VM cost for your cluster. You only pay for the throughput you use. Depends if you want a service that requires minimal management, or if want to tweak everything yourself.

Log in to Reply

Platform

Services

Self-Managed

Capabilities

By Use Case

By Industry

Popular Docs

Quickstart

Resource Center

About

Partnerships

Comparing Couchbase vs CosmosDB

What We Learned Evaluating Agent Memory:The Setup (Part 1)

Building a Test Matrix Pipeline for Couchbase Autonomous Operator

App Development Cost: A Complete Pricing Guide and Breakdown

Azure Key Vault for Credentials

Your AI Agents Are Stuck in Pilot. It’s a Data Problem, Not a Model Problem.

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch