You may have heard that MongoDB has issues with scaling out. You may have heard that Viber is replacing MongoDB with Couchbase Server. Have you heard that MongoHQ has to scale up?
I agree with MongoHQ on one thing. It’s easier to scale up. While scaling up may be the only solution for MongoDB, it is not the right solution. What happens when there is no more room to scale up? I think that MongoHQ would prefer to scale out, but they realized that MongoDB was not engineered for it.
It results in a loss of functionality…
In MongoDB, for instance, horizontal scales means losing out on unique secondary indexes (one of the few schema constraints MongoDB currently supports) for a given collection.
It requires difficult decisions…
In MongoDB, this choice is the “shard key decision” that you will hear experienced MongoDB users talk about with undertones of dread.
It is complex to configure and maintain…
But once you’ve got your sharded database into production, then there’s the underlying reality that your system has more “moving parts” in it than might be ideal because more moving parts mean more things can go wrong.
It results in a loss of performance…
In a sharded setup, the network connectivity between the Mongo router, config servers, and each shard influences overall performance.
Couchbase Server, on the other hand, was engineered to scale out. It does not result in a loss of functionality. It does not require difficult decisions. It is not complex to configure and maintain. It does not result in a loss of performance.
- Couchbase Server does not require manual configuration of sharding. It is automatic.
- Couchbase Server does not have a lot of moving parts. There is one node type.
- When Couchbase Server is scaled out, the performance is increased.
I think the section on oplog tailling was rather troublesome. I think it highlights the fact that MongoDB was not engineered to scale out. It turns out that administrators and developers rely on a log file for insight and integration. The problem with scaling out MongoDB is that is requires developers to tail multiple log files. There is a log file per node.
Couchbase Server, on the other hand, does not require administrators and developers to tail a log file for insight and integration. Administrators can monitor Couchbase Server from any node. Developers can integrate with Elasticsearch or Apache Hadoop without having to interact with individual Couchbase Server nodes. The topology is transparent. That’s the benefit of a distributed system with a shared nothing architecture.
It’s worth pointing out that Couchbase Server supports cross data center replication. It allows the data in a cluster to be replicated to a different cluster. In fact, this is how we integrate with Elasticsearch. It is as a different cluster that Couchbase Server can replicate data to. That it is an Elasticsearch cluster is irrelevant. It is simply a different cluster to replicate data to.
Why We Start Scaling Vertically (Google Web Cache)
Note: MongoHQ as removed the original post and is now redirecting to a new one.
Note: This includes an overview of the moving parts within a MongoDB deployment that is scaled out.