MongoDB Rules Single Node Deployments, Fails to Scale

Shane Johnson, Director, Product Marketing, Couchbase

abril 1, 2015

5 MIN DE LEITURA

MongoDB published an independent benchmark comparing the performance of MongoDB, Apache Cassandra, and Couchbase Server in single node deployments to counter the one we published comparing the performance of MongoDB and Couchbase Server in 9-node deployments. MongoDB performs well when it 1) is limited to a single node, 2) doesn’t store a lot of data, and 3) doesn’t support a lot of users. This is a sweet spot for MongoDB.

MongoDB raised awareness of NoSQL databases by making it easy for developers to build a proof of concept or small application. However, MongoDB can’t meet the rigorous demands of production deployments. Couchbase Server, on the other hand, shines when deployed as a distributed database. It scales with ease to store more data, support more users, and provide higher throughput and lower latency access to data.

1) Single Node Benchmark Fails to Address Scalability Requirements

If you want to see how well a database will perform with a small data set and a few users, benchmark it with a single node deployment. If you want to see how well it will perform in a production environment with a large data set and many users, benchmark it with a clustered deployment.

It’s important to not only measure performance at scale, but to measure performance while meeting enterprise requirements. For example, high availability.

Why didn’t MongoDB compare the performance of distributed deployments? Well, it’s difficult for MongoDB to scale beyond a single node.

As noted by InformationWeek (link), scaling is not linear. Adding nodes to a MongoDB replica set will not increase write performance because every write will still be executed by a single node – the primary node. The same is true for MongoDB shards – every write will still be executed by the primary nodes. If there are three shards with three nodes per shard, writes will be executed by the three primary nodes.

2) Benchmark Applied a Different Write Scenario for Couchbase Server – Not an Apples-to-Apples Comparison

MongoDB performed one operation per write – the update. However, MongoDB inadvertently had Couchbase Server perform two operations per write: one read, one update. This limited the write performance of Couchbase Server.

MongoDB (with WiredTiger) and Couchbase Server leverage document level locking. If two clients update the same document at the same time, one of them will fail and it will have to retry the update. This is the case for both MongoDB and Couchbase Server. It was the write scenario for MongoDB in this benchmark, but not Couchbase Server.

Another write scenario is when you need to ensure a client can’t update a document if it’s been updated by a different client first. In this write scenario, Couchbase Server supports compare-and-swap while MongoDB recommends the “Update Document if Current” pattern. This was the write scenario for Couchbase Server, but not MongoDB.

Why would MongoDB have Couchbase Server perform compare-and-swap, but not implement its own “Update Document if Current” pattern?

3) Benchmark Utilized Outdated Couchbase Client, Rather than Current Client

MongoDB chose to use an outdated client library released in 2013 for Couchbase Server, which limited the performance of Couchbase Server. We released a new client library last September, built on Netty and RxJava, followed by minor releases in February and March.

Why would MongoDB benchmark Couchbase Server with an outdated client library but benchmark itself with its latest client library?

4) Single Node Durability versus Distributed Database Durability

The point of durability is to ensure data is not lost when a server fails. In this benchmark, because it was performed with single node deployments, data can only be durable when it’s written to disk. It’s the same limitation of traditional relational databases.

Today, distributed databases rely a modern approach to durability that distributes the risk of data loss – they replicate data to multiple nodes. Couchbase Server is unique in that while it writes to disk like a conventional database, it leverages faster memory-to-memory replication between nodes too. The data is not only durable, it’s highly available. It can be replicated to nodes on different servers, different racks, or different data centers.

That being said, if MongoDB had used the latest client library Couchbase Server write performance would have been at least 10x higher. The two year old client library (1.1.8) waited a minimum of 100ms before checking if the write had been written to disk. In a later release (1.4.x), 10ms. In the latest release (2.x), 10µs. That’s why you should always benchmark databases with their latest client libraries, not two year old ones.

MongoDB Rules Single Node Deployments

MongoDB is well-suited to a proof of concept or small application that has a small data set and a handful of users. Couchbase Server is better-suited to applications with more data, more users, and higher throughput / lower latency requirements – those that benefit from a distributed deployment. In fact, Couchbase Server is often selected to power mission-critical applications – small or large, consumer or enterprise, social or gaming – where traditional relational databases fail to provide the scalability or performance required.

Discuss on Hacker News

FYI – We benchmarked MongoDB and Couchbase Server with 9-node deployments.

Share this article

Publicado em: Uncategorized

Author

Publicado por Shane Johnson, Director, Product Marketing, Couchbase

Shane K Johnson was the Director of Product Marketing at Couchbase. Prior to Couchbase, he occupied various roles in developing and evangelism with a background in Java and distributed systems. He has consulted with organizations in the financial, retail, telecommunications, and media industries to draft and implement architectures that relied on distributed systems for data and analysis.

Todos os artigos

16 respostas

Ownee

1 de abril de 2015 às 14:41

Its marketing to do a performance test where the opponent will always lose but to respond without a performance test is a bit like shouting “No we are better! We promise! Don’t trust them!”

Acesse para responder
1. shanekj
  
  1 de abril de 2015 às 16:13
  
  I’m not sure I understand. What did you mean by responding without a performance test?
  
  Acesse para responder
2. J Chris Anderson
  
  1 de abril de 2015 às 16:54
  
  For the record, the benchmark MongoDB promoted (that this blog post is a reply to) was run to counter an earlier benchmark with larger (9 node) clusters. The benchmark is available here: https://news.avalonconsult.com/…
  
  The TLDR is for highly concurrent workloads on a nine node cluster, Couchbase Server handles way more traffic than MongoDB without response times slowing down. This is more representative of the production workloads our customers care about, than a single node drag-race would be.
  
  Acesse para responder
Akmal Chaudhri

1 de abril de 2015 às 15:25

Good points Shane but I am reminded of something Mike Stonebraker said many years ago: “… any person who designs a benchmark is in a ‘no win’ situation, i.e. he can only be criticized. External observers will find fault with the benchmark as artificial or incomplete in one way or another. Vendors who do poorly on the benchmark will criticize it unmercifully.” What would be really nice to see, IMHO, are real customer benchmarks rather than just more YCSB numbers. But having worked on database performance benchmarks in the past, I know the difficulties involved.

Acesse para responder
1. shanekj
  
  1 de abril de 2015 às 16:12
  
  Ralph, how was it deceitful?
  
  Acesse para responder
2. shanekj
  
  1 de abril de 2015 às 16:16
  
  Well, Bud Light continues to be most popular beer in the country. However, it’s not a good beer.
  
  Acesse para responder
3. Akmal Chaudhri
  
  1 de abril de 2015 às 16:56
  
  Ralph, I don’t work for Couchbase. I can’t comment on the recent Avalon benchmark report, as I have not read it yet. Be careful of DB-engines.com — it is a popularity rating that includes web mentions/searches and does not say anything about installation numbers.
  
  Acesse para responder
Opsy

1 de abril de 2015 às 16:26

YCSB updates one random field out of ten during updates.

MongoDB has an update which selectively updates a single field (just like SQL UPDATE statement), Couchbase doesn’t have that.

So if you don’t first read the document, you would overwrite the existing ten fields with one new one (which is what the Thumbtack benchmark you published last year did). Your own Avalon benchmark read the document in order to replace one of the fields with new value, but then they replace existing document – therefore overwriting any updates that happened in the meanwhile. Using CAS is correct to actually preserve all the updates that happen. CAS in MongoDB must be used same as in RDBMS only when you either need to be updating a document for a long time (human editing a document, for example).

Acesse para responder
1. shanekj
  
  1 de abril de 2015 às 16:38
  
  To be honest, I think the $set command is useful when the write scenario includes updates to different fields in the same document by multiple clients in the same window who do not read the document first.
  
  Acesse para responder
  1. opsy
    
    1 de abril de 2015 às 17:30
    
    You are apparently not familiar with YCSB. The update only provides one field with a new value. If you don’t read the full document, how do you update it in Couchbase?
    
    What is Couchbase equivalent of UPDATE ycsbtable SET field6=newvalue WHERE primarykey=9999?
    
    Acesse para responder
    1. shanekj
      
      1 de abril de 2015 às 18:09
      
      You are correct. Like I said, I think the $set operation is useful for applications that update specific fields without ever reading the document first. For other applications, that may or may not be the case. For example, user profiles. If a user wants to update their profile, the application reads it first. It’s displayed, and the user edits it. With Couchbase Server, the application could modify the document it read based on user edits and then update it. With MongoDB, the application could use the $set command. In this context, the application would read the document first whether it was from MongoDB or Couchbase Server.
      
      Acesse para responder
      1. Opsy
        
        3 de abril de 2015 às 20:42
        
        your own benchmark did two operations, it read the document and then overwrote it – cause no other way to update one field. and your own benchmark ignored conflicts with other threads.
        
        your code is wrong.
        https://github.com/kruthar/YCS…
      2. shanekj
        
        4 de abril de 2015 às 0:11
        
        You don’t have to read the document first.
        
        The goal of YCSB is to measure the performance of an update operation. In a real-world application, the document would already have been read. For example, to populate an edit form. You measure read performance to estimate how long it will take to display the form. When the form is submitted, you modify the document and update it. You measure the update performance to estimate how long it will take to display the confirmation. You can create a document, generate the data for its fields with YCSB, and perform an update with it. Problem solved.
        
        The Avalon benchmark didn’t perform compare-and-swap operations for MongoDB or Couchbase Server. A partial update does not provide the same guarantee that a CAS operation does. It’s is an optional form of optimistic concurrency control that prevents a client from updating a document if it is unaware of previous updates.
        
        For example, the account for a credit card holder. First, a client checks the payment field to see if one has been made. It has not. Next, while the first client is checking the payment field, a second client updates the payment field to “received” because one was just processed. Finally, the first client updates the status field to “late” because it’s not aware of the update performed by the second client. While partial updates are nice to have, they would not solve this problem. They would allow for these two clients to update different fields at the same time, but the result would be invalid. That’s why we have CAS.
    2. shanekj
      
      1 de abril de 2015 às 18:29
      
      By the way, for Couchbase Server 4.0 it is this:
      https://docs.couchbase.com/deve…
      
      UPDATE ycsbtable USE KEYS “9999” SET field6 = “newvalue”
      
      Acesse para responder
guest

3 de abril de 2015 às 20:02

At least their benchmark included the code used for everyone to see. If you ran your own benchmark, show your code.

Acesse para responder
1. shanekj
  
  3 de abril de 2015 às 23:32
  
  It should have included the URL for the GitHub repository. Here it is:
  https://github.com/kruthar/cou…
  
  Acesse para responder