20% Slowness as compare to MongoDB

Hi,

We have SQL SERVER 2008 R2 in production and we are working on evaluating the NoSQL database performance between MongoDB and CouchBase. We have same setup for Mongo and CouchBase server.
• We are not getting at least the same performance with CouchBase as we got with MongoDB. CouchBase is 20% slower that MongoDB. Reading through all the available white paper we noticed that CouchBase is much faster than MongoDB.
• The migration (150 GB of data) from SQL Server to CouchBase is very very slow using C# code using Jason format. Migration in MongoDB took 6 hours but in CouchBase it took 3 days.
• CouchBase server is taking 84 GB of space in one bucket to store 150GB of data (SQL Server) but MongoDB took 16 GB.

Below are the configurations.

SAN – 500 GB
RAM – 20 GB
OS – Windows Server 2008 R2 (64 bit)

It will be great if someone from CouchBase technical team assist us.

First, could you share versions for server and SDK and the details HW for the cluster for Couchbase Server?

Regarding 20% slower: what is your workload for the comparison? could you share how you are generating the workload and what exactly you are measuring - latency or throughput or both?

Regarding insert performance: if you can share your code for data upload, I can look to see if we can optimize the upload. The difference suggest there may be some other inefficiency in the path.

Regarding storage size difference: The storage optimization of both products are different. Couchbase comes with append only writes so the file size may depend on how much activity was ongoing and happened after compaction. Even after compaction, storage in Couchbase Server has optimizations that builds storage level indexes that different from those of mongodb so there could be difference in storage sizes.

Just to be sure - are you using the same JSON document data model for both Couchbase and MongoDB? We tend to favor difference in modelling the docs especially when it comes to embedding vs referencing. JOINs is something we expect regularly so we like referencing and Mongo does not favor those so it may make sense to start looking at the modelling you have done as well.

thanks
-cihan

Version : CouchBase SDK 2.2 Product Version (2.2.7)
Newtonsoft.Json.8.0.3
Common.Logging.3.3.1
Common.Logging.Core.3.3.1

H/W Detail
SAN – 500 GB
RAM – 20 GB
OS – Windows Server 2008 R2 (64 bit)

Regarding 20% slower :
We have done some proto type work to compare SQL performance with NoSQL Database.
In our application we have a very heavy data process that are sifted into NoSQL Database(Mongodb and couchbase).
Mongodb and couchbase both server have same configuration as i have mention above.
Only difference is : DATA Inserted into Mongodb was in BSON format but in couchbase it was in JSON Format. Otherwise there are no other difference in terms of code

But During testing we found that
Mongodb gives 20% improvement as compare to sql database. But with couchbase we do not get any performance.

Regarding insert performance: in terms of code, we are using below code to insert data in couchbase and ceff is class that contrains our data. this class have approx 35 fields(all type of field like date,int double)

JavaScriptSerializer json = new JavaScriptSerializer();
string s = json.Serialize(cEff);
var upsert = bucket.Upsert<String>(idvalue.ToString(), s);

@CouchBase2,
funny topic :slight_smile:

Mongodb gives 20% improvement as compare to sql database. But with couchbase we do not get any performance.

emotional offtopic: how do you think, who is stronger: world boxing champion or world weightlifter champion ?

  1. Try to use simple thread-pooling, if used Couchbase SDK allows you nothing more then bucket.upsert()
  2. If used Couchbase SDK allows async queries, then use async queries with “bulk”-like-insertion

@CouchBase2, Sounds like you are doing a single node comparison of perf with no replication or HA? is that the configuration that you expect to run in?

@cihangirb, Yes We are using single node comparison with no replication. Just because of we are using same configuration for Mongodb. I feel that, if you are calming that couchbase is much faster than Mongodb then it should give better performance with same configuration.
We are new in this area. May be I have done some mistake in configuration . Due to this reason we want your help. It will be great if someone from CouchBase technical team assist us.

@egrep, its not funny topic, We are using same logic for Mongodb. and we are getting better performance. I believe that if you are doing comparison with other technology then logic would be same for both.

@CouchBase2,
first of all, i’m not a part of couchbase team, just a user like you.

Formally:

  1. Did you try http://developer.couchbase.com/documentation-archive ? All your answers are most likely there. Yes, it’s time-consuming, but, it’s a payment.
  2. If you want to “test a single-threaded insertion into 1 server with your setup, data set and code”, then you already have got a result: use Mongo (20% is cool), don’t ask anyone because you are pretty sure with you tests, eh ?
  3. Wanna a docs-non-reading shortcut from someone who knows more ? Well, nothing to say here, you’ve got my previous advises, probably someone could give you more.

P.S. And there is no justice in the world, yeah.

@CouchBase2,
We can certainly look into the configuration - can you share a cb_collectinfo? I’d like to look into bucket settings and your cluster settings as well.

For ingest, we like parallelisation, so if you can increase concurrency, couchbase server works better. Are you ingesting data that purely inserts or are you updating existing data as well with the ingest?

I should note this: By default, we don’t optimize a single node deployment of couchbase server because majority of customers deploy us with HA and clustering across multiple nodes. There are a large number of optimizations that you can do for a single node deployment that isn’t available in a distributed system. So performance of a single node deployment is not representative of a clustered deployment. I’d highly encourage you to test the actual expected deployment.

I worked on SQL Server for many years and we have optimized a single monolithic engine on a single node greatly and on a single node a relational database can still beat a nosql database but the scale boundaries stay limited to a single node architecture. With Couchbase Server, the architecture assumes we run across nodes, across network and we optimize operations based on this assumption. Tuning to single node deployment require a different approach.
thanks
-cihan

The result of cb_collectinfo is a zip file which is 150 MB in size. Let us know how can we send it to you for your analysis? Sending you few screenshots, let us know if you able to figure out anything which is not correct.

Screenshot of bucket that we are using in our code:

Data Buckets Screenshot:

Settings:

We are also getting below error:

Service ‘goxdcr’ exited with status 1. Restarting. Messages: MetadataService 2016-07-17T13:44:56.893+05:30 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: ConnectEx tcp: No connection could be made because the target machine actively refused it., num_of_retry=3
MetadataService 2016-07-17T13:44:56.893+05:30 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: ConnectEx tcp: No connection could be made because the target machine actively refused it., num_of_retry=4
RemoteClusterService 2016-07-17T13:44:56.893+05:30 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
Error starting remote cluster service. err=metakv failed for max number of retries = 5
[goport] 2016/07/17 13:44:56 c:/Program Files/Couchbase/Server/bin/goxdcr.exe terminated: exit status 1

@cihangirb and @egrep , any update on this?

Hi,

Cihan is out on vacation and I’m following up on this while he’s out.
I have read the whole email thread.
I understand you have reported several issues in this email thread:

  1. MongoDB gives you 20% more throughput than SQL Server. But CB didn’t bring any performance improvement.
  2. Long migration time
  3. Disk size difference
  4. XDCR error message

I don’t think these issues are related and I would like to focus on issue #1 first.

I’m sorry but I may have to ask some duplicate/clarification questions.

  1. Which version of CB are you using? CE or EE?
  2. I noticed that you only have 20G RAM per node but you set 8G for data and 12G for indexer. Are you using indexing service? I’m afraid there may be a memory pressure.
  3. One thing Cihan asked but I’m not seeing the answer here is the actual workload you are stressing the system. Are you doing read heavy? write heavy? Do you need persistence for writes? For MongoDB, are you using Journaled writes?
  4. For reads, what’s the cache miss ration between CB and MongoDB?
  5. For mongodb, I also would like to understand your deployment architecture. I assume there’s only one shard in your deployment and are you deploying active/replica sets on the same node? How many mongod nodes do you have?
  6. Another thing that can help is cpu utilization comparison between CB and Mongo. Can you please provide these to us?

Let me know if you have any questions.

Thanks,
Qi

@CouchBase2 -

Looking at this code, there is no need to do the serialization and then specify the string Type as generic here, the SDK will handle all serialization internally:

var upsert = bucket.Upsert<cEff>("thekey", new cEff{...});

Additionally, you can take advantage of the async/await methods to leverage the thread pool instead of doing this synchronously, something like this:

var tasks = new List<IOperationResult<cEff>>();
foreach(var x in listOfcEffs)
{
    tasks.Add(bucket.UpsertAsync(x.Id, x);
}
var results = await Task.WhenAll(tasks);

You will likely want to partition listOfcEffs to improve on GC and memory.

-Jeff

Hi,

I’m bit off topic, but I’m curious to know have you definitely decided to move away from relational databases (i.e. SQL Server) and, if so, what’s the rationale?

Thanks,
Marija