Couchbase for all (new couchdb involved)

Larry3 · September 28, 2013, 6:30am

I am a bit lost nowadays between the right tool to choose.

I really want to jump in the nosql era.

We are in the late 2013 and bigcouch joined couchdb.
Couchbase is pretty fast.

My thoughts:

Couchdb can preferably be used to all sort of data mining, archiving.

It leans to use disk technology.

Couchbase can preferably be used to store login credentials, calculations summaries and all sorts of datas that need to be retrieved fast.

It leans to use ram technology.

Replication, cluster and auto-healing work as good in the two products, right now.

I want the simplest stack.
Could i only use couchbase + elastic search to make all the deal ? Isn’t it a good solution for data mining too ?
What happens if datas cannot fit in memory ? Say we have 15 teras of datas. We have 4 servers x 128go ram each. How will it be handled on the long run ? Is there a disk access possibility/strategy to rest the ram ?(real hybrid approach : some commands would only rely on disk whereas other on memory).

All the blogs and forums rely on the former version of couchdb (before cloudant’s bigcouch made the jump) and so, are totally outdated.

Please, a bit of “light” would help me

Thanks,

househippo · September 28, 2013, 4:50pm

Couchbase is a great solution for Fast data. The reason it is so fast is b/c it stores all the keys in memory and tries to store as much data in memory also. Lets say you set up all 4 servers with no replicas and 100GB dedicated to 1 bucket, a working set of 80% of memory = 320GB. So that is 2.0% (320GB / 15TB) of data in memory.

Larry3 · September 28, 2013, 5:23pm

So,
With 2.0, we are going nowhere, right ?

What is the time needed to seek on disk? What if the requested datas are never the same because of multiple users and not able to keep up server ram ?

Say in a few months we will have 45tB, we cannot afford 50tB of ram !!

househippo · September 28, 2013, 10:49pm

so having four 3TB HD setup as a raid 1+0 or (10) gives you 4xRead/2Xwrite. A good spinning disk HD is about 120MB/sec. so 480MB/sec Read , 240MB/sec write.
No matter if you use a NoSQL like wide column (Cassandra) or key=>value (Couchbase)
realtime Querying off of 15TB of data will probably require 2-5x that size in indexing. Using ElasticSearch on 15TB is not easy either.

If your rate of data doubling is months not years you need to look into using hadoop.
Hortonworks, Cloudera, and MapR are the big players. From the map/reduce job data then you can put it in Hbase, Cassandra, or Couchbase then query.

Larry3 · September 29, 2013, 9:16am

Nice,

thanks for the directions.

What about couchdb + elasticsearch ?
It seems to be a simpler stack…

What do you think about it ?

househippo · September 29, 2013, 3:09pm

Couchdb + Elasticsearch are a great combo. The problem that you are going is clustering couchdb and speed
CLUSTERING -
You can use Couch Lounge http://guide.couchdb.org/draft/clustering.html to cluster, but you are stuck in between transition of Big Couchdb into Couchdb. So soon you will not need couch Lounge.
Speed - So with CouchDB and ElasticSearch is great b/c you write to any CouchDB node and it magically appears in ElasticSearch for easy querying. But getting your data from ElasticSearch can be very painful(ie slow). Why slow? ElasticSearch is a GREAT Indexing engine not a fast database. The better way to do it is Couchbase + ElasticSearch. So query ElasticSearch and only bring back the KEYS of the documents which is not very painful, then do a GETBULK(“keys from ES here”) into Couchbase and get the data in a millisecond or faster.

Larry3 · September 29, 2013, 3:58pm

Yes it may be a better choice performance wise, but we reach the same problem as before : ram allocation for big datas <-> not enough ram to handle the whole “package”.

Don’t we ?

househippo · September 30, 2013, 12:21am

It depends. My next question would be out of the 15 TB how many documents will you have? So 15TB / (’# of documents’) = YY(GB) or YY(KB) per document.
Does all parts of the data need to me searchable?
How fast do you want an response back from a query? EX. 300ms
How many queries/sec at peak will you be doing? Ex. 300/sec

Larry3 · September 30, 2013, 5:55am

Hmm,

Out of 45TB, we can say that we will need 15TB approx. This quantity will still increase but less and less. Since other documents are not to be retrieved quickly. For those, couchdb applies i think.
You can see the relation as the inverse of exponential function : decreasing with time. The more the documents, the less the ratio “important docs/overall docs”.

1500 req/sec and lower latency as possible :-). The charge is distributed over 10 servers so it is 150 req/sec/server but 1500 req/sec overall.
Approximately of course. And increasing.

househippo · October 1, 2013, 3:45am

With your Request only being 1500/sec you can go with CouchDB + ElasticSearch. I would recommend you use couchbase as your caching layer as a memecached replacement. To speed up ElasticSearch watch this video http://www.elasticsearch.org/videos/scaling-massive-elasticsearch-clusters/. Good luck with your project.

Larry3 · October 1, 2013, 11:55am

Thanks,

I will look into it !

Bye!

Topic		Replies	Views
How much data can i store in couchbase? Couchbase Server n1ql	1	1040	September 1, 2019
Do I have to double my server requirement if I'm using ElasticSearch with Couchbase Couchbase Server	4	1861	September 23, 2013
High concurrency? Couchbase Server	4	2673	September 17, 2016
Storing large amounts of documents with huge disk space and low RAM Couchbase Server java , n1ql	13	3958	July 7, 2017
Using couchbase as a caching layer Couchbase Server java	7	5439	December 14, 2020

Couchbase for all (new couchdb involved)

Related topics