Do I have to double my server requirement if I'm using ElasticSearch with Couchbase

I'm looking to develop a Large scale Web application.

I am going to be using Couchbase as the main data store primarily because of its ease of use and also because of Couchbase Lite which will be quite instrumental for Mobile later in the product lifecycle.

The app is also going to comprise of a publicly available search engine. For the backend of this I want to use ElasticSearch. However, I'm not sure how this is going to affect my hardware requirements.

I plan to use 4 Xeon-class dedicated servers each with 32Gb of RAM for Couchbase. Do I need to get more servers to run ElasticSearch, or can I run it on the same servers I'm running Couchbase on?!

Thanks.

1 Answer

« Back to question.

Its best to treat ES and CB as two different databases and have them on different machines.

The main thing that makes ES different is since its a Lucene product its writes to the index are immutable. So having SSD is great. As far as specs you don't need as much memory and cpu for ES. I would tell you to really think about what the right number of shards to begin with as you can not change the number of shards later.

Thanks a lot. Do you have any specific guidance on selecting the right number of shards. Based on some information I found using Google, it appears that the default number of shards is 5 and replicas is 1. I'm still trying to get my head around the concept of shards, but let's say I have a 4-node cluster for ElasticSearch, is there some kind of rule I can use to determine how many shards and replicas I should have based on the current [and expected future] number of nodes?!

Let the man him self Shay Banon(Creater of ES) go over Sizing and shards - http://www.elasticsearch.org/videos/big-data-search-and-analytics/

Ultimately you will have to play around whats works best for your use case. So Test and develop a few models and see what works best. *ADVICE - about CB make 4 or less buckets in your couchbase cluster and have one on them as a memcached bucket that store your cache data from ES from repetitive queries and give a TTL of 1 seconds to what your feel that user should get fresh queues.

Thanks a lot. Much appreciated. I will do a lot more research on this to find the best approach.