membase performace question
Hi,
We want to use Membase on our project.
In our project it should support:
1. Total records count 10 - 30 millions
2. Avg record size 50KB
3. Concurrent connections 10K - 20K.
4. We will need to use lock mechanism to block record when one thread want to change it and after record changed delete lock.
Does somebody have similar requirements and test it?
Does somebody have any similar results?
Thanks,
Olexii
Thanks Perry,
But why it should take 2TB memory.
How it is allocate memory or how much memory need for one item with size 10KB, 20KB, 50 KB, etc?
Thanks,
Olexii
My comment about 2TB was assuming that you need your entire dataset to be available at very high performance which requires it to be cached in RAM. 10 million items of size 50k equals about 500GB. I didn't know how large your keys were, so I assumed something around 50 bytes. Membase requires about 150 bytes of RAM per item (for storing metadata). I also assumed that you would want at least 1 replica, which would also need to be cached in RAM for the best performance. That's just over 1TB and then I added in some necessary headroom for queues and such to be run in normal production.
2TB was actually a rounding-up, it's really closer to 1.8. I realize that makes a big difference, I was just using round numbers.
The other factor that can dramatically reduce the amount of RAM required is whether or not you need ALL of your data to live in RAM at all times. Many of our customers require <1ms latency to their entire dataset. However, if you can deal with higher latencies (depends on the disks you're using, usually around 10ms) then you can reduce the amount of RAM and leave some data stored on disk.
You can find out sizing guidelines here: http://www.couchbase.org/products/membase/1-7-beta
Perry
Olexii, I'll let the community respond with any specific experiences.
-From a sizing perspective, you'll want to understand how much of your dataset needs to be cached in RAM. If it's all of it, you'll need ~2TB of RAM (10m items at 50k each) but depending on the performance needs of your application and how much data is in use at any one time it would be possible to go with less.
-When you say 10k-20k concurrent connections, does this refer to incoming "customer" connections to your application or connections to the Membase cluster? I assume it's the former, you would need a very large Web/App tier to get 20k connections to Membase. Each Membase node can support up to 10k connections (these are TCP connections from the clients talking to Membase) and that number can be reliably raised if necessary.
-I would recommend using the CAS operation to handle the locking...it will prevent the application from actually blocking on any request but also prevent one thread from overwriting the data of another.
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!