System Limitations?

Mibble · June 23, 2013, 11:08pm

I am wanting to know if there are system limitations on this, ie:
maximum number of files, currently I need to know if more than 11 million
Maximum storage size, can this go more than 64 TB?
Is this similar to robocopy (this put here as most know what robocopy is)
if this will do great! where do I buy!
if not know of a product which will do this?

ChrisP · July 3, 2013, 4:04pm

I am trying out CouchBase for a large project requiring trillions of records. To test it I created a bucket and tried to load it with 100 million records to see how it performs. This is on Windows with a 2GB memory allocation (so I can reach memory exhaustion sooner).

When I got to 20million records it gave me a “Hard Out Of Memory Error. Bucket “…” on node 127.0.0.1 is full. All memory allocated to this bucket is used for metadata.”

Looking into this I found out, "This memory quota is sometimes referred to as >the amount of cache memory. That amount of memory will always store the index to the entire working set. By doing so, we ensure most items are quickly fetched and checks for the existence of items is always fast."
http://www.couchbase.com/docs/membase-manual-1.7/membase-architecture-diskstorage.html

In this case my fixed length 20bytes key (SHA1) takes up 100bytes as an index. That means I would be limited to approx 150m records per EC2 large instance.

Can anyone confirm or deny this as I am really interested in the product but this seems like a massive limitation to scaling?

amuradyan · July 9, 2013, 7:53pm

I am very interested in getting answer to this question too.

My calculations seem to be more sever then ChrisP’s. Taking EC2 Large instance as a base and using the calculation provided in the CB2 manual I get the following:

Preliminary data:

EC2 Large instance memory 7.5GB and only up to some 5.5 GB may be dedicated to a bucket. So we use 5.5GB.
Suppose we don’t have replica (no_of_copies = 1 + number_of_replicas (0) = 1).
Document Key length - 20b (ID_size=20).
Meta data is 120b (This is the space that Couchbase needs to keep metadata per document. It is 120 bytes. All the documents and their metadata need to live in memory at all times and take no more than 50% of memory dedicated to a bucket.).
Suppose intended number of documents is 100M.
We even do not take into account the total documents disk size

(documents_num) * (metadata_per_document + ID_size) * (no_of_copies)

100,000,000 * (120+20) * 1 = 14,000,000,000b = 13GB
13GB * 2 (50% of memory) = 26GB (memory needed to have 100M bucket operating)
26GB / 5.5 = 4.73 = 5 EC2 Large instance

And if we want to have a replica than the number is duplicated.

These calculations are very raw. Using right more complicated way will not make it significantly different.

If I am right (I hope somebody will make me happy and and prove that I am wrong) that will cost a lot .

tgrall · September 9, 2013, 5:58pm

Hello,

I have to answer your question in two ways…

First of all about the limitations:
There’s no technical issue in Couchbase in handling 64TB of data, or more.
As you may know Couchbase is a completely shared-nothing architecture with no SPOF where all the data needs to be stored. That said you have to keep in mind that the current release architecture (2.1.x and older one) requires that all keys reside in memory (distributed on all the nodes). Even if this is something we are planning to improve it does put some hardware limitation on the amount of RAM required across the entire cluster while storing very large numbers of items. This is why we are we are usually viewing Couchbase in interactive, online and operation database more than analytics/batch processing or purely archival store.

Secondly about the use case itself:
Based on your question it does not look to me that Couchbase is the good solution to handle your requirement. For me you are looking for a tool to move file… not somthing to store data. (or I do not get why you are taking about robocopy)

Regards
Tug
@tgrall