Some questions
Hello,
I am considering using membase instead of cassandra. Now I have some questions regarding membase. I hope somebody can help.
1. Memory => Disk => Memory
Will membase push the data to disk on a first in first out policy or are there request counters of which data is how often used?
If so, how will it deal with a heavy requested data of size from one MB vs. 100 middle requested 10kb data values?
2. Are reads handled from the slave-servers first to relieve the master servers?
Or will the data be delivered from a random node (master/slave, doesn't matter)?
3. More servers => more read and more write throughput
Is it correct, that due vBuckets are used for every key an other server is the responsible master server? So the more servers I add the more concurrent writes I can handle? The more the replication ratio the more reads I can handle?
Hello,
I am considering using membase instead of cassandra. Now I have some questions regarding membase. I hope somebody can help.
1. Memory => Disk => Memory
Will membase push the data to disk on a first in first out policy or are there request counters of which data is how often used?
Currently data eviction is random. LRU will most likely be the case, when we implement an eviction algorithm.
If so, how will it deal with a heavy requested data of size from one MB vs. 100 middle requested 10kb data values?
2. Are reads handled from the slave-servers first to relieve the master servers?
Or will the data be delivered from a random node (master/slave, doesn't matter)?
Membase supports replication of your data. The active data and its replicas are spread throughout the cluster. The active data and the replica are never stored on the same host. When a host containing active data dies the replicas on other hosts in the cluster are promoted as active.
3. More servers => more read and more write throughput
Is it correct, that due vBuckets are used for every key an other server is the responsible master server?
I am sorry I did not understand your question. Can you please rephrase and post. Thanks
So the more servers I add the more concurrent writes I can handle? The more the replication ratio the more reads I can handle?
Replication ratio does not affect how many reads can you handle per second as the data is only read from the active nodes and not from the replicas. (Each server hosts active and replica vBuckets). But, if you have more nodes, you will get better throughput as your data wil be spread across more nodes thus spreading the I/O .
4. Enterprise Edition, are two versions free?
I read that you can two enterprise editions for free. So when added a third one, will I need to get three or one license?
You will need three licenses.
5. Best practice for storing list values / How are values stored on disk
In one scenario I want to log many things on a per user basis simultaneously. The added data is about 100 bytes. There are the following solutions
a) Use one key per user, add the data. But what to do with the 1MB/20MB limit you could/should only store per key?
b) Use a counter for the current value and add a new entry with key for every log, using the user-id + counter-value as key.
c) Use a new key per hour/day or whatever.
How you store data will depend entirely on your usage pattern and the size of your values. Are all the values related to the user needed in the application at the same time. You may want to divide your data as frequently used and not-so frequently used. You should look at http://wiki.membase.org/display/membase/Membase+Best+Practices
If you give me more details about your data and usage, I can work with you to find the best way to design your keys.
What is the best for speed purpose? When adding a new value to one key, how is it stored on the disk? Near to the other values or will I need many random seeks to get all the values? Or will the whole value be deleted and inserted new on disk (I thin that is the way cassandra is doing it).
There is no way to guarantee that a given key will be on the same host as another key. If you are talking about the same key being reset with a new value then it is very likley that the key will live on the same host but there is no guarantee. All the keys are mapped to vBuckets. The vBuckets are mapped to physical machines but that mapping may change as you add or remove servers.
6. How persistent is persistent?
Do you use a kind of write log, maybe one that I can enable for just a specific bucket? Let's say I store a value, get the ok. How sure is it that the value is stored to disk if the computer crashes right after that?
Membase writes the data asynchronously. By the time the asynch disk write fails, the application has already been notified that the "ok". Disk writes are on a best efforts basis and the application is taking some level of risk that the data may actually not be durable.
But it should be noted that if the disk write failure is transient (vs. a disk crash), it will be retried by the server. There is a chance of data loss but it is limited to the chance of a disk crash or power failure while the data is being written to the disk.
You should look into the TAP protocl to observe from the outside, data changes going on within a Membase server.
http://wiki.membase.org/display/membase/TAP+Protocol
Thank you for all your help!
Silas
PS: When I join both posts into one I can't post it due it gets rejected due spam!
Thank you for all your answers!
So the replication vBuckets are only used to leverage high availability? They will not increase the read speed beside the fact that I need more servers and so the read/writes will go to different servers.
Is there a way to know when a dataset was written to disk or at least replicated one time to an other server? So I could wait this time for very critical objects I need to be stored to disk (about 0.001% of all values, but this way I could only use your server and don't need any additional).
Do you have any benchmarks how many iops (read and writes) can be handled per second?
What is besser to be used: AMD Hypertransport 3.0 or Intel QuickPath 6.4GBit interface?
Would it be possible to install membase together on the webserver or will it wrestle about CPU/Ram with IIS (on windows)?
What do you think is the performance impact (if there is one)?
PS: Your best practice link is broken. You need to remove the last space from the link.
Thank you for all your answers!
So the replication vBuckets are only used to leverage high availability? They will not increase the read speed beside the fact that I need more servers and so the read/writes will go to different servers.
yes, the replication vbuckets are used only for high availability.
Is there a way to know when a dataset was written to disk or at least replicated one time to an other server? So I could wait this time for very critical objects I need to be stored to disk (about 0.001% of all values, but this way I could only use your server and don't need any additional).
You can use TAP to capture the stream of reads and writes to/from a node.
Do you have any benchmarks how many iops (read and writes) can be handled per second?
Unfortunately, at this time we do not have performance benchmarks on reads and writes but we have tested that with most of your data in memory, the performance is as good as with memcached.
What is besser to be used: AMD Hypertransport 3.0 or Intel QuickPath 6.4GBit interface?
We don't have data comparing the performance on these two types of servers.
Would it be possible to install membase together on the webserver or will it wrestle about CPU/Ram with IIS (on windows)?
Theoretically, If you have enough RAM and CPU time then it should not matter. However most our customers have installed membase on a separate server than IIS. Also, you may want to keep some design issues in mind if you plan on using Enyim (ASP .NET client) . Several users have seen spike in CPU when they were not disposing the client instance. The MembaseClient has a background thread for the configuration management, and this must be stopped by disposing the client
What do you think is the performance impact (if there is one)?
PS: Your best practice link is broken. You need to remove the last space from the link.
Thanks much for letting us know!
Thank you again for your answers.
The background of my questions is that I try to consolidate everything and want to avoid RDBMs.
But as I understood there are some things I should not only store on membase. Let's say I want to store the orders of my customers. Then I can not lose them. Other things, like some log files are not so critical. If I lose some ... nobody will note.
So for orders I would want a way to be sure that they are logged to disk. But for other data it is not so important.
If you could make some kind of write ahead log for some values (on demand) it would be perfect.
Regarding storage I have the following questions:
When replication is active, will the replicated value be stored on disk on every node or just on the master node?
When can I expect the storage to disk will happen? Only if the memory is full or from time to time? How many milliseconds delay will I have?
You write on your documentation the disk space should be about 130% of the memory size. So I assume you think of membase more like memcache with a hard disk backup?
I was more thinking like: I can put everything in, the software takes care that it is saved, replicated, fast accessible and high available. So by adding more servers I will get more space. I thought of machines with 256 GB RAM and 2-3TB disk space, storing everything in memcache without care about other databases SAN and so on (your software will replicate, takes care about H/A and so on).
Thank you again for your answers.
The background of my questions is that I try to consolidate everything and want to avoid RDBMs.
But as I understood there are some things I should not only store on membase. Let's say I want to store the orders of my customers. Then I can not lose them. Other things, like some log files are not so critical. If I lose some ... nobody will note.
So for orders I would want a way to be sure that they are logged to disk. But for other data it is not so important.
If you could make some kind of write ahead log for some values (on demand) it would be perfect.
We are working on "synchronous write" feature. It allows one to specify that a disk write must happen before a write is acknowledged as succeeding. Until that feature is shipped, there is no way to confirm a write occurred. We will hopefully have this feature in the next 6 mos.
Regarding storage I have the following questions:
When replication is active, will the replicated value be stored on disk on every node or just on the master node?
The replica data is spread across all the nodes in the cluster.
When can I expect the storage to disk will happen? Only if the memory is full or from time to time? How many milliseconds delay will I have?
The storage to disk happens as soon as it can be done. This is an asynchronous request.
You write on your documentation the disk space should be about 130% of the memory size. So I assume you think of membase more like memcache with a hard disk backup?
I was more thinking like: I can put everything in, the software takes care that it is saved, replicated, fast accessible and high available. So by adding more servers I will get more space. I thought of machines with 256 GB RAM and 2-3TB disk space, storing everything in memcache without care about other databases SAN and so on (your software will replicate, takes care about H/A and so on).
Please let me know if you have more questions.
bhawana
Regarding the "synchronous write" feature, I would be very happy to see this is one of the next releases. This is a very big great feature. Then membase can be used as a real alternative to a database. Today I see it more like memcache with a disk backup. When you implement it, maybe you could also make it possible to set a number of replication count that need to received the data before acknowledge the write.
Maybe you could also implement a way to load data from the replicated data-nodes. This could speed up the loading of data.
Is it possible to get rid of the 1MB (I think you have 20MB but you said it could change) data store limit? When just adding data to a list I always need to be sure that it is not more than 1 MB.
Thanks Silas.
What do you mean you have to make sure it is less than 1MB? We support object sizes upto 20MB...is that not large enough for you?
4. Enterprise Edition, are two versions free?
I read that you can two enterprise editions for free. So when added a third one, will I need to get three or one license?
5. Best practice for storing list values / How are values stored on disk
In one scenario I want to log many things on a per user basis simultaneously. The added data is about 100 bytes. There are the following solutions
a) Use one key per user, add the data. But what to do with the 1MB/20MB limit you could/should only store per key?
b) Use a counter for the current value and add a new entry with key for every log, using the user-id + counter-value as key.
c) Use a new key per hour/day or whatever.
What is the best for speed purpose? When adding a new value to one key, how is it stored on the disk? Near to the other values or will I need many random seeks to get all the values? Or will the whole value be deleted and inserted new on disk (I thin that is the way cassandra is doing it).
6. How persistent is persistent?
Do you use a kind of write log, maybe one that I can enable for just a specific bucket? Let's say I store a value, get the ok. How sure is it that the value is stored to disk if the computer crashes right after that?
Thank you for all your help!
Silas
PS: When I join both posts into one I can't post it due it gets rejected due spam!