Replication - In Memory or on disk? And can persistence be disabled?
Hi,
I am looking at using Couchbase to implement my cache. I was attracted by its replication/persistence, as I need these for handling session data.
However, I was wondering if buckets are replicated onto other nodes in memory or onto disk? I would presume they are replicated onto disk to ensure maximum RAM availability? I cannot find clarification on this. I definitely don't want my cache capability halfed by a 2GB cache taking up 4GB of RAM!
Also, can persistence be disabled? Potentially looking at porting another project to couchbase, but I would want persistence disabled, but replication enabled.
Thanks,
Chris
Hi Mikew,
Thanks for your reply.
So replication means the amount of RAM needed is:
amount of copies to be kept * projected cache size ?
Thanks,
Chris
You assertion here is generally true. We don't strictly maintain that the amount of replica items to be equal to active items in the cache. In practice this value is usually around 60% active and 40% replica.
Thanks Mike.
Are there any plans to implement storing the replicas on hard disk? Or would this have a significant effect on performance? The way I would think it would work well would be replicas are sent to other couchbase nodes which then save the data to hard disk until the time a node goes down, at which point that data is taken off the hard disk and stored in memory.
I suppose this could cause problems if the replica would take the backup node over its memory allowance, but perhaps there is a way around this?
This would be a great addition as hard disk space is very cheap compared to RAM. At the moment I don't think I can afford to have enough RAM to allow replication.
Thanks,
Chris
I think there is still some confusion here on how we utilize our cache so I'll give a quick overview and then answer your questions. Let's say you had a brand new two node cluster that has no items in it. When you start your application and your cluster begins receiving new items they are first put into the cache. Each item that is received will then be persisted by the active vbucket on the one node and also replicated to a replica vbucket on the other node where it will also be persisted.
Now imagine that you run you application until you reach your cache limit and then stop the application and let your cluster finish all of it's persistence and replication tasks. At this point you will see that all of the active and replica items are in memory and also persisted on disk. If one of your nodes goes down at this point then you can failover and see no performance drop from having to read all of the items that were in the cache from disk. This is why we cache the replica items.
So now we start the application again and write more new items. At this point our disk utilization is growing larger than our memory utilization so the Couchbase cluster will begin evicting both active and replica items that have not been used recently to make room in the cache for the new items. So now we will have some items that live only on disk and some that live both on disk and in memory. At all times Couchbase does everything possible to keep the items that are currently being used in memory and we also try to make sure the active and replica caches have the same items in case a node has to be failed over.
>> Are there any plans to implement storing the replicas on hard disk?
Yes. We already do this. See above.
>> I suppose this could cause problems if the replica would take the backup node over its memory allowance, but perhaps there is a way around this?
We limit the amount of memory that replica items can use. So this shouldn't be an issue.
>> This would be a great addition as hard disk space is very cheap compared to RAM. At the moment I don't think I can afford to have enough RAM to allow replication.
When I run tests on my laptop I only use 512MB and often will have 2GB of disk space used. For the best performance you should have as much memory as possible, but it is not necessary. We recommend that you have enough memory to fit your working set at any given time. For example if you use 4GB of data at your applications peak time then you should have at least 4GB of memory for your cache.
Ahh I see, a useful explanation!
I'm going to give it a test run, and see how it performs on the resources I currently have available.
Thanks,
Chris
1. All items are replicated between nodes. I think what your specifically asking is if we keep replica items in memory and the answer is yes. The reason is that this solves the cold cache issue. So if one of your nodes goes down and you need to failover that node then you won't see a performance drop when the some of the replica items become active items.
2. Persistence cannot be disabled at the moment. We provide a memcached bucket, but this means that you won't have items replicated between nodes.