Membase Blocking on Key Eviction?
We've been using Memcached for a while and recently started testing Membase in AWS. We're testing a single instance of Membase 1.6.0 on a large EC2 instance with 5GB RAM, 750GB disk (Linux FC8).
We've noticed that SQLite seems to block on eviction purges on an hourly basis when expiryPagerSleeptime wakes up. Although this was expected (because SQLite uses database level locking), I didn't expect that Membase would block as well.
In this case, it seems that while SQLite is deleting old keys, Membase operations per second fall to zero or near zero for several minutes. After the eviction process has finished, the Membase server quickly recovers. I would have anticipated that reads from Membase RAM would still proceed while SQLite was locked but this doesn't seem to be the case.
My impression from the docs was that Membase was asynchronous and would continue to serve reads from RAM. I would appreciate any help or suggestions to prevent Membase from blocking on key evictions. This is a serious issue for us because it seems to take about 4 minutes for this eviction process to finish and for the backlog in the disk queue to clear. That means every hour, Membase is effectively offline for 4 minutes.
EDIT: I should also mention that this happens once the data is larger than RAM (and it's increasing size on the disk). We didn't notice any issues with key eviction when the data was just in RAM (presumably because key eviction in RAM happens too quickly to be noticable.)
No, the process wasn't swapping. At the time, we had loaded about 3 million objects into the single instance. Each object is about 15kB in size. We had been in the process of loading 22 million objects in.
If I understand you correctly, the problem we're having would occur frequently once the size of the data is much greater than RAM. Asssuming that only around 300,000 of our 22 million objects could fit in RAM at any time (about 1%), there would be a high likelihood of having to go to disk for any query. If most of the queries were going to disk at the same time while the key expiry process is running, then Membase would be inaccessible. Is this correct?
What are the possible hacks / fixes that we can use while you're reviewing the situation? I've thought about two possibilties:
1) Switch from SQLite to BerkeleyDB: Given that SQLite locks the database, we might benefit from finer-grained locking (page level) that BDB has. With less pages of the DB locking at any time, Membase might not completely freeze up.
2) Don't do key expiry every hour: Is there a way to prevent the key expiry process from waking up and running every hour? We have sufficient disk space to accumulate about a month of data before we have to purge old keys, we could take advantage of Membase's clustering to rotate out servers from the cluster to purge them and then have them rejoin once cleaned.
What do you think about #1 and #2? Would you recommend any other solutions?
Thank you.
In my initial tests, I didn't find BDB to be significantly faster than SQLite. The issue isn't so much about the DB being blocked, it's about your connection being blocked waiting for disk. Disks requests about 1,000 times slower than memory in testing. You get seek bound fast.
The expiry pager sleep time is configurable in the low end, but that may not be exposed to you. Most importantly to me in the short term is to find out under what circumstances it won't be very fast. It actually does quite little other than flip through memory.
We recompliled with BDB but it still suffers from the same problem. At the same time every hour, Membase still is offline for four minutes. The way you answered and suggested a patch was coming suggested that you've been able to reproduce this issue. Is that the case? If not, I'll give you more details to help you do so.
I don't think that our app is putting too much load on the disk. We tried reducing the load but that didn't seem to matter. Even with 150 ops / sec and hardly any disk fetches (~3 / sec) coming from our app, Membase still freezes. During this time when Membase is frozen, we do see disk utilization spike to 100%, and the disk queue size grow.
Does the expiry process clean intermittently when there is no other work being done, or does it wake up and run at 100% until finished? If it isn't throttled, do you think that might work? We would appreciate working with you guys to solve this.
EDIT: this might be a silly question (as I don't understand completely how Membase works) but I would assume that there is a timestamp that the system searches to expire old keys. Is it possible that this field isn't indexed, or the DB isn't choosing the correct index? When I thought about it more, the 100% disk utilization felt a bit like a big table scan. I know from experience in mysql that sometimes the DB doesn't choose the right index (and you have to force it). Could that be what's happening here? Is there a way to confirm that there is an index on that field and whether the DB is choosing the correct index?
We recompliled with BDB but it still suffers from the same problem. At the same time every hour, Membase still is offline for four minutes. The way you answered and suggested a patch was coming suggested that you've been able to reproduce this issue. Is that the case? If not, I'll give you more details to help you do so.
Well, specifically I've not seen it do that other than in cases where the process is paged out.
How did you decide this is expiry pager running? Do you actually see that running on the dispatcher for this bucket? Can you show timing stats?
I don't think that our app is putting too much load on the disk. We tried reducing the load but that didn't seem to matter. Even with 150 ops / sec and hardly any disk fetches (~3 / sec) coming from our app, Membase still freezes. During this time when Membase is frozen, we do see disk utilization spike to 100%, and the disk queue size grow.
This is interesting. The expiry pager doesn't do any disk IO itself, but may cause disk IO after it completes. If you're scheduling background fetches during this, they'll be blocked (but no other operations will be). We're planning to do some testing with parallel disk access, but I don't expect that to be ready within the next week.
Does the expiry process clean intermittently when there is no other work being done, or does it wake up and run at 100% until finished? If it isn't throttled, do you think that might work? We would appreciate working with you guys to solve this.
Sort of -- it's a low priority task.
EDIT: this might be a silly question (as I don't understand completely how Membase works) but I would assume that there is a timestamp that the system searches to expire old keys. Is it possible that this field isn't indexed, or the DB isn't choosing the correct index? When I thought about it more, the 100% disk utilization felt a bit like a big table scan. I know from experience in mysql that sometimes the DB doesn't choose the right index (and you have to force it). Could that be what's happening here? Is there a way to confirm that there is an index on that field and whether the DB is choosing the correct index?
There is no disk access during expiry pager. There is no index on expiry -- or key for that matter. The only on-disk index is by rowid -- basically unrelated to any user data. Everything else is in memory.
(this forum reimplements email kind of poorly, so this is my best approximation of quoting)
Four minutes is a rather long time there. How many items do you have? Is the process swapping?
Items resident in memory will be served quickly even while other requests are blocked for disk reads. The problem is that, statistically, it's very common to have all resources waiting for the slowest part. You very likely have all of your connections tied up waiting for a few keys to come back from disk.
We're hoping to make it a bit easier to keep clients from getting themselves into such situations very shortly.