disk free space recovery / garbage collection
I'm thinking about moving some stuff we currently store on S3 to a membase setup because we already use memcache to I can pretty easily basically take a layer out of my architecture. However I really need to understand how disk space is used and freed by membase.
Specifically I understand that membase uses a lazy garbage collector and that objects only get removed when they get accessed after their expire time. So if I have a lot of orphan objects, things that got created, have since expired but nobody is touching how does membase recover that disk space? Do I have do some sort of patrol-read of object to force GC? If membase is running out of disk space does it do active GC?
It's not at all clear from the docs how this works.
That makes perfect sense - if the scan is in memory does it pick up objects that are only on disk or is the meta data always in memory?
Metadata is always in memory so the software can pick up all objects whether they are cached in memory or only on disk.
Perry
I just had another question to add to this, do the Delete and Flush commands actively remove items from memory and disk or are they in effect considered expired and cleaned up asynchronously as you have described above?
Hi,
Delete and Flush implementations both have separate implementation pathways than the expiration handling, so they're not handled like asynchronously expired items. In-memory metadata for the Deleted item and bucket Flush All, for example, is handled immediately (synchronously), so clients will see a consistent view of items.
There are queues, though, for persistence and replication, so the actual disk writes and updates of remote replica nodes may happen later, asynchronously, as the queues are processed.
Cheers,
Steve
Your understanding of "lazy expiration" is how memcached works, and similar to how Membase works. On top of that, we've added an hourly process that does a (very quick) scan of memory and identifies any expired objects. It then asychronously removes them from both memory and disk.
With the current setup (using sqlite underneath), the disk space won't actually shrink, but it will create "holes" in the database that can then be filled by new data, so you shouldn't see the disk space continuously grow.
Does that make sense?
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!