How can I tell couchbase to purge old data that has not expired yet?
I need to clean out my data set and I want to get rid of old data that has not expired yet.
Thanks for the reply
I don't want to delete all my items
My items have expiration date - is there any way to tell couchbase to delete items that are scheduled to expire after a certain date?
There isn't a command that allows you to do this in Couchbase at the moment.
This really needs to happen - I can't just keep throwing hardware at couchbase.
I understand that things are removed then they expire but there has to be another way to control the # of items in couchbase.
What is your use case for needing a command like this? Can you also explain what hardware you are adding to your cluster? And why did you set such a high expiration time in the first place?
We have a very diverse set of data that we are caching. Our document set is growing very large because of increased traffic. We want to delete items in couch base that are x days old or ideally haven't been accessed in x days. Currently we have to keep throwing more hardware at couchbase in order to avoid OOM errors and high cache miss ratios.
If limited the amount of disk storage for each node, would this help us? Would couchbase then evict data that was old and inactive but not expired?
Your old an inactive data will be evicted to disk if it is not recently used. We currently require that every item stores its meta data in memory and the size of the meta data for an evicted item is 72 bytes + the key size so the number of items you have does have permanent memory overhead. When the memory usage approaches 80% we evict 15% of the items to disk and this makes room for newly used items. Can you describe your workload and cluster setup? How many items? Number of get/sets per second? Memory allocated for your bucket? Number of servers and number of replicas?
number of items 65,000,000
overhead (bytes) 150
average key size (bytes) 200
average object size (bytes) 1024
data needed (GiB) 83.18
working set (%) 30.00%
working set memory needed (GiB) 24.95
Nodes 8
Sizes per Node 4GB
Total 32GB
Replicas 1
Avg # of ops/sec - 3k
Avg # of gets/sec 2.7k
It looks like your setup is ok and if your working set does fit into memory then the cache misses should only happen when newly used data is cycling into Couchbase. Also, removing old items is unlikely to vastly improve performance since the memory reclaimed would be relatively small. The only way to improve performance here would be to add more memory, but as you mentioned your working set already fits into memory so this also probably isn't necessary at the moment.
Also, how high is your cache miss ratio?
Cache miss ratio is approx 3
We are very frustrated that we need to schedule downtime every time we have to re-balance - either after adding or removing a node. All our boxes have SSD hard drives and the process still takes over 30 mins - even after that there is still a huge disk write queue.
It would be awesome if couchbase 1.8.1 had a community edition since it fixes so many re-balancing issues.
When is that planned?
Without an expiration time you will have to do this manually by deleting all unwanted items. If you don't need any of the data in your cluster then you can do a flush_all.