Like many others I accumulate data over time.
Let’s say that I want to delete all logs older than 6 months on the start of every month.
One option is to keep every log in a bucket specific for that month, e.g. 2014May. On the beginning of November 2014, I just flush/delete the entire bucket, will all of its millions documents.
Best practice instructs me to keep bucket count to bare minimum so that this option is not recommended.
What should I do ?
Hi @itay, why not keep all data in a single bucket and add an expiry date to your data that will expire them at a certain time.
Well that sounds cool
I wasn’t aware of it. I thought it was only related to expiry from memory.
If we have to use the expiry date then it should be set at every document level. What if we want to change the expiry time from 6 months to 9 months, we end up updating all the documents again. Wouldn’t it cause performance issues while trying to delete millions of documents?
Compaction is run in background, so it shouldn’t affect frontend traffic. The disk IO could be contended for, but otherwise it shouldn’t cause performance issues. Compaction is done in segments as well.