I want to design a model based on expiry , If I set expiry 30 days , and have 100 millions documents , is there significant additional overhead? Can I use expiry feature without worry?
Generally, you have no worries. When the expiry pager runs, it will write a small record to indicate the document is gone and it’ll later be cleaned up by compaction. That’s out of the critical path for the app. The same thing will also be done if you try to access something expired with the app.
Note that the retention for a while is important if you’re using XDCR.
The expiry pager walks over all of the data anyway and it’s outside the critical path and scheduled low priority. The only thing to keep in mind is your retention period. If you, for instance, have 100M documents and say half expire and the next day you add another 50M, you’ll still be using disk space for 150M, because of the tombstone mentioned.
So the way to think about it is the “overhead” is background walking of the items and space. Other than that, there is no additional overhead. Hope that helps.
cc/ @tai_tran or @drigby to bring in others to review-- I’ve not worked on this part of the system in quite some time