I have a write-heavy 5.x cluster, where compaction runs pretty much constantly 24-7. The compaction configuration is the default one - 30% fragmentation, 1 compactor thread.
When it runs constantly, it takes some toll on the cluster, as can be seen by some write drops every now and then, and increased fetch latency. I wanted to try shutting compaction off at peak hours, so I first tried shutting it off for a single hour (there’s enough disk space to shut compaction off for about 8 hours).
During that hour, the cluster showed great performance. However when the hour ended, there was a peak in drops and latency, worse than the “usual” with a constantly running compaction.
I’m wondering what might be the cause and how I might be able to fix it - I realize there’s “more work” for the compaction process after not having run for an hour, but the fragmentation isn’t off the charts at that time yet, and in any case there’s only a single compactor thread working so why should the amount of fragmentation matter for the performance in this case? I’d assume it’ll simply take longer to run the compaction - and overall be more efficient as less compaction iterations actually took place.
I’d like to be able to disable compaction at times without having to pay that extra peak when it comes back up. Any suggestions will be much appreciated.