We have an older version of Couchbase on a Windows Server:
Version: 4.6.5-4742 Enterprise Edition (build-4742)
We are saving our app session there and the connector is old so that is why we haven’t been able to Upgrade to a newer version.
We had to upgrade recently our infrastructure saldy the people mainting the previous infraestructure quit and we had no metrics of how the Previous Couchbase Server was performing and what IOPs limits it had.
So when we started to hit big loads we started to get a lot of issues with our session and our issues not being able to login or being kicked. I haven’t worked much with Couchbase before but reading the limited App logs it seemed the issue was Couchbase. I started to read Couchbase logs but I couldn’t see any problems per se, I just noticed that the system seem to be compacting constantly. And when we were checking our metrics we saw the Hard Drives of the Couchbase Server running at its limite of IOPs. It was so bad that apparently Couchbase couldn’t write some of its logs at times, because there were some time periods where It didn’t log any misses.
So we decided to double its IOPs limit since we had no information of how it was performing in the past and what limits it had, but we knew that for a long while the processes were going to be write-heavy since we flushed several buckets that had gigabytes of data when we migrated. And additionally we moved the compaction time to run during the night.
After that we haven’t had the issue yet. But Couchbase seems to be hitting the IOPs limit again. What intrigues me is that I never saw any error anywhere, yet the system couldn’t find the user sessions in couchbase.
Does this behavior sound reasonable to you? Also could there be any bad consequences to having the compaction run only during the night? I am sorry if this questions seem rather basica I am really not usually working in Couchbase tunning and maintenace so I am a bit at a loss here.