Hi All -
We have a concerning issue that has crept up recently. When our CB cluster is under heavy(ish) load, some automation that calls queries via cbq hang indefinitely without error. The queries themselves aren’t complicated, but do often result in disk reads (e.g. larger population of data that isn’t always cache-resident).
Our frustration is that there’s no way the query should actually take that long. When the cluster is under light load, the queries return in a matter of seconds. But when the cluster is under pressure, the query appears to hang indefinitely as we’ve let the process sit for over a day.
So the assumption is that something is timing-out inside of the query engine, but we’re not seeing the error bubble back up to CBQ. We’re trapping for a non-zero error code in our call to cbq, so ideally if it times-out cbq would throw a non-zero error code.
Is there any way to discover more details about what’s failing?