Cbq-based queries die/hang indefinitely

Hi All -

We have a concerning issue that has crept up recently. When our CB cluster is under heavy(ish) load, some automation that calls queries via cbq hang indefinitely without error. The queries themselves aren’t complicated, but do often result in disk reads (e.g. larger population of data that isn’t always cache-resident).

Our frustration is that there’s no way the query should actually take that long. When the cluster is under light load, the queries return in a matter of seconds. But when the cluster is under pressure, the query appears to hang indefinitely as we’ve let the process sit for over a day.

So the assumption is that something is timing-out inside of the query engine, but we’re not seeing the error bubble back up to CBQ. We’re trapping for a non-zero error code in our call to cbq, so ideally if it times-out cbq would throw a non-zero error code.

Is there any way to discover more details about what’s failing?


Also, we’ve monitored the query.log file on the node that we’re executing the query against and haven’t seen anything interesting there.

What is CB version? when hang/die is the query is doing fetch(non covered query)? At that time can u do covered query and see if that works.

Couchbase 4.5. Looks like we were looking in the wrong spot. We looked at the indexer log instead of the query log, and saw some errors. We ended up killing the indexer on all nodes, and everything is working again. If the issue creeps up again, I’ll send out an update.