N1ql: Out of memory error. 8093 becomes unresponsive

Hi, we are running Couch Base 4.6.2 on an AWS instance of Ubuntu 14.04. The instance has 4 CPU’s and 16GB of Memory and it runs Data and Query service on the cluster. Another cluster is 2 CPU 8 GB Ram for Indexing. We are using Memory-Optimized Global Secondary Indexes and have 8 secondary indexes and one primary index. Record count is nearly 600 (fairly less).

This problem occurs when we try to run concurrent queries to fetch data. At first, the queries get really slow then we get this error “An unknown N1QL error occurred. This is usually related to an out-of-memory condition.”

We run like 5 queries at a time to display data in dashboard. Most of are queries are similar and they look like this:

statement: ‘SELECT records.*, meta(records).id AS _id, _sync.rev as _rev, null as _sync FROM ri-qa records WHERE type = ‘_ri_workflow’ AND meta(records).id NOT LIKE ‘_sync:%’’

statement: ‘SELECT records.*, meta(records).id AS _id, _sync.rev as _rev, null as _sync FROM ri-qa records WHERE type = ‘_ri_user’ AND role IN [ ‘SUPERVISOR’, ‘OPERATOR’ ] AND meta(records).id NOT LIKE ‘_sync:%’’

Sometimes it’s fine we can run 5 of these queries at once but when we try to refresh once or twice N1ql crashes. Only solution is to reboot the Data server. And sometime it gives empty responses, it’s weird.

We tried to query through the Admin GUI it gives this error:
{ “status”: “\r\n504 Gateway Time-out\r\n\r\n504 Gateway Time-out\r\nnginx/1.4.6 (Ubuntu)\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n”, “status_detail”: “The query workbench only supports queries running for 300 seconds. Use cbq from the command-line for longer running queries. Certain DML queries, such as index creation, will continue in the background despite the user interface timeout.”}

We also checked the port 8093 and it fails to connect. It has become really difficult to find the root cause of this issue. If someone helps us to figure out this issue or help us debug this, we would appreciate it a lot.

How much memory you allocated for data node? Check the top and see how much memory cbq-engine process is taking. What are queries you run and index definitions. You can add another node and move query service to that node and see.

  1. change the NOT LIKE predicate to NOT LIKE “\_sync:%”, use pretty=false
  2. Can you run your queries single threaded for some time - 10 mins?
  3. What’s the plan for these queries?
  4. Observe the memory usage on the query node.
  5. I have the enterprise license, please call support.
Requires \\ 
NOT LIKE "\\_sync:%"

Data node has 10GB allocated. Upgraded the indexing node to 16GB. The thing we don’t understand here is there are hardly 400 documents in the database and N1QL is hanging up. Adding another node for query service requires us to blow away the current data node.

  1. We will change and check. The same pattern on different application with 3 services and sync gateway running on the same node having 350k+ records on UAT never has such issues.
  2. We did that. There are no hang up’s. But at few places it is mandatory to run multiple queries. And in production users will use the app simultaneously.
  3. We are calling only indexing now. Still see issues.
  4. There is no query node installed. Data and query are on the same node.
  5. We reached out to support. Support pointed the configuration issue of having single node. Now we have two nodes and sync gateway running on a different node.

can you please send just the support Case #? Thanks.

Hey Keshav. The support ticket no is #18916.we have uploaded the logs as well.