Subscriber not consuming all the data from Observable

vsr1 · January 31, 2018, 11:04pm

You can’t use projection alias in the HAVING clause (In ORDER BY is okay), So try this.

SELECT m.token,  ids
FROM temp m
WHERE m.appId='foo'
AND array_count(m.subscriptions) > 0
AND m.token > "0"
GROUP BY m.token
LETTING ids = ARRAY_AGG(meta(m).id)
having ARRAY_LENGTH(ids) >1
limit 10;

As this aggregate first it needs to complete aggregate (i.e. collect all ids).

If you want to eliminate them you can use like below

   SELECT m.token,  ids
    FROM temp m
    WHERE m.appId='foo'
    AND array_count(m.subscriptions) > 0
    AND m.token > "0"
    GROUP BY m.token
    LETTING ids = ARRAY_AGG(meta(m).id)
    having ARRAY_LENGTH(ids)  BETWEEN 1 AND 100
    limit 10;

OR

You can slice ids[0: LEAST(100,ARRAY_LENGTH(ids))]

   SELECT m.token,  ids[0:LEAST(100,ARRAY_LENGTH(ids))] AS ids
    FROM temp m
    WHERE m.appId='foo'
    AND array_count(m.subscriptions) > 0
    AND m.token > "0"
    GROUP BY m.token
    LETTING ids = ARRAY_AGG(meta(m).id)
    having ARRAY_LENGTH(ids) >1
    limit 10;

k_reid · February 8, 2018, 6:01pm

Hi @vsr1,

I’ve running the query below in cbq for testing but I am getting an error message. This was working fine last week:

select count(*) 
FROM (
SELECT m.token,ids FROM mobile m WHERE m.appId='foo'
AND array_count(m.subscriptions) > 0
GROUP BY m.token
LETTING ids = ARRAY_AGG(meta(m).id) 
HAVING ARRAY_LENGTH(ids) > 1) as s;

But now I see this in cbq:
N1QL: Query nodes not responding

And this in the errors.log:

[ns_server:error,2018-02-08T17:54:30.458Z,ns_1@archecb-ch2h-01s.foo.net:<0.32408.5471>:janitor_agent:query_states_details:211]Failed to query vbucket states from some nodes:
[{'ns_1@archecb-ch2h-03s.foo.net',timeout}]
[ns_server:error,2018-02-08T17:54:35.965Z,ns_1@archecb-ch2h-01s.foo.net:<0.31451.5471>:janitor_agent:query_states_details:211]Failed to query vbucket states from some nodes:
[{'ns_1@archecb-ch2h-03s.foo.net',timeout}]
[ns_server:error,2018-02-08T17:54:38.227Z,ns_1@archecb-ch2h-01s.foo.net:index_status_keeper_worker<0.1405.0>:index_rest:get_json:42]Request to (indexer) http://127.0.0.1:9102/getIndexStatus failed: {error,
                                                                   timeout}
[ns_server:error,2018-02-08T17:54:40.970Z,ns_1@archecb-ch2h-01s.foo.net:<0.30539.5471>:janitor_agent:query_states_details:211]Failed to query vbucket states from some nodes:
[{'ns_1@archecb-ch2h-03s.foo.net',timeout}]

Any suggestions would be helpful.

Thanks,

-K

k_reid · February 12, 2018, 5:04pm

Hello @vsr1 @daschl @ingenthr,

Can one of you help me out with the error above? Things were working fine and then one of the nodes went down and I had to do a rebalance. The node is back up but since then, the query above is not responding in cbq or in my java program. I see this in the logs tab on the console:

Service 'query' exited with status 1. Restarting. Messages: 2018-02-12T16:54:14.583+00:00 [Info] index 10668927296735143595 has 1 replicas
2018-02-12T16:54:14.583+00:00 [Info] index 6418219675364789612 has 1 replicas
2018-02-12T16:54:14.583+00:00 [Info] index 9370900211264850920 has 1 replicas
2018-02-12T16:54:14.583+00:00 [Info] client load stats {"9370900211264850920": 1.17152253575e+11}
[goport] 2018/02/12 16:54:31 /opt/couchbase/bin/cbq-engine terminated: signal: killed	ns_log 000	ns_1@archecb-ch2h-04s.capsps.comcast.net	4:54:31 PM Mon Feb 12, 2018
Service 'query' exited with status 1. Restarting. Messages: 2018-02-11T20:10:37.742+00:00 [Info] index 15647356407600608026 has 1 replicas
2018-02-11T20:10:37.742+00:00 [Info] index 11651197799130783944 has 1 replicas
2018-02-11T20:10:37.742+00:00 [Info] index 10668927296735143595 has 1 replicas
2018-02-11T20:10:37.742+00:00 [Info] client load stats {"9370900211264850920": 1.07294478189e+11}
[goport] 2018/02/11 20:11:15 /opt/couchbase/bin/cbq-engine terminated: signal: killed	ns_log 000	ns_1@archecb-ch2h-01s.capsps.comcast.net	8:11:15 PM Sun Feb 11, 2018
Service 'query' exited with status 1. Restarting. Messages: 2018-02-11T20:03:23.533+00:00 [Info] index 9370900211264850920 has 1 replicas
2018-02-11T20:03:23.533+00:00 [Info] index 10326453561398343091 has 1 replicas
2018-02-11T20:03:23.533+00:00 [Info] index 3517955350158520960 has 1 replicas
2018-02-11T20:03:23.536+00:00 [Info] client load stats {"9370900211264850920": 1.72848254382e+11}
[goport] 2018/02/11 20:04:02 /opt/couchbase/bin/cbq-engine terminated: signal: killed	ns_log 000	ns_1@archecb-ch2h-02s.capsps.comcast.net	8:04:02 PM Sun Feb 11, 2018
Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@archecb-ch2h-02s.capsps.comcast.net	8:03:59 PM Sun Feb 11, 2018

Your help is much appreciated.

-K

marcog · February 12, 2018, 5:40pm

Is there anything relevant in the query log?

k_reid · February 12, 2018, 7:17pm

Hello @marcog,

I do see this in the query.log:

2018-02-12T17:29:33.991+00:00 [Error] [GsiScanClient:"fooserver.foo.net:9101"] Range(04a29540-0e1e-4757-a77f-971b4d729c4b) response failed `Index scan timed out`

What would be the quickest way to resolve this? I have the liberty to restart/delete anything I need.

Thanks,

K

marcog · February 12, 2018, 7:53pm

Hi @k_reid,
this is not what I was looking for. ns_server is indicating that something killed the query node, so was hoping the query node log would provide more clues?
One thing that comes to mind is that the OOM manager might have decided to kill the query node because of lack of memory.
Let me know if you find anything to the effect that the query node has been killed.

k_reid · February 12, 2018, 8:52pm

Sorry I don’t see anything useful. It’s like finding a needle in a haystack. Can you provide anything specific to look for? I just ran the query again and see this in one of the logs:

2018-02-12T20:47:10.860+00:00 [Info] GSIC[default/mobile-1518468250194795633] logstats "mobile" {"gsi_scan_count":1,"gsi_scan_duration":132006654493,"gsi_throttle
_duration":13336911958,"gsi_prime_duration":40261507,"gsi_blocked_duration":159444330417,"gsi_totalbackfills":0}
2018-02-12T20:48:10.864+00:00 [Info] connected with 4 indexers
2018-02-12T20:48:10.864+00:00 [Info] index 16317699630974350819 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 11651197799130783944 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 10668927296735143595 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 6418219675364789612 has 2 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 9370900211264850920 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 15647356407600608026 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 10326453561398343091 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 3517955350158520960 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 14742960093105506384 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 8540189097684174466 has 2 replicas
2018-02-12T20:48:10.864+00:00 [Info] index 2107522573269552969 has 1 replicas
2018-02-12T20:48:10.864+00:00 [Info] client load stats {"9370900211264850920": 1.32006529682e+11}
2018-02-12T20:49:10.859+00:00 [Info] connected with 4 indexers
2018-02-12T20:49:10.859+00:00 [Info] index 8540189097684174466 has 2 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 2107522573269552969 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 10326453561398343091 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 3517955350158520960 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 14742960093105506384 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 6418219675364789612 has 2 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 9370900211264850920 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 15647356407600608026 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 16317699630974350819 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 11651197799130783944 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] index 10668927296735143595 has 1 replicas
2018-02-12T20:49:10.859+00:00 [Info] client load stats {"9370900211264850920": 1.32006529682e+11}
[goport] 2018/02/12 20:49:38 /opt/couchbase/bin/cbq-engine terminated: signal: killed
2018-02-12T20:49:38.682+00:00 [Info] GSI client: removing old file /tmp/scan-backfill15552228831973 last-modified @ 2018-02-12 17:15:30.457754387 +0000 UTC
_time=2018-02-12T20:49:38.694+00:00 _level=INFO _msg= Initialization of cbauth succeeded
_time=2018-02-12T20:49:38.783+00:00 _level=INFO _msg=New store created with url http://127.0.0.1:8091
_time=2018-02-12T20:49:38.784+00:00 _level=INFO _msg=pollEOF: About to start stdin polling
_time=2018-02-12T20:49:38.791+00:00 _level=INFO _msg=cbq-engine started request-cap=1024 request-size-cap=67108864 max-concurrency=4 loglevel=INFO servicers=32 plus-servicers=128 pipeline-cap=512 version=1.6.0 datastore=http://127.0.0.1:8091 pipeline-batch=16 timeout=0
_time=2018-02-12T20:49:38.792+00:00 _level=INFO _msg=HttpEndpoint: Listen Address=[::]:8093

marcog · February 12, 2018, 9:06pm

Yes the log clearly shows that the n1ql service is being killed by an external agent.
If you have a look at your OS log (sorry, can’t tell you where is it because I don’t find any information on your OS in the thread - /var/log/messages is as good a guess as any) and have a look around 2018/02/12 20:49:38, you may find that the OOM manager has killed N1QL.
If that’s the case, you are running out of memory.

k_reid · February 12, 2018, 9:06pm

One thing I noticed is that other queries are working fine. Like this one:

select RAW meta(m).id
FROM mobile m
WHERE m.appId='foo'
AND array_count(m.subscriptions) > 0
ORDER BY m.appId, meta(m).id
LIMIT 10000

k_reid · February 12, 2018, 11:10pm

Hi @marcog,

I indeed see an Out of memory error in the var/log/messages log. Is there anything that can be done about this? Here is the query again:

select count(*) 
FROM (
SELECT m.token,ids FROM mobile m WHERE m.appId='foo'
AND array_count(m.subscriptions) > 0
GROUP BY m.token
LETTING ids = ARRAY_AGG(meta(m).id) 
HAVING ARRAY_LENGTH(ids) > 1) as s;

The embedded query in the FROM clause has the same issue. The explain plan is above.

Thanks,

-K

vsr1 · February 12, 2018, 11:35pm

select count(*) 
FROM (
SELECT  cnt FROM mobile m WHERE m.appId='foo'
AND array_count(m.subscriptions) > 0
GROUP BY m.token
LETTING cnt =COUNT(1) 
HAVING cnt > 1) as s;

k_reid · February 13, 2018, 6:36pm

Hi @marcog,

It seems like there are some other underlying issues that are causing problems. I’ve attempted to start from scratch by deleting the buckets and the cluster. Now I have a single node and I try to create a bucket. I keep seeing the errors below repeated over and over. The same is occurring when attempt the same on all the individual nodes:

Service 'memcached' exited with status 134. Restarting. Messages: 2018-02-13T18:31:28.011932Z WARNING     /opt/couchbase/bin/memcached() [0x436b52]
2018-02-13T18:31:28.011938Z WARNING     /srv/couchbase/bin/../lib/libplatform.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0x1b) [0x7fa2c50fe9db]
2018-02-13T18:31:28.011943Z WARNING     /srv/couchbase/bin/../lib/libplatform.so.0.1.0() [0x7fa2c50fa6ea]
2018-02-13T18:31:28.011951Z WARNING     /lib64/libpthread.so.0() [0x7fa2c4cdce25]
2018-02-13T18:31:28.011978Z WARNING     /lib64/libc.so.6(clone+0x6d) [0x7fa2c308934d]	ns_log 000	ns_1@fooserver.net	6:31:28 PM Tue Feb 13, 2018
Control connection to memcached on 'ns_1@fooserver.net' disconnected: {badmatch,

In errors.log:

[ns_server:error,2018-02-13T18:33:23.418Z,ns_1@fooserver.net:<0.22627.315>:janitor_agent:query_states_details:211]Failed to query vbucket states from some nodes:
[{'ns_1@fooserver.net',warming_up}]
[stats:error,2018-02-13T18:33:23.431Z,ns_1@fooserver.net:<0.13752.315>:base_stats_collector:handle_info:109](Collector: global_stats_collector) Exception in stats collector: {error,
                                                                   {badmatch,
                                                                    {error,
                                                                     couldnt_connect_to_memcached}},
                                                                   [{global_stats_collector,
                                                                     grab_stats,
                                                                     1,
                                                                     [{file,
                                                                       "src/global_stats_collector.erl"},
                                                                      {line,
                                                                       35}]},
                                                                    {base_stats_collector,
                                                                     handle_info,
                                                                     2,
                                                                     [{file,
                                                                       "src/base_stats_collector.erl"},
                                                                      {line,
                                                                       89}]},
                                                                    {gen_server,
                                                                     handle_msg,
                                                                     5,
                                                                     [{file,
                                                                       "gen_server.erl"},
                                                                      {line,
                                                                       604}]},
                                                                    {proc_lib,
                                                                     init_p_do_apply,
                                                                     3,
                                                                     [{file,
                                                                       "proc_lib.erl"},
                                                                      {line,
                                                                       239}]}]}

in memcached.log*.txt

2018-02-13T18:32:13.110919Z NOTICE 56: HELO [couchbase-java-client/2.5.1 (git: 2.5.1, core: 1.5.1) (Linux/3.10.0-693.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_91-8u91-b14-1~bpo8+1-b14)] TCP NODELAY [ 96.118.211.42:47972 - 96.118.149.218:11210 ]
2018-02-13T18:32:13.273678Z WARNING Breakpad caught crash in memcached. Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/36b54a7a-2a22-aa36-7d48ac24-2adf228d.dmp before terminating.
2018-02-13T18:32:13.273761Z WARNING Stack backtrace of crashed thread:
2018-02-13T18:32:13.273867Z WARNING     /opt/couchbase/bin/memcached() [0x423184]
2018-02-13T18:32:13.273896Z WARNING     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3d4) [0x457df4]
2018-02-13T18:32:13.273907Z WARNING     /opt/couchbase/bin/memcached() [0x457ff5]
2018-02-13T18:32:13.273918Z WARNING     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x97) [0x458137]
2018-02-13T18:32:13.273934Z WARNING     /lib64/libpthread.so.0() [0x7fcf642475e0]
2018-02-13T18:32:13.273975Z WARNING     /lib64/libc.so.6(gsignal+0x37) [0x7fcf625291f7]
2018-02-13T18:32:13.274007Z WARNING     /lib64/libc.so.6(abort+0x148) [0x7fcf6252a8e8]
2018-02-13T18:32:13.274063Z WARNING     /lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165) [0x7fcf62e2fac5]
2018-02-13T18:32:13.274083Z WARNING     /opt/couchbase/bin/memcached() [0x425e7c]
2018-02-13T18:32:13.274115Z WARNING     /lib64/libstdc++.so.6() [0x7fcf62e2da36]
2018-02-13T18:32:13.274148Z WARNING     /lib64/libstdc++.so.6() [0x7fcf62e2da63]
2018-02-13T18:32:13.274182Z WARNING     /lib64/libstdc++.so.6() [0x7fcf62e2dc83]
2018-02-13T18:32:13.274197Z WARNING     /srv/couchbase/bin/../lib/libcbsasl.so.1.1.1() [0x7fcf64c7a931]
2018-02-13T18:32:13.274206Z WARNING     /srv/couchbase/bin/../lib/libcbsasl.so.1.1.1() [0x7fcf64c88fdc]
2018-02-13T18:32:13.274215Z WARNING     /srv/couchbase/bin/../lib/libcbsasl.so.1.1.1() [0x7fcf64c8bb11]
2018-02-13T18:32:13.274226Z WARNING     /opt/couchbase/bin/memcached() [0x449490]
2018-02-13T18:32:13.274236Z WARNING     /opt/couchbase/bin/memcached() [0x436b52]
2018-02-13T18:32:13.274247Z WARNING     /srv/couchbase/bin/../lib/libplatform.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0x1b) [0x7fcf646619db]
2018-02-13T18:32:13.274261Z WARNING     /srv/couchbase/bin/../lib/libplatform.so.0.1.0() [0x7fcf6465d6ea]
2018-02-13T18:32:13.274273Z WARNING     /lib64/libpthread.so.0() [0x7fcf6423fe25]
2018-02-13T18:32:13.274312Z WARNING     /lib64/libc.so.6(clone+0x6d) [0x7fcf625ec34d]

Your help is appreciated.

marcog · February 14, 2018, 8:32am

looks like memcached is throwing a SEGV. @drigby?

drigby · February 14, 2018, 9:43am

Looks like you’ve hit MB-21659. This is fixed in 4.6.0 upwards.

Topic		Replies	Views
Long running N1QL queries completing prematurely without exceptions Couchbase Server query , connections , n1ql	12	4433	April 26, 2016
Couchbase querying performance issue Java SDK java , n1ql	5	1634	July 27, 2017
Couchbase Java SDK 3.x and N1QL Metrics issue Java SDK n1ql	3	760	February 28, 2020
Problem with N1QL and big attribute Java SDK n1ql	2	1109	May 23, 2018
[moved: JavaSDK->CouchbaseServer] N1QL for 4.1.X count(*) problem Couchbase Server	37	6505	June 2, 2016

Subscriber not consuming all the data from Observable

Related topics