[MB-7199] Couchbase server can't handle hundreds of view queries with unlimited number of results at the same time Created: 15/Nov/12  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: 2.0, 2.0.1, 2.1.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Chisheng Hong (Inactive) Assignee: Ravi Mayuram
Resolution: Unresolved Votes: 0
Labels: 2.0-release-notes, scrubbed, system-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos5.6, build 1952 with 16 scheduler

Attachments: File erl_crash.dump.tgz     File logs.tgz     PNG File queries2.png    
Triage: Untriaged

 Description   
Cluster: 6 nodes
10.6.2.37
10.6.2.38
10.6.2.39
10.6.2.40
10.6.2.42
10.6.2.43

Build # 2.0.0-1952 with 16 erlang schedulers
each nodes with 390GB SSD drive, 32GB RAM

2 buckets created sasl and default. Start loading items with 8K creates per sec to each bucket. Then insert a ddoc with 2 views to each bucket. Then have 4 clients do query for the view with 120 reads per sec.

I don't put any limit on query results and those queries are generated without waiting for previous ones finish showing the results:
    capiUrl = "http://%s:%s/couchBase/" % (cfg.COUCHBASE_IP, cfg.COUCHBASE_PORT)
    url = capiUrl + '%s/_design/%s/_%s/%s' % (bucket,
                                                design_doc_name, type_,
                                                view_name)
    headers = {'Content-Type': 'application/json',
               'Authorization': 'Basic %s' % authorization,
               'Accept': '*/*'}
     req = urllib2.Request(url, headers = headers)

Then the UI becomes unresponsive.
Pay attention to the following stats:

1st is the erlang scheduler on one of the nodes during query happens:

(ns_1@10.6.2.37)5> F = fun (R) -> io:format("~p ~p~n", [latency:ts(now()), erlang:statistics(run_queues)]), timer:sleep(100), R(R) end.
#Fun<erl_eval.6.80247286>
1353032384137 {11,104,2,0,8,11,0,0,0,0,0,0,0,0,0,0}
1353032384293 {4,65,103,7,2,20,0,0,0,0,0,0,0,0,0,0}
1353032384425 {3,7,4,25,21,3,0,0,0,0,0,0,0,0,0,0}
1353032384553 {23,17,50,6,6,0,0,0,0,0,0,0,0,0,0,0}
1353032384672 {16,28,92,15,65,42,0,0,0,0,0,0,0,0,0,0}
1353032384795 {6,4,47,15,1,0,0,0,0,0,0,0,0,0,0,0}
1353032384919 {1,11,86,59,56,55,0,0,0,0,0,0,0,0,0,0}
1353032385081 {54,49,30,44,33,11,0,0,0,0,0,0,0,0,0,0}
1353032385221 {15,47,10,45,9,31,0,0,0,0,0,0,0,0,0,0}
1353032385355 {46,2,72,89,28,4,0,0,0,0,0,0,0,0,0,0}
1353032385468 {11,1,8,26,0,2,0,0,0,0,0,0,0,0,0,0}
1353032385610 {7,23,7,14,20,13,0,0,0,0,0,0,0,0,0,0}
1353032385765 {7,85,11,16,0,12,0,0,0,0,0,0,0,0,0,0}
1353032385905 {9,29,28,2,3,26,0,0,0,0,0,0,0,0,0,0}
1353032386068 {48,112,142,31,12,25,0,0,0,0,0,0,0,0,0,0}
1353032386222 {11,40,28,36,5,9,0,0,0,0,0,0,0,0,0,0}
1353032386356 {64,53,4,5,7,34,0,0,0,0,0,0,0,0,0,0}
1353032386560 {0,2,45,2,0,89,0,0,0,0,0,0,0,0,0,0}
1353032386700 {50,18,83,4,0,35,0,0,0,0,0,0,0,0,0,0}
1353032386837 {0,18,3,2,17,4,0,0,0,0,0,0,0,0,0,0}
1353032386984 {2,10,11,6,0,4,0,0,0,0,0,0,0,0,0,0}
1353032387105 {1,5,12,2,0,64,0,0,0,0,0,0,0,0,0,0}
1353032387231 {22,67,58,5,19,7,0,0,0,0,0,0,0,0,0,0}
1353032387337 {17,1,38,33,7,1,0,0,0,0,0,0,0,0,0,0}
1353032387469 {5,5,48,27,2,18,0,0,0,0,0,0,0,0,0,0}
1353032387598 {2,50,47,88,41,8,0,0,0,0,0,0,0,0,0,0}
1353032387746 {2,55,16,35,1,12,0,0,0,0,0,0,0,0,0,0}
1353032387897 {3,29,98,0,5,19,0,0,0,0,0,0,0,0,0,0}
1353032388021 {29,50,147,0,5,3,0,0,0,0,0,0,0,0,0,0}
1353032388146 {15,3,30,3,46,2,0,0,0,0,0,0,0,0,0,0}
1353032388277 {53,8,50,1,10,14,0,0,0,0,0,0,0,0,0,0}
1353032388402 {2,19,45,0,6,2,0,0,0,0,0,0,0,0,0,0}
1353032388594 {17,123,2,0,29,4,0,0,0,0,0,0,0,0,0,0}
1353032388734 {35,92,0,3,40,70,0,0,0,0,0,0,0,0,0,0}
1353032388873 {2,10,22,5,18,17,0,0,0,0,0,0,0,0,0,0}
1353032389008 {112,84,15,0,1,0,0,0,0,0,0,0,0,0,0,0}
1353032389133 {102,57,0,25,3,23,0,0,0,0,0,0,0,0,0,0}
1353032389257 {44,55,28,5,36,49,0,0,0,0,0,0,0,0,0,0}
1353032389379 {4,40,3,48,2,48,0,0,0,0,0,0,0,0,0,0}
1353032389549 {24,161,24,38,16,21,0,0,0,0,0,0,0,0,0,0}
1353032389686 {54,25,12,23,7,98,0,0,0,0,0,0,0,0,0,0}
1353032389804 {79,33,20,2,3,46,0,0,0,0,0,0,0,0,0,0}
1353032389950 {90,0,25,13,45,56,0,0,0,0,0,0,0,0,0,0}
1353032390101 {59,10,17,1,37,54,0,0,0,0,0,0,0,0,0,0}

2nd is the top stats about beam.smp:
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  676 couchbas 20 0 26.0g 24g 5128 S 663.9 77.7 369:01.85 beam.smp

24G memory usage. And the CPU% is always above 350%

 Comments   
Comment by Chisheng Hong (Inactive) [ 15/Nov/12 ]
Do cbcollect_info for every node:
https://s3.amazonaws.com/bugdb/jira/MB-7199/11-15/10.6.2.38.zip
https://s3.amazonaws.com/bugdb/jira/MB-7199/11-15/10.6.2.39.zip
https://s3.amazonaws.com/bugdb/jira/MB-7199/11-15/10.6.2.40.zip
https://s3.amazonaws.com/bugdb/jira/MB-7199/11-15/10.6.2.42.zip
https://s3.amazonaws.com/bugdb/jira/MB-7199/11-15/10.6.2.43.zip

Comment by Tommie McAfee [ 16/Nov/12 ]
Same issue on different cluster with ssd at build 1952:
Went from 35k op/sec kv workload, to 2k op/sec and unresponsive ui when applied 50 queries per second. Only querying a subset of documents (about 800k)
Load ratio: -create 10 --get 75 --update 15 --delete 1

In attached screen see direct correlation between op/sec and view reads.

In atop beam.smp is 5GB and seems to grow without bounds. To replay samples ssh root@10.6.2.66 (pwd: couchbase)
atop -b 12:40 -r /tmp/atop-nodeMB-7199.log

Also In a previous run beam needed to be killed by OS
> Nov 15 10:11:48 pine-11803 kernel: Out of memory: Kill process 32588 (beam.smp) score 962 or sacrifice child
Comment by Aleksey Kondratenko [ 16/Nov/12 ]
Folks, please try testing with default settings. Let's confirm it's not just code regression
Comment by Aleksey Kondratenko [ 16/Nov/12 ]
Passing this formally to Aliaksey who is handling this anyways.

We've started looking at partially ready crash dump and there are signs of real leak of lots of binaries in index merger
Comment by Thuan Nguyen [ 16/Nov/12 ]
Integrated in github-couchdb-preview #538 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/538/])
    MB-7199 Don't buffer socket data on cleanup (Revision 31560d74c3bbe8c019186923a9db3468a8197ab8)

     Result = SUCCESS
Farshid Ghods :
Files :
* src/couch_index_merger/src/couch_index_merger.erl
Comment by Aliaksey Artamonau [ 16/Nov/12 ]
We would like to know as many details about the queries that you run as it's possible. Particularly, which query parameters are used, iff you read the response completely or just drop the connection on the floor.
Comment by Chisheng Hong (Inactive) [ 16/Nov/12 ]
I still face the same situation with default 128 scheduler with the same scenario.
Comment by Steve Yen [ 19/Nov/12 ]
bug-scrub - want same test on build 1953 (which has new fix) - look for crashes/cores
Comment by Farshid Ghods (Inactive) [ 19/Nov/12 ]
1- run the same test without changing the limit to N - new build with same code , after cluster has same number of items , same number of nodes
2- what is the timeout for that call ?
3- see if it still crashes
Comment by Chisheng Hong (Inactive) [ 19/Nov/12 ]
I ran the same test case in bug description with 1953 (16:16) using mcsoda and do query by using os.system("curl -X GET 'http://%s%s:8092/%s/_design/d1/_view/v1?stale=ok&#39;" % (bucket_info, vm_ips[0], bucket)). Same thing happened after I do this unlimited results query more than 1K times (I kick off those queries without caring about whether the previous query finishes showing the results). Memory usage for beam.smp increases above 2.5G very easily. But if we put limit for query, this will not happen.
Comment by Filipe Manana [ 19/Nov/12 ]
Where's the crashdump file?
Comment by Filipe Manana [ 19/Nov/12 ]
My findings after analyzing crash dump file (~300Mb):

When the crash dump was produced, the memory used by Erlang VM was about:

$ grep -n OldHeap: erl_crash.dump.11-19-2012-13\:16\:17.5856 | cut -d ' ' -f 2 | perl -ne '$sum = $sum + $_; print $sum, "\n";' | tail -n 1
152453922

152453922 * 8 = ~1.2Gb

$ grep -n Stack+heap: erl_crash.dump.11-19-2012-13\:16\:17.5856 | cut -d ' ' -f 2 | perl -ne '$sum = $sum + $_; print $sum, "\n";' | tail -n 1
53623918

53623918 * 8 = ~428Mb

Process (Erlang process) with biggest old heaps:

$ grep -n OldHeap: erl_crash.dump.11-19-2012-13\:16\:17.5856 | sort -k 3 -n -t
(...)
23058:OldHeap: 2629425
10824:OldHeap: 3286780
227045:OldHeap: 8024355

(values are in words, therefore to get byte values multiply by 8)

These 3 processes are:

Mochiweb server:

=proc:<0.17887.7>
State: Waiting
Spawned as: proc_lib:init_p/5
Spawned by: <0.1003.0>
Started: Mon Nov 19 15:51:37 2012
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [#Port<0.218061>, <0.1003.0>]
Reductions: 3278012
Stack+heap: 1346269
OldHeap: 8024355
Heap unused: 192102
OldHeap unused: 6083229
Program counter: 0x00007f6829c7b928 (mochiweb_http:request/2 + 88)


ns_server stats reader for bucket default

=proc:<0.5381.0>
State: Waiting
Name: 'stats_reader-default'
Spawned as: proc_lib:init_p/5
Spawned by: <0.5353.0>
Started: Mon Nov 19 13:30:16 2012
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [<0.5353.0>]
Reductions: 10535894
Stack+heap: 1682835
OldHeap: 3286780
Heap unused: 125117
OldHeap unused: 2875168
Program counter: 0x00007f683c076840 (gen_server:loop/6 + 256)


ns_server stats reader for bucket saslbucket

=proc:<0.7758.0>
State: Waiting
Name: 'stats_reader-saslbucket'
Spawned as: proc_lib:init_p/5
Spawned by: <0.7730.0>
Started: Mon Nov 19 13:30:40 2012
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [<0.7730.0>]
Reductions: 5999001
Stack+heap: 1346269
OldHeap: 2629425
Heap unused: 392854
OldHeap unused: 2629326
Program counter: 0x00007f683c076840 (gen_server:loop/6 + 256)



Erlang processes with biggest stack+heap:

$ grep -n Stack+heap: erl_crash.dump.11-19-2012-13\:16\:17.5856 | sort -k 3 -n -t:
(....)
227044:Stack+heap: 1346269
23057:Stack+heap: 1346269
10823:Stack+heap: 1682835

ns_server's stats reader for default bucket:

=proc:<0.5381.0>
State: Waiting
Name: 'stats_reader-default'
Spawned as: proc_lib:init_p/5
Spawned by: <0.5353.0>
Started: Mon Nov 19 13:30:16 2012
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [<0.5353.0>]
Reductions: 10535894
Stack+heap: 1682835
OldHeap: 3286780
Heap unused: 125117
OldHeap unused: 2875168
Program counter: 0x00007f683c076840 (gen_server:loop/6 + 256)


ns_server0s stats reader for saslbucket:

=proc:<0.7758.0>
State: Waiting
Name: 'stats_reader-saslbucket'
Spawned as: proc_lib:init_p/5
Spawned by: <0.7730.0>
Started: Mon Nov 19 13:30:40 2012
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [<0.7730.0>]
Reductions: 5999001
Stack+heap: 1346269
OldHeap: 2629425
Heap unused: 392854
OldHeap unused: 2629326
Program counter: 0x00007f683c076840 (gen_server:loop/6 + 256)


Mochiweb's server:

=proc:<0.17887.7>
State: Waiting
Spawned as: proc_lib:init_p/5
Spawned by: <0.1003.0>
Started: Mon Nov 19 15:51:37 2012
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [#Port<0.218061>, <0.1003.0>]
Reductions: 3278012
Stack+heap: 1346269
OldHeap: 8024355
Heap unused: 192102
OldHeap unused: 6083229
Program counter: 0x00007f6829c7b928 (mochiweb_http:request/2 + 88)


Seems garbage collector isn't doing its job well. View related processes don't seem to be using big quantities of memory (nor the connection pool itself).

Finally looking at error_logger process message box, there's a ton of EBADF errors on file, socket and port operations:

E.g.:

{notify,{error,noproc,
               {emulator,"~s~n",["erts_poll_wait() failed: ebadf (9)\n"]}}}

{notify,{error,noproc,
               {emulator,"~s~n",
                         ["Bad output fd in erts_poll()! fd=42, port=#Port<0.7463>, driver=spawn, name=/opt/couchbase/bin/sigar_port\n"]}}}

{notify,{error,noproc,
               {emulator,"~s~n",
                         ["Bad input fd in erts_poll()! fd=78, port=#Port<0.9684>, driver=tcp_inet, name=tcp_inet\n"]}}}

{notify,
    {error_report,<0.56.0>,
        {<0.32244.4>,crash_report,
         [[{initial_call,
               {couch_file,spawn_writer,['Argument__1','Argument__2']}},
           {pid,<0.32244.4>},
           {registered_name,[]},
           {error_info,
               {error,
                   {badmatch,{error,ebadf}},
                   [{couch_file,write_blocks,3},
                    {couch_file,writer_collect_chunks,5},
                    {proc_lib,init_p_do_apply,3}]}},
           {ancestors,
               [<0.32241.4>,<0.14053.0>,
                <0.14044.0>,<0.14043.0>]},
           {messages,[]},
           {links,[<0.32241.4>,#Port<0.163328>]},
           {dictionary,[]},
           {trap_exit,true},
           {status,running},
           {heap_size,610},
           {stack_size,24},
           {reductions,49502568}],
          []]}}}

{notify,{error,<0.56.0>,
               {emulator,"~s~n",
                         ["Error in process <0.4736.7> on node 'ns_1@10.6.2.37' with exit value: {{badmatch,{error,ebadf}},[{couch_btree,get_node,2},{couch_btree,modify_node,8},{couch_btree,modify_kpnode,10},{couch_btree,modify_node,8},{couch_btree,modify_kpnode,10},{couch_btree,modify_node,8},{couch_btree,modify_kpnode... \n"]}}}


The view merger's connection pool as a max size of 10 000 connections, but at the crash dump time, it only had 2684 connections open (I should probably decrease the limit to 5K or less).

At startup, couchbase-server.sh sets ulimit -n to 10240 and doesn't set ERL_MAX_PORTS, which means it gets the default of 1024 (according to erl -man erlang).
Perhaps we're reaching a limit of ports (each raw file descriptor opened uses 1 port). However this still doesn't explain EBADF errors, I would expect EMFILE instead.

Alk, any reason to not set ERL_MAX_PORTS to a higher value?


Comment by Filipe Manana [ 19/Nov/12 ]
Alk, see question above.
Comment by Filipe Manana [ 19/Nov/12 ]
After talking with Alk, ERL_MAX_PORTS has no effect on Unixes apparently, as it assumes the maximum allowed in the system (getting it via sysconf, etc).

I think it's worth re-trying this test with +A 16 instead of +S N:N.

Chisheng can you try it?
Also, please provide the server logs next time the crash happens.

thanks
Comment by Steve Yen [ 20/Nov/12 ]
moved per bug-scrub to 2.0.1
Comment by Filipe Manana [ 29/Nov/12 ]
Now that I am back at home, Tried this on a 4 nodes cluster with my hardware.
In a 4 nodes cluster, created a view with 10M rows, and then spawned 5000 curl processes like this:

$ for i in `seq 1 5000`; do curl -s http://localhost:9500/default/_design/test/_view/view1 & done

After a short period, all view queries start returning an error, because there's a ns_config call timeout
in the HTTP handler before query execution happens in view engine:

[couchdb:error,2012-11-29T17:16:34.030,n_0@192.168.1.88:<0.2954.1>:couch_log:error:42]Uncaught error in HTTP request: {exit,
                                 {timeout,{gen_server,call,[ns_config,get]}}}

Stacktrace: [{diag_handler,diagnosing_timeouts,1,
                           [{file,"src/diag_handler.erl"},{line,375}]},
             {menelaus_auth,check_auth,1,
                            [{file,"src/menelaus_auth.erl"},{line,121}]},
             {menelaus_auth,bucket_auth_fun,1,
                            [{file,"src/menelaus_auth.erl"},{line,168}]},
             {menelaus_auth,is_bucket_accessible,2,
                            [{file,"src/menelaus_auth.erl"},{line,68}]},
             {capi_frontend,do_db_req,2,
                            [{file,"src/capi_frontend.erl"},{line,53}]},
             {httpd,handle_request,6,
                          [{file,"couch_httpd.erl"},{line,222}]},
             {mochiweb_http,headers,5,[{file,"mochiweb_http.erl"},{line,136}]},
             {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]


This seems to cause timeout_diag_logger do start dumping lots of stuff.
At this point I got the erlang node which received the query, to use over 3Gb of memory, at which point I killed it to get
an erl_crash.dump (attached here).

Looking at that crash dump, it seems the processes which are using very high amounts of memory are mostly logger processes
and a few other ns_server processes:

$ IFS="\n"; for m in `grep 'OldHeap:' erl_crash.dump | sort -n -k 2 | tail -n 15`; do egrep $m -A 3 -B 12 erl_crash.dump | grep '^Name' ; done
Name: standard_error
Name: 'sink-disk_default'
Name: 'sink-disk_error'
Name: 'sink-disk_couchdb'
Name: 'sink-disk_debug'
Name: 'sink-stderr'
Name: timeout_diag_logger
Name: ns_config_events
Name: ns_config
Name: ns_port_memcached
Name: 'stats_reader-default'

Server logs attached here.
Haven't experienced the EBADF errors however (which happened in the QE cluster), maybe because my max open files open is likely
higher than in those machines (59000).

Aliaksey, do you think something needs to be tweaked in the logger?
Thanks
Comment by Filipe Manana [ 29/Nov/12 ]
Aliaksey, see comment above.
Thanks
Comment by Aliaksey Artamonau [ 29/Nov/12 ]
We considered hibernating logger processes. But it's not clear what would be a performance impact. From what you described it seems that the main problem is again those random timeouts.
Comment by Filipe Manana [ 29/Nov/12 ]
Agreed, the main cause are the ns_config call timeouts in the ns_server view query HTTP handler.
With so many timeouts, the logger processes, and mb_master, are the processes with the biggest
message queue lenghts:

=proc:<0.167.0>
State: Scheduled
Name: 'sink-stderr'
Spawned as: proc_lib:init_p/5
Spawned by: <0.33.0>
Started: Thu Nov 29 17:07:15 2012
Message queue length: 3431

=proc:<0.78.0>
State: Scheduled
Name: 'sink-disk_debug'
Spawned as: proc_lib:init_p/5
Spawned by: <0.33.0>
Started: Thu Nov 29 17:07:15 2012
Message queue length: 405

=proc:<0.66.0>
State: Waiting
Name: 'sink-disk_error'
Spawned as: proc_lib:init_p/5
Spawned by: <0.33.0>
Started: Thu Nov 29 17:07:15 2012
Message queue length: 186

=proc:<0.787.0>
State: Waiting
Name: mb_master
Spawned as: proc_lib:init_p/5
Spawned by: <0.371.0>
Started: Thu Nov 29 17:07:31 2012
Message queue length: 149

=proc:<0.627.0>
State: Waiting
Spawned as: proc_lib:init_p/5
Spawned by: <0.580.0>
Started: Thu Nov 29 17:07:30 2012
Message queue length: 84
Comment by kzeller [ 05/Dec/12 ]
Added to RN as: Be aware that if attempt hundreds of simultaneous queries with an unlimited
  number of results, Couchbase Server may fail. For instanace 10 million
  results queried simultaneously will cause the server to fail. Instead you should
  specify a reasonable limit of results when you query, otherwise the
  server will stall and crash due to excessive memory usage.
Comment by Maria McDuff (Inactive) [ 27/Mar/13 ]
moving to 2.1
Comment by Aleksey Kondratenko [ 20/Jun/13 ]
Not sure what to do about this one. But I've created MB-8501 to do something in ns_server so that it defends itself from this condition and other similar conditions
Generated at Sat Aug 30 00:22:49 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.