[MB-12048] Vew engine 2.5 to 3.0 index file upgrade script Created: 22/Aug/14  Updated: 23/Aug/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sarath Lakshman Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: start-finish
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
View engine 2.5 index files are not compatible with 3.0 index files. Hence, it requires index rebuild for 3.0.
We need a method that renames index files to new compatible filenames (with signature) and append new header.

 Comments   
Comment by Volker Mische [ 22/Aug/14 ]
I created a hacky script that still involves some manual steps, though i think it works in general.

Though I'm hitting a bigger issue with DCP. It expectes that you send the correct partition/vBucket version with your request whenever you don't start indexing from scratch. The problem is that this information is persisted on disk, hence in the index header. When we offline upgrade from 2.x to 3.0 we don't know which partition that server might have, hence we can't save the correct one.

Currently the only sane way I see is changing the DCP semantics and making it possible to resume from a certain seq number with sending {0, 0} as partition version (currently you need to send the correct partition version in case you want to resume).
Comment by Sriram Melkote [ 22/Aug/14 ]
Folks - we need to do this as an automatic feature (i.e., fully automated without needing any manual steps) or not at all. Let's talk with EP engine folks to spec this.
Comment by Sarath Lakshman [ 22/Aug/14 ]
I am guessing it can be done as part of installation postscript which checks the current version and performs upgrade of existing files.
Eg. RPM and Deb has a way to specify post scripts.
Comment by Volker Mische [ 22/Aug/14 ]
Sarath, yes, that would be a way.
Comment by Volker Mische [ 22/Aug/14 ]
Siri, I misunderstood you comment as you probably did mine. The manual steps are only needed atm to verify my idea works. The final result will be a script that can be run without any manual steps.

My misunderstanding was that I thought you talk about "online upgrade", but you didn't really say that.
Comment by Sriram Melkote [ 22/Aug/14 ]
Ketaki, can we add a test to detect this situation? The test must fail until we fix this issue.
Comment by Ketaki Gangal [ 22/Aug/14 ]
Hi Siri,

We run automation tests which do offline upgrades from 2.X to 3.X, what these tests dont check is whether index is rebuilt /not.
https://github.com/couchbase/testrunner/blob/master/conf/py-newupgrade.conf#L39

I ll update the tests to add a check for index-rebuild verification.

Sarath: Can you provide details on how to check if indexes are rebuilt/not.


Comment by Sarath Lakshman [ 23/Aug/14 ]
I can think of a very easy way where you can have a timeout for stale=false after Couchbase is up soon after warmup. Run a stale=false and it should run immediately. Since index rebuild is going on, it would take more time.




[MB-11955] [windows] could not re-create default bucket after delete default bucket Created: 13/Aug/14  Updated: 22/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Thuan Nguyen Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server 2008 R2 64-bit

Attachments: Zip Archive 172.23.107.124-8132014-1238-diag.zip     Zip Archive 172.23.107.125-8132014-1239-diag.zip     Zip Archive 172.23.107.126-8132014-1241-diag.zip    
Triage: Untriaged
Operating System: Windows 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: Link to manifest file of this build (I don't see manifest file for windows)

http://latestbuilds.hq.couchbase.com/couchbase-server-enterprise_x86_64_3.0.0-1144-rel.rpm.manifest.xml
Is this a Regression?: Yes

 Description   
Run sanity test on windows 2008 R2 64-bit on build 3.0.0-1144, there were a lot of failed tests.
Check error log, see test failed due to unable to create default bucket.

[2014-08-13 11:54:31,069] - [rest_client:1524] INFO - http://172.23.107.124:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=200&proxyPort=12211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=
[2014-08-13 11:54:31,078] - [rest_client:747] ERROR - http://172.23.107.124:8091/pools/default/buckets

Sanity tests passed in build 3.0.0-1139

 Comments   
Comment by Abhinav Dangeti [ 15/Aug/14 ]
Hey Tony, can I get one windows vm for testing here?
Comment by Thuan Nguyen [ 15/Aug/14 ]
I gave to Siram some physical windows servers.
Can you check with him if he still needs server with IP 10.17.1.166?
Comment by Abhinav Dangeti [ 15/Aug/14 ]
Bucket deletion doesn't complete, reason why you cannot recreate the bucket.
Comment by Thuan Nguyen [ 15/Aug/14 ]
This bug does not happen in build 3.0.0-1139 and previous build as shown in
http://qa.sc.couchbase.com/job/CouchbaseServer-SanityTest-4Node-Windows_2008_x64/
Comment by Abhinav Dangeti [ 15/Aug/14 ]
Changes between 1140 and 1142 to be precise.
Comment by Abhinav Dangeti [ 18/Aug/14 ]
Seeing multiple mcCouch connection failures in the logs:

Mon Aug 18 11:28:22.741780 Pacific Daylight Time 3: (default) Trying to connect to mccouch: "127.0.0.1:11213"
Mon Aug 18 11:28:22.741780 Pacific Daylight Time 3: (default) Connected to mccouch: "127.0.0.1:11213"
Mon Aug 18 11:28:22.748780 Pacific Daylight Time 3: (default) Failed to read from mccouch for select_bucket: "The operation completed successfully.

"
Mon Aug 18 11:28:22.748780 Pacific Daylight Time 3: (default) Resetting connection to mccouch, lastReceivedCommand = select_bucket lastSentCommand = select_bucket currentCommand =unknown
Mon Aug 18 11:28:22.748780 Pacific Daylight Time 3: (default) Trying to connect to mccouch: "127.0.0.1:11213"
Mon Aug 18 11:28:22.748780 Pacific Daylight Time 3: (default) Connected to mccouch: "127.0.0.1:11213"
Mon Aug 18 11:28:22.758780 Pacific Daylight Time 3: (No Engine) Bucket default registered with low priority
Mon Aug 18 11:28:22.758780 Pacific Daylight Time 3: (No Engine) Spawning zu readers, zu writers, zu auxIO, zu nonIO threads
Mon Aug 18 11:28:22.761781 Pacific Daylight Time 3: (default) metadata loaded in 1000 usec
Mon Aug 18 11:28:22.761781 Pacific Daylight Time 3: (default) Enough number of items loaded to enable traffic
Mon Aug 18 11:28:22.761781 Pacific Daylight Time 3: (default) warmup completed in 1000 usec
Mon Aug 18 11:28:23.839842 Pacific Daylight Time 3: (default) Failed to read from mccouch for notify_vbucket_update: "The operation completed successfully.

"
Mon Aug 18 11:28:23.839842 Pacific Daylight Time 3: (default) Resetting connection to mccouch, lastReceivedCommand = notify_vbucket_update lastSentCommand = notify_vbucket_update currentCommand =unknown
Mon Aug 18 11:28:23.839842 Pacific Daylight Time 3: (default) Trying to connect to mccouch: "127.0.0.1:11213"
Mon Aug 18 11:28:23.839842 Pacific Daylight Time 3: (default) Connected to mccouch: "127.0.0.1:11213"
Mon Aug 18 11:28:23.888845 Pacific Daylight Time 3: (default) Failed to read from mccouch for notify_vbucket_update: "The operation completed successfully.

"

...
Comment by Abhinav Dangeti [ 22/Aug/14 ]
These failures are very alike the ones seen in MB-11948.
Fix: http://review.couchbase.org/#/c/40865/




[MB-11501] [System tests with DGM]Rebalance in exited with reason wait_seqno_persisted_failed(segmentation fault) Created: 21/Jun/14  Updated: 22/Aug/14  Resolved: 23/Jun/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-848

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
system test info before rebalance:
4 buckets:
AbRegNum: 500MB ram quota, ~ 20 resident ratio
RevAB : 4500MB ram quota, ~70 resident ratio
MsgsCalls: 300MB ram quota, ~70 resident ratio
UserINno: 300MB ram quota, ~100 resident ratio
 
3 nodes in the cluster:
172.23.105.22, 172.23.105.157, 172.23.105.158
UniXDCR replication with other cluster: 172.23.105.159

Starting rebalance, KeepNodes = ['ns_1@172.23.105.22','ns_1@172.23.105.157',
'ns_1@172.23.105.158','ns_1@172.23.105.156',
'ns_1@172.23.105.160'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes

rebalance stuck for a long time with progress ~1% then failed with wait_seqno_persisted
don't see any crashes on vms

 

Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.13953.427>,
{wait_seqno_persisted_failed,"RevAB",849,
17733,
[{'ns_1@172.23.105.157',
{'EXIT',
{{badmatch,{error,closed}},
{gen_server,call,
[{'janitor_agent-RevAB',
'ns_1@172.23.105.157'},
{if_rebalance,<0.9603.440>,
{wait_seqno_persisted,849,17733}},
infinity]}}}}]}}}
ns_orchestrator002 ns_1@172.23.105.158 12:07:10 - Sat Jun 21, 2014
<0.9744.443> exited with {unexpected_exit,
{'EXIT',<0.13953.427>,
{wait_seqno_persisted_failed,"RevAB",849,17733,
[{'ns_1@172.23.105.157',
{'EXIT',
{{badmatch,{error,closed}},
{gen_server,call,
[{'janitor_agent-RevAB',
'ns_1@172.23.105.157'},
{if_rebalance,<0.9603.440>,
{wait_seqno_persisted,849,17733}},
infinity]}}}}]}}} ns_vbucket_mover000 ns_1@172.23.105.158 12:07:10 - Sat Jun 21, 2014
Bucket "AbRegNums" loaded on node 'ns_1@172.23.105.157' in 27 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:58 - Sat Jun 21, 2014
Bucket "MsgsCalls" loaded on node 'ns_1@172.23.105.157' in 3 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:35 - Sat Jun 21, 2014
Bucket "UserInfo" loaded on node 'ns_1@172.23.105.157' in 28 seconds. ns_memcached000 ns_1@172.23.105.157 12:06:31 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
stats_recv,
4,
[{file,
"src/mc_client_binary.erl"},
{line,
163}]},
{mc_client_binary,
stats,
4,
[{file,
"src/mc_client_binary.erl"},
{line,
411}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
725}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
170}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} (repeated 2 times) ns_memcached000 ns_1@172.23.105.157 12:06:13 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
"src/mc_client_binary.erl"},
{line,
149}]},
{mc_client_binary,
select_bucket,
2,
[{file,
"src/mc_client_binary.erl"},
{line,
344}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
"src/ns_memcached.erl"},
{line,
1280}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
750}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
170}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} (repeated 3 times) ns_memcached000 ns_1@172.23.105.157 12:06:13 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {badmatch,
{error,
closed}} ns_memcached000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
stats_recv,
4,
[{file,
"src/mc_client_binary.erl"},
{line,
163}]},
{mc_client_binary,
stats,
4,
[{file,
"src/mc_client_binary.erl"},
{line,
411}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
725}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
170}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} ns_memcached000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: Sat Jun 21 12:05:50.696179 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 363) Stream closing, 0 items sent from disk, 0 items sent from memory, 894 was last seqno sent
Sat Jun 21 12:05:50.696196 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 363) stream created with start seqno 894 and end seqno 894
Sat Jun 21 12:05:50.698921 PDT 3: (AbRegNums) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:AbRegNums - (vb 363) stream created with start seqno 894 and end seqno 0
Sat Jun 21 12:05:50.699524 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 424) Stream closing, 0 items sent from disk, 0 items sent from memory, 920 was last seqno sent
Sat Jun 21 12:05:50.699544 PDT 3: (AbRegNums) UPR (Producer) eq_uprq:xdcr:AbRegNums-e2e70d5f12fab94482239b9abac8afd7 - (vb 424) stream created with start seqno 920 and end seqno 920 ns_log000 ns_1@172.23.105.157 12:05:56 - Sat Jun 21, 2014
Bucket "AbRegNums" loaded on node 'ns_1@172.23.105.157' in 36 seconds. ns_memcached000 ns_1@172.23.105.157 12:05:46 - Sat Jun 21, 2014
Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 139. Restarting. Messages: Sat Jun 21 12:04:53.361372 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 162) stream created with start seqno 17580 and end seqno 0
Sat Jun 21 12:04:53.371897 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 381) stream created with start seqno 17966 and end seqno 0
Sat Jun 21 12:04:53.401541 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 98) stream created with start seqno 17760 and end seqno 0
Sat Jun 21 12:04:53.454580 PDT 3: (RevAB) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@172.23.105.157:RevAB - (vb 166) stream created with start seqno 17704 and end seqno 0
Sat Jun 21 12:04:53.743529 PDT 3: (RevAB) Notified the timeout on checkpoint persistence for vbucket 921, cookie 0x663d500 ns_log000 ns_1@172.23.105.157 12:05:05 - Sat Jun 21, 2014
Control connection to memcached on 'ns_1@172.23.105.157' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
"src/mc_client_binary.erl"},
{line,
149}]},
{mc_client_binary,
select_bucket,
2,
[{file,
"src/mc_client_binary.erl"},
{line,
344}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
"src/ns_memcached.erl"},
{line,
1280}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
750}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
170}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} ns_memcached000 ns_1@172.23.105.157 12:05:05 - Sat Jun 21, 2014
Bucket "RevAB" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.158 11:38:27 - Sat Jun 21, 2014
Started rebalancing bucket RevAB ns_rebalancer000 ns_1@172.23.105.158 11:38:24 - Sat Jun 21, 2014
Starting rebalance, KeepNodes = ['ns_1@172.23.105.22','ns_1@172.23.105.157',
'ns_1@172.23.105.158','ns_1@172.23.105.156',
'ns_1@172.23.105.160'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
ns_orchestrator004 ns_1@172.23.105.158 11:38:23 - Sat Jun 21, 2014

 Comments   
Comment by Andrei Baranouski [ 21/Jun/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11501/3ad6ee7f/172.23.105.156-6212014-1239-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11501/3ad6ee7f/172.23.105.157-6212014-1228-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11501/3ad6ee7f/172.23.105.158-6212014-1219-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11501/3ad6ee7f/172.23.105.160-6212014-1247-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11501/3ad6ee7f/172.23.105.22-6212014-1253-diag.zip
Comment by Aleksey Kondratenko [ 21/Jun/14 ]
It failed due to {badmatch, closed}. As you might know it is often indication of memcached crash. And indeed you can see memcached dying with status code 139. Which is signal 11 which is segmentation fault.

Assign this to ep-engine folks but "process" core dump first as usual in such cases.
Comment by Andrei Baranouski [ 21/Jun/14 ]
Thanks Alk thanks for the review


root@172.23.105.157

gdb /opt/couchbase/bin/memcached /data/core.memcached.14993
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/couchbase/bin/memcached...done.
[New Thread 15000]
[New Thread 15004]
[New Thread 14993]
[New Thread 15005]
[New Thread 14998]
[New Thread 14996]
[New Thread 14999]
[New Thread 14997]
[New Thread 15007]
[New Thread 15009]
[New Thread 14995]
[New Thread 14994]
[New Thread 15003]
[New Thread 15002]
[New Thread 15001]
[New Thread 15008]
[New Thread 15006]

warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1
Reading symbols from /opt/couchbase/bin/../lib/libplatform.so.0.1.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libplatform.so.0.1.0
Reading symbols from /opt/couchbase/bin/../lib/libcJSON.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcJSON.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libJSON_checker.so...done.
Loaded symbols for /opt/couchbase/bin/../lib/libJSON_checker.so
Reading symbols from /opt/couchbase/bin/../lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libsnappy.so.1
Reading symbols from /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4
Reading symbols from /opt/couchbase/bin/../lib/libevent_core-2.0.so.5...done.
Loaded symbols for /opt/couchbase/bin/../lib/libevent_core-2.0.so.5
Reading symbols from /usr/lib64/libssl.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libssl.so.6
Reading symbols from /usr/lib64/libcrypto.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libcrypto.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libkrb5support.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so
Reading symbols from /opt/couchbase/lib/libdirutils.so.0.1.0...done.
Loaded symbols for /opt/couchbase/lib/libdirutils.so.0.1.0
Reading symbols from /opt/couchbase/lib/libv8.so...done.
Loaded symbols for /opt/couchbase/lib/libv8.so
Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicui18n.so.44
Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicuuc.so.44
Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicudata.so.44
Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
Program terminated with signal 11, Segmentation fault.
#0 cbsasl_server_step (conn=0x0, input=0x6093c00d "", inputlen=10, output=0x7f1eb810abf0, outputlen=0x7f1eb810abfc) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/cbsasl/server.c:116
116 /home/buildbot/centos-5-x64-300-builder/build/build/memcached/cbsasl/server.c: No such file or directory.
in /home/buildbot/centos-5-x64-300-builder/build/build/memcached/cbsasl/server.c
Missing separate debuginfos, use: debuginfo-install couchbase-server-3.0.0-848.x86_64
(gdb) t a a bt

Thread 17 (Thread 0x7f1eb1cc7700 (LWP 15006)):
#0 0x00007f1ebe638054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1ebe633388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f1ebe633257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1ebf87d9f9 in cb_mutex_enter (mutex=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f1eb603ab0d in Mutex::acquire (this=0x2baf080) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/mutex.cc:31
#5 0x00007f1eb606363e in lock (this=0x2baf088, task=..., waketime=..., taskType=@0xffffffffffffffff, now=...) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/locks.h:66
#6 LockHolder (this=0x2baf088, task=..., waketime=..., taskType=@0xffffffffffffffff, now=...) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/locks.h:44
#7 TaskQueue::fetchNextTask(ExTask &, timeval &, ._241 &, timeval) (this=0x2baf088, task=..., waketime=..., taskType=@0xffffffffffffffff, now=...)
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:64
#8 0x00007f1eb602a9a0 in ExecutorPool::nextTask (this=0x745d8e0, t=..., tick=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:146
#9 0x00007f1eb603be5d in ExecutorThread::run (this=0x2b9a480) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:77
#10 0x00007f1eb603c416 in launch_executor_thread (arg=0x2baf088) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#11 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46cd0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#12 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f1eb08c5700 (LWP 15008)):
#0 0x00007f1ebe638054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1ebe633388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f1ebe633257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1ebf87d9f9 in cb_mutex_enter (mutex=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f1eb603ab0d in Mutex::acquire (this=0x2baf080) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/mutex.cc:31
#5 0x00007f1eb60626c5 in lock (this=0x2baf080, task=..., curTaskType=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/locks.h:66
#6 LockHolder (this=0x2baf080, task=..., curTaskType=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/locks.h:44
#7 TaskQueue::reschedule(ExTask &, ._241 &) (this=0x2baf080, task=..., curTaskType=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:167
#8 0x00007f1eb603bfef in ExecutorThread::run (this=0x2b9a5a0) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:108
#9 0x00007f1eb603c416 in launch_executor_thread (arg=0x2baf088) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#10 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46d00) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#11 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f1eb770a700 (LWP 15001)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7406f00, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7406f00, flags=<value optimized out>) at event.c:1558
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b461c0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f1eb6d09700 (LWP 15002)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7407180, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7407180, flags=<value optimized out>) at event.c:1558
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b461d0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f1eb3aca700 (LWP 15003)):
#0 0x00007f1ebd797b8d in nanosleep () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#1 0x00007f1ebd7ccd64 in usleep () from /lib64/libc.so.6
#2 0x00007f1eb60396e5 in updateStatsThread (arg=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/memory_tracker.cc:36
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b462e0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f1ebc132700 (LWP 14994)):
#0 0x00007f1ebd7c660d in read () from /lib64/libc.so.6
#1 0x00007f1ebd75cf68 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00007f1ebd75ea6e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00007f1ebd75314a in _IO_getline_info_internal () from /lib64/libc.so.6
#4 0x00007f1ebd751fa9 in fgets () from /lib64/libc.so.6
#5 0x00007f1ebc1338b1 in check_stdin_thread (arg=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/extensions/daemon/stdin_check.c:38
#6 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b460e0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#7 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f1ebb51d700 (LWP 14995)):
#0 0x00007f1ebe6357bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1ebf87d8eb in cb_cond_timedwait (cond=0x7f1ebb731e60, mutex=0x7f1ebb731e20, ms=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f1ebb521548 in logger_thead_main (arg=0x2b86900) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/extensions/loggers/file_logger.c:372
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46080) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f1eafec4700 (LWP 15009)):
#0 0x00007f1ebe638054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1ebe633388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f1ebe633257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1ebf87d9f9 in cb_mutex_enter (mutex=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f1eb603ab0d in Mutex::acquire (this=0x2baf080) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/mutex.cc:31
#5 0x00007f1eb606363e in lock (this=0x2baf088, task=..., waketime=..., taskType=@0xffffffffffffffff, now=...) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/locks.h:66
#6 LockHolder (this=0x2baf088, task=..., waketime=..., taskType=@0xffffffffffffffff, now=...) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/locks.h:44
#7 TaskQueue::fetchNextTask(ExTask &, timeval &, ._241 &, timeval) (this=0x2baf088, task=..., waketime=..., taskType=@0xffffffffffffffff, now=...)
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:64
#8 0x00007f1eb602a9a0 in ExecutorPool::nextTask (this=0x745d8e0, t=..., tick=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:146
#9 0x00007f1eb603be5d in ExecutorThread::run (this=0x2b9a630) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:77
#10 0x00007f1eb603c416 in launch_executor_thread (arg=0x2baf088) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#11 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46d10) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#12 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f1eb12c6700 (LWP 15007)):
#0 0x00007f1eb606628e in pop_heap<std::_Deque_iterator<SingleThreadedRCPtr<GlobalTask>, SingleThreadedRCPtr<GlobalTask>&, SingleThreadedRCPtr<GlobalTask>*>, CompareByDueDate> (this=0x2baf160)
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/atomic.h:109
#1 std::priority_queue<SingleThreadedRCPtr<GlobalTask>, std::deque<SingleThreadedRCPtr<GlobalTask>, std::allocator<SingleThreadedRCPtr<GlobalTask> > >, CompareByDueDate>::pop (this=0x2baf160)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_queue.h:456
#2 0x00007f1eb60631ed in TaskQueue::moveReadyTasks (this=0x2baf080, tv=...) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:115
#3 0x00007f1eb606366d in TaskQueue::fetchNextTask(ExTask &, timeval &, ._241 &, timeval) (this=0x2baf080, task=..., waketime=..., taskType=@0x2b9a558, now=...)
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:70
#4 0x00007f1eb602a9a0 in ExecutorPool::nextTask (this=0x745d8e0, t=..., tick=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:146
---Type <return> to continue, or q <return> to quit---
#5 0x00007f1eb603be5d in ExecutorThread::run (this=0x2b9a510) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:77
#6 0x00007f1eb603c416 in launch_executor_thread (arg=0x7f1eb12c5b30) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#7 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46cf0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#8 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f1eb9f0e700 (LWP 14997)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7406500, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7406500, flags=<value optimized out>) at event.c:1558
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46180) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f1eb8b0c700 (LWP 14999)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7406a00, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7406a00, flags=<value optimized out>) at event.c:1558
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b461a0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f1eba90f700 (LWP 14996)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7406280, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7406280, flags=<value optimized out>) at event.c:1558
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46170) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f1eb950d700 (LWP 14998)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7406780, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7406780, flags=<value optimized out>) at event.c:1558
#3 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46190) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f1eb26c8700 (LWP 15005)):


#0 ObjectRegistry::memoryAllocated (mem=1050039744) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/objectregistry.cc:192
#1 0x00007f1ebf036576 in MallocHook::InvokeNewHookSlow (p=0x608b8d80, s=41) at src/malloc_hook.cc:514

#2 0x00007f1ebf03b8f3 in InvokeNewHook (size=41) at src/malloc_hook-inl.h:154
#3 tc_new (size=41) at src/tcmalloc.cc:1622
#4 0x00007f1ebdfb43c9 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) ()
   from /usr/lib64/libstdc++.so.6
#5 0x00007f1ebdfb5daa in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, unsigned long) () from /usr/lib64/libstdc++.so.6
#6 0x00007f1ebdfb5f6c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) ()
   from /usr/lib64/libstdc++.so.6


#7 0x00007f1eb6024f3e in Item::Item (this=0x5adcb590, k=0x5b529c38, nk=16, fl=<value optimized out>, exp=<value optimized out>, dta=0x362a2d50, nb=13, ext_meta=0x7f1eb26c74f4 "", ext_len=1 '\001',
    theCas=6719750077747650, i=15636, vbid=508, sno=1, nru_value=2 '\002') at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/item.h:334
#8 0x00007f1eb6097f30 in CouchKVStore::recordDbDump (db=0x1ec18680, docinfo=0x5b529bf0, ctx=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1665
#9 0x00007f1eb5da3320 in lookup_callback (rq=<value optimized out>, k=0x7f1eb26c75c0, v=0x7f1eb26c75b0) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/couch_db.cc:725
#10 0x00007f1eb5da1b07 in btree_lookup_inner (rq=0x7f1eb26c7840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:103
#11 0x00007f1eb5da1c5c in btree_lookup_inner (rq=0x7f1eb26c7840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:71
#12 0x00007f1eb5da1c5c in btree_lookup_inner (rq=0x7f1eb26c7840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:71
#13 0x00007f1eb5da1c5c in btree_lookup_inner (rq=0x7f1eb26c7840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:71
#14 0x00007f1eb5da26b9 in couchstore_changes_since (db=0x1ec18680, since=<value optimized out>, options=<value optimized out>, callback=<value optimized out>, ctx=<value optimized out>)
    at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/couch_db.cc:768
#15 0x00007f1eb609ccbb in CouchKVStore::loadDB (this=0x8118680, cb=std::tr1::shared_ptr (count -125632512) 0x7f1eb26c7b80, cl=std::tr1::shared_ptr (count 1524444096) 0x7f1eb26c7b70,
    sr=std::tr1::shared_ptr (count 206502592) 0x7f1eb26c7b60, keysOnly=false, vbid=508, startSeqno=0, options=4)
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1261
#16 0x00007f1eb609e264 in CouchKVStore::dump (this=0x8118680, vbids=std::vector of length 167, capacity 256 = {...}, cb=std::tr1::shared_ptr (count 1) 0x7f1eb26c7c70,
    cl=std::tr1::shared_ptr (count 266044744) 0x7f1eb26c7c60) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1092
#17 0x00007f1eb6083591 in Warmup::loadDataforShard (this=0x74423c0, shardId=0) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/warmup.cc:778
#18 0x00007f1eb6089a17 in WarmupLoadingData::run (this=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/warmup.h:377
#19 0x00007f1eb603bef1 in ExecutorThread::run (this=0x2b9a3f0) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:94
#20 0x00007f1eb603c416 in launch_executor_thread (arg=0x6) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#21 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46ce0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#22 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f1ec00947e0 (LWP 14993)):
#0 0x00007f1ebd7d3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f1ebee13376 in epoll_dispatch (base=0x7406000, tv=<value optimized out>) at epoll.c:404
#2 0x00007f1ebedfec44 in event_base_loop (base=0x7406000, flags=<value optimized out>) at event.c:1558
#3 0x000000000040f1c9 in main (argc=<value optimized out>, argv=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:8795

Thread 2 (Thread 0x7f1eb30c9700 (LWP 15004)):
#0 0x00007f1ebdfce676 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib64/libstdc++.so.6
#1 0x00007f1eb609801e in release (db=0x2b57b00, docinfo=0x5d441f80, ctx=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/boost_shared_ptr.h:151
#2 ~shared_count (db=0x2b57b00, docinfo=0x5d441f80, ctx=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/boost_shared_ptr.h:277
#3 ~shared_ptr (db=0x2b57b00, docinfo=0x5d441f80, ctx=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/boost_shared_ptr.h:486
#4 CouchKVStore::recordDbDump (db=0x2b57b00, docinfo=0x5d441f80, ctx=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1679
#5 0x00007f1eb5da3320 in lookup_callback (rq=<value optimized out>, k=0x7f1eb30c85c0, v=0x7f1eb30c85b0) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/couch_db.cc:725
#6 0x00007f1eb5da1b07 in btree_lookup_inner (rq=0x7f1eb30c8840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:103
#7 0x00007f1eb5da1c5c in btree_lookup_inner (rq=0x7f1eb30c8840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:71
#8 0x00007f1eb5da1c5c in btree_lookup_inner (rq=0x7f1eb30c8840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:71
#9 0x00007f1eb5da1c5c in btree_lookup_inner (rq=0x7f1eb30c8840, diskpos=<value optimized out>, current=0, end=1) at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/btree_read.cc:71
#10 0x00007f1eb5da26b9 in couchstore_changes_since (db=0x2b57b00, since=<value optimized out>, options=<value optimized out>, callback=<value optimized out>, ctx=<value optimized out>)
    at /home/buildbot/centos-5-x64-300-builder/build/build/couchstore/src/couch_db.cc:768
#11 0x00007f1eb609ccbb in CouchKVStore::loadDB (this=0x8119380, cb=std::tr1::shared_ptr (count -125632512) 0x7f1eb30c8b80, cl=std::tr1::shared_ptr (count 549226688) 0x7f1eb30c8b70,
    sr=std::tr1::shared_ptr (count 1554547728) 0x7f1eb30c8b60, keysOnly=false, vbid=185, startSeqno=0, options=4)
    at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1261
#12 0x00007f1eb609e264 in CouchKVStore::dump (this=0x8119380, vbids=std::vector of length 166, capacity 256 = {...}, cb=std::tr1::shared_ptr (count 1) 0x7f1eb30c8c70,
    cl=std::tr1::shared_ptr (count 266044744) 0x7f1eb30c8c60) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1092
#13 0x00007f1eb6083591 in Warmup::loadDataforShard (this=0x74423c0, shardId=1) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/warmup.cc:778
#14 0x00007f1eb6089a17 in WarmupLoadingData::run (this=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/warmup.h:377
#15 0x00007f1eb603bef1 in ExecutorThread::run (this=0x2b9a360) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:94
#16 0x00007f1eb603c416 in launch_executor_thread (arg=0x5ca88468) at /home/buildbot/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#17 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b46cc0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#18 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#19 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f1eb810b700 (LWP 15000)):
#0 cbsasl_server_step (conn=0x0, input=0x6093c00d "", inputlen=10, output=0x7f1eb810abf0, outputlen=0x7f1eb810abfc) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/cbsasl/server.c:116
#1 0x000000000041740c in process_bin_complete_sasl_auth (c=0x72ba000) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:2081
#2 0x0000000000417bd5 in complete_nread (c=0x72ba000) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5775
#3 conn_nread (c=0x72ba000) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:6997
#4 0x0000000000409ddd in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x72ba000) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7270
#5 0x00007f1ebedfed3c in event_process_active_single_queue (base=0x7406c80, flags=<value optimized out>) at event.c:1308
#6 event_process_active (base=0x7406c80, flags=<value optimized out>) at event.c:1375
#7 event_base_loop (base=0x7406c80, flags=<value optimized out>) at event.c:1572
#8 0x00007f1ebf87db6f in platform_thread_wrap (arg=0x2b461b0) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#9 0x00007f1ebe631851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f1ebd7d390d in clone () from /lib64/libc.so.6
Comment by Trond Norbye [ 23/Jun/14 ]
Is the coredump and memcached installation still available? I'd like to debug the corefile myself?
Comment by Trond Norbye [ 23/Jun/14 ]
I Just looked at this coredump. the connections sasl_data is set to NULL... Should be an easy fix
Comment by Trond Norbye [ 23/Jun/14 ]
http://review.couchbase.org/#/c/38689/
Comment by Andrei Baranouski [ 23/Jun/14 ]
yes, it's available on 172.23.105.157 /data/core.memcached.14993 but apparently you've already found it




[MB-10917] CBBackup/CBRestore needs to support compression Created: 21/Apr/14  Updated: 22/Aug/14  Resolved: 04/Jun/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Don Pinto Assignee: Ashvinder Singh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   
Customer X wants to backup terabytes of data. Wants our backup tool to support compression.

Today's options are :
1. Use 3rd party backup tools on the archive
2. Use storage level compression tools



 Comments   
Comment by Anil Kumar [ 04/Jun/14 ]
Fixed in 3.0.




[MB-11985] couchbase server 2.x.x failed to start in server if that server was used to upgrade from 2.x to 3.0.0 and uninstall couchbase server 3.0.0 Created: 18/Aug/14  Updated: 22/Aug/14  Resolved: 22/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Bin Cui
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server 2008 R2 64-bit

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 2.x.x to a windows server 2008 R2 64-bit
Upgrade couchbase server to 3.0.0-1159
Uninstall this couchbase server 3.0.0
Install any couchbase server 2.x.x into this server, couchbase server 2.x.x will not start.

 Comments   
Comment by Thuan Nguyen [ 18/Aug/14 ]
This will block all windows upgrade jobs from 2.x.x to 3.0.0
Comment by Bin Cui [ 22/Aug/14 ]
When 3.0 is uninstalled, somehow the service doesn't remove service "CouchbaseServer" and it blocks 2.0 from registering service name correctly.
Comment by Bin Cui [ 22/Aug/14 ]
In fact, after 3.0 couchbase server is uninstalled, there is still a "Couchbase Moxi Service" in the service console.
Comment by Chris Hillery [ 22/Aug/14 ]
I haven't the first idea how to address this, and I'm pretty sure it will involve modifying InstallShield configurations. Bin is the only person who has the access and knowledge to do this, so I'm assigning this back to him. If there's something I'm missing that I can actually help with, please give me specifics.
Comment by Bin Cui [ 22/Aug/14 ]
The problem is caused by different version of erlang.
in 2.5. we bundled erlang 5.8.5 and in 3.0, we bundled erlang 5.10.4.
Due to whatever reason, erlang with older version won't be able to register couchbaseserver as window service if the registerkey is created by newer version.

Please remove the following registry key and the problem should be solved:

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Ericsson\Erlang\ErlSrv.
Comment by Thuan Nguyen [ 22/Aug/14 ]
So when we uninstall 3.0.0, we need to remove that file.




[MB-11156] able to failover all nodes from the cluster: used couchbase-cli Created: 19/May/14  Updated: 22/Aug/14  Resolved: 07/Jul/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server, tools, UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Parag Agarwal
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-692

Attachments: PNG File failover_all_nodes.png     PNG File MB-1156.png     PNG File Screen Shot 2014-07-07 at 7.10.50 PM.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
so, will try to give more detailed steps to reproduce, but I got it during the implementation of the tests on failover, delta recovery, rebalance operation, etc using couchbase-cli

I also filled separate UI ticket for the issue: MB-11155

https://s3.amazonaws.com/bugdb/jira/MB-11155/47461e2d/10.3.4.144-5192014-529-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11155/47461e2d/10.3.4.146-5192014-530-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11155/47461e2d/10.3.4.147-5192014-533-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11155/47461e2d/10.3.4.148-5192014-532-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11155/47461e2d/10.3.4.149-5192014-534-diag.zip

 Comments   
Comment by Steve Yen [ 05/Jun/14 ]
> so, will try to give more detailed steps to reproduce...

Hi Andrei,
Any chance (best case!) you might be able to give couchbase-cli command line? (Perhaps it's a couchbase-cli bug?)
Comment by Andrei Baranouski [ 05/Jun/14 ]
not sure that it's completely couchbase-cli issue. believe that ns_server should not be allowed to do so also

steps:

2 nodes in the cluster: 10.3.4.144 & 10.3.4.145

[root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.144:8091 --server-failover=10.3.4.145:8091 -u Administrator -p password
INFO: graceful failover .
SUCCESS: failover ns_1@10.3.4.145
[root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.144:8091 --server-failover=10.3.4.144:8091 -u Administrator -p password
INFO: graceful failover .
SUCCESS: failover ns_1@10.3.4.144

please, see screenshot after these steps
Comment by Steve Yen [ 05/Jun/14 ]
Thanks Andrei.

Hi Bin,
Can you take a quick peek at this on the chance that CLI is mis-reporting some error result as a success?

Otherwise, this bug is probably ns-server related (missing some error case, perhaps?)

Thanks,
steve

Comment by Bin Cui [ 09/Jun/14 ]
Andrei,

Can you try out the latest build? I cannot reproduce the problem by following your test steps:
1. create two node cluster.
2. failover first one. Return successful as expected.
3. failover the second one. Error returns as expected.

-bash-3.2$ ./couchbase-cli failover --cluster 10.5.2.133 --server-failover=10.5
.2.133:8091 -u Administrator -p 123456
INFO: graceful failover . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
SUCCESS: failover ns_1@10.5.2.133
-bash-3.2$ ./couchbase-cli failover --cluster 10.6.2.91 --server-failover=10.6.
2.91:8091 -u Administrator -p 123456
ERROR: unable to failover ns_1@10.6.2.91 (400) Bad Request
ERROR: command: failover: 10.6.2.91:8091, No JSON object could be decoded
-bash-3.2$ ./couchbase-cli failover --cluster 10.6.2.91:8091 --server-failover=
10.6.2.91:8091 -u Administrator -p 123456
ERROR: unable to failover ns_1@10.6.2.91 (400) Bad Request
ERROR: command: failover: 10.6.2.91:8091, No JSON object could be decoded
Comment by Andrei Baranouski [ 10/Jun/14 ]

there are 2 cases:
a) no buckets in the cluster:
[root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.144:8091 --server-failover=10.3.4.145:8091 -u Administrator -p password
INFO: graceful failover
SUCCESS: failover ns_1@10.3.4.145

root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.144:8091 --server-failover=10.3.4.144:8091 -u Administrator -p password
INFO: graceful failover
SUCCESS: failover ns_1@10.3.4.144

b) there is a bucket in the cluster

[root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.148:8091 --server-failover=10.3.4.148:8091 -u Administrator -p password
INFO: graceful failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SUCCESS: failover ns_1@10.3.4.148
[root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.149:8091 --server-failover=10.3.4.149:8091 -u Administrator -p password
ERROR: unable to failover ns_1@10.3.4.149 (400) Bad Request
ERROR: command: failover: 10.3.4.149:8091, No JSON object could be decoded

it's fine but with --force:
[root@localhost bin]# ./couchbase-cli failover --cluster 10.3.4.149:8091 --server-failover=10.3.4.149:8091 -u Administrator -p password --force
SUCCESS: failover ns_1@10.3.4.149


so, for case with buckets we have to apply "--force" to get the same picture as in MB-11155



Comment by Bin Cui [ 10/Jun/14 ]
Again, assign to ns_server team to check if the test results are in line with expectation.
Comment by Aleksey Kondratenko [ 10/Jun/14 ]
Certainly, not a bug with CLI. Error checking and validation is supposed to be done in ns_server. And I think I've seen duplicate of this bug somewhere.
Comment by Anil Kumar [ 19/Jun/14 ]
Triage - June 19 2014 Alk, Parag, Anil
Comment by Aliaksey Artamonau [ 30/Jun/14 ]
http://review.couchbase.org/38906
Comment by Andrei Baranouski [ 07/Jul/14 ]
build 3.0.0-918

[root@centos-64-x64 bin]# ./couchbase-cli failover --cluster 172.23.105.156:8091 --server-failover=172.23.105.156:8091 -u Administrator -p password
INFO: graceful failover
SUCCESS: failover ns_1@127.0.0.1

http://www.couchbase.com/issues/secure/attachment/21172/Screen%20Shot%202014-07-07%20at%207.10.50%20PM.png
Comment by Aliaksey Artamonau [ 07/Jul/14 ]
http://review.couchbase.org/39182




[MB-11775] Rebalance-stop is slow -- takes multiple attempts to stop rebalance Created: 21/Jul/14  Updated: 22/Aug/14  Resolved: 23/Jul/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Ketaki Gangal Assignee: Ketaki Gangal
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-973-rel
Centos 6.4

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Setup:

1. Cluster 7 nodes, 2 buckets, 1 design doc X 2 views
2. Load 120M, 99M items on both the buckets, dgm state of 70% active resident.
3. Do a graceful failover on 1 node
4. Choose delta recovery, add back node and rebalance

5. I tried to stop the rebalance a couple of times ( about 10 times) --- Unsuccessful on a number of attempts.
Rebalance eventually failed with reason " Rebalance exited with reason stop" --- Rebalance stop is not working as expected.

- Attaching logs



 Comments   
Comment by Ketaki Gangal [ 21/Jul/14 ]
Logs https://s3.amazonaws.com/bugdb/MB-11775/11775.tar
Comment by Aleksey Kondratenko [ 21/Jul/14 ]
I'll need more diagnostics to figure out this case. I've added some in this commit (still pending review and merge): http://review.couchbase.org/39625

Please retest after this commit is merged so that I can see what makes rebalance stop slow.
Comment by Aleksey Kondratenko [ 21/Jul/14 ]
referenced commit is now in. So test as soon as you get next build
Comment by Ketaki Gangal [ 22/Jul/14 ]
Tested with build which contains the above commit - build 3.0.0-999-rel.

Seeing same behaviour, where it takes a couple of attempts to stop rebalance.

Logs at https://s3.amazonaws.com/bugdb/MB-11775/11775-2.tar
Comment by Aleksey Kondratenko [ 22/Jul/14 ]
Uploaded probable fix:

http://review.couchbase.org/39694

With this fix (assuming I am right about cause of slowness) we'll be able to stop even if some node is stuck somewhere in janitor_agent which could in turn be due to view engine. That would mean that original slowness would be (maybe) visible elsewhere. Possibly in harder to debug way.

So in order to diagnose _that_ I need you to capture diag or collectinfo from just one node _immediately_ after you're sending stop and it is slow. If this is done correctly I'll be able to see what is causing that slowness in first place. Note that it needs to be done on build prior to rebalance stop fix that I've referred to above.
Comment by Aleksey Kondratenko [ 22/Jul/14 ]
merged. So rebalance stop should not be slow anymore. But see above for some additional investigation that we should do.
Comment by Aleksey Kondratenko [ 22/Jul/14 ]
reverted for now
Comment by Aleksey Kondratenko [ 23/Jul/14 ]
Merged hopefully more correct fix: http://review.couchbase.org/39756




[MB-11701] UI graceful option should be greyed out when there are no replicas. Created: 11/Jul/14  Updated: 22/Aug/14  Resolved: 24/Jul/14

Status: Closed
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0, 3.0-Beta
Security Level: Public

Type: Bug Priority: Major
Reporter: Patrick Varley Assignee: Pavel Blagodov
Resolution: Fixed Votes: 1
Labels: failover
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2 buckets
beer-sample bucket 1 replica
XDCR bucket 0 replica

Attachments: PNG File Failover.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: No

 Description   
When a bucket has no replicas it cannot be gracefully failed over.

In the UI we hide the graceful button which I believe is bad UI design instead we should grey it out and explain that graceful failover is not available without the correct replicas vBuckets.

 Comments   
Comment by Anil Kumar [ 18/Jul/14 ]
Pavel - Instead of 'hiding' lets grey out the Graceful Fail Over option.

( ) Graceful Fail Over (default) [Grey out]
(*) Hard Fail Over ...............

Attention – Graceful fail over option is not available either because node is unreachable or replica vbucket cannot be activated gracefully.

Warning --
Comment by Pavel Blagodov [ 24/Jul/14 ]
http://review.couchbase.org/39604
Comment by Parag Agarwal [ 22/Aug/14 ]
works, have added tests for the same




[MB-11910] rebalance request fails with 500 (was: UI:: Removal Node+Failover and Delta Recovery Add Back leads to Rebalance failure without the reason for failure) Created: 08/Aug/14  Updated: 22/Aug/14  Resolved: 11/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Parag Agarwal
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1105,

1. Create 3 node cluster
2. Failover another node and mark for add back with delta recovery type
3. Mark another node for removal
4. Rebalance

Rebalance fails without any message being given in the UI logs or reason. We need to make sure that a proper error message is displayed.


 Comments   
Comment by Parag Agarwal [ 08/Aug/14 ]
Saw the same issue when we failover a node instead of selecting for node removal
Comment by Aleksey Kondratenko [ 08/Aug/14 ]
Tried to reproduce using 3 ways mentioned above but saw error message instead of silent failure.
Comment by Aleksey Kondratenko [ 08/Aug/14 ]
https://s3.amazonaws.com/cb-customers/alk/11/collectinfo-2014-08-08T234506-ns_1%4010.6.2.144.zip
Comment by Aleksey Kondratenko [ 11/Aug/14 ]
http://review.couchbase.org/40501




[MB-12037] ns_server may lose replicas on stopped rebalance/graceful failover (was: {DCP} : Delta Recovery Impossible after re-try of graceful failover since in first attempt failed) Created: 21/Aug/14  Updated: 22/Aug/14  Resolved: 22/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 10.6.2.144-10.6.2.150
centos 6x
1174

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8202014-2226-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8202014-2228-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8202014-2228-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/1174.log.tar.gz
Is this a Regression?: Unknown

 Description   
Scenario
1. Create a 7 Node cluster
2. Create default bucket with 100 K items
3. Graceful failover a node
4. Kill memcached of another node during graceful failover
5. Graceful failover the same node in step 3
6. Add-back the node with delta recovery
7. Hit Rebalance

In Step 7, Rebalance fails for delta recovery. Says delta recovery is not possible. Although we see nodes in the cluster are in healthy state.

We see the following warning:: "Fail Over Warning: Rebalance required, some data is not currently replicated!”

Seems like the delta recovery will not work in this condition, unless we rebalance the cluster. Also, I was able to cancel the delta recovery and do full recovery.

Opening the bug to follow-up on the issue. Attaching logs and data files



 Comments   
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
I was able to reproduce it easily. There's indeed something wrong with restarting graceful failover which impacts delta recovery.
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
And predictably happens with any stop/restart of graceful failover.
Comment by Parag Agarwal [ 21/Aug/14 ]
When the warning is showing (" Rebalance required, some data is not currently replicated!"), we don't expect delta recovery to succeed and this should be correct behavior? Asking since we will have to document it as well

Comment by Aleksey Kondratenko [ 21/Aug/14 ]
Warning has nothing to do with that. And warning is valid. Midway into graceful failover your're not balanced indeed.
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
manifest updated here: http://review.couchbase.org/40811

fix merged here: http://review.couchbase.org/40803
Comment by Parag Agarwal [ 22/Aug/14 ]
Tested
Comment by Parag Agarwal [ 22/Aug/14 ]
Test Run:: http://qa.hq.northscale.net/job/centos_x64--02_01--Rebalance-In/6/console
Comment by Parag Agarwal [ 22/Aug/14 ]
Saw the issue again for the following scenario with build 1186 1

 Scenario
1. Create a 7 Node cluster
2. Create default bucket with 200 K items
3. Graceful failover a node
4. Kill memcached of another node during graceful failover
5. Graceful failover the same node in step 3
6. Add-back the node with delta recovery
7. Hit Rebalance

We see the following warning:: "Fail Over Warning: Rebalance required, some data is not currently replicated!”

In Step 7, Rebalance fails for delta recovery. Says delta recovery is not possible. Although we see nodes in the cluster are in healthy state. This is true when we have 200 K items Vs 100K items where it passes.

I am attaching the logs for you to analyze. Since the above warning comes in both cases. Not sure about the internal state of the system which stops the add-back delta recovery.

Test fails for 2k items
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1436-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1437-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1439-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1440-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1441-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1442-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1444-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1446-couch.tar.gz

Test passes for 1 K Items

https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1458-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-150-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-151-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-153-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-154-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-156-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-157-diag.zip
Comment by Parag Agarwal [ 22/Aug/14 ]
I think the logs were not uploaded, will add them again
Comment by Parag Agarwal [ 22/Aug/14 ]
fixed the logs
Comment by Aleksey Kondratenko [ 22/Aug/14 ]
Please open new ticket for new instance of the issue
Comment by Parag Agarwal [ 22/Aug/14 ]
http://www.couchbase.com/issues/browse/MB-12055




[MB-12055] {DCP} : Delta Recovery Impossible after re-try of graceful failover since in first attempt failed Created: 22/Aug/14  Updated: 22/Aug/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 1186, centos 6x, 10.6.2.144-10.6.2.160

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
build 1186

 Scenario
1. Create a 7 Node cluster
2. Create default bucket with 200 K items
3. Graceful failover a node
4. Kill memcached of another node during graceful failover
5. Graceful failover the same node in step 3
6. Add-back the node with delta recovery
7. Hit Rebalance

We see the following warning:: "Fail Over Warning: Rebalance required, some data is not currently replicated!”

In Step 7, Rebalance fails for delta recovery. Says delta recovery is not possible. Although we see nodes in the cluster are in healthy state. This is true when we have 200 K items Vs 100K items where it passes.

I am attaching the logs for you to analyze. Since the above warning comes in both cases. Not sure about the internal state of the system which stops the add-back delta recovery.

Test fails for 2k items
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1436-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1437-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1439-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1440-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1441-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1442-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1444-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1446-couch.tar.gz

Test passes for 1 K Items

https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1458-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-150-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-151-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-153-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-154-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-156-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-157-diag.zip




[MB-12041] Disabling access.log on multiple buckets results in node failing to become available Created: 21/Aug/14  Updated: 22/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Brent Woodruff Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
On review of a customer ticket today during support discussions, this particular issue was brought up. It is unclear from subsequent discussions in that ticket whether this issue was addressed and fixed.

Steps to reproduce:

* Initialize a Couchbase node with more than one bucket

* Disable the access.log on *both* buckets using the following command for each bucket:

wget -O- --user=Administrator --password=password --post-data='ns_bucket:update_bucket_props("bucket1", [{extra_config_string, "alog_path="}]).' http://localhost:8091/diag/eval

wget -O- --user=Administrator --password=password --post-data='ns_bucket:update_bucket_props("bucket2", [{extra_config_string, "alog_path="}]).' http://localhost:8091/diag/eval

where 'bucket1' and 'bucket2' are the bucket names.

* Restart the node and observe the following errors in the logs:

memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
memcached<0.89.0>: Unsupported key: <^A>

* Note that the node remains pending and never becomes available

 Comments   
Comment by Abhinav Dangeti [ 21/Aug/14 ]
I don't see the node failing to become available.

Started couchbase server with 2 buckets:
  1 Fri Aug 22 10:36:25.628702 PDT 3: (default) Trying to connect to mccouch: "127.0.0.1:13000"
  2 Fri Aug 22 10:36:25.628978 PDT 3: (default) Connected to mccouch: "127.0.0.1:13000"
  3 Fri Aug 22 10:36:25.644502 PDT 3: (No Engine) Bucket default registered with low priority
  4 Fri Aug 22 10:36:25.644528 PDT 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
  5 Fri Aug 22 10:36:25.646178 PDT 3: (default) metadata loaded in 982 usec
  6 Fri Aug 22 10:36:25.646205 PDT 3: (default) Enough number of items loaded to enable traffic
  7 Fri Aug 22 10:36:25.646559 PDT 3: (default) warmup completed in 1052 usec
  8 Fri Aug 22 10:36:33.495128 PDT 3: (default) Shutting down tap connections!
  9 Fri Aug 22 10:36:33.495174 PDT 3: (default) Shutting down dcp connections!
 10 Fri Aug 22 10:36:33.496244 PDT 3: (No Engine) Unregistering last bucket default
 11 Fri Aug 22 10:36:41.791797 PDT 3: (bucket1) Trying to connect to mccouch: "127.0.0.1:13000"
 12 Fri Aug 22 10:36:41.791932 PDT 3: (bucket1) Connected to mccouch: "127.0.0.1:13000"
 13 Fri Aug 22 10:36:41.800241 PDT 3: (No Engine) Bucket bucket1 registered with low priority
 14 Fri Aug 22 10:36:41.800273 PDT 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
 15 Fri Aug 22 10:36:41.801437 PDT 3: (bucket1) metadata loaded in 719 usec
 16 Fri Aug 22 10:36:41.801450 PDT 3: (bucket1) Enough number of items loaded to enable traffic
 17 Fri Aug 22 10:36:41.801593 PDT 3: (bucket1) warmup completed in 761 usec
 18 Fri Aug 22 10:36:46.922063 PDT 3: (bucket2) Trying to connect to mccouch: "127.0.0.1:13000"
 19 Fri Aug 22 10:36:46.922191 PDT 3: (bucket2) Connected to mccouch: "127.0.0.1:13000"
 20 Fri Aug 22 10:36:46.931024 PDT 3: (No Engine) Bucket bucket2 registered with low priority
 21 Fri Aug 22 10:36:46.932154 PDT 3: (bucket2) metadata loaded in 715 usec
 22 Fri Aug 22 10:36:46.932170 PDT 3: (bucket2) Enough number of items loaded to enable traffic
 23 Fri Aug 22 10:36:46.932314 PDT 3: (bucket2) warmup completed in 776 usec

Loaded 1000 items in each, and restarted node, after setting the alog_path to NULL, in the same way mentioned.
  1 Fri Aug 22 10:38:08.372050 PDT 3: (bucket2) Trying to connect to mccouch: "127.0.0.1:13000"
  2 Fri Aug 22 10:38:08.372307 PDT 3: (bucket2) Connected to mccouch: "127.0.0.1:13000"
  3 Fri Aug 22 10:38:08.382418 PDT 3: (No Engine) Bucket bucket2 registered with low priority
  4 Fri Aug 22 10:38:08.382445 PDT 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
  5 Fri Aug 22 10:38:08.434024 PDT 3: (bucket1) Trying to connect to mccouch: "127.0.0.1:13000"
  6 Fri Aug 22 10:38:08.434205 PDT 3: (bucket1) Connected to mccouch: "127.0.0.1:13000"
  7 Fri Aug 22 10:38:08.445064 PDT 3: (No Engine) Bucket bucket1 registered with low priority
  8 Fri Aug 22 10:38:08.481732 PDT 3: (bucket2) metadata loaded in 98 ms
  9 Fri Aug 22 10:38:08.507847 PDT 3: (bucket2) warmup completed in 124 ms
 10 Fri Aug 22 10:38:08.540342 PDT 3: (bucket1) metadata loaded in 92 ms
 11 Fri Aug 22 10:38:08.553951 PDT 3: (bucket1) warmup completed in 106 ms

[10:37:46] abhinav: ~/Documents/couchbase30/ep-engine $ ./management/cbstats localhost:12000 all -b bucket1 | grep alog
 ep_alog_block_size: 4096
 ep_alog_path:
 ep_alog_sleep_time: 1440
 ep_alog_task_time: 10
[10:38:50] abhinav: ~/Documents/couchbase30/ep-engine $ ./management/cbstats localhost:12000 all -b bucket2 | grep alog
 ep_alog_block_size: 4096
 ep_alog_path:
 ep_alog_sleep_time: 1440
 ep_alog_task_time: 10

I do see the duplicate entry warning, but that I'm guessing is because we set alog_path again after initializing it to the default value, in which case it would overwrite.
Comment by Abhinav Dangeti [ 22/Aug/14 ]
I tried your scenario with the latest 3.0 and then with 2.5.1, and noted similar behavior.
Can you point me to the build, with which you saw this issue?




[MB-10092] Changing watermark thresholds doesn't work in percentages - See more at: http://www.couchbase.com/communities/q-and-a/changing-watermark-thresholds-doesnt-work-percentages#sthash.Ii2GRmpx.dpuf Created: 31/Jan/14  Updated: 22/Aug/14  Resolved: 31/Jan/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Minor
Reporter: kay Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: any

Issue Links:
Duplicate
duplicates MB-10096 cbepctl doesn't support setting certa... Open
Operating System: Centos 64-bit

 Description   
When I start couchbase and set watermark limit in percents for warm-up - warming-up stops. And limits became 90 and 94 bytes. Is it a bug?

The commands I use: /opt/couchbase/bin/cbepctl localhost:11210 -b default set flush_param mem_log_wat 90
/opt/couchbase/bin/cbepctl localhost:11210 -b default set flush_param mem_high_wat 94

P.S. When I set limits in bytes - everything works fine.




[MB-7963] ep_item_flush_expired not matching up with vb_active_expired Created: 25/Mar/13  Updated: 22/Aug/14  Resolved: 02/Apr/13

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Perry Krug
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Load 100 items (or 1000, or 100000) with an expiration time of 1 second. You will see the ep_item_flush_expired stat increase (because we are expiring items when they get written to disk) but not the vb_active_expired

 Comments   
Comment by Mike Wiederhold [ 02/Apr/13 ]
Perry,

ep_item_flush_expires is the number of times an item is not flushed due to the expiry of the item. This means that if we see that an item is going to expire soon we choose not to persist it and increment this stat. The item is not actually expired though and still lives in memory.

When the expiry pager rund you will see that the items actually get expired and then the vb_active_expired stat is increased.

Please try running the test again and make the expiry pager 30 seconds. Once the expiry pager runs you should see that the two stats are roughly equal.
Comment by Perry Krug [ 02/Apr/13 ]
Thanks, that was my misunderstanding of the statistic.




[MB-11458] View engine fails to parse noop request from upr server Created: 18/Jun/14  Updated: 22/Aug/14  Resolved: 23/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Sarath Lakshman Assignee: Sarath Lakshman
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates MB-11464 Initial indexing of 20M docs gets stuck Closed
Relates to
relates to MB-11462 [System Test] Initial Indexing is stu... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Sometimes I am receiving two response messages for a stream request with status as 0 and 34

EP-Engine log:

Jun 18 13:49:03.360729 UTC 3: (default) UPR (Producer) eq_uprq:mapreduce_view: default _design/dev_test (prod/main) - (vb 778) stream created with start seqno 7 and end seqno 20
Wed Jun 18 13:49:03.367437 UTC 3: (default) UPR (Producer) eq_uprq:mapreduce_view: default _design/dev_test (prod/main) - (vb 778) Sending disk snapshot with start seqno 7 and end seqno 42
Wed Jun 18 13:49:03.367732 UTC 3: (default) UPR (Producer) eq_uprq:mapreduce_view: default _design/dev_test (prod/main) - (vb 778) Backfill complete, 35 items read from disk, last seqno read: 42
Wed Jun 18 13:49:03.368616 UTC 3: (default) UPR (Producer) eq_uprq:mapreduce_view: default _design/dev_test (prod/main) - (vb 778) Stream closing, 35 items sent from disk, 0 items sent from memory, 42 was last seqno sent
Wed Jun 18 13:49:05.371994 UTC 3: (default) UPR (Producer) eq_uprq:mapreduce_view: default _design/dev_test (prod/main) - (vb 778) Stream request failed because the snap start seqno (0) <= start seqno (7) <= snap end seqno (0) is required


View engine log with debug message to print response:
chdb:info,2014-06-18T13:49:03.358,n_0@127.0.0.1:<0.960.0>:couch_log:info:39]set view `default`, main (prod) group `_design/dev_test`: received a snapshot marker (on-disk) for partition 777 from sequence 6 to 38
[couchdb:info,2014-06-18T13:49:03.361,n_0@127.0.0.1:<0.870.0>:couch_log:info:39]Stream created for request id 20081 with status 0
[couchdb:info,2014-06-18T13:49:05.372,n_0@127.0.0.1:<0.1019.0>:couch_log:info:39]Stream created for request id 20081 with status 34

Started noticing this after trying a test with this patch:
http://review.couchbase.org/#/c/38402/

Test:
1. Start couchbase
2. Create a simple view
3. Insert 1000 items and wait for indexing
4. Start inserting another 1M items

One of the upr streams that is created will have this problem

I can provide a live server with this problem upon request for debugging.


 Comments   
Comment by Sarath Lakshman [ 18/Jun/14 ]
You can easily reproduce this problem by applying following patches:
http://review.couchbase.org/#/c/38402
http://review.couchbase.org/#/c/38401

and run the following test with vbuckets=1024

view.viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,num_nodes_to_add=1,skip_rebalance=true,docs-per-day=1
Comment by Pavel Paulau [ 18/Jun/14 ]
Raising to test blocker due to MB-11464.
Comment by Mike Wiederhold [ 18/Jun/14 ]
I don't see much evidence that ep-engine is sending a stream request response twice. Here are some reasons why.

First off, we don't defer a response when we receive a stream request. What I mean by this is that it is not possible for a connection to send anything else once a stream request is received. This means that the request is received and a response is always sent immediately. Nothing is sent in between. I'll double check this with Trond, but I'm fairly confident that this is the case. Also, the two responses you are receiving come 2 seconds apart which is a fairly long time.

Second, it appears that the data in the stream requests are different. For example, the first on succeeds and likely contains a snap_start_seqno and snap_end_seqno of 7. The second one contains a snap_start_seqno and end_seqno of 0. We process the parameters as is from the stream request message so it is not possible for those to change based on time. As a result it seems more likely that a second stream request is sent with different parameters.

I talked with Sarath and he will get a packet trace to determine if ep-engine is sending to responses. I'm going to take a look at the other issues that were filed as duplicates in order to make sure that I didn't miss anything.
Comment by Sarath Lakshman [ 19/Jun/14 ]
Fixed.

commit b6625b75de6079eea5aa36e67c58752fcbbd9fee
Author: Sarath Lakshman <sarathlakshman@slynux.com>
Date: Thu Jun 19 00:43:32 2014 +0530

    MB-11458 Fix parsing of noop message from server

    The upr server sends noop request and the upr client should
    reply as response message. Currently the message parser expects
    noop message from server to be response type. This assumption is
    wrong.

    Renamed no_op to noop to make it consistent in the upr client
    implementation.

    Change-Id: Id55ec6f483e340f2a51c1c8effe9e10e10479c3a
    Reviewed-on: http://review.couchbase.org/38421
    Tested-by: buildbot <build@couchbase.com>
    Reviewed-by: Volker Mische <volker.mische@gmail.com>




[MB-11475] make simple-test isn't passing on i386 Created: 18/Jun/14  Updated: 22/Aug/14  Resolved: 25/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Aleksey Kondratenko Assignee: Aleksey Kondratenko
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
SUBJ. It was passing some days ago.

I'm making bold assumption that it's caused by ep-engine because it's only component where 32-bitness might make difference and which was changed recently.

 Comments   
Comment by Aleksey Kondratenko [ 23/Jun/14 ]
Might be fixed. I'll rerun shortly.
Comment by Aleksey Kondratenko [ 23/Jun/14 ]
fixed already indeed




[MB-11424] vbuckettool shows incorrect vbucket id Created: 13/Jun/14  Updated: 22/Aug/14  Resolved: 25/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Trond Norbye
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-791-rel

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-11424/dca3be89/172.27.33.10-6132014-2024-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11424/dca3be89/172.27.33.11-6132014-2029-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11424/dca3be89/172.27.33.12-6132014-2026-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11424/dca3be89/172.27.33.13-6132014-2031-diag.zip
Is this a Regression?: Unknown

 Description   
I have 4 nodes cluster, 1 bucket
I expect to have vbucketid 0 for key 'vbuckettool-0', with master on .10 and replica on .11:

[root@kiwi-r109 bin]# ./couch_dbdump ../var/lib/couchbase/data/default/0.couch.1
Dumping "../var/lib/couchbase/data/default/0.couch.1":
Doc seq: 1
     id: vbuckettool-0
     rev: 1
     content_meta: 128
     size (on disk): 76
     size: 71
     data: (snappy) {"mutated": 0, "age": 0, "_id": "vbuckettool-0", "first_name": "james"}

Total docs: 1
[root@kiwi-r109 bin]# ./couch_dbdump ../var/lib/couchbase/data/default/281.couch.1
Dumping "../var/lib/couchbase/data/default/281.couch.1":
Doc seq: 1
     id: vbuckettool-281
     rev: 1
     content_meta: 128
     size (on disk): 82
     size: 73
     data: (snappy) {"mutated": 0, "age": 1, "_id": "vbuckettool-281", "first_name": "james"}

Total docs: 1
[root@kiwi-r109 bin]#

but get 281
[root@kiwi-r109 tools]# curl http://localhost:8091/pools/default/buckets/default |./vbuckettool - vbuckettool-0
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 12019 100 12019 0 0 680k 0 --:--:-- --:--:-- --:--:-- 11.4M
key: vbuckettool-0 master: 172.27.33.11:11210 vBucketId: 281 couchApiBase: http://172.27.33.11:8092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910 replicas: 172.27.33.10:11210


[root@kiwi-r109 bin]# curl http://localhost:8091/pools/default/buckets/default
{"name":"default","bucketType":"membase","authType":"sasl","saslPassword":"","proxyPort":0,"replicaIndex":true,"uri":"/pools/default/buckets/default?bucket_uuid=ebdbccd734abcf9d7eb3a3eb0b94a910","streamingUri":"/pools/default/bucketsStreaming/default?bucket_uuid=ebdbccd734abcf9d7eb3a3eb0b94a910","localRandomKeyUri":"/pools/default/buckets/default/localRandomKey","controllers":{"flush":"/pools/default/buckets/default/controller/doFlush","compactAll":"/pools/default/buckets/default/controller/compactBucket","compactDB":"/pools/default/buckets/default/controller/compactDatabases","purgeDeletes":"/pools/default/buckets/default/controller/unsafePurgeBucket","startRecovery":"/pools/default/buckets/default/controller/startRecovery"},"nodes":[{"couchApiBaseHTTPS":"https://172.27.33.13:18092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","couchApiBase":"http://172.27.33.13:8092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","systemStats":{"cpu_utilization_rate":0,"swap_total":4959834112,"swap_used":77824,"mem_total":4294967296,"mem_free":3701747712},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":10049718,"couch_docs_data_size":8865280,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":256,"curr_items_tot":512,"ep_bg_fetched":0,"get_hits":0,"mem_used":17736008,"ops":0,"vb_replica_curr_items":256},"uptime":"171783","memoryTotal":4294967296,"memoryFree":3701747712,"mcdMemoryReserved":3276,"mcdMemoryAllocated":3276,"replication":1,"clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@172.27.33.13","hostname":"172.27.33.13:8091","clusterCompatibility":196608,"version":"3.0.0-791-rel-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210}},{"couchApiBaseHTTPS":"https://172.27.33.12:18092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","couchApiBase":"http://172.27.33.12:8092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","systemStats":{"cpu_utilization_rate":1.246882793017456,"swap_total":4959834112,"swap_used":77824,"mem_total":4294967296,"mem_free":3690610688},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":10057534,"couch_docs_data_size":8873472,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":256,"curr_items_tot":512,"ep_bg_fetched":0,"get_hits":0,"mem_used":17736328,"ops":0,"vb_replica_curr_items":256},"uptime":"171828","memoryTotal":4294967296,"memoryFree":3690610688,"mcdMemoryReserved":3276,"mcdMemoryAllocated":3276,"replication":1,"clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@172.27.33.12","hostname":"172.27.33.12:8091","clusterCompatibility":196608,"version":"3.0.0-791-rel-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210}},{"couchApiBaseHTTPS":"https://172.27.33.11:18092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","couchApiBase":"http://172.27.33.11:8092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","systemStats":{"cpu_utilization_rate":0.7556675062972292,"swap_total":4959834112,"swap_used":77824,"mem_total":4294967296,"mem_free":3703681024},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":10006060,"couch_docs_data_size":8824320,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":256,"curr_items_tot":512,"ep_bg_fetched":0,"get_hits":0,"mem_used":17736072,"ops":0,"vb_replica_curr_items":256},"uptime":"171786","memoryTotal":4294967296,"memoryFree":3703681024,"mcdMemoryReserved":3276,"mcdMemoryAllocated":3276,"replication":1,"clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@172.27.33.11","hostname":"172.27.33.11:8091","clusterCompatibility":196608,"version":"3.0.0-791-rel-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210}},{"couchApiBaseHTTPS":"https://172.27.33.10:18092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","couchApiBase":"http://172.27.33.10:8092/default%2Bebdbccd734abcf9d7eb3a3eb0b94a910","systemStats":{"cpu_utilization_rate":3.365384615384615,"swap_total":4959834112,"swap_used":77824,"mem_total":4294967296,"mem_free":3593519104},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":10530020,"couch_docs_data_size":9348608,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":256,"curr_items_tot":512,"ep_bg_fetched":0,"get_hits":0,"mem_used":21905128,"ops":0,"vb_replica_curr_items":256},"uptime":"171764","memoryTotal":4294967296,"memoryFree":3593519104,"mcdMemoryReserved":3276,"mcdMemoryAllocated":3276,"replication":1,"clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@172.27.33.10","thisNode":true,"hostname":"172.27.33.10:8091","clusterCompatibility":196608,"version":"3.0.0-791-rel-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210}}],"stats":{"uri":"/pools/default/buckets/default/stats","directoryURI":"/pools/default/buckets/default/statsDirectory","nodeStatsListURI":"/pools/default/buckets/default/nodes"},"ddocs":{"uri":"/pools/default/buckets/default/ddocs"},"nodeLocator":"vbucket","fastWarmupSettings":false,"autoCompactionSettings":false,"uuid":"ebdbccd734abcf9d7eb3a3eb0b94a910","vBucketServerMap":{"hashAlgorithm":"CRC","numReplicas":1,"serverList":["172.27.33.10:11210","172.27.33.11:11210","172.27.33.12:11210","172.27.33.13:11210"],"vBucketMap":[[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[0,3],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[1,3],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[2,3],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,0],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,1],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2],[3,2]]},"replicaNumber":1,"threadsNumber":3,"quota":{"ram":9160359936,"rawRAM":2290089984},"basicStats":{"quotaPercentUsed":0.8199845478211567,"opsPerSec":0,"diskFetches":0,"itemCount":1024,"diskUsed":40643332,"dataUsed":35911680,"memUsed":75113536},"evictionPolicy":"valueOnly","bucketCapabilitiesVer":"","bucketCapabilities":["cbhello","datatype","touch","couchapi","cccp"]}

 Comments   
Comment by Bin Cui [ 17/Jun/14 ]
vbuckettool is managed by ep_engine team.
Comment by David Liao [ 24/Jun/14 ]
Is this libvbucket related?
Comment by Trond Norbye [ 25/Jun/14 ]
How did you decide what the expected vbucket should be? Is this purely based upon a client inserting the key? If so, which client are you using.
Comment by Iryna Mironava [ 25/Jun/14 ]
I am using MemcachedClient and load like client.set(key, 0, 0, value, my_vb_id)
also after that i check vbucket using ./couch_dbdump and my item is there. Am I doing something wrong?
Comment by Trond Norbye [ 25/Jun/14 ]
How do you calculate the my_vb_id you store there? Please note that vbuckettool does not try to look up the document in the cluster and print out where it is located, it just calculates where the object is _supposed_ to be according to the cluster topology and if the clients store the item correctly.

I've verified with the Java client and libcouchbase (but the latter use the same code as vbuckettool) that they want to use vbucket 281 for this key.

Comment by Iryna Mironava [ 25/Jun/14 ]
so if i choose the vbucket myself vbuckettool will not show currect bucket I am right? then i am closing the bug
Comment by Trond Norbye [ 25/Jun/14 ]
Yes, This isn't a bug




[MB-11364] xdcr replicator crashes because there's some data on the wire after upr_stream_end Created: 09/Jun/14  Updated: 22/Aug/14  Resolved: 09/Jul/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aliaksey Artamonau Assignee: Sriram Ganesan
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
[error_logger:error,2014-06-05T12:23:10.023,ns_1@10.1.3.93:error_logger<0.6.0>:ale_error_logger_handler:do_log:207]
=========================CRASH REPORT=========================
  crasher:
    initial call: erlang:apply/2
    pid: <0.15660.7>
    registered_name: []
    exception error: no case clause matching
                     {req,{upr_packet,85,0,702,0,4210752250,0,
                                      <<0,0,0,2>>,
                                      <<>>,<<>>},
                          <<128,86,0,0,20,0,2,190,0,0,0,20,250,250,250,250,0,0,
                            0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,0,0,0,
                            1>>,
                          28}
      in function xdcr_upr_streamer:do_start/11 (src/xdcr_upr_streamer.erl, line 187)
      in call from xdcr_upr_streamer:stream_vbucket_inner/9 (src/xdcr_upr_streamer.erl, line 250)
    ancestors: [<0.15654.7>]
    messages: []
    links: [<0.15654.7>,#Port<0.19750>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 46422
    stack_size: 27
    reductions: 7239
  neighbours:

See this log https://s3.amazonaws.com/bugdb/jira/MB-11344/7c278e29/10.1.3.93-652014-1416-diag.zip.

 Comments   
Comment by Chiyoung Seo [ 26/Jun/14 ]
Sriram,

If you need more details, please work with Aliaksey A who filed this issue.
Comment by Aliaksey Artamonau [ 26/Jun/14 ]
Just to be clear. I haven't encountered this issue myself. I just saw it in MB-11344 logs. And since it wasn't directly related to that ticket I opened a new one.
Comment by Sriram Ganesan [ 01/Jul/14 ]
I am not sure of how to interpret the following numbers in the crash report

{upr_packet,85,0,702,0,4210752250,0, <<0,0,0,2>>,
                                      <<>>,<<>>},
                          <<128,86,0,0,20,0,2,190,0,0,0,20,250,250,250,250,0,0,
                            0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,0,0,0,
                            1>>,
                          28}

Assuming that this is the suspect packet that is being received, can you confirm if "85" is the opcode of the packet that you are receiving after the stream end is received? Also if you can add details as to what the other numbers mean, that would help diagnose the problem better.
Comment by Aliaksey Artamonau [ 01/Jul/14 ]
<<128,86,0,0,20,0,2,190,0,0,0,20,250,250,250,250,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,0,0,0,1>> is the extra data that was received. It's an erlang notation for a stream of bytes. Each number is decimal. So it's request packet with UPR_SNAPSHOT_MARKER opcode.
Comment by Sriram Ganesan [ 02/Jul/14 ]
I tried running the test in MB-11344 a few times using build 3.0.0-914 in a 5 node CentOS 5.8 cluster and was unable to reproduce the same crash report with the extras in the upr_packet. If this issue pops up in any one of the later builds or if there is consistently reproducible test case, please reopen this bug.




[MB-11352] {UPR}:: Flow control with Rebalance-in does not satisfy condition :: max_unacked_bytes == 0 Created: 07/Jun/14  Updated: 22/Aug/14  Resolved: 16/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Parag Agarwal
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive upr_flow_control_rebalanacein.tar.gz    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
version: 3.0.0-790, centos-64 bit

1. Create 3 node cluster '10.5.2.13, 10.5.2.14, 10.5.2.15'
2. Create default bucket
3. Switch on upr flow control for the bucket
4. Send data (100K) and mutations to bucket
5. Rebalance-in 1 node (10.3.121.63) in the cluster
6. Check max_unacked_bytes == 0

Fails at Step 6

Test Run:: http://qa.hq.northscale.net/job/centos_x64--02_04--Rebalance_tests_UPR-P0/214/consoleText

Test Case:: testrunner -i /tmp/centos-64-2.0-basic-rebalance-tests-P0.ini -t rebalance.rebalancein.RebalanceInTests.rebalance_in_after_ops,nodes_in=1,nodes_init=3,replicas=1,items=100000,enable_flow_control=True,GROUP=IN

Discussed the bug with Mike, so assigning it to him

This might happen with other tests as well i.e. not related to rebalance-in.

Attaching logs.


 Comments   
Comment by Parag Agarwal [ 07/Jun/14 ]
Issue is not related to rebalance-in

./testrunner -i ~/ini/palm.ini -t analysis.clusterinfoanalysis.DataAnalysisTests.test_data_analysis_disk_memory_comparison_all,items=50000,upr=True,vbuckets=128,replicas=1,enable_flow_control=True,verify_max_unacked_bytes=True

This test case too fails.

2014-06-07 14:16:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_upr_max_unacked_bytes 524288 == 0 expected on '10.6.2.147:8091', default bucket
2014-06-07 14:16:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_upr_max_unacked_bytes 524288 == 0 expected on '10.6.2.148:8091', default bucket
2014-06-07 14:16:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_upr_max_unacked_bytes 524288 == 0 expected on '10.6.2.150:8091', default bucket
2014-06-07 14:16:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_upr_max_unacked_bytes 524288 == 0 expected on '10.6.2.149:8091', default bucket
Comment by Mike Wiederhold [ 09/Jun/14 ]
Lowering this to critical because we haven't enabled this feature in standard builds.
Comment by Mike Wiederhold [ 09/Jun/14 ]
Please re-test once the following changes are in a build.

http://review.couchbase.org/#/c/38030/
http://review.couchbase.org/#/c/38031/
Comment by Parag Agarwal [ 10/Jun/14 ]
Didn't pass with 797

http://qa.hq.northscale.net/job/centos_x64--02_04--Rebalance_In_out_UPR-P0/14/consoleText
Comment by Mike Wiederhold [ 10/Jun/14 ]
We found that the wrong stat was being grabbed by the test to check that we acked all of the bytes. Parag will fix this and retest.
Comment by Parag Agarwal [ 10/Jun/14 ]
Fixed the test cases

http://review.couchbase.org/#/c/38110/
Comment by Parag Agarwal [ 10/Jun/14 ]
Issue was with our test verification. Fixed

http://review.couchbase.org/#/c/38110/




[MB-11348] vbucket not being taken-over during rebalance Created: 06/Jun/14  Updated: 22/Aug/14  Resolved: 11/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Tommie McAfee Assignee: Tommie McAfee
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive collectinfo-2014-06-06T204835-n_0@10.0.0.105.zip     Zip Archive collectinfo-2014-06-06T204835-n_1@10.0.0.105.zip     GZip Archive ns_logs.tar.gz    
Triage: Untriaged
Is this a Regression?: Yes

 Description   
* started with a 4 node cluster
* failover + rebalance out 2
* attempted to rebalance out 1 more
       - hangs

ns_server is trying to move all vbuckets to node_0 but vb5 isn't being taken over:

vb_0: active
 vb_1: active
 vb_2: active
 vb_3: active
 vb_4: active
 vb_5: replica
 vb_6: active
 vb_7: active

according to the upr_stats vb5 takeover flags = 0:

 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_flags: 0
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_items_ready: false
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_opaque: 3
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_snap_end_seqno: 10
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_snap_start_seqno: 0
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_start_seqno: 10
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_state: reading
  
*** I'm not sure why items_ready = false


the other streams however were takeover streams:

 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_2_flags: 1
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_7_flags: 1

the producer seems to suggest the items were sent already:
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_flags: 0
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_items_ready: false
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_last_sent_seqno: 10
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_memory: 10
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_opaque: 3
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_snap_end_seqno: 0
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_snap_start_seqno: 0
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_start_seqno: 0
 eq_uprq:replication:n_1@10.0.0.105->n_0@10.0.0.105:default:stream_5_state: in-memory


also noticed vb5 has an open checkpoint with 0 items if that is of any significance…
 
 vb_5:last_closed_checkpoint_id: 1
 vb_5:num_checkpoint_items: 1
 vb_5:num_checkpoints: 1
 vb_5:num_items_for_persistence: 1
 vb_5:num_open_checkpoint_items: 0
 vb_5:num_tap_cursors: 0
 vb_5:open_checkpoint_id: 2
 vb_5:persisted_checkpoint_id: 1
 vb_5:state: replica



 Comments   
Comment by Tommie McAfee [ 06/Jun/14 ]
repro with pyupr:

./cluster_run -n4

./pyupr -h 10.0.0.105:9000 10.0.0.105:9001 10.0.0.105:9002 10.0.0.105:9003 -b dev -s reb -o test_add_stream_during_failover

(it fails for reasons unrelated to this bug, but hangs in teardown)
Comment by Aliaksey Artamonau [ 06/Jun/14 ]
Could you please always upload diags, not just logs? It makes our lives much easier.
Comment by Aliaksey Artamonau [ 06/Jun/14 ]
[rebalance:debug,2014-06-06T16:40:16.538,n_0@10.0.0.105:<0.1792.0>:janitor_agent:handle_call:747]Going to wait for persistence of seqno 10 in vbucket 6
[rebalance:debug,2014-06-06T16:40:27.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:40:58.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:41:29.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:42:00.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:42:31.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:43:02.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:43:33.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:44:04.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:44:35.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:45:06.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:45:37.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:46:08.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:46:39.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:47:10.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:47:41.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:48:12.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
[rebalance:debug,2014-06-06T16:48:43.539,n_0@10.0.0.105:<0.1792.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again
Comment by Tommie McAfee [ 10/Jun/14 ]

I suspect underlying issue has to do with these rollbacks responses. 'etmpfail' then occur after the requested seqno's are not received:


data/n_1/logs/memcached.log.0.txt:Tue Jun 10 11:21:00.964305 EDT 3: (default) UPR (Producer) eq_uprq:replication:n_1@192.168.17.95->n_0@192.168.17.95:default - (vb 5) Stream request failed because a rollback to seqno 0 is required (start seqno 10, vb_uuid 4675300462675623936, snapStartSeqno 10, snapEndSeqno 10)
data/n_1/logs/memcached.log.0.txt:Tue Jun 10 11:21:01.176016 EDT 3: (default) UPR (Producer) eq_uprq:replication:n_1@192.168.17.95->n_0@192.168.17.95:default - (vb 6) Stream request failed because a rollback to seqno 0 is required (start seqno 10, vb_uuid 4675300462675623936, snapStartSeqno 10, snapEndSeqno 10)
data/n_1/logs/memcached.log.0.txt:Tue Jun 10 11:21:01.180773 EDT 3: (default) UPR (Producer) eq_uprq:replication:n_1@192.168.17.95->n_0@192.168.17.95:default - (vb 7) Stream request failed because a rollback to seqno 0 is required (start seqno 10, vb_uuid 4675300462675623936, snapStartSeqno 10, snapEndSeqno 10)
data/n_0/logs/memcached.log.0.txt:Tue Jun 10 11:21:00.969006 EDT 3: (default) UPR (Producer) eq_uprq:replication:n_0@192.168.17.95->n_1@192.168.17.95:default - (vb 4) Stream request failed because a rollback to seqno 0 is required (start seqno 10, vb_uuid 59579079898245, snapStartSeqno 10, snapEndSeqno 10)
data/n_0/logs/memcached.log.0.txt:Tue Jun 10 11:21:00.981704 EDT 3: (default) UPR (Producer) eq_uprq:replication:n_0@192.168.17.95->n_3@192.168.17.95:default - (vb 4) Stream request failed because a rollback to seqno 0 is required (start seqno 10, vb_uuid 59579079898245, snapStartSeqno 10, snapEndSeqno 10)


then etmpfails....


logs/n_0/debug.log:[rebalance:debug,2014-06-10T11:21:12.850,n_0@192.168.17.95:<0.1558.0>:janitor_agent:do_wait_seqno_persisted:1004]Got etmpfail waiting for seq no persistence. Will try again


Comment by Mike Wiederhold [ 10/Jun/14 ]
Tommie,

I removed the consumer stream creation and the test passed for me. I'm not sure why your doing this and would actually expect bad things to happen if someone did this. Let's discuss more over skype tomorrow. Below is what I changed.

Mike-Wiederholds-MacBook-Pro:pyupr mikewied$ git diff
diff --git a/unit.py b/unit.py
index 062eec7..9558ad9 100644
--- a/unit.py
+++ b/unit.py
@@ -2118,10 +2118,10 @@ class RebTestCase(ParametrizedTestCase):
                 self.mcd_client.set('key' + str(i), 0, 0, 'value', vb)
 
         # send add_stream request to node1 replica vbuckets
- self.mcd_reset(active_vbs[0])
- for vb in replica_vbs:
- response = self.upr_client.add_stream(vb, 0)
- assert response['status'] == SUCCESS
+ #self.mcd_reset(active_vbs[0])
+ #for vb in replica_vbs:
+ # response = self.upr_client.add_stream(vb, 0)
+ # assert response['status'] == SUCCESS
 
         for host in self.hosts[2:]:
             assert self.rest_client.failover(host)
@@ -2140,14 +2140,15 @@ class RebTestCase(ParametrizedTestCase):
             "Got upr_count = {0}, expected = {1}".format(upr_count, 3)
 
         # check consumer persisted and high_seqno are correct
- for vb in replica_vbs:
- key = 'eq_uprq:mystream:stream_%s_start_seqno' % vb
- assert key in stats, "Stream %s missing from stats" % vb
+ #for vb in replica_vbs:
+ # key = 'eq_uprq:mystream:stream_%s_start_seqno' % vb
+ # assert key in stats, "Stream %s missing from stats" % vb
 
- start_seqno = stats[key]
- assert int(start_seqno) == doc_count,\
- "Expected seqno=%s got=%s" % (doc_count, start_seqno)
+ # start_seqno = stats[key]
+ # assert int(start_seqno) == doc_count,\
+ # "Expected seqno=%s got=%s" % (doc_count, start_seqno)
 
+ print "replicas time"
         # verify data can be streamed
         self.upr_client.open_producer("producerstream")
         for vb in replica_vbs:
Mike-Wiederholds-MacBook-Pro:pyupr mikewied$
Comment by Tommie McAfee [ 11/Jun/14 ]
thanks Mike I removed the consumer streams and test passes

http://review.couchbase.org/#/c/38147/




[MB-11331] Issue with failover log generating new request with the same seqno Created: 05/Jun/14  Updated: 22/Aug/14  Resolved: 23/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: start-finish
is triggered by MB-11085 XDCR checkpointing : ep-engine does n... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
From Alk,


I spotted this:

MB-11085: Always create a new failover entry on unclean shutdowns
    
    In the past we wouldn't generate a new failover entry if the high
    seqno number on disk was the same after a crash. This is incorrect
    because it is possible that the server did receive mutations and
    replicated them without persisting them before the crash. If this
    happens the consumers of upr streams will no roll back their data
    properly because the failover entry will not change on the server.
    
    Change-Id: I8c6bab504f0be3298e1e888dbe6f3fac9c3fa905
    Reviewed-on: http://review.couchbase.org/37670
    Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
    Tested-by: Michael Wiederhold <mike@couchbase.com>

And tried it's behavior in practice. It looks like it has reverted to old behavior where it would silently overwrite last failover-history entry uuid if last seqno equals failover-entry seqno.

Thinking about this more I believe it might be fine. But it has interesting consequences.

If I understand failover-history entry seqno as "seqno _just before_ start of new failover 'era'" then it appears perfectly fine to do that.

However I think I'll need to change my code to accomodate for that. And some other upr consumers might have to as well. This is because my checkpointing code always assumes that latest seqno always "belongs" to latest failover-history entry. Which is clearly not the case when last seqno = seqno-of-last-failover-history-entry. In the later case seqno actually belongs to _previous_ entry.

I can adapt my code. Or we can add a simple tweak to upr where it'll create empty, "bubble" seqno when it starts new failover history entry. In that case you will never have a case where on restart your last seqno = last-failover-history-entry-seqno. And there's no problem.
I'm pretty sure that this corner case affects not just xdcr. And I'm willing to bet that nobody handles it right yet. So we need to resolve this case asap.

 Comments   
Comment by Venu Uppalapati [ 10/Jun/14 ]
This has impact on the rollback function as well. for the case where failover happens twice without any seqno change in between, we currently overwrite the previous vb uuid. this will cause the rollback function to always ask the client rollback to 0 seq number if the client does not have the new vb uuid.
Comment by Mike Wiederhold [ 23/Jun/14 ]
Duplicate of MB-11085.




[MB-11258] UPR streams getting stuck Created: 29/May/14  Updated: 22/Aug/14  Resolved: 02/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sarath Lakshman Assignee: David Liao
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive logs.tar.gz    
Issue Links:
Duplicate
is duplicated by MB-11276 ep-engine stops sending mutations thr... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When I run the following test with 1024 vbuckets, the view engine log says that it is waiting for mutations from ep-engine indefinitely.

$ NODES=1 TEST=view.viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,num_nodes_to_add=1,skip_rebalance=true,docs-per-day=1 make any-test

[couchdb:error,2014-05-29T15:10:36.205,n_0@127.0.0.1:<0.898.0>:couch_log:error:42]upr client (<0.898.0>): Obtaining mutation from server timed out after 60.0 seconds [RequestId 9818 PartId 299]. Waiting...
[couchdb:error,2014-05-29T15:10:36.283,n_0@127.0.0.1:<0.778.0>:couch_log:error:42]upr client (<0.778.0>): Obtaining mutation from server timed out after 60.0 seconds [RequestId 12080 PartId 406]. Waiting...


I tried checking out an older ep-engine commit, 86b88cd6e90baf6dafbabffa1ba7bc42379cc5b2 and I do not see upr streams getting stuck.

 Comments   
Comment by Chiyoung Seo [ 02/Jun/14 ]
https://www.couchbase.com/issues/browse/MB-11276




[MB-10471] Unable to take anykind of backup when using build 3.0.0 release 433 enterprise edition Created: 14/Mar/14  Updated: 22/Aug/14  Resolved: 26/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Ashvinder Singh Assignee: Ashvinder Singh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 64bit

Triage: Triaged
Is this a Regression?: No

 Description   
The 'cbbackup' tool never returns when invoked for backup and it appears the tool is stuck.

Steps to reproduce:
- Install build 3.0.0 enterprise edition (build-433)
- Generate some data, example: /opt/couchbase/bin/cbworkloadgen -i 1000 --prefix=x1
- Take backup using "/opt/couchbase/bin/cbbackup http://localhost:8091 /backup"

After the third step cbbackup never completes, no output is seen on stdout and no backup folder is created.

Furthermore, I tested the 'cbbackup' tool from (433 build) with older couchbase server 3.0 release 403 and I could take full, incremental and accumulative backup.
It appears that this may be a server issue.

 Comments   
Comment by Mike Wiederhold [ 14/Mar/14 ]
Appears to be an ep-engine issue. I will investigate it further.
Comment by Mike Wiederhold [ 26/Mar/14 ]
This issue is about the backup tool hanging and I resolved that issue. Please note that I noticed a separate issue after fixing this where incremental backup receives slightly more items than expected. I filed MB-10654 to track this issue and marked it as a blocker.




[MB-10445] Unit test failure: Producer stream request (disk only) (231) Created: 12/Mar/14  Updated: 22/Aug/14  Resolved: 16/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: .master
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Trond Norbye Assignee: Trond Norbye
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
I killed the test after 20 CPU minutes




[MB-10444] Unit test failure: test async vbucket destroy restart (180) Created: 12/Mar/14  Updated: 22/Aug/14  Resolved: 16/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: .master
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Trond Norbye Assignee: Trond Norbye
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Comments   
Comment by Trond Norbye [ 12/Mar/14 ]
I killed the test after spending 1 hour of CPU time




[MB-10879] Rebalance fails sporadically on employee dataset test (make simple-test) Created: 17/Apr/14  Updated: 22/Aug/14  Resolved: 23/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Sarath Lakshman
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive logs.tar.gz    
Issue Links:
Duplicate
duplicates MB-10514 During rebalance, UPR stream gets stu... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Fails on employee dataset test with 64 vbuckets. It doesn't happen frequently. I see it most often however on centos.

 Comments   
Comment by Mike Wiederhold [ 22/Apr/14 ]
This looks view engine related. See the error message below:

[rebalance:info,2014-04-22T17:58:19.648,n_0@10.5.2.33:<0.3650.0>:ebucketmigrator_srv:process_upstream:976]TAP stream is not doing backfill
[rebalance:info,2014-04-22T17:58:19.653,n_0@10.5.2.33:<0.3650.0>:ebucketmigrator_srv:terminate:727]Skipping close ack for successfull takover

[rebalance:info,2014-04-22T17:58:19.663,n_0@10.5.2.33:<0.3654.0>:janitor_agent:set_vbucket_state:387]Doing vbucket 127 state change: {'n_2@127.0.0.1',active,undefined,undefined}
[ns_server:warn,2014-04-22T17:58:19.693,n_0@10.5.2.33:capi_set_view_manager-default<0.852.0>:capi_set_view_manager:handle_info:302]Remote server node {'capi_ddoc_replication_srv-default','n_2@127.0.0.1'} process down: killed
[ns_server:error,2014-04-22T17:58:19.716,n_0@10.5.2.33:<0.2631.0>:ns_single_vbucket_mover:spawn_and_wait:107]Got unexpected exit signal {'EXIT',<0.2692.0>,
                            {{{{badmatch,
                                {error,
                                 {{function_clause,
                                   [{couch_set_view_group,handle_info,
                                     [{'DOWN',#Ref<13706.0.0.20276>,process,
                                       <13706.1550.0>,normal},
                                      {state,
                                       {"/home/jenkins/couchbase/ns_server/data/n_2/data",
                                        <<"default">>,
                                        {set_view_group,
                                         <<223,177,167,252,146,233,92,11,210,
                                           66,6,189,181,169,123,106>>,
                                         nil,<<"default">>,
                                         <<"_design/test_view-65b578a">>,[],
                                         [{set_view,0,
                                           <<"function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp(\"^Senior \"); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}">>,
                                           undefined,
                                           {mapreduce_view,
                                            [<<"test_view-65b578a">>],
                                            nil,[],[]}}],
                                         nil,nil,
                                         {set_view_index_header,2,0,0,0,0,[],
                                          nil,[],false,[],nil,[],[]},
                                         main,nil,nil,nil,[],mapreduce_view,
                                         ".view",prod,
                                         couch_set_view_stats_prod,0,nil}},
                                       <13706.1394.0>,
                                       {set_view_group,
                                        <<223,177,167,252,146,233,92,11,210,66,
                                          6,189,181,169,123,106>>,
                                        <13706.1345.0>,<<"default">>,
                                        <<"_design/test_view-65b578a">>,[],
                                        [{set_view,0,
                                          <<"function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp(\"^Senior \"); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}">>,
                                          #Ref<13706.0.0.14417>,
                                          {mapreduce_view,
                                           [<<"test_view-65b578a">>],
                                           {btree,<13706.1345.0>,nil,
                                            identity,identity,
                                            #Fun<mapreduce_view.14.84993316>,
                                            #Fun<mapreduce_view.13.84993316>,
                                            7168,6144,true},
                                           [],[]}}],
                                        {btree,<13706.1345.0>,nil,identity,
                                         identity,
                                         #Fun<couch_btree.1.46073246>,
                                         #Fun<couch_set_view_group.15.12362153>,
                                         7168,6144,true},
                                        <13706.1349.0>,
                                        {set_view_index_header,2,128,
                                         9903520314283042199192993792,
                                         340282366911015600335977731165779918848,
                                         0,
                                         [{84,0},
                                          {85,0},
                                          {86,0},
                                          {87,0},
                                          {88,0},
                                          {89,0},
                                          {90,0},
                                          {91,0},
                                          {92,0},
                                          {93,0},
                                          {94,0},
                                          {95,0},
                                          {96,0},
                                          {97,0},
                                          {98,0},
                                          {99,0},
                                          {100,0},
                                          {101,0},
                                          {102,0},
                                          {103,0},
                                          {104,0},
                                          {105,0},
                                          {106,0},
                                          {107,0},
                                          {108,0},
                                          {109,0},
                                          {110,0},
                                          {111,0},
                                          {112,0},
                                          {113,0},
                                          {114,0},
                                          {115,0},
                                          {116,0},
                                          {117,0},
                                          {118,0},
                                          {119,0},
                                          {120,0},
                                          {121,0},
                                          {122,0},
                                          {123,0},
                                          {124,0},
                                          {125,0},
                                          {126,0},
                                          {127,0}],
                                         nil,
                                         [nil],
                                         true,[],nil,[],
                                         [{84,[{0,0}]},
                                          {85,[{0,0}]},
                                          {86,[{0,0}]},
                                          {87,[{0,0}]},
                                          {88,[{0,0}]},
                                          {89,[{0,0}]},
                                          {90,[{0,0}]},
                                          {91,[{0,0}]},
                                          {92,[{0,0}]},
                                          {93,[{0,0}]},
                                          {94,[{0,0}]},
                                          {95,[{0,0}]},
                                          {96,[{0,0}]},
                                          {97,[{0,0}]},
                                          {98,[{0,0}]},
                                          {99,[{0,0}]},
                                          {100,[{0,0}]},
                                          {101,[{0,0}]},
                                          {102,[{0,0}]},
                                          {103,[{0,0}]},
                                          {104,[{0,0}]},
                                          {105,[{0,0}]},
                                          {106,[{0,0}]},
                                          {107,[{0,0}]},
                                          {108,[{0,0}]},
                                          {109,[{0,0}]},
                                          {110,[{0,0}]},
                                          {111,[{0,0}]},
                                          {112,[{0,0}]},
                                          {113,[{0,0}]},
                                          {114,[{0,0}]},
                                          {115,[{0,0}]},
                                          {116,[{0,0}]},
                                          {117,[{0,0}]},
                                          {118,[{0,0}]},
                                          {119,[{0,0}]},
                                          {120,[{0,0}]},
                                          {121,[{0,0}]},
                                          {122,[{0,0}]},
                                          {123,[{0,0}]},
                                          {124,[{0,0}]},
                                          {125,[{0,0}]},
                                          {126,[{0,0}]},
                                          {127,[{0,0}]}]},
                                        main,nil,<13706.1394.0>,nil,
                                        "/home/jenkins/couchbase/ns_server/data/n_2/data/@indexes/default/main_dfb1a7fc92e95c0bd24206bdb5a97b6a.view.1",
                                        mapreduce_view,".view",prod,
                                        couch_set_view_stats_prod,188416,
                                        <13706.1350.0>},
                                       nil,false,not_running,nil,nil,nil,0,[],
                                       nil,false,undefined,true,true,[],[],
                                       {dict,0,16,16,8,80,48,
                                        {[],[],[],[],[],[],[],[],[],[],[],[],
                                         [],[],[],[]},
                                        {{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]}}},
                                       nil,3000}],
                                     [{file,
                                       "/home/jenkins/couchbase/couchdb/src/couch_set_view/src/couch_set_view_group.erl"},
                                      {line,990}]},
                                    {gen_server,handle_msg,5,
                                     [{file,"gen_server.erl"},{line,597}]},
                                    {proc_lib,init_p_do_apply,3,
                                     [{file,"proc_lib.erl"},{line,227}]}]},
                                  {gen_server,call,
                                   [<13706.1342.0>,
                                    {monitor_partition_update,85,
                                     #Ref<13706.0.0.53221>,<13706.2594.0>},
                                    infinity]}}}},
                               [{capi_set_view_manager,handle_call,3,
                                 [{file,"src/capi_set_view_manager.erl"},
                                  {line,217}]},
                                {gen_server,handle_msg,5,
                                 [{file,"gen_server.erl"},{line,578}]},
                                {gen_server,init_it,6,
                                 [{file,"gen_server.erl"},{line,297}]},
                                {proc_lib,init_p_do_apply,3,
                                 [{file,"proc_lib.erl"},{line,227}]}]},
                              {gen_server,call,
                               ['capi_set_view_manager-default',
                                {wait_index_updated,103},
                                infinity]}},
                             {gen_server,call,
                              [{'janitor_agent-default','n_2@127.0.0.1'},
                               {if_rebalance,<0.2564.0>,
                                {wait_index_updated,124}},
                               infinity]}}}
Comment by Sarath Lakshman [ 23/Apr/14 ]
Is this test run with TAP replication enabled ?
Comment by Mike Wiederhold [ 23/Apr/14 ]
Yes, this is with tap replication.
Comment by Sarath Lakshman [ 23/Apr/14 ]
UPR stream still gets stuck with TAP replication (MB-10514). This is depended on MB-10514. You can see get_stream_event timeouts in the logs.
Comment by Mike Wiederhold [ 23/Apr/14 ]
Per Sarath's comments I'm closing this as a duplicate of MB-10514.




[MB-10443] Unit test failure: test vbucket compact (174) Created: 12/Mar/14  Updated: 22/Aug/14  Resolved: 16/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: .master
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Trond Norbye Assignee: Trond Norbye
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown




[MB-10336] item count inconsistencies after expired items purged Created: 03/Mar/14  Updated: 22/Aug/14  Resolved: 12/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Tommie McAfee Assignee: Tommie McAfee
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive dgmexpire.zip    
Triage: Triaged
Is this a Regression?: Yes

 Description   
1) Created a 2GB bucket
2) Loaded items with 15s ttl and for about 10 mins.
       * active items was 50% at end of load
 
The compactor runs and doc count dropped from 4Million to 12k. I waited a minute later and ran compactor again and doc count remains 12k.

checked cbstats:
 curr_items: 122317
 vb_active_meta_data_memory: 500388

Checked if docs were still on disk with couch_dbifno but all the vbuckets return 'no documents'. Tried restarting couchbase to see if the docs were in memory but the item count was still 12k.

Querying docs
http://localhost:8091/pools/default/buckets/default/docs
...
{"id":"0540DAF5-51_1000294","key":"0540DAF5-51_1000294","value":{"rev":"21-0039f8a3e2c028275314ded900000000"}},
{"id":"0540DAF5-51_1000476","key":"0540DAF5-51_1000476","value":{"rev":"21-0039f8a3e399644f5314ded900000000"}},
{"id":"0540DAF5-51_1002821","key":"0540DAF5-51_1002821","value":{"rev":"21-0039f8a423c58e0a5314deda00000000"}},
{"id":"0540DAF5-51_1003125","key":"0540DAF5-51_1003125","value":{"rev":"21-0039f8a4317dacd25314deda00000000"}},
{"id":"0540DAF5-51_1003657","key":"0540DAF5-51_1003657","value":{"rev":"21-0039f8a437c1d89a5314deda00000000"}},
{"id":"0540DAF5-51_1005215","key":"0540DAF5-51_1005215","value":{"rev":"21-0039f8a458b8ed925314dedb00000000"}},

However trying to fetch one of these keys with an mc client it get:

couchbase.exceptions.NotFoundError: <Key=u'0540DAF5-51_1000294', RC=0xD[No such key], Operational Error, Results=1, C Source=(src/multiresult.c,286)>

attached log of this run as well

-Tommie







 Comments   
Comment by Tommie McAfee [ 03/Mar/14 ]
fyi, I ran this scenrio against 2.5 (build-1062) and initially expired items remained. I think it was waiting for the expiry pager to run, but after restart couchbase-server item count goes to 0.

Comment by Sundar Sridharan [ 05/Mar/14 ]
hi Tommie, could you also elaborate on which client/script you used to reproduce this issue?
thanks
Comment by Tommie McAfee [ 07/Mar/14 ]
Using systest loader in standalone mode: https://github.com/couchbase/testrunner/tree/master/pysystests

python cbsystest.py run workload --standalone --hosts 127.0.0.1:9000 --create 100 --expire 100 --ops 40000


However I just tried to repro today and wasn't able to...
Comment by Maria McDuff (Inactive) [ 11/Mar/14 ]
Tommie,

pls keep an eye and re-open if it persists again.




[MB-10236] Items are not deleted completely from the bucket Created: 17/Feb/14  Updated: 22/Aug/14  Resolved: 01/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Kirill Safonov Assignee: David Liao
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.3 / 6.4 64 bit

Issue Links:
Relates to
Triage: Untriaged
Operating System: Centos 64-bit

 Description   
We've got Couchbase cluster running on 6 nodes (2.2.0 community edition (build-821)). One of the buckets is used for temporary data, i.e. almost every created item is deleted in the next couple of hours. Currently the bucket seems to have lots of stale objects in it:
1) cbbackup tool reports status like "###################] 44005.4% (4923329/11188 msgs)"
(note: 11188 is the number that is reported by web UI)
2) Java client TAP dump reports lots of keys really missing in the database (which Java client get operation returns null for)

Bucket config: type: couchbase, Per node RAM quota: 300 Mb, No replicas, No flush, default auto-compaction, 3 reader/writer workers

 Comments   
Comment by David Liao [ 10/Mar/14 ]
can we have a retest and save the logs?
Comment by David Liao [ 01/Apr/14 ]
can't reproduce.




[MB-10233] ep-engine fails to initialize bucket with 2 vbuckets Created: 17/Feb/14  Updated: 22/Aug/14  Resolved: 23/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Aleksey Kondratenko Assignee: Abhinav Dangeti
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged

 Description   
Appears to be a regression. I did successful tests with 2 vbuckets just last Friday. Today I'm getting this:

{bad_return_value,
                      {stop,
                       {ensure_bucket_failed,
                        {error,
                         {bucket_create_error,
                          {memcached_error,not_stored,
                           <<"Failed to initialize instance. Error code: 0\n">>}}}}}},

Supporting 2 vbuckets is handy for some mega-buckets cases as well as xdcr testing

 Comments   
Comment by Maria McDuff (Inactive) [ 08/Apr/14 ]
Venu,

can you pls check if this issue is still happening with latest 3.0 build?
Comment by Abhinav Dangeti [ 16/Apr/14 ]
The number of vbuckets should at least match the number of shards, there fore the minimum allowed vbucket count in 3.0 is going to be 4.




[MB-10081] getMeta unit test failed sporadically Created: 30/Jan/14  Updated: 22/Aug/14  Resolved: 06/Feb/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Chiyoung Seo Assignee: David Liao
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
[chiyoung@cen-0411 ep-engine]$ export EP_TEST_NUM=189

[chiyoung@cen-0411 ep-engine]$ ../memcached/engine_testapp -E ep.so -T ep_testsuite.so -e

"flushall_enabled=true;ht_size=13;ht_locks=7"
Running [0000/0001]: mb-4314 (couchstore).../home/chiyoung/couchbase/cmake/ep-engine/tests/ep_testsuite.cc:434 Test failed: `Expected get_meta call to be successful' (ret == ENGINE_SUCCESS)
 DIED
# Passed 0 of 1 tests

 Comments   
Comment by Chiyoung Seo [ 03/Feb/14 ]
Found this issue is caused by the same issue as https://www.couchbase.com/issues/browse/MB-9950




[MB-10038] ep-engine dies with segmentation fault on warmup Created: 27/Jan/14  Updated: 22/Aug/14  Resolved: 28/Feb/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Abhinav Dangeti
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
The bucket has few million of items and I'm pretty sure few previous warmups worked fine. But this time (perhaps just luck or maybe there's some disk state that's damaged). I'm getting a continuous crashes:

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
Core was generated by `/root/src/altoros/moxi/repo30/install/bin/memcached.orig -u root -X /root/src/a'.
Program terminated with signal 11, Segmentation fault.
#0 0xf29ae56a in cJSON_GetObjectItem (object=0x0, string=0xf2b0187b "state") at /root/src/altoros/moxi/repo30/cmake/libvbucket/src/cJSON.c:452
452 cJSON *cJSON_GetObjectItem(cJSON *object,const char *string) {cJSON *c=object->child; while (c && cJSON_strcasecmp(c->string,string)) c=c->next; return c;}
(gdb) bt
#0 0xf29ae56a in cJSON_GetObjectItem (object=0x0, string=0xf2b0187b "state") at /root/src/altoros/moxi/repo30/cmake/libvbucket/src/cJSON.c:452
#1 0xf2acf29c in CouchKVStore::readVBState (db=0x8e72480, vbId=0, vbState=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/couch-kvstore/couch-kvstore.cc:1812
#2 0xf2ac8cf3 in CouchKVStore::listPersistedVbuckets (this=0xbfae1a0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/couch-kvstore/couch-kvstore.cc:551
#3 0xf2a3ac4e in EventuallyPersistentStore::loadVBucketState (this=0x92a42c0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/ep.cc:2636
#4 0xf2ab5604 in Warmup::initialize (this=0xbf6c1b0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/warmup.cc:439
#5 0xf2ab8230 in WarmupInitialize::run (this=0xbf4edc0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/warmup.h:220
#6 0xf2a7bc6a in ExecutorThread::run (this=0xbf4e550) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:93
#7 0xf2a7b71c in launch_executor_thread (arg=0xbf4e550) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#8 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e340) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#9 0xf76bccf1 in start_thread (arg=0xeeb4fb40) at pthread_create.c:311
#10 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131
(gdb) thread app all bt

Thread 27 (Thread 0xe9344b40 (LWP 22038)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf68a34) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf68a34) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf68a30) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xe9344198) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xe9344198, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a6ddec in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:153
#9 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#10 0xf2a7bb08 in ExecutorThread::run (this=0xbf4eaa0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#11 0xf2a7b71c in launch_executor_thread (arg=0xbf4eaa0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#12 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e3c0) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#13 0xf76bccf1 in start_thread (arg=0xe9344b40) at pthread_create.c:311
#14 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 26 (Thread 0xec34ab40 (LWP 22032)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76c62f0 in _L_cond_lock_773 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76c6061 in __pthread_mutex_cond_lock (mutex=0xbf68a34) at ../nptl/pthread_mutex_lock.c:64
#4 0xf76c0c7c in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:364
#5 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=2000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#6 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#7 0xf2a6def7 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:168
#8 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#9 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e7d0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4e7d0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e318) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xec34ab40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 25 (Thread 0xee34eb40 (LWP 22028)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf68a34) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf68a34) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf68a30) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xee34e198) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xee34e198, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a6ddec in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:153
#9 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=2 '\002') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#10 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e500) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#11 0xf2a7b71c in launch_executor_thread (arg=0xbf4e500) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#12 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e338) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#13 0xf76bccf1 in start_thread (arg=0xee34eb40) at pthread_create.c:311
#14 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 24 (Thread 0xecb4bb40 (LWP 22031)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76c62f0 in _L_cond_lock_773 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76c6061 in __pthread_mutex_cond_lock (mutex=0xbf68a34) at ../nptl/pthread_mutex_lock.c:64
#4 0xf76c0c7c in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:364
#5 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=2000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#6 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#7 0xf2a6def7 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:168
#8 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
---Type <return> to continue, or q <return> to quit---
#9 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e820) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4e820) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e320) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xecb4bb40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 23 (Thread 0xf4b71b40 (LWP 22019)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#4 0x0805ec59 in worker_libevent (arg=0xbf1e9a4) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#5 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e0f0) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#6 0xf76bccf1 in start_thread (arg=0xf4b71b40) at pthread_create.c:311
#7 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 22 (Thread 0xed34cb40 (LWP 22030)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76c62f0 in _L_cond_lock_773 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76c6061 in __pthread_mutex_cond_lock (mutex=0xbf68a34) at ../nptl/pthread_mutex_lock.c:64
#4 0xf76c0c7c in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:364
#5 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=2000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#6 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#7 0xf2a6def7 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:168
#8 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#9 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e870) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4e870) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e328) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xed34cb40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 21 (Thread 0xebb49b40 (LWP 22033)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf68ad0) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf68ad0) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf68acc) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xebb49154) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xebb49154, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a6e72a in ExecutorPool::snooze (this=0xbf68a20, taskId=8, tosleep=2) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:281
#9 0xf2a1ccf4 in BgFetcher::run (this=0xbf84230, tid=8) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/bgfetcher.cc:157
#10 0xf2a9e72a in BgFetcherTask::run (this=0xbf4e8c0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/tasks.cc:103
#11 0xf2a7bc6a in ExecutorThread::run (this=0xbf4e780) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:93
#12 0xf2a7b71c in launch_executor_thread (arg=0xbf4e780) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#13 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e310) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#14 0xf76bccf1 in start_thread (arg=0xebb49b40) at pthread_create.c:311
#15 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 20 (Thread 0xf4370b40 (LWP 22020)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#4 0x0805ec59 in worker_libevent (arg=0xbf1ea30) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#5 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e0e8) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#6 0xf76bccf1 in start_thread (arg=0xf4370b40) at pthread_create.c:311
#7 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 19 (Thread 0xea346b40 (LWP 22036)):
---Type <return> to continue, or q <return> to quit---
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf68a34) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf68a34) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf68a30) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xea346198) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xea346198, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a6ddec in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:153
#9 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#10 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e690) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#11 0xf2a7b71c in launch_executor_thread (arg=0xbf4e690) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#12 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e3d0) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#13 0xf76bccf1 in start_thread (arg=0xea346b40) at pthread_create.c:311
#14 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 18 (Thread 0xe9b45b40 (LWP 22037)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf700c4) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf700c4) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf700c0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xe9b451e8) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xe9b451e8, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a9fb8d in TaskQueue::reschedule (this=0xbf700c0, task=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/taskqueue.cc:139
#9 0xf2a7bcf1 in ExecutorThread::run (this=0xbf4eaf0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:100
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4eaf0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e3c8) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xe9b45b40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 17 (Thread 0xf3b6fb40 (LWP 22021)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#4 0x0805ec59 in worker_libevent (arg=0xbf1eabc) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#5 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e0e0) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#6 0xf76bccf1 in start_thread (arg=0xf3b6fb40) at pthread_create.c:311
#7 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 16 (Thread 0xf0b53b40 (LWP 22023)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75b1896 in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#2 0xf75e080d in usleep (useconds=250000) at ../sysdeps/unix/sysv/linux/usleep.c:32
#3 0xf2a79dc5 in updateStatsThread (arg=0x8e62500) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/memory_tracker.cc:36
#4 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e188) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#5 0xf76bccf1 in start_thread (arg=0xf0b53b40) at pthread_create.c:311
#6 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 15 (Thread 0xedb4db40 (LWP 22029)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf68a34) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf68a34) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf68a30) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xedb4d198) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xedb4d198, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a6ddec in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:153
---Type <return> to continue, or q <return> to quit---
#9 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=2 '\002') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#10 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e4b0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#11 0xf2a7b71c in launch_executor_thread (arg=0xbf4e4b0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#12 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e330) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#13 0xf76bccf1 in start_thread (arg=0xedb4db40) at pthread_create.c:311
#14 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 14 (Thread 0xf336eb40 (LWP 22022)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#4 0x0805ec59 in worker_libevent (arg=0xbf1eb48) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#5 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e0d8) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#6 0xf76bccf1 in start_thread (arg=0xf336eb40) at pthread_create.c:311
#7 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 13 (Thread 0xefb51b40 (LWP 22025)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf700c4) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf700c4) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf700c0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xefb511e8) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xefb511e8, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a9fb8d in TaskQueue::reschedule (this=0xbf700c0, task=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/taskqueue.cc:139
#9 0xf2a7bcf1 in ExecutorThread::run (this=0xbf4e5f0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:100
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4e5f0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e358) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xefb51b40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 12 (Thread 0xe8b43b40 (LWP 22039)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c0ad3 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:124
#2 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=189) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#3 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#4 0xf2a6df11 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:170
#5 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=4 '\004') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#6 0xf2a7bb08 in ExecutorThread::run (this=0xbf4ea50) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#7 0xf2a7b71c in launch_executor_thread (arg=0xbf4ea50) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#8 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e3b8) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#9 0xf76bccf1 in start_thread (arg=0xe8b43b40) at pthread_create.c:311
#10 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 11 (Thread 0xf0352b40 (LWP 22024)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76c62f0 in _L_cond_lock_773 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76c6061 in __pthread_mutex_cond_lock (mutex=0xbf68a34) at ../nptl/pthread_mutex_lock.c:64
#4 0xf76c0c7c in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:364
#5 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=2000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#6 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#7 0xf2a6def7 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:168
#8 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=2 '\002') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#9 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e640) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4e640) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e350) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xf0352b40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131
---Type <return> to continue, or q <return> to quit---

Thread 10 (Thread 0xeab47b40 (LWP 22035)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c3462 in __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:386
#2 0xf76c0d20 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:510
#3 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=2000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#4 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#5 0xf2a6def7 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:168
#6 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#7 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e6e0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#8 0xf2a7b71c in launch_executor_thread (arg=0xbf4e6e0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#9 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e300) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#10 0xf76bccf1 in start_thread (arg=0xeab47b40) at pthread_create.c:311
#11 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 9 (Thread 0xef350b40 (LWP 22026)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76c62f0 in _L_cond_lock_773 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76c6061 in __pthread_mutex_cond_lock (mutex=0xbf68a34) at ../nptl/pthread_mutex_lock.c:64
#4 0xf76c0c7c in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:364
#5 0xf7774442 in cb_cond_timedwait (cond=0xbf68a54, mutex=0xbf68a34, ms=2000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#6 0xf2a6ff21 in SyncObject::wait (this=0xbf68a30, tv=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:74
#7 0xf2a6def7 in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:168
#8 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=4 '\004') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#9 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e5a0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#10 0xf2a7b71c in launch_executor_thread (arg=0xbf4e5a0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#11 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e348) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#12 0xf76bccf1 in start_thread (arg=0xef350b40) at pthread_create.c:311
#13 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 8 (Thread 0xeb348b40 (LWP 22034)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c33a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:144
#2 0xf76bee24 in _L_lock_770 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bec63 in __GI___pthread_mutex_lock (mutex=0xbf68a34) at pthread_mutex_lock.c:64
#4 0xf7774194 in cb_mutex_enter (mutex=0xbf68a34) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:85
#5 0xf2a7af04 in Mutex::acquire (this=0xbf68a30) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/mutex.cc:32
#6 0xf2a1a659 in LockHolder::lock (this=0xeb348198) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:66
#7 0xf2a1a606 in LockHolder::LockHolder (this=0xeb348198, m=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/locks.h:44
#8 0xf2a6ddec in ExecutorPool::trySleep (this=0xbf68a20, t=..., now=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:153
#9 0xf2a6dd89 in ExecutorPool::nextTask (this=0xbf68a20, t=..., tick=1 '\001') at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorpool.cc:144
#10 0xf2a7bb08 in ExecutorThread::run (this=0xbf4e730) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:77
#11 0xf2a7b71c in launch_executor_thread (arg=0xbf4e730) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#12 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e308) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#13 0xf76bccf1 in start_thread (arg=0xeb348b40) at pthread_create.c:311
#14 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 7 (Thread 0xf5372b40 (LWP 22018)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#4 0x0805ec59 in worker_libevent (arg=0xbf1e918) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#5 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e0f8) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#6 0xf76bccf1 in start_thread (arg=0xf5372b40) at pthread_create.c:311
#7 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 6 (Thread 0xf6374b40 (LWP 22016)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c07ab in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:187
---Type <return> to continue, or q <return> to quit---
#2 0xf77742de in cb_cond_wait (cond=0xf6373aac, mutex=0xf6373a8c) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:119
#3 0xf2a3c864 in SyncObject::wait (this=0xf6373a88) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/syncobject.h:39
#4 0xf2a40c10 in WarmupWaitListener::wait (this=0xf6373a7c) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/ep.cc:301
#5 0xf2a30d66 in EventuallyPersistentStore::initialize (this=0x92a42c0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/ep.cc:331
#6 0xf2a503e1 in EventuallyPersistentEngine::initialize (this=0xbf7c000,
    config=0xbf5603a "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_txn_size=10000;max_size=943718400;tap_keepalive=300;dbname=/root/src/altoros/moxi/ns_server/data/n_1/data/default;allow_data_loss_during_shutdown=true;"...)
    at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/ep_engine.cc:1792
#7 0xf2a4b534 in EvpInitialize (handle=0xbf7c000,
    config_str=0xbf5603a "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_txn_size=10000;max_size=943718400;tap_keepalive=300;dbname=/root/src/altoros/moxi/ns_server/data/n_1/data/default;allow_data_loss_during_shutdown=true;"...)
    at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/ep_engine.cc:129
#8 0xf7746e6a in create_bucket_UNLOCKED (e=0xf77517c0 <bucket_engine>, bucket_name=0x8e5e120 "default", path=0xbf56000 "/root/src/altoros/moxi/repo30/install/lib/memcached/ep.so",
    config=0xbf5603a "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_txn_size=10000;max_size=943718400;tap_keepalive=300;dbname=/root/src/altoros/moxi/ns_server/data/n_1/data/default;allow_data_loss_during_shutdown=true;"...,
    e_out=0x0, msg=0xf6373c4c "", msglen=1024) at /root/src/altoros/moxi/repo30/cmake/memcached/engines/bucket_engine/bucket_engine.c:812
#9 0xf774b156 in handle_create_bucket (handle=0xf77517c0 <bucket_engine>, cookie=0xbec8000, request=0xbf0e800, response=0x8050b2b <binary_response_handler>)
    at /root/src/altoros/moxi/repo30/cmake/memcached/engines/bucket_engine/bucket_engine.c:2676
#10 0xf774c232 in bucket_unknown_command (handle=0xf77517c0 <bucket_engine>, cookie=0xbec8000, request=0xbf0e800, response=0x8050b2b <binary_response_handler>)
    at /root/src/altoros/moxi/repo30/cmake/memcached/engines/bucket_engine/bucket_engine.c:3016
#11 0x08051ab9 in default_unknown_command (descriptor=0x0, handle=0xf77517c0 <bucket_engine>, cookie=0xbec8000, request=0xbf0e800, response=0x8050b2b <binary_response_handler>)
    at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:2704
#12 0x08051b7a in process_bin_unknown_packet (c=0xbec8000) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:2737
#13 0x08055923 in process_bin_packet (c=0xbec8000) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:4397
#14 0x08056e35 in complete_nread (c=0xbec8000) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:4969
#15 0x08059896 in conn_nread (c=0xbec8000) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:5842
#16 0x0805a315 in event_handler (fd=47, which=2, arg=0xbec8000) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:6098
#17 0xf76dbe52 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#18 0x0805ec59 in worker_libevent (arg=0xbf1e800) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#19 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e108) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#20 0xf76bccf1 in start_thread (arg=0xf6374b40) at pthread_create.c:311
#21 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 5 (Thread 0xf73abb40 (LWP 22014)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75d74bb in read () at ../sysdeps/unix/syscall-template.S:81
#2 0xf756a52b in _IO_new_file_underflow (fp=0xf76a4ac0 <_IO_2_1_stdin_>) at fileops.c:613
#3 0xf756b459 in __GI__IO_default_uflow (fp=0xf76a4ac0 <_IO_2_1_stdin_>) at genops.c:436
#4 0xf756b28b in __GI___uflow (fp=fp@entry=0xf76a4ac0 <_IO_2_1_stdin_>) at genops.c:390
#5 0xf755ee8a in __GI__IO_getline_info (fp=fp@entry=0xf76a4ac0 <_IO_2_1_stdin_>, buf=buf@entry=0xf73ab2a0 "", n=n@entry=79, delim=delim@entry=10, extract_delim=extract_delim@entry=1, eof=eof@entry=0x0) at iogetline.c:69
#6 0xf755efc7 in __GI__IO_getline (fp=fp@entry=0xf76a4ac0 <_IO_2_1_stdin_>, buf=buf@entry=0xf73ab2a0 "", n=n@entry=79, delim=delim@entry=10, extract_delim=extract_delim@entry=1) at iogetline.c:38
#7 0xf755decf in _IO_fgets (buf=0xf73ab2a0 "", n=80, fp=0xf76a4ac0 <_IO_2_1_stdin_>) at iofgets.c:56
#8 0xf77708b8 in check_stdin_thread (arg=0x805c552 <shutdown_server>) at /root/src/altoros/moxi/repo30/cmake/memcached/extensions/daemon/stdin_check.c:38
#9 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e018) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#10 0xf76bccf1 in start_thread (arg=0xf73abb40) at pthread_create.c:311
#11 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 4 (Thread 0xf6baab40 (LWP 22015)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf76c3462 in __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:386
#2 0xf76bfc57 in _L_unlock_670 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76bfb9a in __pthread_mutex_unlock_usercnt (mutex=0xf776f83c <mutex>, decr=0) at pthread_mutex_unlock.c:52
#4 0xf76c0a83 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:89
#5 0xf7774442 in cb_cond_timedwait (cond=0xf776f860 <cond>, mutex=0xf776f83c <mutex>, ms=19000) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:156
#6 0xf776d74a in logger_thead_main (arg=0x8e62120) at /root/src/altoros/moxi/repo30/cmake/memcached/extensions/loggers/file_logger.c:363
#7 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e010) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#8 0xf76bccf1 in start_thread (arg=0xf6baab40) at pthread_create.c:311
#9 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 3 (Thread 0xf73ad700 (LWP 22012)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
---Type <return> to continue, or q <return> to quit---
#4 0x0805dfe4 in main (argc=21, argv=0xffe5b974) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/memcached.c:8050

Thread 2 (Thread 0xf5b73b40 (LWP 22017)):
#0 0xf7785430 in __kernel_vsyscall ()
#1 0xf75e8656 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#2 0xf76f20d0 in ?? () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#3 0xf76dbb93 in event_base_loop () from /usr/lib/i386-linux-gnu/libevent_core-2.0.so.5
#4 0x0805ec59 in worker_libevent (arg=0xbf1e88c) at /root/src/altoros/moxi/repo30/cmake/memcached/daemon/thread.c:285
#5 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e100) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#6 0xf76bccf1 in start_thread (arg=0xf5b73b40) at pthread_create.c:311
#7 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131

Thread 1 (Thread 0xeeb4fb40 (LWP 22027)):
#0 0xf29ae56a in cJSON_GetObjectItem (object=0x0, string=0xf2b0187b "state") at /root/src/altoros/moxi/repo30/cmake/libvbucket/src/cJSON.c:452
#1 0xf2acf29c in CouchKVStore::readVBState (db=0x8e72480, vbId=0, vbState=...) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/couch-kvstore/couch-kvstore.cc:1812
#2 0xf2ac8cf3 in CouchKVStore::listPersistedVbuckets (this=0xbfae1a0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/couch-kvstore/couch-kvstore.cc:551
#3 0xf2a3ac4e in EventuallyPersistentStore::loadVBucketState (this=0x92a42c0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/ep.cc:2636
#4 0xf2ab5604 in Warmup::initialize (this=0xbf6c1b0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/warmup.cc:439
#5 0xf2ab8230 in WarmupInitialize::run (this=0xbf4edc0) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/warmup.h:220
#6 0xf2a7bc6a in ExecutorThread::run (this=0xbf4e550) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:93
#7 0xf2a7b71c in launch_executor_thread (arg=0xbf4e550) at /root/src/altoros/moxi/repo30/cmake/ep-engine/src/executorthread.cc:33
#8 0xf7773fc3 in platform_thread_wrap (arg=0x8e5e340) at /root/src/altoros/moxi/repo30/cmake/platform/src/cb_pthreads.c:19
#9 0xf76bccf1 in start_thread (arg=0xeeb4fb40) at pthread_create.c:311
#10 0xf75e7c3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:131
(gdb)


 Comments   
Comment by Chiyoung Seo [ 27/Jan/14 ]
Mike,

Seems like a regression from the change that you merged today:

http://review.couchbase.org/#/c/32821/
Comment by Abhinav Dangeti [ 17/Feb/14 ]
I believe the fix for this is already in:
http://review.couchbase.org/#/c/33050/




[MB-9939] ep worker stats unit test sporadically fails on centos (description is truncated) Created: 16/Jan/14  Updated: 22/Aug/14  Resolved: 06/Feb/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Minor
Reporter: Mike Wiederhold Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Centos 64-bit

 Description   
It looks like this test is failing sometimes because the task description is getting truncated. See some of the debugging I printed out below.

Also we need to add the following task to the list:

tasklist.insert("Tap connection notifier");

nonio_worker_7:log:0:runtime : 22
nonio_worker_7:log:0:starttime : 0
nonio_worker_7:log:0:task : Running a flusher loop: shard 0
nonio_worker_7:log:10:runtime : 17
nonio_worker_7:log:10:starttime : 0
nonio_worker_7:log:10:task : Running a flusher loop: shard 0
nonio_worker_7:log:11:runtime : 14
nonio_worker_7:log:11:starttime : 0
nonio_worker_7:log:11:task : Running a flusher loop: shard 0
nonio_worker_7:log:12:runtime : 46
nonio_worker_7:log:12:starttime : 0
nonio_worker_7:log:12:task : Running a flusher loop: shard 0
nonio_worker_7:log:13:runtime : 57
nonio_worker_7:log:13:starttime : 0
nonio_worker_7:log:13:task : Running a flusher loop: shard 0
nonio_worker_7:log:14:runtime : 10
nonio_worker_7:log:14:starttime : 0
nonio_worker_7:log:14:task : Running a flusher loop: shard 0
nonio_worker_7:log:15:runtime : 14
nonio_worker_7:log:15:starttime : 0
nonio_worker_7:log:15:task : Running a flusher loop: shard 0
nonio_worker_7:log:16:runtime : 13
nonio_worker_7:log:16:starttime : 0
nonio_worker_7:log:16:task : Running a flusher loop: shard 0
nonio_worker_7:log:17:runtime : 13
nonio_worker_7:log:17:starttime : 0
nonio_worker_7:log:17:task : Running a flusher loop: shard 0
nonio_worker_7:log:18:runtime : 26
nonio_worker_7:log:18:starttime : 0
nonio_worker_7:log:18:task : Running a flusher loop: shard 0
nonio_worker_7:log:19:runtime : 12
nonio_worker_7:log:19:starttime : 0
nonio_worker_7:log:19:task : Running a flusher loop: shard 0
nonio_worker_7:log:1:runtime : 8
nonio_worker_7:log:1:starttime : 0
nonio_worker_7:log:1:task : Running a flusher loop: shard 0
nonio_worker_7:log:2:runtime : 33
nonio_worker_7:log:2:starttime : 0
nonio_worker_7:log:2:task : Running a flusher loop: shard 0
nonio_worker_7:log:3:runtime : 31
nonio_worker_7:log:3:starttime : 0
nonio_worker_7:log:3:task : Running a flusher loop: shard 0
nonio_worker_7:log:4:runtime : 54
nonio_worker_7:log:4:starttime : 0
nonio_worker_7:log:4:task : Running a flusher loop: shard 0
nonio_worker_7:log:5:runtime : 14
nonio_worker_7:log:5:starttime : 0
nonio_worker_7:log:5:task : Running a flusher loop: shard 0
nonio_worker_7:log:6:runtime : 9
nonio_worker_7:log:6:starttime : 0
nonio_worker_7:log:6:task : Running a flusher loop: shard 0
nonio_worker_7:log:7:runtime : 51
nonio_worker_7:log:7:starttime : 0
nonio_worker_7:log:7:task : Running a flusher loop: shard 0
nonio_worker_7:log:8:runtime : 19
nonio_worker_7:log:8:starttime : 0
nonio_worker_7:log:8:task : Running a flusher loop: shard 0
nonio_worker_7:log:9:runtime : 10
nonio_worker_7:log:9:starttime : 0
nonio_worker_7:log:9:task : Running a flusher loop: shard 0
nonio_worker_7:runtime : 16172
nonio_worker_7:state : running
nonio_worker_7:task : Not currently running any task
reader_worker_0:log:0:runtime : 7
reader_worker_0:log:0:starttime : 0
reader_worker_0:log:0:task : Batching background fetch
reader_worker_0:log:1:runtime : 4
reader_worker_0:log:1:starttime : 0
reader_worker_0:log:1:task : Batching background fetch
reader_worker_0:log:2:runtime : 532148
reader_worker_0:log:2:starttime : 0
reader_worker_0:log:2:task : Warmup - initialize
reader_worker_0:log:3:runtime : 14
reader_worker_0:log:3:starttime : 0
reader_worker_0:log:3:task : Warmup - estimate database item count
reader_worker_0:log:4:runtime : 93
reader_worker_0:log:4:starttime : 0
reader_worker_0:log:4:task : Wa
reader_worker_0:log:5:runtime : 18
reader_worker_0:log:5:starttime : 0
reader_worker_0:log:5:task : Running a flusher loop: shard 0
reader_worker_0:log:6:runtime : 14
reader_worker_0:log:6:starttime : 0
reader_worker_0:log:6:task :
reader_worker_0:log:7:runtime : 5
reader_worker_0:log:7:starttime : 0
reader_worker_0:log:7:task : Wa
reader_worker_0:log:8:runtime : 4
reader_worker_0:log:8:starttime : 0
reader_worker_0:log:8:task : W
reader_worker_0:runtime : 3464
reader_worker_0:slow:0:runtime : 532148
reader_worker_0:slow:0:starttime : 0
reader_worker_0:slow:0:task : Warmup - initialize
reader_worker_0:state : running
reader_worker_0:task : War
reader_worker_1:log:0:runtime : 21
reader_worker_1:log:0:starttime : 0
reader_worker_1:log:0:task : Running a flusher loop: shard 0
reader_worker_1:log:10:runtime : 98
reader_worker_1:log:10:starttime : 0
reader_worker_1:log:10:task : Running a flusher loop: shard 0
reader_worker_1:log:11:runtime : 17
reader_worker_1:log:11:starttime : 0
reader_worker_1:log:11:task : Running a flusher loop: shard 0
reader_worker_1:log:12:runtime : 20
reader_worker_1:log:12:starttime : 0
reader_worker_1:log:12:task : Running a flusher loop: shard 0
reader_worker_1:log:13:runtime : 53
reader_worker_1:log:13:starttime : 0
reader_worker_1:log:13:task : Running a flusher loop: shard 0
reader_worker_1:log:14:runtime : 83
reader_worker_1:log:14:starttime : 0
reader_worker_1:log:14:task : Running a flusher loop: shard 0
reader_worker_1:log:15:runtime : 25
reader_worker_1:log:15:starttime : 0
reader_worker_1:log:15:task : Running a flusher loop: shard 0
reader_worker_1:log:16:runtime : 20
reader_worker_1:log:16:starttime : 0
reader_worker_1:log:16:task : Running a flusher loop: shard 0
reader_worker_1:log:17:runtime : 175
reader_worker_1:log:17:starttime : 0
reader_worker_1:log:17:task : Warmup - estimate database item count
reader_worker_1:log:18:runtime : 79
reader_worker_1:log:18:starttime : 0
reader_worker_1:log:18:task : Running a flusher loop: shard 0
reader_worker_1:log:19:runtime : 17
reader_worker_1:log:19:starttime : 0
reader_worker_1:log:19:task : Running a flusher loop: shard 0
reader_worker_1:log:1:runtime : 27
reader_worker_1:log:1:starttime : 0
reader_worker_1:log:1:task : Running a flusher loop: shard 0
reader_worker_1:log:2:runtime : 12
reader_worker_1:log:2:starttime : 0
reader_worker_1:log:2:task : Running a flusher loop: shard 0
reader_worker_1:log:3:runtime : 28
reader_worker_1:log:3:starttime : 0
reader_worker_1:log:3:task : Running a flusher loop: shard 0
reader_worker_1:log:4:runtime : 21
reader_worker_1:log:4:starttime : 0
reader_worker_1:log:4:task : Running a flusher loop: shard 0
reader_worker_1:log:5:runtime : 34
reader_worker_1:log:5:starttime : 0
reader_worker_1:log:5:task : Running a flusher loop: shard 0
reader_worker_1:log:6:runtime : 17
reader_worker_1:log:6:starttime : 0
reader_worker_1:log:6:task : Running a flusher loop: shard 0
reader_worker_1:log:7:runtime : 32
reader_worker_1:log:7:starttime : 0
reader_worker_1:log:7:task : Running a flusher loop: shard 0
reader_worker_1:log:8:runtime : 44
reader_worker_1:log:8:starttime : 0
reader_worker_1:log:8:task : Running a flusher loop: shard 0
reader_worker_1:log:9:runtime : 21
reader_worker_1:log:9:starttime : 0
reader_worker_1:log:9:task : Running a flusher loop: shard 0
reader_worker_1:runtime : 368
reader_worker_1:state : running
reader_worker_1:task : Snapshotting vbucket states for the shard: 2
reader_worker_2:log:0:runtime : 47
reader_worker_2:log:0:starttime : 0
reader_worker_2:log:0:task : Running a flusher loop: shard 0
reader_worker_2:log:10:runtime : 17
reader_worker_2:log:10:starttime : 0
reader_worker_2:log:10:task : Running a flusher loop: shard 0
reader_worker_2:log:11:runtime : 18
reader_worker_2:log:11:starttime : 0
reader_worker_2:log:11:task : Running a flusher loop: shard 0
reader_worker_2:log:12:runtime : 38
reader_worker_2:log:12:starttime : 0
reader_worker_2:log:12:task : Running a flusher loop: shard 0
reader_worker_2:log:13:runtime : 26
reader_worker_2:log:13:starttime : 0
reader_worker_2:log:13:task : Running a flusher loop: shard 0
reader_worker_2:log:14:runtime : 23
reader_worker_2:log:14:starttime : 0
reader_worker_2:log:14:task : Running a flusher loop: shard 0
reader_worker_2:log:15:runtime : 16
reader_worker_2:log:15:starttime : 0
reader_worker_2:log:15:task : Running a flusher loop: shard 0
reader_worker_2:log:16:runtime : 12
reader_worker_2:log:16:starttime : 0
reader_worker_2:log:16:task : Running a flusher loop: shard 0
reader_worker_2:log:17:runtime : 19
reader_worker_2:log:17:starttime : 0
reader_worker_2:log:17:task : Running a flusher loop: shard 0
reader_worker_2:log:18:runtime : 39
reader_worker_2:log:18:starttime : 0
reader_worker_2:log:18:task : Running a flusher loop: shard 0
reader_worker_2:log:19:runtime : 10
reader_worker_2:log:19:starttime : 0
reader_worker_2:log:19:task : W
reader_worker_2:log:1:runtime : 19
reader_worker_2:log:1:starttime : 0
reader_worker_2:log:1:task : Running a flusher loop: shard 0
reader_worker_2:log:2:runtime : 94
reader_worker_2:log:2:starttime : 0
reader_worker_2:log:2:task : Running a flusher loop: shard 0
reader_worker_2:log:3:runtime : 14
reader_worker_2:log:3:starttime : 0
reader_worker_2:log:3:task : Running a flusher loop: shard 0
reader_worker_2:log:4:runtime : 26
reader_worker_2:log:4:starttime : 0
reader_worker_2:log:4:task : Running a flusher loop: shard 0
reader_worker_2:log:5:runtime : 28
reader_worker_2:log:5:starttime : 0
reader_worker_2:log:5:task : Running a flusher loop: shard 0
reader_worker_2:log:6:runtime : 9
reader_worker_2:log:6:starttime : 0
reader_worker_2:log:6:task : Running a flusher loop: shard 0
reader_worker_2:log:7:runtime : 19
reader_worker_2:log:7:starttime : 0
reader_worker_2:log:7:task : Running a flusher loop: shard 0
reader_worker_2:log:8:runtime : 24
reader_worker_2:log:8:starttime : 0
reader_worker_2:log:8:task : Running a flusher loop: shard 0
reader_worker_2:log:9:runtime : 12
reader_worker_2:log:9:starttime : 0
reader_worker_2:log:9:task : Running a flusher loop: shard 0
reader_worker_2:runtime : 3287
reader_worker_2:state : running
reader_worker_2:task : Tap connection notifier
writer_worker_3:log:0:runtime : 1486
writer_worker_3:log:0:starttime : 0
writer_worker_3:log:0:task : Running a flusher loop: shard 0
writer_worker_3:log:10:runtime : 38
writer_worker_3:log:10:starttime : 0
writer_worker_3:log:10:task : Running a flusher loop: shard 0
writer_worker_3:log:11:runtime : 7
writer_worker_3:log:11:starttime : 0
writer_worker_3:log:11:task : Running a flusher loop: shard 0
writer_worker_3:log:12:runtime : 6
writer_worker_3:log:12:starttime : 0
writer_worker_3:log:12:task : Running a flusher loop: shard 0
writer_worker_3:log:13:runtime : 7
writer_worker_3:log:13:starttime : 0
writer_worker_3:log:13:task : Running a flusher loop: shard 0
writer_worker_3:log:14:runtime : 8
writer_worker_3:log:14:starttime : 0
writer_worker_3:log:14:task : Running a flusher loop: shard 0
writer_worker_3:log:15:runtime : 12
writer_worker_3:log:15:starttime : 0
writer_worker_3:log:15:task : Running a flusher loop: shard 0
writer_worker_3:log:16:runtime : 9
writer_worker_3:log:16:starttime : 0
writer_worker_3:log:16:task : Running a flusher loop: shard 0
writer_worker_3:log:17:runtime : 17
writer_worker_3:log:17:starttime : 0
writer_worker_3:log:17:task : Running a flusher loop: shard 0
writer_worker_3:log:18:runtime : 108
writer_worker_3:log:18:starttime : 0
writer_worker_3:log:18:task : Running a flusher loop: shard 0
writer_worker_3:log:19:runtime : 67
writer_worker_3:log:19:starttime : 0
writer_worker_3:log:19:task : Running a flusher loop: shard 0
writer_worker_3:log:1:runtime : 11
writer_worker_3:log:1:starttime : 0
writer_worker_3:log:1:task : Running a flusher loop: shard 0
writer_worker_3:log:2:runtime : 7
writer_worker_3:log:2:starttime : 0
writer_worker_3:log:2:task : Running a flusher loop: shard 0
writer_worker_3:log:3:runtime : 6
writer_worker_3:log:3:starttime : 0
writer_worker_3:log:3:task : Running a flusher loop: shard 0
writer_worker_3:log:4:runtime : 7
writer_worker_3:log:4:starttime : 0
writer_worker_3:log:4:task : Running a flusher loop: shard 0
writer_worker_3:log:5:runtime : 7
writer_worker_3:log:5:starttime : 0
writer_worker_3:log:5:task : Running a flusher loop: shard 0
writer_worker_3:log:6:runtime : 11
writer_worker_3:log:6:starttime : 0
writer_worker_3:log:6:task : Running a flusher loop: shard 0
writer_worker_3:log:7:runtime : 8
writer_worker_3:log:7:starttime : 0
writer_worker_3:log:7:task : Running a flusher loop: shard 0
writer_worker_3:log:8:runtime : 7
writer_worker_3:log:8:starttime : 0
writer_worker_3:log:8:task : Running a flusher loop: shard 0
writer_worker_3:log:9:runtime : 7
writer_worker_3:log:9:starttime : 0
writer_worker_3:log:9:task : Running a flusher loop: shard 0
writer_worker_3:runtime : 2097
writer_worker_3:state : running
writer_worker_3:task : Snapshotting vbucket states for the shard: 0
writer_worker_4:log:0:runtime : 20
writer_worker_4:log:0:starttime : 0
writer_worker_4:log:0:task : Running a flusher loop: shard 0
writer_worker_4:log:10:runtime : 27
writer_worker_4:log:10:starttime : 0
writer_worker_4:log:10:task : Running a flusher loop: shard 0
writer_worker_4:log:11:runtime : 26
writer_worker_4:log:11:starttime : 0
writer_worker_4:log:11:task : Running a flusher loop: shard 0
writer_worker_4:log:12:runtime : 38
writer_worker_4:log:12:starttime : 0
writer_worker_4:log:12:task : Running a flusher loop: shard 0
writer_worker_4:log:13:runtime : 9
writer_worker_4:log:13:starttime : 0
writer_worker_4:log:13:task : Running a flusher loop: shard 0
writer_worker_4:log:14:runtime : 22
writer_worker_4:log:14:starttime : 0
writer_worker_4:log:14:task : Running a flusher loop: shard 0
writer_worker_4:log:15:runtime : 13
writer_worker_4:log:15:starttime : 0
writer_worker_4:log:15:task : Running a flusher loop: shard 0
writer_worker_4:log:16:runtime : 19
writer_worker_4:log:16:starttime : 0
writer_worker_4:log:16:task : Running a flusher loop: shard 0
writer_worker_4:log:17:runtime : 24
writer_worker_4:log:17:starttime : 0
writer_worker_4:log:17:task : Running a flusher loop: shard 0
writer_worker_4:log:18:runtime : 194
writer_worker_4:log:18:starttime : 0
writer_worker_4:log:18:task : Running a flusher loop: shard 0
writer_worker_4:log:19:runtime : 870
writer_worker_4:log:19:starttime : 0
writer_worker_4:log:19:task : Running a flusher loop: shard 0
writer_worker_4:log:1:runtime : 15
writer_worker_4:log:1:starttime : 0
writer_worker_4:log:1:task : Running a flusher loop: shard 0
writer_worker_4:log:2:runtime : 11
writer_worker_4:log:2:starttime : 0
writer_worker_4:log:2:task : Running a flusher loop: shard 0
writer_worker_4:log:3:runtime : 66
writer_worker_4:log:3:starttime : 0
writer_worker_4:log:3:task : Running a flusher loop: shard 0
writer_worker_4:log:4:runtime : 37
writer_worker_4:log:4:starttime : 0
writer_worker_4:log:4:task : Running a flusher loop: shard 0
writer_worker_4:log:5:runtime : 17
writer_worker_4:log:5:starttime : 0
writer_worker_4:log:5:task : Running a flusher loop: shard 0
writer_worker_4:log:6:runtime : 17
writer_worker_4:log:6:starttime : 0
writer_worker_4:log:6:task : Running a flusher loop: shard 0
writer_worker_4:log:7:runtime : 7
writer_worker_4:log:7:starttime : 0
writer_worker_4:log:7:task : Running a flusher loop: shard 0
writer_worker_4:log:8:runtime : 33
writer_worker_4:log:8:starttime : 0
writer_worker_4:log:8:task : Running a flusher loop: shard 0
writer_worker_4:log:9:runtime : 17
writer_worker_4:log:9:starttime : 0
writer_worker_4:log:9:task : Running a flusher loop: shard 0
writer_worker_4:runtime : 1492
writer_worker_4:state : running
writer_worker_4:task : Snapshotting vbucket states for the shard: 3
writer_worker_5:log:0:runtime : 11
writer_worker_5:log:0:starttime : 0
writer_worker_5:log:0:task : Running a flusher loop: shard 0
writer_worker_5:log:10:runtime : 11
writer_worker_5:log:10:starttime : 0
writer_worker_5:log:10:task : Running a flusher loop: shard 0
writer_worker_5:log:11:runtime : 17
writer_worker_5:log:11:starttime : 0
writer_worker_5:log:11:task : Running a flusher loop: shard 0
writer_worker_5:log:12:runtime : 19
writer_worker_5:log:12:starttime : 0
writer_worker_5:log:12:task : Running a flusher loop: shard 0
writer_worker_5:log:13:runtime : 57
writer_worker_5:log:13:starttime : 0
writer_worker_5:log:13:task : Running a flusher loop: shard 0
writer_worker_5:log:14:runtime : 13
writer_worker_5:log:14:starttime : 0
writer_worker_5:log:14:task : Running a flusher loop: shard 0
writer_worker_5:log:15:runtime : 18
writer_worker_5:log:15:starttime : 0
writer_worker_5:log:15:task : Warmup - estimate database item count
writer_worker_5:log:16:runtime : 1519
writer_worker_5:log:16:starttime : 0
writer_worker_5:log:16:task : Warmup - check for access log
writer_worker_5:log:17:runtime : 283
writer_worker_5:log:17:starttime : 0
writer_worker_5:log:17:task : Running a flusher loop: shard 0
writer_worker_5:log:18:runtime : 19
writer_worker_5:log:18:starttime : 0
writer_worker_5:log:18:task : Running a flusher loop: shard 0
writer_worker_5:log:19:runtime : 102
writer_worker_5:log:19:starttime : 0
writer_worker_5:log:19:task : Running a flusher loop: shard 0
writer_worker_5:log:1:runtime : 28
writer_worker_5:log:1:starttime : 0
writer_worker_5:log:1:task : Running a flusher loop: shard 0
writer_worker_5:log:2:runtime : 9
writer_worker_5:log:2:starttime : 0
writer_worker_5:log:2:task : Running a flusher loop: shard 0
writer_worker_5:log:3:runtime : 7
writer_worker_5:log:3:starttime : 0
writer_worker_5:log:3:task : Running a flusher loop: shard 0
writer_worker_5:log:4:runtime : 25
writer_worker_5:log:4:starttime : 0
writer_worker_5:log:4:task : Running a flusher loop: shard 0
writer_worker_5:log:5:runtime : 36
writer_worker_5:log:5:starttime : 0
writer_worker_5:log:5:task : Running a flusher loop: shard 0
writer_worker_5:log:6:runtime : 29
writer_worker_5:log:6:starttime : 0
writer_worker_5:log:6:task : Running a flusher loop: shard 0
writer_worker_5:log:7:runtime : 17
writer_worker_5:log:7:starttime : 0
writer_worker_5:log:7:task : Running a flusher loop: shard 0
writer_worker_5:log:8:runtime : 57
writer_worker_5:log:8:starttime : 0
writer_worker_5:log:8:task : Running a flusher loop: shard 0
writer_worker_5:log:9:runtime : 27
writer_worker_5:log:9:starttime : 0
writer_worker_5:log:9:task : Running a flusher loop: shard 0
writer_worker_5:runtime : 2005
writer_worker_5:state : running
writer_worker_5:task : Updating stat snapshot on disk
Task 0: 'War'
/home/jenkins/couchbase/cmake/ep-engine/tests/ep_testsuite.cc:4919 Test failed: `worker_0's Current task incorrect' (tasklist.find(worker_0_task)!=tasklist.end())
 DIED


 Comments   
Comment by Sundar Sridharan [ 21/Jan/14 ]
This is a known issue and is observed because the stats collection thread does not grab locks so an ephemeral tasks like sharded warmup might not last long enough for the description to be fully copied over.
This results in a relatively minor impact and unit test only failures, so marking it with lowered priority as I continue to find a low impact fix.
Comment by Sundar Sridharan [ 28/Jan/14 ]
Fix ready at
http://review.couchbase.org/#/c/32887
Comment by Sundar Sridharan [ 28/Jan/14 ]
Fix is merged
Comment by Mike Wiederhold [ 30/Jan/14 ]
I'm still seeing this issue. I initially noticed the reason was that you didn't add tasklist.insert("Tap connection notifier"); to the list of tasks, but when I did that I then started to see that the test would crash sometime during test runs on os x. Below is the stack trace of the relevant thread.

Thread 1 (core thread 0):
#0 0x00007fff88f5e405 in std::string::_Rep::_M_dispose ()
#1 0x00007fff88f5f124 in std::string::assign ()
#2 0x000000010106ac31 in TaskLogEntry::operator= (this=0x1040027b0) at tasklogentry.h:33
#3 0x000000010106aca2 in std::__copy<false, std::random_access_iterator_tag>::copy<TaskLogEntry*, TaskLogEntry*> (__first=0x100813528, __last=0x100813540, __result=0x1040027b0) at stl_algobase.h:283
#4 0x000000010106ace6 in std::__copy_aux<TaskLogEntry*, TaskLogEntry*> (__first=0x100813408, __last=0x100813540, __result=0x104002690) at stl_algobase.h:315
#5 0x000000010106ad15 in std::__copy_normal<false, true>::__copy_n<TaskLogEntry*, __gnu_cxx::__normal_iterator<TaskLogEntry*, std::vector<TaskLogEntry, std::allocator<TaskLogEntry> > > > (__first=0x100813408, __last=0x100813540, __result={_M_current = 0x104002690}) at stl_algobase.h:358
#6 0x000000010106ad59 in std::copy<TaskLogEntry*, __gnu_cxx::__normal_iterator<TaskLogEntry*, std::vector<TaskLogEntry, std::allocator<TaskLogEntry> > > > (__first=0x100813408, __last=0x100813540, __result={_M_current = 0x104002690}) at stl_algobase.h:401
#7 0x000000010106c13a in RingBuffer<TaskLogEntry>::contents (this=0x100132ee8) at ringbuffer.h:84
#8 0x000000010106c197 in TaskQueue::getLog (this=0x100132d90) at taskqueue.h:53
#9 0x0000000101066f73 in ExecutorPool::doWorkerStat (this=0x10012e340, engine=0x103000c00, cookie=0x104000ee0, add_stat=0x1002475ec <add_stats>) at /Users/mikewied/membase-30/cmake/ep-engine/src/executorpool.cc:551
#10 0x000000010104279d in EventuallyPersistentEngine::doDispatcherStats (this=0x103000c00, cookie=0x104000ee0, add_stat=0x1002475ec <add_stats>) at /Users/mikewied/membase-30/cmake/ep-engine/src/ep_engine.cc:3921
#11 0x000000010104c838 in EventuallyPersistentEngine::getStats (this=0x103000c00, cookie=0x104000ee0, stat_key=0x10025fb33 "dispatcher", nkey=10, add_stat=0x1002475ec <add_stats>) at /Users/mikewied/membase-30/cmake/ep-engine/src/ep_engine.cc:4097
#12 0x000000010104d530 in EvpGetStats (handle=0x103000c00, cookie=0x104000ee0, stat_key=0x10025fb33 "dispatcher", nkey=10, add_stat=0x1002475ec <add_stats>) at /Users/mikewied/membase-30/cmake/ep-engine/src/ep_engine.cc:206
#13 0x0000000100002bd3 in mock_get_stats (handle=0x1000083a0, cookie=0x0, stat_key=0x10025fb33 "dispatcher", nkey=10, add_stat=0x1002475ec <add_stats>) at /Users/mikewied/membase-30/cmake/memcached/programs/engine_testapp.c:193
warning: .o file "/Users/mikewied/membase-30/cmake/ep-engine/CMakeFiles/ep_testsuite.dir/tests/ep_testsuite.cc.o" more recent than executable timestamp in "[memory object "/Users/mikewied/membase-30/cmake/ep-engine/ep_testsuite.dylib" at 0x100200000]"
warning: Couldn't open object file '/Users/mikewied/membase-30/cmake/ep-engine/CMakeFiles/ep_testsuite.dir/tests/ep_testsuite.cc.o'
#14 0x000000010021aecf in test_worker_stats ()
#15 0x0000000100004989 in execute_test (test={name = 0x1001007b0 "ep worker stats (couchstore)", tfun = 0x10021ae85 <test_worker_stats>, test_setup = 0x10022e0cd <test_setup>, test_teardown = 0x100203e70 <teardown>, cfg = 0x100400000 "max_num_workers=4;max_threads=8;backend=couchdb;couch_response_timeout=3000;couch_port=56927", prepare = 0x10020942f <prepare>, cleanup = 0x1002093cb <cleanup>}, engine=0x7fff5fbff8f2 "ep.so", default_cfg=0x7fff5fbff90e "flushall_enabled=true;ht_size=13;ht_locks=7") at /Users/mikewied/membase-30/cmake/memcached/programs/engine_testapp.c:999
#16 0x000000010000521e in main (argc=10, argv=0x7fff5fbff780) at /Users/mikewied/membase-30/cmake/memcached/programs/engine_testapp.c:1257
#17 0x00000001000025c8 in start ()
Comment by Sundar Sridharan [ 03/Feb/14 ]
we are hitting a latent issue that has surfaced, because of an unreliable read on the ring buffer. I will have a fix for this shortly.
thanks
Comment by Chiyoung Seo [ 05/Feb/14 ]
http://review.couchbase.org/#/c/33153/
Comment by Maria McDuff (Inactive) [ 28/Mar/14 ]
Venu,

can you verify the ep worker test is consistently passing now on centos.




[MB-9865] The upr add stream command needs to be able to timeout Created: 08/Jan/14  Updated: 22/Aug/14  Resolved: 12/Feb/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Abhinav Dangeti
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Comments   
Comment by Abhinav Dangeti [ 11/Feb/14 ]
For now, we don't think this will be needed, as once the request reaches the producer from the consumer, the producer pretty much immediately returns a yes or no after looking up its failover table.




[MB-9237] create bucket with 3 replica cause couchbase server hang Created: 08/Oct/13  Updated: 22/Aug/14  Resolved: 10/Oct/13

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Thuan Nguyen
Resolution: Fixed Votes: 0
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: ubuntu 12.04 64bit

Operating System: Ubuntu 64-bit

 Description   
Install couchbase server 3.0.0-25 on one ubuntu 12.04 64-bit
Create buckets with 1 replica ==> ok
Create buckets with 2 replica ==> ok
Create a bucket with 3 replica ==> failed
Couchbase server hang made all other buckets hang

Server is in hang status with IP 10.3.2.199 (Sundar has credential to access this server)
Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_3.0.0-25-rel.deb.manifest.xml


Below is backtrace from memcached of this node

GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>.
Attaching to process 3782
Reading symbols from /opt/couchbase/bin/memcached...done.
Reading symbols from /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1
Reading symbols from /opt/couchbase/bin/../lib/libplatform.so.0.1.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libplatform.so.0.1.0
Reading symbols from /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4
Reading symbols from /opt/couchbase/bin/../lib/libevent_core-2.0.so.5...done.
Loaded symbols for /opt/couchbase/bin/../lib/libevent_core-2.0.so.5
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f45c12f0700 (LWP 3925)]
[New Thread 0x7f45c1af1700 (LWP 3924)]
[New Thread 0x7f45c22f2700 (LWP 3923)]
[New Thread 0x7f45c2af3700 (LWP 3922)]
[New Thread 0x7f45c32f4700 (LWP 3921)]
[New Thread 0x7f45c3af5700 (LWP 3920)]
[New Thread 0x7f45c42f6700 (LWP 3919)]
[New Thread 0x7f45c4af7700 (LWP 3918)]
[New Thread 0x7f45c52f8700 (LWP 3917)]
[New Thread 0x7f45c5af9700 (LWP 3916)]
[New Thread 0x7f45c62fa700 (LWP 3915)]
[New Thread 0x7f45c6afb700 (LWP 3895)]
[New Thread 0x7f45c72fc700 (LWP 3894)]
[New Thread 0x7f45c7afd700 (LWP 3893)]
[New Thread 0x7f45c82fe700 (LWP 3892)]
[New Thread 0x7f45c8aff700 (LWP 3891)]
[New Thread 0x7f45c9300700 (LWP 3890)]
[New Thread 0x7f45c9b01700 (LWP 3889)]
[New Thread 0x7f45ca302700 (LWP 3888)]
[New Thread 0x7f45cab03700 (LWP 3887)]
[New Thread 0x7f45cb304700 (LWP 3886)]
[New Thread 0x7f45cbb05700 (LWP 3885)]
[New Thread 0x7f45cc306700 (LWP 3864)]
[New Thread 0x7f45ccb07700 (LWP 3863)]
[New Thread 0x7f45cd308700 (LWP 3862)]
[New Thread 0x7f45cdb09700 (LWP 3861)]
[New Thread 0x7f45ce30a700 (LWP 3860)]
[New Thread 0x7f45ceb0b700 (LWP 3859)]
[New Thread 0x7f45cf30c700 (LWP 3858)]
[New Thread 0x7f45cfb0d700 (LWP 3857)]
[New Thread 0x7f45d030e700 (LWP 3856)]
[New Thread 0x7f45d0b0f700 (LWP 3855)]
[New Thread 0x7f45d1310700 (LWP 3854)]
[New Thread 0x7f45d1b11700 (LWP 3833)]
[New Thread 0x7f45d2312700 (LWP 3832)]
[New Thread 0x7f45d2b13700 (LWP 3831)]
[New Thread 0x7f45d3314700 (LWP 3830)]
[New Thread 0x7f45d3b15700 (LWP 3829)]
[New Thread 0x7f45d4316700 (LWP 3828)]
[New Thread 0x7f45d4b17700 (LWP 3827)]
[New Thread 0x7f45d5318700 (LWP 3826)]
[New Thread 0x7f45d5b19700 (LWP 3825)]
[New Thread 0x7f45d631a700 (LWP 3824)]
[New Thread 0x7f45d6b1b700 (LWP 3823)]
[New Thread 0x7f45d9bd5700 (LWP 3822)]
[New Thread 0x7f45da3d6700 (LWP 3797)]
[New Thread 0x7f45dabd7700 (LWP 3796)]
[New Thread 0x7f45db3d8700 (LWP 3795)]
[New Thread 0x7f45dbbd9700 (LWP 3794)]
[New Thread 0x7f45dc3da700 (LWP 3793)]
[New Thread 0x7f45dcde8700 (LWP 3791)]
[New Thread 0x7f45dda04700 (LWP 3790)]
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libstdc++.so.6
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
Reading symbols from /opt/couchbase/lib/libv8.so...done.
Loaded symbols for /opt/couchbase/lib/libv8.so
Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicuuc.so.44
Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicudata.so.44
Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicui18n.so.44
Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
0x00007f45de50d363 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) thread apply all bt

Thread 53 (Thread 0x7f45dda04700 (LWP 3790)):
#0 0x00007f45de4ff8cd in read () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f45de493ff8 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f45de49503e in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f45de48918a in _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f45de48806b in fgets () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f45dda05aa1 in fgets (__stream=<optimized out>, __n=<optimized out>, __s=<optimized out>) at /usr/include/bits/stdio2.h:255
#6 check_stdin_thread (arg=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/extensions/daemon/stdin_check.c:38
#7 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0020) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#8 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

Thread 52 (Thread 0x7f45dcde8700 (LWP 3791)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45df267a63 in cb_cond_timedwait (cond=0x7f45dd203200, mutex=0x7f45dd2031c0, ms=<optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:133
#2 0x00007f45dd00204d in logger_thead_main (arg=<optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/extensions/loggers/file_logger.c:363
#3 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0010) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#4 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 51 (Thread 0x7f45dc3da700 (LWP 3793)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45df267819 in cb_mutex_enter (mutex=0x7f45dc5e73c0) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:79
#4 0x00007f45dc3dd8a0 in lock_engines () at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:301
#5 find_bucket (name=0xec1ad0 "bucket-2") at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:639
#6 0x00007f45dc3e1e3a in handle_select_bucket (response=<optimized out>, request=<optimized out>, cookie=<optimized out>, handle=<optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:2778
#7 bucket_unknown_command (handle=0x7f45dc5e7220, cookie=0x569fb80, request=0x56a3800, response=0x4078e0 <binary_response_handler>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:2918
#8 0x00000000004142dc in process_bin_unknown_packet (c=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:2611
#9 process_bin_packet (c=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:4055
#10 complete_nread (c=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:4620
#11 conn_nread (c=0x569fb80) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:5493
#12 0x00000000004076cd in event_handler (fd=<optimized out>, which=<optimized out>, arg=0x569fb80)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:5749
#13 0x00007f45dee0eafc in event_process_active_single_queue (activeq=<optimized out>, base=<optimized out>) at event.c:1308
#14 event_process_active (base=<optimized out>) at event.c:1375
#15 event_base_loop (base=0x569e500, flags=<optimized out>) at event.c:1572
#16 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0140) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#17 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 50 (Thread 0x7f45dbbd9700 (LWP 3794)):
#0 0x00007f45de50d363 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f45dee235a6 in epoll_dispatch (base=0x569e280, tv=<optimized out>) at epoll.c:404
#2 0x00007f45dee0ea04 in event_base_loop (base=0x569e280, flags=<optimized out>) at event.c:1558
---Type <return> to continue, or q <return> to quit---
#3 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0130) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#4 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 49 (Thread 0x7f45db3d8700 (LWP 3795)):
#0 ExecutorPool::startWorkers (this=<optimized out>, workload=...) at src/scheduler.cc:550
#1 0x00007f45d913c9c1 in ExecutorPool::registerBucket (this=0x578ba00, engine=0x572b200) at src/scheduler.cc:528
#2 0x00007f45d910e43c in EventuallyPersistentStore::EventuallyPersistentStore (this=0xbef9200, theEngine=...) at src/ep.cc:153
#3 0x00007f45d9118f6f in EventuallyPersistentEngine::initialize (this=0x572b200, config=<optimized out>) at src/ep_engine.cc:1355
#4 0x00007f45d9119256 in EvpInitialize (handle=0x572b200,
    config_str=0x56f1823 "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_txn_size=10000;max_size=524288000;tap_keepalive=300;dbname=/opt/couchbase/var/lib/couchbase/data/bucket-3;allow_data_loss_during_shutdown=true;backend="...) at src/ep_engine.cc:127
#5 0x00007f45dc3e183a in create_bucket_UNLOCKED (e=<optimized out>, bucket_name=0xec05d0 "bucket-3", path=0x56f1800 "/opt/couchbase/lib/memcached/ep.so",
    config=<optimized out>, e_out=<optimized out>, msg=0x7f45db3d77f0 "", msglen=1024)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:781
#6 0x00007f45dc3e1aae in handle_create_bucket (handle=0x7f45dc5e7220, cookie=0x564a500, request=<optimized out>, response=0x4078e0 <binary_response_handler>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:2567
#7 0x00007f45dc3e1f69 in bucket_unknown_command (handle=0x7f45dc5e7220, cookie=0x564a500, request=0x5670800, response=0x4078e0 <binary_response_handler>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:2906
#8 0x00000000004142dc in process_bin_unknown_packet (c=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:2611
#9 process_bin_packet (c=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:4055
#10 complete_nread (c=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:4620
#11 conn_nread (c=0x564a500) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:5493
#12 0x00000000004076cd in event_handler (fd=<optimized out>, which=<optimized out>, arg=0x564a500)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:5749
#13 0x00007f45dee0eafc in event_process_active_single_queue (activeq=<optimized out>, base=<optimized out>) at event.c:1308
#14 event_process_active (base=<optimized out>) at event.c:1375
#15 event_base_loop (base=0x569e000, flags=<optimized out>) at event.c:1572
#16 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0120) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#17 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 48 (Thread 0x7f45dabd7700 (LWP 3796)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45df267819 in cb_mutex_enter (mutex=0x7f45dc5e73c0) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:79
#4 0x00007f45dc3dd8a0 in lock_engines () at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:301
#5 find_bucket (name=0xebe040 "default") at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:639
#6 0x00007f45dc3e2a72 in handle_connect (cookie=0x564a780, type=128, event_data=0x0, cb_data=0xffffffffffffffff)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/engines/bucket_engine/bucket_engine.c:1201
#7 0x0000000000408713 in perform_callbacks (c=<optimized out>, data=<optimized out>, type=<optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:242
#8 conn_new (sfd=68, parent_port=11209, init_state=<optimized out>, event_flags=18, read_buffer_size=<optimized out>, base=0x56f3b80, timeout=0x0)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:920
#9 0x00000000004153d9 in thread_libevent_process (fd=<optimized out>, which=<optimized out>, arg=<optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/thread.c:320
#10 0x00007f45dee0eafc in event_process_active_single_queue (activeq=<optimized out>, base=<optimized out>) at event.c:1308
#11 event_process_active (base=<optimized out>) at event.c:1375
#12 event_base_loop (base=0x56f3b80, flags=<optimized out>) at event.c:1572
#13 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0110) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#14 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#15 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 47 (Thread 0x7f45da3d6700 (LWP 3797)):
#0 0x00007f45de50d363 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f45dee235a6 in epoll_dispatch (base=0x56f3900, tv=<optimized out>) at epoll.c:404
#2 0x00007f45dee0ea04 in event_base_loop (base=0x56f3900, flags=<optimized out>) at event.c:1558
#3 0x00007f45df2679bf in platform_thread_wrap (arg=0xec0100) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/platform/src/cb_pthreads.c:18
#4 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 46 (Thread 0x7f45d9bd5700 (LWP 3822)):
#0 0x00007f45de4d884d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f45de506784 in usleep () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f45d9137bd5 in updateStatsThread (arg=<optimized out>) at src/memory_tracker.cc:36
#3 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 45 (Thread 0x7f45d6b1b700 (LWP 3823)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710480, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b7540) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 44 (Thread 0x7f45d631a700 (LWP 3824)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710120, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b74a0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 43 (Thread 0x7f45d5b19700 (LWP 3825)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x57105a0, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b7720) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 42 (Thread 0x7f45d5318700 (LWP 3826)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=60) at src/scheduler.cc:437
#7 0x00007f45d915f114 in StatSnap::run (this=<optimized out>) at src/tasks.cc:96
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b7680) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 41 (Thread 0x7f45d4b17700 (LWP 3827)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5711440, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b75e0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 40 (Thread 0x7f45d4316700 (LWP 3828)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710c60, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b79a0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
---Type <return> to continue, or q <return> to quit---
#12 0x0000000000000000 in ?? ()

Thread 39 (Thread 0x7f45d3b15700 (LWP 3829)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5711320, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b7900) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 38 (Thread 0x7f45d3314700 (LWP 3830)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710360, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b7860) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 37 (Thread 0x7f45d2b13700 (LWP 3831)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x57110e0, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x56b77c0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 36 (Thread 0x7f45d2312700 (LWP 3832)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d91186e2 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 wait (secs=<optimized out>, this=<optimized out>) at src/syncobject.h:93
#3 wait (previousCounter=<optimized out>, howlong=<optimized out>, this=<optimized out>) at src/tapconnmap.h:225
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5728000) at src/ep_engine.cc:3383
#5 0x00007f45d91187d3 in EvpNotifyPendingConns (arg=0x5728000) at src/ep_engine.cc:1139
---Type <return> to continue, or q <return> to quit---
#6 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x0000000000000000 in ?? ()

Thread 35 (Thread 0x7f45d1b11700 (LWP 3833)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d90fbfb8 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 IdleTask::run (this=0xed03f0, d=...) at src/dispatcher.cc:344
#3 0x00007f45d90feb6a in Dispatcher::run (this=0x5708c40) at src/dispatcher.cc:186
#4 0x00007f45d90ff33d in launch_dispatcher_thread (arg=0x5708c94) at src/dispatcher.cc:30
#5 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000000000 in ?? ()

Thread 34 (Thread 0x7f45d1310700 (LWP 3854)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bde00, tid=2047) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370b40) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 33 (Thread 0x7f45d0b0f700 (LWP 3855)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5711680, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370aa0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 32 (Thread 0x7f45d030e700 (LWP 3856)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x57117a0, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370a00) at src/scheduler.cc:97
---Type <return> to continue, or q <return> to quit---
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 31 (Thread 0x7f45cfb0d700 (LWP 3857)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bcdc0, tid=1070) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370960) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 30 (Thread 0x7f45cf30c700 (LWP 3858)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0xecaf00, tid=10) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x63708c0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 29 (Thread 0x7f45ceb0b700 (LWP 3859)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x9c93e00, tid=3011) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370820) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 28 (Thread 0x7f45ce30a700 (LWP 3860)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x9c93cc0, tid=3010) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370780) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 27 (Thread 0x7f45cdb09700 (LWP 3861)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0xecb040, tid=11) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x63706e0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 26 (Thread 0x7f45cd308700 (LWP 3862)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bc000, tid=2051) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x6370640) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7f45ccb07700 (LWP 3863)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d91186e2 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 wait (secs=<optimized out>, this=<optimized out>) at src/syncobject.h:93
#3 wait (previousCounter=<optimized out>, howlong=<optimized out>, this=<optimized out>) at src/tapconnmap.h:225
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5728a00) at src/ep_engine.cc:3383
#5 0x00007f45d91187d3 in EvpNotifyPendingConns (arg=0x5728a00) at src/ep_engine.cc:1139
#6 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x0000000000000000 in ?? ()

Thread 24 (Thread 0x7f45cc306700 (LWP 3864)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1 0x00007f45d90fbfb8 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 IdleTask::run (this=0x5806480, d=...) at src/dispatcher.cc:344
#3 0x00007f45d90feb6a in Dispatcher::run (this=0x5709a40) at src/dispatcher.cc:186
#4 0x00007f45d90ff33d in launch_dispatcher_thread (arg=0x5709a94) at src/dispatcher.cc:30
#5 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000000000 in ?? ()

Thread 23 (Thread 0x7f45cbb05700 (LWP 3885)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bdb80, tid=2050) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c3c0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 22 (Thread 0x7f45cb304700 (LWP 3886)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=60) at src/scheduler.cc:437
#7 0x00007f45d915f114 in StatSnap::run (this=<optimized out>) at src/tasks.cc:96
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c320) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7f45cab03700 (LWP 3887)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bca00, tid=1067) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c280) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 20 (Thread 0x7f45ca302700 (LWP 3888)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710ea0, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c1e0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 19 (Thread 0x7f45c9b01700 (LWP 3889)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710b40, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c140) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 18 (Thread 0x7f45c9300700 (LWP 3890)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x9c93b80, tid=3008) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c0a0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7f45c8aff700 (LWP 3891)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5710d80, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x791c000) at src/scheduler.cc:97
---Type <return> to continue, or q <return> to quit---
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7f45c82fe700 (LWP 3892)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0xecb400, tid=9) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x9871680) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 15 (Thread 0x7f45c7afd700 (LWP 3893)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x57106c0, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0x98715e0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7f45c72fc700 (LWP 3894)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d91186e2 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 wait (secs=<optimized out>, this=<optimized out>) at src/syncobject.h:93
#3 wait (previousCounter=<optimized out>, howlong=<optimized out>, this=<optimized out>) at src/tapconnmap.h:225
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x5729e00) at src/ep_engine.cc:3383
#5 0x00007f45d91187d3 in EvpNotifyPendingConns (arg=0x5729e00) at src/ep_engine.cc:1139
#6 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7f45c6afb700 (LWP 3895)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d90fbfb8 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 IdleTask::run (this=0x5804d80, d=...) at src/dispatcher.cc:344
#3 0x00007f45d90feb6a in Dispatcher::run (this=0x7a6ca80) at src/dispatcher.cc:186
#4 0x00007f45d90ff33d in launch_dispatcher_thread (arg=0x7a6cad4) at src/dispatcher.cc:30
#5 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
---Type <return> to continue, or q <return> to quit---
#7 0x0000000000000000 in ?? ()

Thread 12 (Thread 0x7f45c62fa700 (LWP 3915)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bc8c0, tid=1071) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd065a0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7f45c5af9700 (LWP 3916)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=60) at src/scheduler.cc:437
#7 0x00007f45d915f114 in StatSnap::run (this=<optimized out>) at src/tasks.cc:96
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd06500) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f45c52f8700 (LWP 3917)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bdcc0, tid=2048) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd06460) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f45c4af7700 (LWP 3918)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
---Type <return> to continue, or q <return> to quit---
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=60) at src/scheduler.cc:437
#7 0x00007f45d915f114 in StatSnap::run (this=<optimized out>) at src/tasks.cc:96
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd063c0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f45c42f6700 (LWP 3919)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x9c93a40, tid=3012) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd06320) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f45c3af5700 (LWP 3920)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5711200, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd06280) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f45c32f4700 (LWP 3921)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=1) at src/scheduler.cc:437
#7 0x00007f45d90f0729 in BgFetcher::run (this=0x5711560, tid=<optimized out>) at src/bgfetcher.cc:163
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd061e0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f45c2af3700 (LWP 3922)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0xecb2c0, tid=13) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd06140) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f45c22f2700 (LWP 3923)):
#0 0x00007f45debf289c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45debee065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f45debedeba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f45d9138aea in Mutex::acquire (this=0x578bb10) at src/mutex.cc:80
#4 0x00007f45d913a458 in lock (this=<optimized out>) at src/locks.h:66
#5 LockHolder (m=..., this=<optimized out>) at src/locks.h:44
#6 ExecutorPool::snooze (this=0x578ba00, taskId=128, tosleep=0.20000000000000001) at src/scheduler.cc:437
#7 0x00007f45d913186e in Flusher::step (this=0x57bcc80, tid=1069) at src/flusher.cc:165
#8 0x00007f45d913e31b in ExecutorThread::run (this=0xbd060a0) at src/scheduler.cc:97
#9 0x00007f45d913e81d in launch_executor_thread (arg=0x578bb18) at src/scheduler.cc:38
#10 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f45c1af1700 (LWP 3924)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d91186e2 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 wait (secs=<optimized out>, this=<optimized out>) at src/syncobject.h:93
#3 wait (previousCounter=<optimized out>, howlong=<optimized out>, this=<optimized out>) at src/tapconnmap.h:225
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x572a800) at src/ep_engine.cc:3383
#5 0x00007f45d91187d3 in EvpNotifyPendingConns (arg=0x572a800) at src/ep_engine.cc:1139
#6 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f45c12f0700 (LWP 3925)):
#0 0x00007f45debf00fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f45d90fbfb8 in wait (tv=..., this=<optimized out>) at src/syncobject.h:77
#2 IdleTask::run (this=0x9bdcab0, d=...) at src/dispatcher.cc:344
#3 0x00007f45d90feb6a in Dispatcher::run (this=0xbf82a80) at src/dispatcher.cc:186
#4 0x00007f45d90ff33d in launch_dispatcher_thread (arg=0xbf82ad4) at src/dispatcher.cc:30
#5 0x00007f45debebe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f45de50cccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f45dfa82740 (LWP 3782)):
#0 0x00007f45de50d363 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f45dee235a6 in epoll_dispatch (base=0x569e780, tv=<optimized out>) at epoll.c:404
#2 0x00007f45dee0ea04 in event_base_loop (base=0x569e780, flags=<optimized out>) at event.c:1558
---Type <return> to continue, or q <return> to quit---
#3 0x000000000040af4d in main (argc=<optimized out>, argv=<optimized out>) at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/cmake/memcached/daemon/memcached.c:7667
(gdb)


 Comments   
Comment by Sundar Sridharan [ 08/Oct/13 ]
Issue found due to change in shard size reconfiguration...
Fix available at
http://review.couchbase.org/#/c/29396
thanks
Comment by Thuan Nguyen [ 08/Oct/13 ]
Thumb up for Sundar with quick fix.
I will test again when the build with this fix available.




[MB-5653] memcached stucks in shutdown code Created: 22/Jun/12  Updated: 22/Aug/14  Resolved: 18/Jan/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0-beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Aleksey Kondratenko Assignee: Trond Norbye
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Was running cluster_run cluster and stopped it by Ctrl-C (which is not delivered to memcached). All memcached's died normally. One seems to hang. Here's backtrace:


(gdb) thread app all bt

Thread 2 (Thread 0xf298ab70 (LWP 4846)):
#0 0xf772f430 in __kernel_vsyscall ()
#1 0xf767af02 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xf767639b in _L_lock_728 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#3 0xf76761c1 in __pthread_mutex_lock (mutex=mutex@entry=0xc90bb64) at pthread_mutex_lock.c:61
#4 0x0804d9a4 in release_cookie (cookie=0xc8da8d8) at daemon/memcached.c:6691
#5 0xf771fc12 in bucket_engine_release_cookie (cookie=0xc8da8d8) at bucket_engine.c:2519
#6 0xf4343fa4 in EventuallyPersistentEngine::releaseCookie (this=0xc93a4c8, cookie=0xc8da8d8) at ep_engine.cc:1192
#7 0xf437d04b in TapConnection::releaseReference (this=0xf1800a10, force=false) at tapconnection.cc:65
#8 0xf438f72b in TapConnectionReaperCallback::callback (this=0xc90bd38) at tapconnmap.cc:29
#9 0xf4326855 in Task::run (this=0xca3dc40, d=..., t=...) at dispatcher.hh:139
#10 0xf4324713 in Dispatcher::run (this=0xc95ea08) at dispatcher.cc:123
#11 0xf43261ea in launch_dispatcher_thread (arg=0xc95ea08) at dispatcher.cc:28
#12 0xf7673c39 in start_thread (arg=0xf298ab70) at pthread_create.c:304
#13 0xf75e127e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Thread 1 (Thread 0xf750f6c0 (LWP 4793)):
#0 0xf772f430 in __kernel_vsyscall ()
#1 0xf7674e65 in pthread_join (threadid=4070091632, thread_return=0x0) at pthread_join.c:89
#2 0xf43203d0 in Dispatcher::stop (this=0xc95ea08, force=false) at dispatcher.cc:162
#3 0xf432c8f7 in EventuallyPersistentStore::~EventuallyPersistentStore (this=0xc94b9b0, __in_chrg=<optimized out>) at ep.cc:676
#4 0xf43595b4 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0xc93a4c8, __in_chrg=<optimized out>) at ep_engine.h:511
#5 0xf4347fc5 in EvpDestroy (handle=0xc93a4c8, force=false) at ep_engine.cc:125
#6 0xf771c5fa in bucket_shutdown_engine (key=0xc93a4b8, nkey=7, val=0xc90c150, nval=0, args=0x0) at bucket_engine.c:1289
#7 0xf7721c59 in genhash_iter (h=0xc8f5330, iterfunc=iterfunc@entry=0xf771c5a0 <bucket_shutdown_engine>, arg=arg@entry=0x0) at genhash.c:275
#8 0xf7720ae2 in bucket_destroy (handle=0xf77253e0, force=<optimized out>) at bucket_engine.c:1326
#9 bucket_destroy (handle=0xf77253e0, force=false) at bucket_engine.c:1306
#10 0x0804be30 in main (argc=19, argv=0xffeb7ab4) at daemon/memcached.c:7921

Both threads seem to be stuck and unable to proceed

 Comments   
Comment by Aleksey Kondratenko [ 22/Jun/12 ]
Added bucket-engine as well as it can be deadlock in there instead
Comment by Trond Norbye [ 03/Jul/12 ]
Are these the only threads on the system? From what I can see thread 1 is waiting for thread 2 to terminate and thread 2 is trying to lock the thread the connection is bound to..
Comment by Aleksey Kondratenko [ 03/Jul/12 ]
Apparently yes, those are only threads left
Comment by Trond Norbye [ 04/Jul/12 ]
I assume you don't have access to the process anymore? It would be interesting to see the "cookie" there.. The locking of the thread instance happens within the event dispatcher callback, and it should never be locked when the thread using the instance isn't "running" in the callback. At this time all of the worker threads are released, so none of the threads should be "locked".
Comment by Aleksey Kondratenko [ 04/Jul/12 ]
Correct. I don't have access anymore.
Comment by Peter Wansch (Inactive) [ 01/Aug/12 ]
We cannot reproduce this bug. If it occurs again, we need a coredump. Keeping this bug open for now.
Comment by Karan Kumar (Inactive) [ 10/Sep/12 ]
Not sure, if this is the same,

2.0.0-1697-rel
but we are hitting somethign quite similar. memached is not idle.. just seems to be consuming cpu.. no disk.. etc

https://friendpaste.com/5gFQw9wPBFOgNjue64HfM2



Comment by Aleksey Kondratenko [ 10/Sep/12 ]
my understanding (looking at backtrace) is it's very different bug. Mine looks like some deadlock, but this case is (IMHO) clearly different.
Comment by Karan Kumar (Inactive) [ 10/Sep/12 ]
We are going to file a separate bug.
Comment by Trond Norbye [ 11/Sep/12 ]
Karan: Which thread is consuming the CPU? I would "guess" the one in flushOneDelOrSet (it's traversing all of the items in the cache...)
Comment by Karan Kumar (Inactive) [ 11/Sep/12 ]
Not sure. How can I get per thread cpu consumption?
Comment by Trond Norbye [ 11/Sep/12 ]
I believe you may see that from pressing H in top on linux.
Comment by Peter Wansch (Inactive) [ 08/Oct/12 ]
Trond, do you have a patch in gerrit for this?
Comment by Karan Kumar (Inactive) [ 17/Oct/12 ]
Hitting the issue again on build 1862
https://friendpaste.com/2RoA2uswrK1WgTtjDNrQei
Comment by Mike Wiederhold [ 18/Jan/14 ]
Closing since this is over 1 year old. If this is still a valid issue please reopen.




[MB-7253] vbuckettool help/usage needs improvement Created: 22/Nov/12  Updated: 22/Aug/14  Resolved: 20/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Andrei Baranouski Assignee: Mike Wiederhold
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
key WRONG_KEY doesn't exist in bucket

[root@cen-2708 tools]# curl http://localhost:8091/pools/default/buckets/default | ./vbuckettool - WRONG_KEY
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 9837 100 9837 0 0 12047 0 --:--:-- --:--:-- --:--:-- 1601k
key: WRONG_KEY master: 10.3.121.66:11210 vBucketId: 739 couchApiBase: http://10.3.121.66:8092/default replicas: 10.3.121.63:11210



description:

./vbuckettool
vbuckettool mapfile key0 [key1 ... [keyN]]

  The vbuckettool expects a vBucketServerMap JSON mapfile, and
  will print the vBucketId and servers each key should live on.
  You may use '-' instead for the filename to specify stdin.

  Examples:
    ./vbuckettool file.json some_key another_key

    curl http://HOST:8091/pools/default/buckets/default | \
       ./vbuckettool - some_key another_key



 Comments   
Comment by Andrei Baranouski [ 22/Nov/12 ]
if I set non-existent key, key will be stored in vbucket map?
if it's so it's better to add such information in usage section
Comment by Steve Yen [ 22/Nov/12 ]
vbuckettool doesn't store/retrieve keys. It only hashes the key and computes which vbucket the key/item would have been assigned to, to help with diagnosis.
Comment by Farshid Ghods (Inactive) [ 26/Nov/12 ]
Steve,

can you please add more documentation to this tool ( usage ).
the user who is unfamiliar with the tools might think that this tool also checks whether the key exists on the replica or active
Comment by Steve Yen [ 10/Dec/12 ]
Bin,
This now looks like a usage string fix. vbuckettool source code is in libvbucket project, I think. Time to fix some c code!
Thanks,
Steve
Comment by Maria McDuff (Inactive) [ 10/Oct/13 ]
Bin,

any chance to fix this in 3.0? if not, pls indicate for deferral.
pls advise.
Comment by Anil Kumar [ 17/Jun/14 ]
Needs some minor usage info to the tool.
Comment by Mike Wiederhold [ 20/Aug/14 ]
This tool no longer exists in 3.0.




[MB-4460] seeing tap client activity even though the client is not using a TAP client Created: 21/Nov/11  Updated: 22/Aug/14  Resolved: 28/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 1.7.2, 2.0-beta, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Farshid Ghods (Inactive) Assignee: Maria McDuff (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Clients TAP queue.png     PNG File Screen Shot 2014-01-30 at 4.37.29 PM.png     GZip Archive tap stats.tar.gz    
Triage: Untriaged

 Comments   
Comment by Perry Krug [ 21/Nov/11 ]
Screenshots from 2 different customers attached.
Comment by Chiyoung Seo [ 01/Jun/12 ]
Just for redistributing the bugs to the couchbase bucket team
Comment by Perry Krug [ 06/Oct/12 ]
FYI, this is still showing up in 2.0
Comment by Xiaoqin Ma (Inactive) [ 25/Mar/13 ]
@Perry
Can you give more details about what they are doing and the setup of the cluster?
Comment by Perry Krug [ 25/Mar/13 ]
Xiaoqin, there is nothing special about this setup. A few nodes and a bit of load, I've seen this myself in many different situations across many different versions. I'm presuming there is something incorrect about the way ns_server is monitoring these TAP stats to categorize them as "client" since there has never been a client TAP connected in any of these scenarios.

Comment by Maria McDuff (Inactive) [ 06/Apr/13 ]
andrei, pls verify. mike thinks you have a test that can repro this.
Comment by Thuan Nguyen [ 10/Apr/13 ]
Integrated in github-ep-engine-2-0 #483 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/483/])
    MB-4460: Don't aggregate a disconnected tap stream (Revision 572b5a2c7cf9adb1b5c42b8f0ede6817433ebbce)

     Result = SUCCESS
Mike Wiederhold :
Files :
* src/ep_engine.cc
Comment by Andrei Baranouski [ 22/Apr/13 ]
checked it manually, tap client stats equal 0 when user doesn't use Tap connections.

Maria, we don't have any automation tests that check tap stats like ep_tap_user_qlen
Comment by Maria McDuff (Inactive) [ 24/Apr/13 ]
CBQE-1245.
Comment by Andrei Baranouski [ 29/Apr/13 ]
checked it manually, need to create automation test CBQE-1245
Comment by Perry Krug [ 30/Jan/14 ]
I'm seeing this again on 2.5....new screenshot attached
Comment by Mike Wiederhold [ 28/Apr/14 ]
In 3.0 we will use upr as the replication protocol. Since this is a cosmetic issue I am closing as won't fix.




[MB-11976] memory usage differs by 1.5 percent when we recreate the buckets and load the same data Created: 16/Aug/14  Updated: 22/Aug/14  Resolved: 22/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Andrei Baranouski Assignee: Abhinav Dangeti
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1118-rel

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Hi Abhinav,

I know that you wrote this test, and so I'm going straight to you )

http://qa.hq.northscale.net/job/centos_x64--51_01--mem_sanity-P1/8/consoleFull

./testrunner -i /tmp/centos_x64--51_01--mem_sanity-P1.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,GROUP=ALL -t memorysanitytests.MemorySanity.repetitive_create_delete,items=200000,repetition_count=5

This test creates a bucket, adds an initial front end load,
        checks memory stats, deletes the bucket, and recreates the
        same scenario repetitively for the specified number of times,
        and checks after the last repetition if the memory usage is
        the same that was at the end of the very first front end load.

2014-08-15 23:36:48 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.1.2.58%2Cns_1%4010.1.2.56%2Cns_1%4010.1.2.57%2Cns_1%4010.1.2.55
2014-08-15 23:36:58 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.1.2.55:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-08-15 23:38:20 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 200000 with exp:0
2014-08-15 23:38:26 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Initial max_data_size of bucket 'default': 8678014976
2014-08-15 23:38:26 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] initial memory consumption of bucket 'default' with load: 350739600

remove bucket default ...
2014-08-15 23:38:42 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.1.2.55:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-08-15 23:40:01 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 200000 with exp:0
2014-08-15 23:40:07 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 1 = 349445216, Difference from initial snapshot: -1294384

remove bucket default ...
2014-08-15 23:40:22 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.1.2.55:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-08-15 23:41:48 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 200000 with exp:0
2014-08-15 23:41:54 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 2 = 343476928, Difference from initial snapshot: -7262672

remove bucket default ...
2014-08-15 23:42:09 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.1.2.55:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-08-15 23:43:33 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 200000 with exp:0
2014-08-15 23:43:39 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 3 = 359316944, Difference from initial snapshot: 8577344

remove bucket default ...
2014-08-15 23:43:55 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.1.2.55:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-08-15 23:45:15 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 200000 with exp:0
2014-08-15 23:45:21 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 4 = 359671952, Difference from initial snapshot: 8932352

remove bucket default ...
2014-08-15 23:45:37 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.1.2.55:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-08-15 23:46:59 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 200000 with exp:0
2014-08-15 23:47:15 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] default :: Initial: 350739600 :: Now: 355466704 :: Difference: 4727104

in short:
in the initial attempt we've got 350739600 Byte(334.49 Megabyte) mem_used
in the 5-th attempt it was 355466704 byte(338.99 Megabyte)
so, here the difference is about 4.5 megabyte

in our tests, you accept the difference only 100000 byte( ~0.1 Megabyte)
and seems like this was always enough (3.0.0-767-rel http://qa.hq.northscale.net/job/centos_x64--51_01--mem_sanity-P1/2/) because I had not seen this issue before
https://github.com/couchbase/testrunner/blob/master/pytests/memorysanitytests.py#L38


 Comments   
Comment by Andrei Baranouski [ 16/Aug/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11976/f8e52844/10.1.2.55-8152014-2348-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11976/f8e52844/10.1.2.56-8152014-2349-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11976/f8e52844/10.1.2.57-8152014-2349-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11976/f8e52844/10.1.2.58-8152014-2350-diag.zip
Comment by Abhinav Dangeti [ 21/Aug/14 ]
Hey Andrei,

In 3.0, to support the features like rollback provided by UPR/DCP, the checkpoints hold on to previous item instances for slightly longer, causing a slight memory usage increase, but then once they do let the reference go, the memory usage does drop back down to the initial usage.

I verified the results with this test runner code change:
http://review.couchbase.org/#/c/40797

2014-08-21 10:23:09 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 15 secs. ...
2014-08-21 10:23:24 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Initial max_data_size of bucket 'default': 9162457088
2014-08-21 10:23:24 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] initial memory consumption of bucket 'default' with load: 162832728

2014-08-21 10:24:25 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 15 secs. ...
2014-08-21 10:24:40 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 1 = 162832728, Difference from initial snapshot: 0

2014-08-21 10:25:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 15 secs. ...
2014-08-21 10:25:47 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 2 = 162832984, Difference from initial snapshot: 256

2014-08-21 10:26:38 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 15 secs. ...
2014-08-21 10:26:53 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 3 = 162833240, Difference from initial snapshot: 512

2014-08-21 10:27:43 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 15 secs. ...
2014-08-21 10:27:59 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 4 = 162833016, Difference from initial snapshot: 288

2014-08-21 10:28:49 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 15 secs. ...
2014-08-21 10:29:05 | INFO | MainProcess | test_thread | [memorysanitytests.repetitive_create_delete] Memory used after attempt 5 = 162833112, Difference from initial snapshot: 384




[MB-12042] Memcached crashes with Segmentation fault during bucket deletion Created: 21/Aug/14  Updated: 22/Aug/14  Resolved: 22/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Meenakshi Goel Assignee: Meenakshi Goel
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1174-rel, Debian 7.0

Attachments: Text File crash_log.txt     Text File test_log.txt    
Triage: Triaged
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.hq.northscale.net/job/debian_x64--90_04--observe_tests-P0/1/consoleFull

Test Failed:
./testrunner -i yourfile.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,GROUP=ALL -t observe.observetest.ObserveTests.test_observe_basic_data_load_delete,items=100,sasl_buckets=1,standard_buckets=1,rebalance=out,GROUP=P1

1. Note that failed test will not deterministically reproduce the crash.
2. Same test and some other tests found to be failed on CentOS and Ubuntu but found no core dumps.
http://qa.hq.northscale.net/job/centos_x64--44_04--observe-P0/125/console

[user:info,2014-08-21T5:17:57.340,ns_1@10.3.5.154:<0.750.0>:ns_orchestrator:handle_info:483]Rebalance exited with reason {unexpected_exit,
                              {'EXIT',<0.32255.122>,
                               {{{badmatch,{error,closed}},
                                 {gen_server,call,
                                  ['ns_memcached-standard_bucket0',
                                   {set_vbucket,610,replica},
                                   180000]}},
                                {gen_server,call,
                                 [{'janitor_agent-standard_bucket0',
                                   'ns_1@10.3.5.154'},
                                  {if_rebalance,<0.14541.122>,
                                   {dcp_takeover,'ns_1@10.3.5.155',634}},
                                  infinity]}}}}

[error_logger:error,2014-08-21T5:17:57.340,ns_1@10.3.5.154:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
  crasher:
    initial call: ns_single_vbucket_mover:-set_initial_vbucket_state/6-fun-0-/0
    pid: <0.32266.122>
    registered_name: []
    exception error: {bulk_set_vbucket_state_failed,
                      [{'ns_1@10.3.5.154',
                        {'EXIT',
                         {{{badmatch,{error,closed}},
                           {gen_server,call,
                            ['ns_memcached-standard_bucket0',
                             {set_vbucket,610,replica},
                             180000]}},
                          {gen_server,call,
                           [{'janitor_agent-standard_bucket0',
                             'ns_1@10.3.5.154'},
                            {if_rebalance,<0.14541.122>,
                             {update_vbucket_state,610,replica,passive,
                              'ns_1@10.3.5.155'}},
                            infinity]}}}}]}
      in function janitor_agent:bulk_set_vbucket_state/4 (src/janitor_agent.erl, line 411)
    ancestors: [<0.32268.122>,<0.14541.122>,<0.14465.122>,<0.750.0>,
                  mb_master_sup,mb_master,ns_server_sup,ns_server_cluster_sup,
                  <0.60.0>]
    messages: []
    links: [<0.32268.122>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 3407
  neighbours:
[ns_server:info,2014-08-21T5:17:57.344,ns_1@10.3.5.154:<0.32684.122>:diag_handler:log_all_tap_and_checkpoint_stats:125]logging tap & checkpoint stats
[error_logger:error,2014-08-21T5:17:57.342,ns_1@10.3.5.154:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
  crasher:
    initial call: ns_memcached:init/1
    pid: <0.18706.119>
    registered_name: []
    exception exit: {badmatch,{error,closed}}
      in function gen_server:init_it/6 (gen_server.erl, line 328)
    ancestors: ['single_bucket_sup-default',<0.18679.119>]
    messages: []
    links: [<0.18720.119>,<0.18726.119>,<0.18744.119>,<0.279.0>,
                  <0.18696.119>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 75113
    stack_size: 27
    reductions: 632988
  neighbours:
    neighbour: [{pid,<0.18744.119>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'single_bucket_sup-default',<0.18679.119>]},
                  {messages,[]},
                  {links,[<0.18706.119>,#Port<0.174708>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,6772},
                  {stack_size,23},
                  {reductions,525127}]
    neighbour: [{pid,<0.18726.119>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'single_bucket_sup-default',<0.18679.119>]},
                  {messages,[]},
                  {links,[<0.18706.119>,#Port<0.174706>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,28690},
                  {stack_size,23},
                  {reductions,3329}]
                  {reductions,3329}]
    neighbour: [{pid,<0.18720.119>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'single_bucket_sup-default',<0.18679.119>]},
                  {messages,[]},
                  {links,[<0.18706.119>,#Port<0.174713>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,46422},
                  {stack_size,23},
                  {reductions,2151481}]

Please refer attached crash_log.txt.
Core is available at 10.3.5.155:/tmp/backup_crash/21_08_2014_05_45
Uploading Logs

 Comments   
Comment by Meenakshi Goel [ 21/Aug/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12042/f9ad56ee/10.3.5.154-8212014-520-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12042/c7d38635/10.3.5.155-8212014-522-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12042/fd4f484f/10.3.2.146-8212014-524-diag.zip
Comment by Anil Kumar [ 21/Aug/14 ]
Venu please look at it and work with Sundar to verify the test case.
Comment by Sundar Sridharan [ 21/Aug/14 ]
Looks like the crash is in warmup callback in the midst of bucket creation and due to a missing vbucket entry. Can we have the list of exact steps for manual reproduction? thanks
Comment by Venu Uppalapati [ 21/Aug/14 ]
The test case invokes rebalance of nodes. Investigating if there is race between bucket creation and rebalance.
Comment by Meenakshi Goel [ 22/Aug/14 ]
Steps to Reproduce:
1. Create 3 buckets default, sasl_bucket and standard_bucket
2. Rebalance-in 2 nodes
3. Load data 100 items
4. Wait for item to get persist on disk
5. Run re-balance out and Observe in parallel
6. Create View and Run query on view with stale=false

While querying on standard_bucket test fails with timeout error wherein only 100 items are there and timeout is 120 seconds.
In tearDown below steps are occurring during which crash is occurring after rebalance failure (Please note that as mentioned in description crash doesn't happen every time):
1. Memcached is being killed using killall -9 memcached
2. Stop Rebalance
3. Bucket deletion

Attached is the test output. test_log.txt
Comment by Sundar Sridharan [ 22/Aug/14 ]
Does not seem to reproduce with cluster_run easily.
Comment by Sundar Sridharan [ 22/Aug/14 ]
fix ready for merge at http://review.couchbase.org/#/c/40849/ thanks
Comment by Sundar Sridharan [ 22/Aug/14 ]
fix is merged. can you please verify the same. thanks




[MB-11973] [System test] Seeing odd messages on couchdb.logs, where startSeqNo> endSeqNo, expecting a rollback for partition X. Created: 15/Aug/14  Updated: 22/Aug/14  Resolved: 22/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Mike Wiederhold
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6.4
6 Nodes, 2 Buckets, 1 ddoc X 2 Views
3.0.0-1163-rel

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1. Load items ~ 700M, 630M on both the buckets, 10% dgm
2. During initial indexing / loading phase, I see a number of log messages which expect rollbacks on partitions due to startSeqNo > endSeqNo.

Not sure why should the above happen - given that there is no rebalance/failover/add new node etc activity.

Logs from the cluster https://s3.amazonaws.com/bugdb/11973/bug_logs.tar

 Comments   
Comment by Sriram Melkote [ 15/Aug/14 ]
Moving to critical as even though there's no problem, if we did indeed rollback after indexing 65k mutations, there'll be a big performance penalty
Comment by Mike Wiederhold [ 15/Aug/14 ]
I don't know why this was assigned to me. These log messages need to be looked at on the view engine side first.
Comment by Ketaki Gangal [ 15/Aug/14 ]
These have been looked at by the view-team, it looks like there is an issue w/ the endSeqNo which the view-component receives from ep-engine seqno resets likely.

Will wait for view-team to add more details on this however.


Comment by Sarath Lakshman [ 18/Aug/14 ]
Siri, I am currently looking into this bug
Comment by Sarath Lakshman [ 18/Aug/14 ]
One possibility of problem at view engine side is that, we cache seq nos stats every 300ms. I will check if this is due to low stats cache update frequency
Comment by Sarath Lakshman [ 18/Aug/14 ]
Looks like its a problem due to stats caching. Since it is a lazily updated async cache, it takes triggers a cache update only when cache ttl is expired and it returns old cache value without waiting for fetching latest. For replica vbuckets, only time cache will be updated is by every 5 seconds trigger. At each 5 second trigger, it will read old seqs from cache. Only next cache reader will see the asynchronously updated cache. But next reader is again after 5 seconds since there are no queries consuming the seqnos. So always updater is started with around 5 seconds old seqnos.
Comment by Sarath Lakshman [ 18/Aug/14 ]
Just confirmed from code that we don't use cached seqs for 5 seconds trigger. I just realized that we dont use cached seqs for anything other than stale=update_after queries. Hence this doesn't look like a view engine problem.
Comment by Sarath Lakshman [ 18/Aug/14 ]
Looks like there is a problem in EP-Engine for replica vbuckets. I read the log files and found that EP-Engine silently seem to rollback to a seqno less than it had for replica vbuckets. I do not see this problem for any active vbuckets.

Example from view engine logs:

View engine received items from a snapshot for vbucket 386 upto seqno 518299 and wrote data to index.
Next time when the updater tried to index next set of items, we received highseqn=518222 from vbucket-seqno stats

[couchdb:info,2014-08-15T8:50:18.745,ns_1@10.6.2.166:<0.26830.22>:couch_log:info:41]set view `saslbucket`, replica (prod) group `_design/ddoc2`: received a snapshot marker (on-disk) for partition 386 from sequence 518052 to 518299
[couchdb:info,2014-08-15T8:50:27.357,ns_1@10.6.2.166:<0.26990.22>:couch_log:info:41]dcp client (<0.1755.0>): Expecting a rollback for partition 386. Found start_seqno > end_seqno (518299 > 518222).


I see this pattern only for replica vbuckets. Atleast from the logs of a node that I investigated, I do not see any such pattern for any of the active vbuckets.

Is it something to do with replication in EP-Engine ?
Comment by Sarath Lakshman [ 18/Aug/14 ]
I just double checked that we are grabing latest seqnos stats (no caching) before starting updater and use those seqnos as endseqnos.
Comment by Raju Suravarjjala [ 18/Aug/14 ]
Triage: Not a blocker for 3.0 RC1
Comment by Chiyoung Seo [ 18/Aug/14 ]
Mike,

It seems to me that this issue was caused by the recent change that was made in the ep-engine:

http://review.couchbase.org/#/c/40346/

The above commit adapts the replica vbucket so that it should use its closed checkpoint's end seqno as its high seqno for the UPR stream. Did you communicate the above change to the view team?
Comment by Mike Wiederhold [ 18/Aug/14 ]
I'll look into this. The changes made on the ep-engine side shouldn't have affected anyone since nothing changed externally.
Comment by Mike Wiederhold [ 20/Aug/14 ]
http://review.couchbase.org/#/c/40765/




[MB-10463] First deletion is ignored in stream request Created: 14/Mar/14  Updated: 22/Aug/14  Resolved: 14/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Volker Mische Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When starting with a certain start seq and the next item is a deletion it is ignored. Here's an example.

 - insert a: seq 1
 - insert b: seq 2
 - insert c: seq 3
 - delete a: seq 4
 - delete b: seq 5
 - insert d: seq 6
 - stream request, start seq 3

Expected:
 - delete a: seq 4
 - delete b: seq 5
 - insert d: seq 6

But it will miss the seq 4:
 - delete b: seq 5
 - insert d: seq 6

This got introduced by commit e53c6065884a9bbb26314a4b6e7e5e4459ec16ce [1][2]. When reverting it everything is fine.

[1]: https://github.com/couchbase/ep-engine/commit/e53c6065884a9bbb26314a4b6e7e5e4459ec16ce
[2]: http://review.couchbase.org/34413

 Comments   
Comment by Chiyoung Seo [ 14/Mar/14 ]
The fix is in gerrit for review:

http://review.couchbase.org/#/c/34485/
Comment by Mike Wiederhold [ 14/Mar/14 ]
Should be fixed by this change.




[MB-7483] refine definition/use of ep_max_data_size and ep_max_size Created: 03/Jan/13  Updated: 22/Aug/14  Resolved: 10/Aug/13

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Jin Lim Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This is very minor but it appears to be that ep_max_data_size and ep_max_size represents basically the same value. However, ep_max_size has the value change listener for allowing change its value dynamically. Either remove one of stat if found to be redundant or re-define each stat and its use.

Also, it is probably a good idea to revisit our stats.org and find if there are any more redundant & intermingled case like this for another stats.

 Comments   
Comment by Mike Wiederhold [ 10/Jan/13 ]
http://review.couchbase.org/#/c/23849/
Comment by Thuan Nguyen [ 27/Mar/13 ]
Integrated in github-ep-engine-2-0 #481 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/481/])
    MB-7483: Remove duplicate stats holding bucket size (Revision 7e84fc9ce1488722e055a496382af302f655b417)
Revert "MB-7483: Remove duplicate stats holding bucket size" (Revision 2d14e9a818699e22b61e8b837280d1fafb3120bc)

     Result = SUCCESS
Mike Wiederhold :
Files :
* src/ep_engine.cc
* tests/ep_testsuite.cc
* docs/stats.org

Mike Wiederhold :
Files :
* tests/ep_testsuite.cc
* src/ep_engine.cc
* docs/stats.org
Comment by Maria McDuff (Inactive) [ 06/Apr/13 ]
shashank, pls verify. thanks.
to verify (per mike), , telnet in and get stats. You should make sure there are no duplicate stats
Comment by Shashank Gupta [ 10/Apr/13 ]
Collected the stats for a single bucket and both are showing the same results.

 ep_max_size: 1073741824

 ep_max_data_size: 1073741824

Used the build 2.0.2-761-rel.
Comment by Deepkaran Salooja [ 10/Apr/13 ]
Looking at the gerrit, ep_max_data_size should no longer be there.

Mike, can you please confirm if both these stats should still be there?
Comment by Mike Wiederhold [ 10/Apr/13 ]
This change was reverted. I will fix it in 2.1.
Comment by Thuan Nguyen [ 20/Jun/13 ]
Integrated in windows32_sanity_P0 #39 (See [http://qa.hq.northscale.net/job/windows32_sanity_P0/39/])
    MB-7483: Use ep_max_size not ep_max_data_size (Revision 964807150c70ded44ef645b7de83065804effdce)

     Result = UNSTABLE
Michael Wiederhold :
Files :
* pytests/memorysanitytests.py
Comment by Thuan Nguyen [ 21/Jun/13 ]
Integrated in windows32_view_P0 #22 (See [http://qa.hq.northscale.net/job/windows32_view_P0/22/])
    MB-7483: Use ep_max_size not ep_max_data_size (Revision 964807150c70ded44ef645b7de83065804effdce)

     Result = UNSTABLE
Michael Wiederhold :
Files :
* pytests/memorysanitytests.py
Comment by Thuan Nguyen [ 21/Jun/13 ]
Integrated in windows32_rebalance-kv #33 (See [http://qa.hq.northscale.net/job/windows32_rebalance-kv/33/])
    MB-7483: Use ep_max_size not ep_max_data_size (Revision 964807150c70ded44ef645b7de83065804effdce)

     Result = UNSTABLE
Michael Wiederhold :
Files :
* pytests/memorysanitytests.py
Comment by Thuan Nguyen [ 02/Jul/13 ]
Integrated in windows_cas_P1 #66 (See [http://qa.hq.northscale.net/job/windows_cas_P1/66/])
    MB-7483: Use ep_max_size not ep_max_data_size (Revision 964807150c70ded44ef645b7de83065804effdce)

     Result = SUCCESS
Michael Wiederhold :
Files :
* pytests/memorysanitytests.py




[MB-10514] During rebalance, UPR stream gets stuck after sending a snapshot marker and does not send any further mutations for that stream. Created: 20/Mar/14  Updated: 22/Aug/14  Resolved: 25/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Sarath Lakshman Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive couchdb_logs.tar.gz     Text File couchdb_upr_client_inf_timeout.patch     GZip Archive logs.tar.gz     GZip Archive memc_logs.tar.gz     Text File ops.txt     Text File streams.txt     File upr_incoming.pcapng    
Issue Links:
Dependency
blocks MB-10490 Simple-test Rebalance failure with ba... Resolved
blocks MB-10548 Views tests failing with error "vbuck... Closed
blocks MB-10730 Rebalance exited with reason "bulk_se... Closed
Duplicate
duplicates MB-10490 Simple-test Rebalance failure with ba... Resolved
is duplicated by MB-10879 Rebalance fails sporadically on emplo... Closed
is duplicated by MB-10910 Rebalance with views after failover f... Closed
Gantt: start-finish
is triggering MB-10908 beam.smp RSS grows to 50GB during del... Closed
Relates to
relates to MB-10772 During rebalance, getting timeout for... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Related to view bug ticket, MB-10490

For views, we open a single connection and reuse that connection for all the tasks such as gathering stats and streaming mutations. At at time we use this connection for requesting only one stream. But it is simultaneously used for querying stats.

Scenario:
1. Create a couchbase node with 1024 vbuckets, insert 10240 documents (no duplicates)
2. Create a default view and publish it
3. Create another couchbase node 1024 vbuckets
4. Added the second node to the cluster and rebalance

On the second node, when building index, view engine request stream for each vbucket and try to read mutations. But it is observed that after few streams or even for first stream, the open stream (seq 0 - x) succeeds with failover log. Then instead of receiving mutations 0 to x with stream_end, I am receiving a snapshot_marker and thats it. No more mutations coming for that stream and it gets stuck.

Please apply the attached couchdb patch (to keep erlang upr client timeout infinity) before reproducing it.

Also attached some debug logs with comments. Please refer streams.txt to see sequence of operations and ops.txt to see response coming from server.

Packet trace of upr (port 12002) with repro test other than the one corresponding to the debug log is also attached.


 Comments   
Comment by Sriram Melkote [ 27/Mar/14 ]
Nimish, Pratap - please wait for final confirmation from Mike that bug has been fixed before retrying.
Comment by Nimish Gupta [ 01/Apr/14 ]
Still seeing the issue, so reopening it. Getting UPR stream timeout in the logs.

[ns_server:debug,2014-04-01T14:42:08.664,n_1@127.0.0.1:<0.20498.0>:capi_set_view_manager:do_wait_index_updated:640]Got unexpected message from ddoc monitoring. Assuming that's shutdown: {updater_err, {timeout,
                                                                         {gen_server,
                                                                          call,
                                                                          [<0.15423.0>,
                                                                           {get_stream_event,
                                                                            57377}]}}}

I have revision="042873d5e22703bfae31e17869c5721f52fb6b3e" in ep-engine.
Comment by Ketaki Gangal [ 02/Apr/14 ]
Seeing failures with latest build simple-test view-rebalance - 3.0.0-537-rel

http://qa.sc.couchbase.com/view/3.0.0/job/centos_x64--00_01--simple-test-P0/204/consoleFull
Comment by Mike Wiederhold [ 07/Apr/14 ]
http://review.couchbase.org/#/c/35243/
Comment by Sarath Lakshman [ 09/Apr/14 ]
Still this problem exists. get_stream_event timeout error messages are logged in couchdb logs if you try to run above repro steps.
Volker just mentioned about a potential fix, http://review.couchbase.org/#/c/35486/ which seems to fix the problem.

Please close this bug once the fix is merged.
Comment by Sarath Lakshman [ 09/Apr/14 ]
With http://review.couchbase.org/#/c/35486/, it doesn't happen with steps mentioned in bug description. But, I can see it happening with the following test in third node.

NODES=4 TEST=rebalance.rebalancein.RebalanceInTests.incremental_rebalance_in_with_queries,blob_generator=False,items=2000,is_dev_ddoc=False,max_verify=2000 make any-test.

I am using tap replication (COUCHBASE_REPL_TYPE=tap).


Saraths-MacBook-Pro:couchbase sarath$ cd ns_server/logs/
Saraths-MacBook-Pro:logs sarath$ grep get_stream_ -R .
./n_3/couchdb.1:error: {timeout,{gen_server,call,[<0.977.0>,{get_stream_event,957}]}}
./n_3/couchdb.1: {get_stream_event,

Currently we have a 5 second timeout. Can it be a legitimate time for backfilling from ep-engine ? I can try increasing it.
Comment by Sriram Melkote [ 10/Apr/14 ]
Mike will back this out and add it back to ensure it is not causing other regressions. Sarath will increase timeout to give ep-engine more time. Also, with UPR replication, this is not occurring.
Comment by Mike Wiederhold [ 11/Apr/14 ]
From further testing it looks like my fix doesn't cause any significant issues, but it does trade one set of sporadic failures for another set. When this change is merged I see two problems.

1. Not all items are streamed to view engine (MB-10846 Mike to look at this since it is probably an ep-engine issue)
2. The active nor passive partition issue in view engine (MB-10815 which is assigned to Sarath)

I will continue to look at the ep-engine issue and we can figure out whether or not we want to merge my change before fixing MB-10815 when I'm done fixing the ep-engine problems.
Comment by Maria McDuff (Inactive) [ 11/Apr/14 ]
Mike,

why is this assigned to Tommie?
Is this resolved and ready for QE to test? pls confirm.
Comment by Mike Wiederhold [ 11/Apr/14 ]
Tommie assigned it to himself. I do not know.
Comment by Sriram Melkote [ 14/Apr/14 ]
Tommie, let's wait for confirmation from Mike that all planned changes with respect to stream continuity are merged before taking the bug to verification step
Comment by Tommie McAfee [ 14/Apr/14 ]
Sure, may have done this by mistake.
Comment by Sarath Lakshman [ 14/Apr/14 ]
Mike,

MB-10815 seems to be a problem around ns_server interaction. So you can ignore that problem.
Comment by Mike Wiederhold [ 14/Apr/14 ]
The ep-engine side fix for this is here: http://review.couchbase.org/35708

We will wait until ns_server and view engine have fixes for the remaining problems ready before merging this.
Comment by Sarath Lakshman [ 15/Apr/14 ]
It is happening with this patch as well with tap replication (I haven't tried upr replication)
Please see the attached logs.tar.gz

Following is the config that I am using:
diff --git a/scripts/start_cluster_and_run_tests.sh b/scripts/start_cluster_and_run_tests.sh
index ec573dc..4867ee7 100755
--- a/scripts/start_cluster_and_run_tests.sh
+++ b/scripts/start_cluster_and_run_tests.sh
@@ -72,7 +72,8 @@ else
    make dataclean
    make
 fi
-COUCHBASE_NUM_VBUCKETS=64 python ./cluster_run --nodes=$servers_count &> $wd/cluster_run.log &
+
+COUCHBASE_REPL_TYPE=tap COUCHBASE_NUM_VBUCKETS=1024 python ./cluster_run --nodes=$servers_count --loglevel=info &> $wd/cluster_run.log &
 pid=$!
 popd
 python ./testrunner $conf -i $ini $test_params 2>&1 -p makefile=True | tee make_test.log


Test:
NODES=3 TEST=rebalance.rebalancein.RebalanceInTests.incremental_rebalance_in_with_queries,blob_generator=False,items=2000,is_dev_ddoc=False,max_verify=2000,get-logs=True,get-cbcollect-info=True make any-test

You may get hit by other ns_server related exceptions, in that case you have to try luck in the next run :)
Comment by Mike Wiederhold [ 15/Apr/14 ]
There are no memcached logs in that tar file. Unfortunately they are in a different location. I'll look at this issue once we get some of the other things merged since I think I saw this happen very sporadically and it does not affect a make simple-test test case at the moment.
Comment by Sarath Lakshman [ 15/Apr/14 ]
Sorry that I missed memcached logs.
Attaching logs of different test run.
Comment by Mike Wiederhold [ 15/Apr/14 ]

I merged two changes to fix this:

http://review.couchbase.org/#/c/35708/
http://review.couchbase.org/#/c/35748/

Please reopen if you see this problem again.
Comment by Mike Wiederhold [ 16/Apr/14 ]
Appears to still be some sporadic issue. I'll look into it tomorrow.
Comment by Sarath Lakshman [ 18/Apr/14 ]
Now, with UPR replication it works fine. TAP replication still has a problem.
Comment by Mike Wiederhold [ 25/Apr/14 ]
Fix was merged. Do not reopen this bug from any reason. If you see another connection stuck issue then file a separate bug.




[MB-11276] ep-engine stops sending mutations through UPR Created: 02/Jun/14  Updated: 22/Aug/14  Resolved: 02/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Volker Mische Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File ep-engine-hang.conf     File stream_request_hang.py    
Issue Links:
Dependency
blocks MB-11105 View indexing gets stuck during rebal... Resolved
blocks MB-11251 host not reachable in viewmerge test Closed
Duplicate
duplicates MB-11258 UPR streams getting stuck Closed
Gantt: start-finish
is triggering MB-11289 XDCR: Items stuck in replication queu... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Sometimes ep-engine doesn't send mutations and just hangs on a stream request.

Here's how to reproduce the issue:

1. Run a single node cluster with 4 vBuckets: COUCHBASE_NUM_VBUCKETS=4 ./cluster_run -n 1
2. Start testrunner with the attached ep-engine-hang.conf config file: ./testrunner -i b/resources/dev-single-node.ini -c conf/ep-engine-hang.conf
3. Wait until the test hangs. The last line in the testrunner output you would see should be something like: [2014-06-02 17:29:06,901] - [rest_client:481] INFO - index query url: http://127.0.0.1:9500/default/_design/test2/_view/redview_stats?stale=false&on_error=stop
4. Watch the ns_server output for something like: [couchdb:error,2014-06-02T17:30:06.906,n_0@127.0.0.1:<0.943.0>:couch_log:error:42]upr client (<0.943.0>): Obtaining mutation from server timed out after 60.0 seconds [RequestId 28 PartId 0]. Waiting..
  You could also wait for >60 secs and grep the log files for 'upr client'
5. Now kill the testrunner test, keep the cluster running
6. Run the attached stream_request_hang.py script that uses pyupr to make a stream request. The script will hang although it should return a mutation.

 Comments   
Comment by Tommie McAfee [ 02/Jun/14 ]
I suspect relation to MB-11255

vb_uuid = 0 and most likely mutations are persisted by the time this request is made.
Comment by Meenakshi Goel [ 02/Jun/14 ]
Not sure if related to https://www.couchbase.com/issues/browse/MB-11258?
Comment by Volker Mische [ 02/Jun/14 ]
It looks like MB-11276 and MB-11258 are duplicates, I still leave both open as it's good to have two ways to reproduce the issue.
Comment by Chiyoung Seo [ 02/Jun/14 ]
All,

Please do not create multiple tickets for the same issue. It disturbs the engine team a lot if we keep creating the same ticket over and over. If you're sure that it is the same issue, please close the ticket as duplicate.
Comment by Ketaki Gangal [ 02/Jun/14 ]
Most qe-view tests hit this issue, seeing multiple server timeouts for upr client [ Specifically the make-simple tests on any recent builds]
Comment by Sundar Sridharan [ 02/Jun/14 ]
upragg stats show two items are stuck remaining to be sent...
:total:items_remaining: 2
but in upr stats none of the streams show any items ready
stream_0_items_ready: false


Comment by Mike Wiederhold [ 02/Jun/14 ]
http://review.couchbase.org/#/c/37739/
Comment by Ketaki Gangal [ 03/Jun/14 ]
Current set of tests run okay with the above fix, build 3.0.0-767.
http://qa.sc.couchbase.com/job/centos_x64--00_01--qe-sanity-P0/472/




[MB-12054] [windows] [2.5.1] cluster hang when flush beer-sample bucket Created: 22/Aug/14  Updated: 22/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Thuan Nguyen Assignee: Raju Suravarjjala
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server 2008 R2

Attachments: Zip Archive 172.23.107.124-8222014-1546-diag.zip     Zip Archive 172.23.107.125-8222014-1547-diag.zip     Zip Archive 172.23.107.126-8222014-1548-diag.zip     Zip Archive 172.23.107.127-8222014-1549-diag.zip    
Triage: Triaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Install couchbase server 2.5.1 on 4 nodes windows server 2008 R2 64-bit
Create a cluster of 4 nodes
Create beer-sample bucket
Enable flush in bucket setting.
Flush beer-sample bucket. Cluster became hang.




[MB-11971] [make-simple] Sporadic failures on make-simple-upr/tap on warmup-tests Created: 15/Aug/14  Updated: 22/Aug/14  Resolved: 19/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Abhinav Dangeti
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
http://factory.couchbase.com/job/make-simple-github-upr/758/console

The warmup tests fail on keyError while retrieving stats.

Abhinav, could you take a look?

This particular test has started failing intermittently since Aug 14.

 Comments   
Comment by Abhinav Dangeti [ 15/Aug/14 ]
Ketaki, would it be possible to get the logs from this test?
I'm not able to reproduce.
Comment by Chris Hillery [ 15/Aug/14 ]
You can see the output of the failed tests on the Jenkins jobs:

http://factory.couchbase.com/job/make-simple-github-tap/
http://factory.couchbase.com/job/make-simple-github-upr/

If you want server logs, that's a bit harder since the job re-runs on the same workspace every time. I could probably extend the job to save the logs if you tell me exactly what commands would achieve that...
Comment by Abhinav Dangeti [ 15/Aug/14 ]
Test issue, you're trying to get ep_warmup_time when ep_warmup_thread is still running.

Here's the proof:
2014-08-15 13:04:35 | INFO | MainProcess | test_thread | [memcapable._do_warmup] ep_warmup_thread directly after kill_memcached: running
2014-08-15 13:04:35 | ERROR | MainProcess | test_thread | [memcapable._do_warmup] 'ep_warmup_time' was not found in stats:{'ep_item_eviction_policy': 'value_only', 'ep_backfill_mem_threshold': '95', 'ep_dbname': '/root/couchbase30/ns_server/data/n_0/data/default', 'cmd_set': '0', 'vb_replica_queue_memory': '0', 'vb_replica_queue_pending': '0', 'ep_tap_throttle_cap_pcnt': '10', 'libevent': '1.4.13-stable', 'rejected_conns': '0', 'ep_vbucket_del': '0', 'ep_chk_remover_stime': '5', 'ep_dcp_enable_noop': '1', 'ep_tap_bg_fetched': '0', 'ep_meta_data_disk': '0', 'ep_exp_pager_stime': '3600', 'ep_blob_overhead': '621593', 'ep_oom_errors': '0', 'ep_num_value_ejects': '0', 'ep_tap_throttle_threshold': '90', 'cas_hits': '0', 'bytes_written': '26688', 'ep_vb_total': '64', 'ep_diskqueue_drain': '0', 'listen_disabled_num': '0', 'ep_chk_period': '5', 'ep_tap_ack_window_size': '10', 'ep_tap_backoff_period': '5', 'vb_pending_queue_drain': '0', 'ep_max_item_size': '20971520', 'vb_pending_ht_memory': '0', 'ep_warmup_thread': 'running' ....

If you see the stats, ep_warmup_time was not found when ep_warmup_thread is still found to be running, which is expected.

Here's your test runner fix:
http://review.couchbase.org/#/c/40661




[MB-10509] ep-engine mangles last entry in failover log on unclean shutdown even when everything is safely on disk Created: 19/Mar/14  Updated: 22/Aug/14  Resolved: 27/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This is what I get after saving a number of docs into vbucket other and successfully waiting for them to hit disk.

Before shutdown:

root@beta:~/src/altoros/moxi/repo30# rlwrap ../send-diag-eval lh:9000
xdcr_upr_streamer:get_failover_log("other", 7).
RestClient.post "lh:9000/diag/eval", "xdcr_upr_streamer:get_failover_log(\"other\", 7).\n", "Accept"=>"*/*; q=0.5, application/xml", "Accept-Encoding"=>"gzip, deflate", "Content-Length"=>"48"
# => 200 OK | 76 bytes:[{13226830787664969728,0},{2722187756529715712,48},{4817340327337429639,96}]


[{13226830787664969728,0},{2722187756529715712,48},{4817340327337429639,96}]


After Ctrl-C (unclean shutdown) and restart:

root@beta:~/src/altoros/moxi/repo30# rlwrap ../send-diag-eval lh:9000
xdcr_upr_streamer:get_failover_log("other", 7).
RestClient.post "lh:9000/diag/eval", "xdcr_upr_streamer:get_failover_log(\"other\", 7).\n", "Accept"=>"*/*; q=0.5, application/xml", "Accept-Encoding"=>"gzip, deflate", "Content-Length"=>"48"
# => 200 OK | 76 bytes:[{13226830787664969728,0},{2722187756529715712,48},{4817340327337430016,96}]


[{13226830787664969728,0},{2722187756529715712,48},{4817340327337430016,96}]


Note how last entry uuid is changed.

Currently that's causing my xdcr work in progress checkpointing to fail to resume from xdcr checkpoint without good reason.

 Comments   
Comment by Abhinav Dangeti [ 26/Mar/14 ]
http://review.couchbase.org/#/c/34971




[MB-9131] ep_engine pollutes log files on bucket creation Created: 16/Sep/13  Updated: 22/Aug/14  Resolved: 15/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0
Fix Version/s: 3.0, 3.0-Beta
Security Level: Public

Type: Bug Priority: Minor
Reporter: Artem Stemkovski Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
With these messages

Mon Sep 16 16:17:05.462500 PDT 3: (test) Warning: couchstore_open_db failed, name=/Users/artem/Work/couchbase/ns_server/data/n_1/data/test/975.couch.1 option=2 rev=1 error=no such file [none]
Mon Sep 16 16:17:05.462510 PDT 3: (test) Warning: failed to open database file, name=/Users/artem/Work/couchbase/ns_server/data/n_1/data/test/975.couch.1

This happens because during the warmup stage the vbucket files are not created yet and the following method is invoked: CouchKVStore::listPersistedVbuckets

There are couple of strange things I see in this method:
1. dbFileRevMapPopulated is always false. It can never become true.
2. discoverDbFiles check which vbucket files are actually exist but the following code ignores this information and blindly runs through all possible id's and tries to open db even on nonexisting files which results in bogus error messages in the log

 Comments   
Comment by Anil Kumar [ 29/Jul/14 ]
Triage : Anil, Wayne .. July 29th
Comment by David Liao [ 30/Jul/14 ]
fix in review:
http://review.couchbase.org/#/c/40078/
Comment by Chiyoung Seo [ 01/Aug/14 ]
Merged.




[MB-9950] test set meta conflict resolution fails sporadically on centos Created: 17/Jan/14  Updated: 22/Aug/14  Resolved: 06/Feb/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Centos 64-bit

 Description   
Running [0209/0253]: test set meta conflict resolution (couchstore).../home/jenkins/couchbase/cmake/ep-engine/tests/ep_testsuite.cc:540 Test failed: `Expected to be able to store with meta' (h1->unknown_command(h, NULL, pkt, add_response) == ENGINE_SUCCESS)
 DIED

 Comments   
Comment by Chiyoung Seo [ 03/Feb/14 ]
http://review.couchbase.org/#/c/33158/




[MB-10014] ep-engine crashed due to Assertion `bySeqno != 0' failed Created: 26/Jan/14  Updated: 22/Aug/14  Resolved: 06/Feb/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [manifests] $ git rev-parse HEAD
Added a project: cmake/cbsasl at revision: 5c7f8470a4fdab529e6366bc5f64264544000dd4
Added a project: cmake/couchbase-cli at revision: 2d3b8a2c6c6c5cc47ef09c2ea5d1b382b00245a8
Added a project: cmake/couchbase-examples at revision: c04d52124a29c51eb8d32bedebf0057c23063666
Added a project: cmake/couchbase-python-client at revision: 536cfb9fd7b4372ac2f6316f1e3939c0557b72a5
Added a project: couchdb at revision: 29c10e1f41012ead660e768d23a522cdf6e2dc3d
Added a project: couchdbx-app at revision: 817d29fff823115a58f506b984ec9cea10a12ad9
Added a project: cmake/couchstore at revision: daea5fa0e2216ab6ae2122155d80845f125bdef7
Added a project: cmake/ep-engine at revision: 05c3b79a1fd020fecbaa36ba5cf62561b165125c
Added a project: geocouch at revision: a8ba56cd6382d3323278d04ef4a7fad609981a15
Added a project: cmake/healthchecker at revision: 76ec10d05d630d34321d47d78262e71e31256af5
Added a project: cmake/libconflate at revision: 288e0c695f381ece9f1525ca32d7809f30aba1a5
Added a project: cmake/libmemcached at revision: 77ed12c9f2b16940bb27315e130c6764aa9cdb2a
Added a project: cmake/libvbucket at revision: 8d2c92d5ed7c50aa957dbe1a56e42141196d39cd
Added a project: cmake/memcached at revision: ce9a18d02b56904285dc0d7db42272b95e071be3
Added a project: cmake/moxi at revision: c53cc2bc6071b6ef57790a087b6f482eeef3e5df
Added a project: ns_server at revision: 28d0ea57ee57b5bb4a214e3f7badf414a6ba744c
Added a project: cmake/platform at revision: efb25dad82f1a8f53e27bc4cb07007cfc60ffe07
Added a project: cmake/portsigar at revision: cf60238e2a511999ac42721ce3e2d0b490777f34
Added a project: cmake/sigar at revision: 3b4d083633c85a33e3672be34b9681706459ce37
Added a project: testrunner at revision: 8e22b89705223b8ff05703d92f390c0b23895b15
Added a project: tlm at revision: 690e108e95763a0ad0670c0bee052b765e23109a
Manifest at revision: 8a12952962c3a45e53744f1293486130ec937508
[testrunner-gerrit-master] $ /bin/bash /tmp/hudson8653493837990870449.sh

Triage: Untriaged
Operating System: Centos 64-bit

 Description   
http://factory.couchbase.com/job/testrunner-gerrit-master/226/consoleFull

http://factory.couchbase.com/job/testrunner-gerrit-master/226/artifact/testrunner/cluster_run.log

in our test's output we see
Traceback (most recent call last):
  File "pytests/rebalance/rebalancein.py", line 47, in rebalance_in_with_ops
    task.result()
  File "lib/tasks/future.py", line 160, in result
    return self.__get_result()
  File "lib/tasks/future.py", line 112, in __get_result
    raise self._exception
MemcachedError: Memcached error #7 'Not my vbucket': Connection reset with error: [Errno 32] Broken pipe

but in fact, there occurs a erlang crash on one of the nodes


[error_logger:error,2014-01-26T7:32:06.022,n_0@10.3.2.199:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: erlang:apply/2
    pid: <0.614.0>
    registered_name: []
    exception error: no match of right hand side value {error,closed}
      in function mc_client_binary:cmd_binary_vocal_recv/5 (src/mc_client_binary.erl, line 123)
      in call from mc_client_binary:set_vbucket/3 (src/mc_client_binary.erl, line 336)
      in call from ns_memcached:do_handle_call/3 (src/ns_memcached.erl, line 526)
      in call from ns_memcached:worker_loop/3 (src/ns_memcached.erl, line 189)
    ancestors: ['ns_memcached-default','single_bucket_sup-default',
                  <0.578.0>]
    messages: []
    links: [<0.595.0>,#Port<0.10095>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 75025
    stack_size: 24
    reductions: 5768556
  neighbours:

[user:info,2014-01-26T7:32:06.023,n_0@10.3.2.199:ns_memcached-default<0.595.0>:ns_memcached:terminate:752]Control connection to memcached on 'n_0@10.3.2.199' disconnected: {badmatch,
                                                                   {error,
                                                                    closed}}
[error_logger:error,2014-01-26T7:32:06.024,n_0@10.3.2.199:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: erlang:apply/2
    pid: <0.611.0>
    registered_name: []
    exception error: no match of right hand side value {error,closed}
      in function mc_binary:quick_stats_recv/3 (src/mc_binary.erl, line 53)
      in call from mc_binary:quick_stats_loop/5 (src/mc_binary.erl, line 153)
      in call from mc_binary:quick_stats/5 (src/mc_binary.erl, line 138)
      in call from ns_memcached:do_handle_call/3 (src/ns_memcached.erl, line 535)
      in call from ns_memcached:worker_loop/3 (src/ns_memcached.erl, line 189)
    ancestors: ['ns_memcached-default','single_bucket_sup-default',
                  <0.578.0>]
    messages: []
    links: [<0.595.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 2584
    stack_size: 24
    reductions: 2070
  neighbours:


 Comments   
Comment by Aleksey Kondratenko [ 26/Jan/14 ]
Without logs there's not much that I can do. But socket closed like that is usually caused by memcached death. Which will be visible in Logs section on UI for example.
Comment by Andrei Baranouski [ 27/Jan/14 ]
Hi Alk,
above mentioned logs are useless to you?
http://factory.couchbase.com/job/testrunner-gerrit-master/226/artifact/testrunner/cluster_run.log

if it's not enough, we have to ask Phil to attach ns_server/logs as buildbot artifacts

Comment by Aleksey Kondratenko [ 27/Jan/14 ]
Hi Andrei. First why you're using cluster_run at all and not our normal packages?

I was not aware of that cluster_run.log. It looks like stderr of cluster_run. So it's quite useful. But it's _less_ useful then /diag. Which is itself a bit less useful than cbcollectinfo.

So I'd prefer cbcollectinfo. And when it's impossible I need /diag. And _only_ if it's impossible to have too I'm ok to deal with other logs.
Comment by Aleksey Kondratenko [ 27/Jan/14 ]
By looking at cluster_run.log I saw this just before connection was found to be closed:

memcached<0.86.0>: memcached: /home/jenkins/jenkins/workspace/testrunner-gerrit-master/cmake/ep-engine/src/item.h:282: Item::Item(const void*, size_t, size_t, uint32_t, time_t, uint8_t*, uint8_t, uint64_t, int64_t, uint16_t): Assertion `bySeqno != 0' failed.


So indeed it's caused by memcached crash.
Comment by Chiyoung Seo [ 27/Jan/14 ]
Seems a regression from the data field support.
Comment by Abhinav Dangeti [ 03/Feb/14 ]
http://review.couchbase.org/#/c/33147/




[MB-9736] Check to make sure the opaque is valid before a producer sends a message Created: 12/Dec/13  Updated: 22/Aug/14  Resolved: 18/Jan/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-9892] When doing an add stream we need to add the latest failover log entry to the request Created: 10/Jan/14  Updated: 22/Aug/14  Resolved: 30/Jan/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
See the following lines.

uint64_t vbucket_uuid = 0; // Why UUID is set to zero?
uint64_t high_seqno = start_seqno;




[MB-9921] UPR stream doesn't return value Created: 15/Jan/14  Updated: 22/Aug/14  Resolved: 18/Jan/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Volker Mische Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: upr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When doing a Stream Reuquest in UPR the result doesn't contain any values, it's just empty.

 Comments   
Comment by Volker Mische [ 15/Jan/14 ]
A failing test is available here: https://github.com/mikewied/pyupr/pull/10
Comment by Chiyoung Seo [ 15/Jan/14 ]
Mike,

Please do the initial investigation on this issue.

Chiyoung
Comment by Volker Mische [ 16/Jan/14 ]
http://review.couchbase.org/32496 fixes the issue.




[MB-10351] UPR: stream_close returns success even after stream_end is received by the consumer Created: 04/Mar/14  Updated: 22/Aug/14  Resolved: 27/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Sarath Lakshman Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: ep-engine, upr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive stream_close.pcap.zip    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
If a stream for a vbucket is requested for a range of seqno (x to y) and the consumer receives docs for all the sequence numbers requested as well as stream_end message, if a stream_close is requested, the ep-engine should return stream-not-found rather than success.

Steps to reproduce:
1. Load beer sample
2. Request vbucket 5 stream for seq number 0 to 3 and read all mutations up to stream_end.
3. Request stream_close for vbucket 5 and you will receive stream_close OK.

Please find the packet trace attached (memcached port 12000)


 Comments   
Comment by Abhinav Dangeti [ 26/Mar/14 ]
http://review.couchbase.org/#/c/34915/




[MB-10644] upr rebalance occasionally causes an item count mismatch Created: 25/Mar/14  Updated: 22/Aug/14  Resolved: 27/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown




[MB-10772] During rebalance, getting timeout for the UPR stream. Created: 07/Apr/14  Updated: 22/Aug/14  Resolved: 08/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Nimish Gupta Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File couchdb.4    
Issue Links:
Relates to
relates to MB-10514 During rebalance, UPR stream gets stu... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Reproduction Steps:

1. Start the couchbase server with 1024 vbuckets.
2. Please run this following command in the testrunner directory:

./testrunner -i b/resources/dev-4-nodes.ini get-delays=True,get-cbcollect-info=True -t view.viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,num_nodes_to_add=1,skip_rebalance=true,docs-per-day=1

3. In the server logs, we see that timeout happens for the streams

 [ns_server:debug,2014-04-07T12:07:55.919,n_1@127.0.0.1:<0.4618.0>:capi_set_view_manager:do_wait_index_updated:636]Got unexpected message from ddoc monitoring. Assuming that's shutdown: {updater_error,
                                                                        {timeout, {gen_server,
                                                                          call, [<0.4356.0>,
                                                                           {get_stream_event,
                                                                            141}]}}}

This timeout is happening for the upr stream.

This is related to http://www.couchbase.com/issues/browse/MB-10514.

 Comments   
Comment by Mike Wiederhold [ 07/Apr/14 ]
Volker,

Can you provide some information on what the exact problem is here? For example, what is the timeout and why exactly is this timeout occurring? What is it waiting for? I cannot really say that this is an ep-engine issue without knowing what view engine is trying to do.
Comment by Mike Wiederhold [ 07/Apr/14 ]
Also, Nimish, please upload logs for this issue and also let me know what build this was run against.
Comment by Nimish Gupta [ 08/Apr/14 ]
Adding the logs. The ep engine last commit id is fa8295f624132676890821ff5f48ab882c92f8b8.
Comment by Mike Wiederhold [ 08/Apr/14 ]
http://review.couchbase.org/#/c/35441/
Comment by Mike Wiederhold [ 08/Apr/14 ]
The fix was merged.




[MB-10967] recvd snapshot twice during failure scenario Created: 25/Apr/14  Updated: 22/Aug/14  Resolved: 20/May/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Tommie McAfee Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: upr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive ns_logs.tar.gz     File repro_script.tar    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
wrote a script to perform the failure scenario here: https://github.com/couchbaselabs/cbupr/blob/master/failure-scenarios.md

the gist of it: https://gist.github.com/tahmmee/11301024 (also attached)

At the end there is an attempt to stream 10k items.
However, what I'm observing is a snapshot is sent and after this a repeat of mutations at seqno_1.


Here is the stream response ->

{'status': 0, 'body': '', 'opcode': 80}
{'status': 0, 'opcode': 83, 'failover_log': [(68675663887800, 1222), (55655994587054, 0)]}
{'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key17', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 1, 'rev_seqno': 1}
...

{'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key9976', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 1224, 'rev_seqno': 1}
{'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key9982', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 1225, 'rev_seqno': 1}

...
{'vbucket': 0, 'opcode': 86}
{'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key17', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 1, 'rev_seqno': 1}
{'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key21', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 2, 'rev_seqno': 1}
{'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key24', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 3, 'rev_seqno': 1}
...


The last seqno sent is 1225, and in logs it shows this was where backfill completed:

[ns_server:info,2014-04-25T15:53:54.574,babysitter_of_n_1@127.0.0.1:<0.85.0>:ns_port_server:log:169]memcached<0.85.0>: Fri Apr 25 15:53:54.372982 EDT 3: (default) Scheduling backfill for vb 0 (0 to 1226)
memcached<0.85.0>: Fri Apr 25 15:53:54.373173 EDT 3: (default) UPR (Producer) eq_uprq:failuerscenario1 - Stream created for vbucket 0
memcached<0.85.0>: Fri Apr 25 15:53:54.377299 EDT 3: (default) UPR (Producer) eq_uprq:failuerscenario1 - Backfill complete for vb 0, last seqno read: 1225



I've attached the script with it's deps and it can be unpacked and reproduced in cluster_run:
./cluster_run -n4
 python uprfailurescenario.py
...
AssertionError: ERROR: Out of order response on vbucket 0: {'value': 'val', 'lock_time': 0, 'opcode': 87, 'expiration': 0, 'key': 'key5', 'flags': 0, 'nru': 2, 'vbucket': 0, 'by_seqno': 1, 'rev_seqno': 1}









 Comments   
Comment by Tommie McAfee [ 25/Apr/14 ]
oops, what's happening is my end_seqno is greater than high_seqno and I'm getting back the snapshot twice which i think is due to the uprclient I am using. debugging there's no real evidence the server itself is actually sending same seqno.

Comment by Tommie McAfee [ 25/Apr/14 ]
Update: this only happens when there are entries in the failover table. (In my last comment when I closed this I didn't do the node crash).

Entire snapshot is definitely being sent twice when end_seqno > high_seqno after this failure scenario.

I noticed that with lastSentSeqno always being less than end_seqno that the producer transitioned from backfillPhase() into memoryPhase() attempting to get more mutations, but a call to nextCheckpointItem() reset the lastSentSeqno to 1. And so mutations are sent until inMemoryPhase reaches end_seqno_.

Not sure if this is by design, although it may confuse some clients or cause them to be inefficient when recovering mutations.

-Tommie
Comment by Chiyoung Seo [ 15/May/14 ]
Sriram,

Can you please work with Tommie to see if this issue still exists or not?
Comment by Sriram Ganesan [ 15/May/14 ]
I am trying to run the script that was uploaded and I am getting the following error:

{'status': 4, 'err_msg': 'Invalid arguments', 'opcode': 83}
Traceback (most recent call last):
  File "uprfailurescenario.py", line 74, in <module>
    stream_mutations()
  File "uprfailurescenario.py", line 64, in stream_mutations
    while stream.has_message():
AttributeError: 'NoneType' object has no attribute 'has_message'

It looks like the stream request failed with invalid arguments. I am running with the latest repo though. May be the script needs to be updated?
Comment by Sriram Ganesan [ 20/May/14 ]
Tommie

I updated the uprfailurescenario script and ran the script and as of the following commits in the ep-engine, the snapshot isn't being sent twice and I don't see the AssertionError anymore. Please update the ticket if you are still seeing this issue.

commit 489727929a819e58f234db82fdf7074fdd78ab5a
Author: Mike Wiederhold <mike@couchbase.com>
Date: Tue May 20 16:40:41 2014 -0700

    Fix build breakage due to missing function definition
    
    Change-Id: If0448fd7a60bec99e591b7de85173f2a641540cf
    Reviewed-on: http://review.couchbase.org/37359
    Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
    Tested-by: abhinav dangeti <abhinav@couchbase.com>

commit f0dc93487dc5a42f69c45697671301681c845a6b
Author: Mike Wiederhold <mike@couchbase.com>
Date: Tue May 20 15:30:29 2014 -0700

    Set the current byseqno in the checkpoint manager for backfilled items
    
    If we don't do this then it can cause a sequence number to be re-used
    once new items arrive after a backfill.
    
    Change-Id: I2bdcde7bf64c4280ea1457e7ab2c6572d07132c0
    Reviewed-on: http://review.couchbase.org/37350
    Reviewed-by: Sriram Ganesan <sriram@couchbase.com>
    Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
    Reviewed-by: abhinav dangeti <abhinav@couchbase.com>
    Tested-by: Chiyoung Seo <chiyoung@couchbase.com>







[MB-10928] Upgrade to UPR fails because add_stream command returns rollback response Created: 22/Apr/14  Updated: 22/Aug/14  Resolved: 04/May/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aliaksey Artamonau Assignee: Thuan Nguyen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File debug.1    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
[ns_server:debug,2014-04-22T13:53:41.410,n_2@127.0.0.1:upr_consumer_conn-default-n_3@127.0.0.1<0.12530.0>:upr_proxy:handle_packet:118]Proxy packet: RESPONSE: 0x51 (upr_add_stream) vbucket = 0 opaque = 0x20 status = 0x23 (rollback)
81 51 00 00
04 00 00 23
00 00 00 04
00 00 00 20
00 00 00 00
00 00 00 00
00 00 00 01


 Comments   
Comment by Artem Stemkovski [ 23/Apr/14 ]
looks like the same issue as MB-10936
Comment by Abhinav Dangeti [ 01/May/14 ]
http://review.couchbase.org/#/c/36572/
Comment by Parag Agarwal [ 14/May/14 ]
This should be verified by Tony and not Venu




[MB-11263] Add retry logic for temporarily failed mutations Created: 29/May/14  Updated: 22/Aug/14  Resolved: 17/Jun/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Ganesan Assignee: Sriram Ganesan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When we process mutations on the UPR consumer side, we batch process up to 10 mutations. If any of those mutations fail due to temporary out of memory conditions, we need to retry the failed mutations. Otherwise it would just result in data loss which is undesirable.

 Comments   
Comment by Cihan Biyikoglu [ 06/Jun/14 ]
Hi Siri, should this be under improvements? it may be small in size but it improves behavior - does not sound like a bug.
Comment by Sriram Ganesan [ 06/Jun/14 ]
Hello Cihan

This seemed like a bug to me purely because with the current code under OOM conditions, we could lose mutations on the consumer side, which will result in data loss.
Comment by Cihan Biyikoglu [ 06/Jun/14 ]
ok, looking for non-must-have items we can move off 3.0. would this have large customer impact? can it be done in 3.0.1?
thanks
Comment by Sriram Ganesan [ 07/Jun/14 ]
As mentioned, the impact would be data loss under OOM conditions on the upr consumer side. I am not sure if that counts as a "large" impact or not. You may want to consult either Mike Wiederhold or Chiyoung Seo for more accurate information on whether this counts as large impact or not. Thanks.
Comment by Sriram Ganesan [ 17/Jun/14 ]
Review- http://review.couchbase.org/#/c/37999/




[MB-11970] vb_active_perc_mem_resident reports 0% when there are 0 items Created: 15/Aug/14  Updated: 22/Aug/14  Resolved: 20/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Bryce Jasmer Assignee: Mike Wiederhold
Resolution: Fixed Votes: 0
Labels: rc2, stats
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I can argue both ways (0% or 100%) for what vb_active_perc_mem_resident should be when there are no items in the vbucket, but it seems like the safer choice for this condition would be to report 100% of the items are resident. Reporting 0% indicates to me there is a bad situation and everything has been flushed to disk, but since there isn't anything at all there, there is no error and everything that should be in memory is in memory, just that there isn't anything to put in memory.


 Comments   
Comment by Mike Wiederhold [ 20/Aug/14 ]
http://review.couchbase.org/#/c/40756/
http://review.couchbase.org/#/c/40759/
Comment by Anil Kumar [ 20/Aug/14 ]
Minor stats issue "approved" to be included for RC2.
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
Hm. I was not aware of ep-engine side change. ns_server change is merged and is now part of 3.0 manifest. http://review.couchbase.org/40761
Comment by Mike Wiederhold [ 20/Aug/14 ]
I'm backporting the ep-engine one, but this is related to the UI so the ep-engine change is not necessary for resolving this issue since it is not used by the UI.




[MB-12046] CBTransfer showing missing items when items are present after topology change with graceful failover+ full recovery with nodes crashing Created: 21/Aug/14  Updated: 22/Aug/14  Resolved: 22/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Bin Cui
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1184

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.144-8212014-1657-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.144-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.145-8212014-1658-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.145-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.146-8212014-1659-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.146-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.147-8212014-171-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.147-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.148-8212014-172-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.148-8212014-177-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.149-8212014-173-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.149-8212014-177-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.150-8212014-174-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.150-8212014-177-couch.tar.gz
Is this a Regression?: Unknown

 Description   
Scenario

1. Create 7 node cluster
2. Create default bucket and add 100K items
3. Graceful failover 1 node
4. During Graceful failover, kill memcached of 3 other nodes, this fails graceful failover
5. Restart Graceful failover and let it run to completion
6. Full recover the failed over node and rebalance
7. During rebalance, kill memcached of 3 other nodes, this fails rebalance
8. Restart Rebalance and run it to completion

After Step 8, we collect data using cbtransfer and compare it to the one we had in step 2. We see missing keys.

Note that there are no mutations running from step 3 to step 8. We always read from couch store after the queues have been drained and replication is complete. Also, before we run cbtransfer, we verified item counts and verified data items as well

This seems like a bug in cbtransfer

Missing keys

failover97727
 failover96541
 failover19942
 failover