[MB-10012] cbrecovery hangs in the case of multi-instance case Created: 24/Jan/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: test-execution
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Venu Uppalapati Assignee: Ashvinder Singh
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive cbrecovery1.zip     Zip Archive cbrecovery2.zip     Zip Archive cbrecovery3.zip     Zip Archive cbrecovery4.zip     Zip Archive cbrecovery_source1.zip     Zip Archive cbrecovery_source2.zip     Zip Archive cbrecovery_source3.zip     Zip Archive cbrecovery_source4.zip     PNG File recovery.png    
Issue Links:
Relates to
Triage: Triaged
Operating System: Centos 64-bit

 Description   
2.5.0-1055

during verification MB-9967 I performed the same steps:
source cluster: 3 modes, 4 buckets
destination cluster: 3 nodes, 1 bucket
failover 2 nodes on destination cluster(without rebalance)

cbrecovery hangs on

[root@centos-64-x64 ~]# /opt/couchbase/bin/cbrecovery http://172.23.105.158:8091 http://172.23.105.159:8091 -u Administrator -U Administrator -p password -P password -b RevAB -B RevAB -v
Missing vbuckets to be recovered:[{"node": "ns_1@172.23.105.159", "vbuckets": [513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023]}]
2014-01-22 01:27:59,304: mt cbrecovery...
2014-01-22 01:27:59,304: mt source : http://172.23.105.158:8091
2014-01-22 01:27:59,305: mt sink : http://172.23.105.159:8091
2014-01-22 01:27:59,305: mt opts : {'username': '<xxx>', 'username_destination': 'Administrator', 'verbose': 1, 'dry_run': False, 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'data_only': 1.0, 'nmv_retry': 1.0, 'conflict_resolve': 1.0, 'cbb_max_mb': 100000.0, 'try_xwm': 1.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'batch_max_size': 1000.0, 'report': 5.0, 'design_doc_only': 0.0, 'recv_min_bytes': 4096.0}, 'bucket_destination': 'RevAB', 'vbucket_list': '{"172.23.105.159": [513]}', 'threads': 4, 'password_destination': 'password', 'key': None, 'password': '<xxx>', 'id': None, 'bucket_source': 'RevAB'}
2014-01-22 01:27:59,491: mt bucket: RevAB
2014-01-22 01:27:59,558: w0 source : http://172.23.105.158:8091(RevAB@172.23.105.156:8091)
2014-01-22 01:27:59,559: w0 sink : http://172.23.105.159:8091(RevAB@172.23.105.156:8091)
2014-01-22 01:27:59,559: w0 : total | last | per sec
2014-01-22 01:27:59,559: w0 batch : 1 | 1 | 15.7
2014-01-22 01:27:59,559: w0 byte : 0 | 0 | 0.0
2014-01-22 01:27:59,559: w0 msg : 0 | 0 | 0.0
2014-01-22 01:27:59,697: s1 warning: received NOT_MY_VBUCKET; perhaps the cluster is/was rebalancing; vbucket_id: 513, key: RAB_111001636418, spec: http://172.23.105.159:8091, host:port: 172.23.105.159:11210
2014-01-22 01:27:59,719: s1 warning: received NOT_MY_VBUCKET; perhaps the cluster is/was rebalancing; vbucket_id: 513, key: RAB_111001636418, spec: http://172.23.105.159:8091, host:port: 172.23.105.159:11210
2014-01-22 01:27:59,724: w2 source : http://172.23.105.158:8091(RevAB@172.23.105.158:8091)
2014-01-22 01:27:59,724: w2 sink : http://172.23.105.159:8091(RevAB@172.23.105.158:8091)
2014-01-22 01:27:59,727: w2 : total | last | per sec
2014-01-22 01:27:59,728: w2 batch : 1 | 1 | 64.0
2014-01-22 01:27:59,728: w2 byte : 0 | 0 | 0.0
2014-01-22 01:27:59,728: w2 msg : 0 | 0 | 0.0
2014-01-22 01:27:59,738: s1 warning: received NOT_MY_VBUCKET; perhaps the cluster is/was rebalancing; vbucket_id: 513, key: RAB_111001636418, spec: http://172.23.105.159:8091, host:port: 172.23.105.159:11210
2014-01-22 01:27:59,757: s1 warning: received NOT_MY_VBUCKET; perhaps the cluster is/was rebalancing; vbucket_id: 513, key: RAB_111001636418, spec: http://172.23.105.159:8091, host:port: 172.23.105.159:11210



 Comments   
Comment by Anil Kumar [ 04/Jun/14 ]
Triage - June 04 2014 Bin, Ashivinder, Venu, Tony
Comment by Cihan Biyikoglu [ 27/Aug/14 ]
does this need to be considered for 3.0 or is this a test issue only?




[MB-9632] diag / master events captured in log file Created: 22/Nov/13  Updated: 27/Aug/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.2.0, 2.5.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Blocker
Reporter: Steve Yen Assignee: Ravi Mayuram
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The information available in the diag / master events REST stream should be captured in a log (ALE?) file and hence available to cbcollect-info's and later analysis tools.

 Comments   
Comment by Aleksey Kondratenko [ 22/Nov/13 ]
It is already available in collectinfo
Comment by Dustin Sallings (Inactive) [ 26/Nov/13 ]
If it's only available in collectinfo, then it's not available at all. We lose most of the useful information if we don't run an http client to capture it continually throughout the entire course of a test.
Comment by Aleksey Kondratenko [ 26/Nov/13 ]
Feel free to submit a patch with exact behavior you need
Comment by Cihan Biyikoglu [ 27/Aug/14 ]
is this still relevant?




[MB-12085] query with key=value requires inclusive_end=true Created: 27/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Tommie McAfee Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
In 3.0 the view editor creates urls with inclusive_end=false. For some reason this causes key=value queries to return empty results. In 2.5 the param was left unset.

1. load beer sample

2. created view with map:

function (doc, meta) {
   emit(doc.state, doc.name);
}

3. In filter result drop down:
key = "Alaska"

Generates:
http://172.23.106.53:8092/beer-sample/_design/dev_ddoc1/_view/view1?stale=false&inclusive_end=false&key=%22Alaska%22&connection_timeout=60000&limit=10&skip=0

need to manually check inclusive_end=false.

2.5 Generates:
http://172.23.106.53:8092/beer-sample/_design/dev_ddoc1/_view/view1?stale=false&key=%22Alaska%22&connection_timeout=60000&limit=10&skip=0
 

If I didn't look at the url generated in 2.5 I'm not sure if I would've got this to work.




[MB-12079] Cannot edit documents with textual or numeric data. Created: 27/Aug/14  Updated: 27/Aug/14

Status: Reopened
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Task Priority: Minor
Reporter: Brett Lawson Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When attempting to create or modify a document in the Web UI, and that value is a string or a number, errors occur preventing you from saving the document.

 Comments   
Comment by Matt Ingenthron [ 27/Aug/14 ]
Related is MB-12078. Looks like the validator we're using in the console is wrong.
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
It is not "validator is wrong". It's deliberate choice to refuse editing such values.
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
See MB-9208
Comment by Matt Ingenthron [ 27/Aug/14 ]
It looks like things have changed for the better and the console is a bit out of sync. Now views do work with numbers such as "1" which may be incr/decr'd too. So, what is said in MB-9208 isn't valid any more. Brett verified this.

At this stage you can insert "1234", incr it, but you can't insert it through the console because it says "JSON should represent an object": http://puu.sh/b9H8i.png

Do you see a downside to syncing up the console to what views/view-engine actually do these days?

Comment by Aleksey Kondratenko [ 27/Aug/14 ]
Have you tried arrays ?




[MB-12063] KV+XDCR System test : Between expiration and purging, getMeta() retrieves revID as 1 for deleted key from Source, same deleted key from Destination returns 2. Created: 25/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1, 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.x , build 3.0.0-1174-rel

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
--> Before expiration and after purging, all metadata match between source and destination clusters.
--> However, after expiration, there's some code that causes a deleted key at the source(!!!) to have seqno as 1. The seqno at destination is however 2, as expected.
--> The data below for uni-xdcr (C1 -> C2).
--> Not a recent regression, seen it once before in 3.0.0-9xx. Catching this bug totally depends on when I run the validation script after system test is completed. Expiration is usually set to 1 day and tombstone purge interval is 3 days on both source and destination. Once tombstones are purged, I don't see this mismatch. So I don't have a live cluster.

{'C1_location:': u'172.23.105.44', 'vb': 90, 'C2_node': u'172.23.105.54', 'C1_key_count': 19919, 'C2_key_count': 19919, 'missing_keys': 0}
RevID or CAS mismatch -
  172.23.105.44(C1): key:65ABEE18-153_100061 metadata:{'deleted': 1, 'seqno': 1, 'cas': 1902841111553483, 'flags': 0, 'expiration': 1408646731}
  172.23.105.54(C2): key:65ABEE18-153_100061 metadata:{'deleted': 1, 'seqno': 2, 'cas': 1902841111553484, 'flags': 0, 'expiration': 1408646731}
 RevID or CAS mismatch -
  172.23.105.44(C1): key:65ABEE18-153_100683 metadata:{'deleted': 1, 'seqno': 1, 'cas': 1902841111336520, 'flags': 0, 'expiration': 1408646731}
  172.23.105.54(C2): key:65ABEE18-153_100683 metadata:{'deleted': 1, 'seqno': 2, 'cas': 1902841111336521, 'flags': 0, 'expiration': 1408646731}
RevID or CAS mismatch -
  172.23.105.44(C1): key:65ABEE18-153_100713 metadata:{'deleted': 1, 'seqno': 1, 'cas': 1902841111837669, 'flags': 0, 'expiration': 1408646731}
  172.23.105.54(C2): key:65ABEE18-153_100713 metadata:{'deleted': 1, 'seqno': 2, 'cas': 1902841111837670, 'flags': 0, 'expiration': 1408646731}
 RevID or CAS mismatch -
  172.23.105.44(C1): key:65ABEE18-153_103240 metadata:{'deleted': 1, 'seqno': 1, 'cas': 1902843752129235, 'flags': 0, 'expiration': 1408646733}
  172.23.105.54(C2): key:65ABEE18-153_103240 metadata:{'deleted': 1, 'seqno': 2, 'cas': 1902843752129236, 'flags': 0, 'expiration': 1408646733}
 RevID or CAS mismatch -
  172.23.105.44(C1): key:65ABEE18-153_105170 metadata:{'deleted': 1, 'seqno': 1, 'cas': 1902847773405994, 'flags': 0, 'expiration': 1408646737}
  172.23.105.54(C2): key:65ABEE18-153_105170 metadata:{'deleted': 1, 'seqno': 2, 'cas': 1902847773405995, 'flags': 0, 'expiration': 1408646737}

Please let me know what/if you need in particular to diagnose this issue. Thanks!




 Comments   
Comment by Aruna Piravi [ 25/Aug/14 ]
Worth mentioning that if the same key gets recreated at Source with seq <= seqno(same key at destination), the create will/may not get propagated.
Comment by Aruna Piravi [ 25/Aug/14 ]
We have many xdcr functional tests with expiration, after which we compare revIDs, even of deleted items. We did not hit this particular bug. Venu did some unit tests and could not catch it either.

I'd like to perform the same system test on 2.5.1 to determine if this is a regression. Again system test runs itself runs for 12-15 hrs, we keep loading throughout the test, items keep expiring until the next 24 hrs, expiry pager runs 3 days from the start of the test and the validation script runs for half a day so it's all about timing.

Will get you the data files and start the system test.

Comment by Aruna Piravi [ 25/Aug/14 ]
http://172.23.105.44:8091/index.html
http://172.23.105.54:8091/index.html
Comment by Aruna Piravi [ 25/Aug/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12063/stdbucket.rtf --> keys that have this mismatch for bucket 'standardbucket'.

For a quick look -
https://s3.amazonaws.com/bugdb/jira/MB-12063/44_couch.tar (source)
https://s3.amazonaws.com/bugdb/jira/MB-12063/54_couch.tar (dest)
Comment by Aruna Piravi [ 25/Aug/14 ]
Also seeing some keys where revID is greater at source than dest. Pls look at vbuckets 15 and 28.

RevID or CAS mismatch -
  172.23.105.44(C1): key:6B67A321-142_4666321 metadata:{'deleted': 1, 'seqno': 4, 'cas': 13175771830561215, 'flags': 0, 'expiration': 1408649286}
  172.23.105.54(C2): key:6B67A321-142_4666321 metadata:{'deleted': 1, 'seqno': 3, 'cas': 13175771830561214, 'flags': 0, 'expiration': 1408649286}
RevID or CAS mismatch -
  172.23.105.44(C1): key:6B67A321-142_4666453 metadata:{'deleted': 1, 'seqno': 4, 'cas': 13175771778347790, 'flags': 0, 'expiration': 1408649286}
  172.23.105.54(C2): key:6B67A321-142_4666453 metadata:{'deleted': 1, 'seqno': 3, 'cas': 13175771778347789, 'flags': 0, 'expiration': 1408649286}
 RevID or CAS mismatch -
  172.23.105.44(C1): key:6B67A321-142_4674099 metadata:{'deleted': 1, 'seqno': 4, 'cas': 13175775983769867, 'flags': 0, 'expiration': 1408649290}
  172.23.105.54(C2): key:6B67A321-142_4674099 metadata:{'deleted': 1, 'seqno': 3, 'cas': 13175775983769866, 'flags': 0, 'expiration': 1408649290}
 RevID or CAS mismatch -
  172.23.105.44(C1): key:6B67A321-142_4674109 metadata:{'deleted': 1, 'seqno': 4, 'cas': 13175775977754745, 'flags': 0, 'expiration': 1408649290}
  172.23.105.54(C2): key:6B67A321-142_4674109 metadata:{'deleted': 1, 'seqno': 3, 'cas': 13175775977754744, 'flags': 0, 'expiration': 1408649290}
RevID or CAS mismatch -
  172.23.105.44(C1): key:6B67A321-142_4677328 metadata:{'deleted': 1, 'seqno': 4, 'cas': 13175778211679600, 'flags': 0, 'expiration': 1408649293}
  172.23.105.54(C2): key:6B67A321-142_4677328 metadata:{'deleted': 1, 'seqno': 3, 'cas': 13175778211679599, 'flags': 0, 'expiration': 1408649293}
Comment by Aruna Piravi [ 25/Aug/14 ]
cbcollect :

https://s3.amazonaws.com/bugdb/jira/MB-12063/source.tar
https://s3.amazonaws.com/bugdb/jira/MB-12063/dest.tar

Note: .44 is down now. Was up until this morning. .44 ran out of diskspace. The above script ran 2 days back.

I can start system test on 2.5.1 tomorrow morning if required. Pls let me know, thanks.
Comment by Wayne Siu [ 26/Aug/14 ]
Reviewed with Cihan. Potentially a RC2 candidate.
If the fix is contained and is ready by this week.
Need Dev's risk assessment.
Comment by Mike Wiederhold [ 26/Aug/14 ]
The logs have rolled over so I can't see exactly what happened. I think the reason that the cluster got into this state is because the expiry pager was run on one side of the cluster and not the other. This can easily explain the why destination would have a different sequence number and the source wouldn't. This happens because expiring a key will increment the rev sequence number and unfortunately I don't think there is anything we can do about this.

I am assuming that the same issue is happening when the source clusters seqno is incremented, but the destination seqno is not. In this case we should see the delete replicated to the other side and I cannot determine whether or not this happened because the logs are rolled over. I'm also not sure if you just checked those keys while keys were being propagated to the destination node. All of the traffic could have stopped, but the expiry pager might have kicked in during the verification phase.

I'm seeing a log message that I need to investigate further so I will leave this assigned to me.
Comment by Aruna Piravi [ 26/Aug/14 ]
 > I think the reason that the cluster got into this state is because the expiry pager was run on one side of the cluster and not the other.

That's possible but what can explain seeing 'deleted': 1 with 'seqno': 1? My understanding is - any key with deleted flag = 1 needs to have revid as atleast 2. RevID for any doc can be 1 only at the time of creation. If expired or deleted, revID should be incremented, right?
Comment by Aruna Piravi [ 27/Aug/14 ]
Also, this is C1 -> C2 (uni-xdcr). So seeing something like

172.23.105.44(C1): key:65ABEE18-153_100061 metadata:{'deleted': 1, 'seqno': 1, 'cas': 1902841111553483, 'flags': 0, 'expiration': 1408646731}
172.23.105.54(C2): key:65ABEE18-153_100061 metadata:{'deleted': 1, 'seqno': 2, 'cas': 1902841111553484, 'flags': 0, 'expiration': 1408646731}

would mean - expiry pager ran at C1('deleted':1) but did not increment revid which is clearly a bug.
Comment by Cihan Biyikoglu [ 27/Aug/14 ]
is there an ETA on the resolution? if this won't resolve in the next day, we need to push this one out.
thanks
Comment by Abhinav Dangeti [ 27/Aug/14 ]
Mike's already submitted the fixes:
http://review.couchbase.org/#/c/40996/
http://review.couchbase.org/#/c/40997/




[MB-12010] XDCR@next release - Parts : XMEM Nozzle Created: 19/Aug/14  Updated: 27/Aug/14

Status: In Progress
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: techdebt-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 40h
Time Spent: Not Specified
Original Estimate: 40h

Epic Link: XDCR next release




[MB-7761] Move operations stats out of memcached and into the engines Created: 15/Feb/13  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 1.8.1, 2.0, 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Improvement Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: supportability
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by MB-11986 Stats for every operations. (prepend ... Resolved
Relates to
relates to MB-8793 Prepare spec on stats updates Open
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-5011 gat (get and touch) operation not rep... Technical task Open Mike Wiederhold  
MB-6121 More operation stats please Technical task Open Mike Wiederhold  
MB-7419 Disk reads for append/prepend/incr/de... Technical task Open Mike Wiederhold  
MB-7711 UI: Getandlock doesn't show up in any... Technical task Closed Mike Wiederhold  
MB-7807 aggregate all kinds of ops in ops/sec... Technical task Open Anil Kumar  
MB-8183 getAndTouch (and touch) operations ar... Technical task Resolved Aleksey Kondratenko  
MB-10377 getl and cas not reported in the GUI ... Technical task Open Aleksey Kondratenko  
MB-11655 Stats: Getandlock doesn't show up in ... Technical task Open Mike Wiederhold  

 Description   
Stats have increasingly been an issue to deal with since they are half done in memcached and half done in ep-engine. Memcached should simply handle connections and not really care or track anything operation related. This stuff should happen in the engines and memcached should just ask for it when it needs the info.

 Comments   
Comment by Tug Grall (Inactive) [ 01/May/13 ]
Just to be sure they are linked. I let the engr team chose how to deal with this JIRA
Comment by Perry Krug [ 07/Jul/14 ]
Raising awareness to this broad supportability issue which sometimes makes it hard for the field and customers to accurately understand their Couchbase traffic




[MB-10262] corrupted key in data file rolls backwards to an earlier version or disappears without detection Created: 19/Feb/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0
Fix Version/s: 3.0.1, 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Matt Ingenthron Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: corrupt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Observed on Mac OS X, but presumed to affect all versions

Triage: Triaged
Flagged:
Release Note
Is this a Regression?: Yes

 Description   
By shutting down Couchbase Server, intentionally corrupting one recently stored key, then starting up the server and trying to read said key, an older version of that key is seen. The corruption wasn't logged (that I could find).

Note, the actual component here is couchstore.

Steps to reproduce:
1) Add a new document to a given bucket. Call the key something known, like "corruptme"
2) Edit the document once (so you'll have two versions of it)
3) Shut down the server
4) grep for that string in the vbucket data files
5) Edit the vbucket file for the given key. Change "corruptme" to "corruptm3"
6) Start the server
7) Perform a get for the given key (with cbc or the like)

Expected behavior: either the right key is returned (assumes replicated metadata) or an error is returned.

Observed behavior: the old version of the key is returned.


The probability of encountering this goes up dramatically in environments where there are many nodes, disks.

Related reading:
http://static.googleusercontent.com/media/research.google.com/en/us/archive/disk_failures.pdf

 Comments   
Comment by Anil Kumar [ 17/Jul/14 ]
Triage - Chiyoung, Anil, Venu, Wayne .. July 17th
Comment by Chiyoung Seo [ 30/Jul/14 ]
This is not a regression in 3.0, but has been there since 2.0 release. Given the 3.0 release schedule and the list of 3.0 blockers that we have as of this time, we will fix this issue in 3.0.1.
Comment by Sundar Sridharan [ 01/Aug/14 ]
Matt, I do not see this behavior in 3.0 cluster. If I shutdown the server and corrupt a key the subsequent warmup of the bucket returns fewer items. Just wondering how is it that you still see a corrupted key after warmup? thanks
Comment by Matt Ingenthron [ 01/Aug/14 ]
I don't "see a corrupted key" but rather corruption occurs, the item goes back to an older version when retrieved, and there's not even a warning in a log file. Another case I didn't check for was if the modified item was in the long past and there are good items after it.

Have you carried out the reproduction steps? Is there something that isn't clear?
Comment by Sundar Sridharan [ 01/Aug/14 ]
Matt, here are my steps and results, please help correct me if I am missing something..
1) Disable couchstore compression so the keys and values are visible also disable compaction.
2) Set key "corruptme" with value "test"
3) Set key "corruptme" with value "test" after a few seconds so there is a second commit writing the key again.
4) Shut down the server
5) grep for that string in the vbucket data files
6) Edit the vbucket file for the given key. Change "corruptme" to "corruptm3"
7) Start the server (warmup refuses to load key corruptm3)
8) Perform a get for the given key (with cbc or the like) (fails as there is no such key)
thanks
Comment by Matt Ingenthron [ 01/Aug/14 ]
Since the record should be checksummed, then either at the time of warmup or at the time of retrieving the corrupted entry, the checksum should not match. I would expect either an error, or a log message and a retrieval of an old value, or both. As it exists currently, if a bit changes for whatever reason, the corruption passes by silently. This could lead to very confusing and incorrect behavior in applications.
Comment by Sundar Sridharan [ 01/Aug/14 ]
We do log an error.. for example on my machine at warmup I see the following...
Fri Aug 1 14:14:39.886305 PDT 3: (default) couchstore_changes_since failed, error=checksum fail [none]
Fri Aug 1 14:14:39.886408 PDT 3: (default) couchstore_changes_since failed, error=checksum fail [none]
Do you believe this might not be sufficient?
thanks
Comment by Matt Ingenthron [ 04/Aug/14 ]
That definitely seems like different behavior from when I tried this. I think it's probably up to you and your team and PM on whether or not this is sufficient. From an application perspective, an item could have been stored, and then when later going to read it back an older version of it comes up with an error log message.

Comment by Chiyoung Seo [ 27/Aug/14 ]
As Sundar mentioned above, we log an error message when a given key is corrupted in a database file, and return its older version to the application if exists. I think that's enough at this time, but should be noted in 3.0 release note.

Ruth,

Please feel free to grab me if you need more information.




[MB-12046] CBTransfer showing missing items when items are present after topology change with graceful failover+ full recovery with nodes crashing Created: 21/Aug/14  Updated: 27/Aug/14  Resolved: 22/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Bin Cui
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1184

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.144-8212014-1657-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.144-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.145-8212014-1658-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.145-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.146-8212014-1659-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.146-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.147-8212014-171-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.147-8212014-176-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.148-8212014-172-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.148-8212014-177-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.149-8212014-173-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.149-8212014-177-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.150-8212014-174-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12046/10.6.2.150-8212014-177-couch.tar.gz
Is this a Regression?: Unknown

 Description   
Scenario

1. Create 7 node cluster
2. Create default bucket and add 100K items
3. Graceful failover 1 node
4. During Graceful failover, kill memcached of 3 other nodes, this fails graceful failover
5. Restart Graceful failover and let it run to completion
6. Full recover the failed over node and rebalance
7. During rebalance, kill memcached of 3 other nodes, this fails rebalance
8. Restart Rebalance and run it to completion

After Step 8, we collect data using cbtransfer and compare it to the one we had in step 2. We see missing keys.

Note that there are no mutations running from step 3 to step 8. We always read from couch store after the queues have been drained and replication is complete. Also, before we run cbtransfer, we verified item counts and verified data items as well

This seems like a bug in cbtransfer

Missing keys

failover97727
 failover96541
 failover19942
 failover72566
 failover98994
 failover21107
 failover17597
 failover47535
 failover58469
 failover47247
 failover79250
 failover95182
 failover48606
 failover885
 failover98366
 failover72214
 failover24016
 failover74124
 failover51288
 failover41177
 failover47925
 failover19220
 failover6008
 failover40281
 failover94916
 failover20361
 failover29410
 failover29800
 failover61528
 failover90103
 failover73072
 failover17817
 failover46753
 failover27955
 failover91997
 failover25502
 failover99672
 failover32149
 failover19552
 failover34279
 failover26723
 failover16113
 failover79522
 failover96951
 failover11737
 failover15332
 failover70253
 failover78036
 failover20413
 failover45200
 failover13192
 failover14154
 failover31368
 failover88099
 failover44684
 failover49460
 failover25882
 failover62699
 failover12486
 failover81678
 failover23632
 failover15850
 failover27237
 failover505
 failover11045
 failover49312
 failover94496
 failover95760
 failover24186
 failover10941
 failover84769
 failover72976
 failover77295
 failover20993
 failover15440
 failover12516
 failover277
 failover38589
 failover92636
 failover22844
 failover72384
 failover73700
 failover95012
 failover82459
 failover22326
 failover87548
 failover87958
 failover98414
 failover78744
 failover23140
 failover27545
 failover10223
 failover61938
 failover3119
 failover14626
 failover79932
 failover74656
 failover12896
 failover41605
 failover93322
 failover42424
 failover92144
 failover99090
 failover94274
 failover91365
 failover77867
 failover44066
 failover24764
 failover40311
 failover38809
 failover20803
 failover68259
 failover54209
 failover76163
 failover70931
 failover6198
 failover44714
 failover18734
 failover51318
 failover43642
 failover98584
 failover49870
 failover43130
 failover82849
 failover41795
 failover28104
 failover13002
 failover10551
 failover76611
 failover91807
 failover17407
 failover22454
 failover91587
 failover67618
 failover29580
 failover16661
 failover915
 failover90671
 failover495
 failover12906
 failover9449
 failover42356
 failover97055
 failover98804
 failover77477
 failover29990
 failover40463
 failover42834
 failover45390
 failover29362
 failover9859
 failover57028
 failover38419
 failover77305
 failover17987
 failover71035
 failover76781
 failover45962
 failover71747
 failover75820
 failover18046
 failover91417
 failover20583
 failover12264
 failover25492
 failover64439
 failover94886
 failover70521
 failover28676
 failover48796
 failover45572
 failover16083
 failover25912
 failover49282
 failover64829
 failover96233
 failover26051
 failover38999
 failover99100
 failover3089
 failover48174
 failover5229
 failover21097
 failover93450
 failover37058
 failover25270
 failover46021
 failover93840
 failover62709
 failover28094
 failover40873
 failover13770
 failover58879
 failover90093
 failover88109
 failover73690
 failover67788
 failover17375
 failover94506
 failover75342
 failover54399
 failover52139
 failover75430
 failover21675a

 Comments   
Comment by Parag Agarwal [ 22/Aug/14 ]
does not repro with 1186, marking it for closure




[MB-11411] Warmup with an access log always sets the loaded document's rev-id to 1. Created: 12/Jun/14  Updated: 27/Aug/14  Resolved: 25/Jun/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.5.0, 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Jim Walker Assignee: Jim Walker
Resolution: Fixed Votes: 0
Labels: customer, hotfix, warmup, xdcr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: start-finish
Relates to

 Description   
Note this is fixed upstream (3.0/master).

Identified that any restart with an access.log pushes the code down a path which creates "Item" objects with a sequence number of 1.

This is due to the constructor not being given the seq number and it defaulting to 1.

This happens when there's a server restart, e.g. offline upgrade or just a node reboot. The effect of this problem on XDCR is not pretty (likely to lose mutations in XDCR target).

If the user performs online/swap-rebalance and always swaps out crashed nodes and rebalances back, i.e. nodes always get their data from partner node/replicas, they're probably ok...

Raising this as an MB even though it's fixed upstream as I'm speculating that we'd like to track the back-port of this fix into 2.5.2 or a HOTFIX.

Patch to be uploaded shortly.

 Comments   
Comment by Dave Rigby [ 12/Jun/14 ]
Note this appears to originate in the original XDCR support code - see https://github.com/couchbase/ep-engine/commit/22e08f12#diff-0ba617db78bf72a148917b4a9205336eR265 (Line 265). It's evolved / changed since then, but the construction without the revID has remained...
Comment by Jim Walker [ 12/Jun/14 ]
Patch http://review.couchbase.org/#/c/38177/
Comment by Jim Walker [ 12/Jun/14 ]
Re: "Affects 2.0" ... the fetchDoc code path with the incorrect constructor usage is in 2.0 but access.log isn't.
Comment by Jim Walker [ 12/Jun/14 ]
Patch for MB-11411 branch

http://review.couchbase.org/38196
Comment by Jim Walker [ 12/Jun/14 ]
Re: hotfix - we need RedHat 5.8 x86-64.
Comment by Venu Uppalapati [ 12/Jun/14 ]
verified that this issue does not happen in 3.0(build 799). Warmup using access logs does Not reset document rev id to 1.
Comment by Phil Labee [ 12/Jun/14 ]
2.5.1 builds using manifests identical 2.5.1 release, with exception that rel-2.5.1.xml uses branch "MB-11411" for ep-engine.
Comment by Phil Labee [ 13/Jun/14 ]
build 2.5.1-1094 contains all changes on MB-11411 branch, up to and including: f9b9a8948cc6d6489b4f6b0fe4569be39c0cf456

From the build log:

   ep-engine MB-11411...
   git remote update
   Fetching origin
   git fetch --tags
   git reset --hard origin/MB-11411 || git reset --hard MB-11411
   HEAD is now at f9b9a89 MB-11411 Warmup with an access log always sets the loaded document's rev-id to 1.
Comment by Wayne Siu [ 17/Jun/14 ]
Update:
Regression test passed on centos-5 and ubuntu-12.


Comment by Mike Wiederhold [ 23/Jun/14 ]
Moved the fix version to 2.5.1. Note that this is a hot fix for that version. This is already fixed for 3.0.
Comment by Dave Rigby [ 24/Jun/14 ]
@Mike: I think setting the fixed version to 2.5.1 is misleading - while it *may* be fixed in the 2.5.1 branch, it isn't fixed in the release and hence when people are searching for bugs they may be hitting they could incorrectly think this isn't a problem in 2.5.1

Changing the fixed version back to 3.0.
Comment by Cihan Biyikoglu [ 27/Jun/14 ]
Article detailing the issue on KB: http://support.couchbase.com/entries/46382384-XDCR-may-not-replicate-some-mutations




[MB-12001] Stats Issue - Number of documents is misleading during rebalance with delta recovery Created: 18/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1169

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = RAID 10 HDD

Attachments: PNG File curr_items.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/ares-dev/45/artifact/
Is this a Regression?: No

 Description   
1 of 4 nodes is being re-added after failover.
500M x 2KB items, 10K mixed ops/sec.

Steps:
1. Failover one of nodes.
2. Add it back.
3. Enabled delta recovery.
4. Sleep 20 minutes.
5. Rebalance cluster.

When rebalance starts reported curr_items is significantly greater than actual number of documents.

 Comments   
Comment by Anil Kumar [ 19/Aug/14 ]
Stats Issue for new 3.0 feature 'delta node recovery'




[MB-11984] Intra-cluster replication slows down during intensive ejection Created: 18/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1166

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2680 v2 (40 vCPU)
Memory = 256 GB
Disk = RAID 10 SSD

Attachments: PNG File ejections_and_replication_queue.png    
Issue Links:
Relates to
relates to MB-11642 Intra-replication falling far behind ... Reopened
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.17.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.18.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.19.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.20.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.21.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.22.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.23.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.24.zip
https://s3.amazonaws.com/bugdb/jira/MB-11984/172.23.100.25.zip
Is this a Regression?: Yes

 Description   
Following up MB-11642.

I noticed that higher replication queue (10-100K items) is caused by intensive ejection (by intensive I mean 1-2M items ejected per second).

My setup:
9 nodes, 1 bucket, 2 replicas
50K inserts/sec (~1KB docs)

 Comments   
Comment by Raju Suravarjjala [ 18/Aug/14 ]
Triage: Moving to 3.0.1 as this is due to millions of documents
Comment by Sundar Sridharan [ 19/Aug/14 ]
Still trying to reproduce, does not occur with lower set ops/sec




[MB-11999] Resident ratio of active items drops from 3% to 0.06% during rebalance with delta recovery Created: 18/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1169

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = RAID 10 HDD

Attachments: PNG File vb_active_resident_items_ratio.png     PNG File vb_replica_resident_items_ratio.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/ares-dev/45/artifact/
Is this a Regression?: No

 Description   
1 of 4 nodes is being re-added after failover.
500M x 2KB items, 10K mixed ops/sec.

Steps:
1. Failover one of nodes.
2. Add it back.
3. Enabled delta recovery.
4. Sleep 20 minutes.
5. Rebalance cluster.

Most importantly it happens due to excessive memory usage.




[MB-11998] Working set is screwed up during rebalance with delta recovery (>95% cache miss rate) Created: 18/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1169

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = RAID 10 HDD

Attachments: PNG File cache_miss_rate.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/ares-dev/45/artifact/
Is this a Regression?: No

 Description   
1 of 4 nodes is being re-added after failover.
500M x 2KB items, 10K mixed ops/sec.

Steps:
1. Failover one of nodes.
2. Add it back.
3. Enabled delta recovery.
4. Sleep 20 minutes.
5. Rebalance cluster.




[MB-11804] [Windows] Memcached error #132 'Internal error': Internal error for vbucket... when set key to bucket Created: 23/Jul/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Thuan Nguyen Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 R2 64-bit

Attachments: Zip Archive 172.23.107.124-7232014-1631-diag.zip     Zip Archive 172.23.107.125-7232014-1633-diag.zip     Zip Archive 172.23.107.126-7232014-1634-diag.zip     Zip Archive 172.23.107.127-7232014-1635-diag.zip    
Triage: Untriaged
Operating System: Windows 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: Link to manifest file of this build from centos build. http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_3.0.0-999-rel.rpm.manifest.xml
Is this a Regression?: Yes

 Description   
Test warmup test in build 3.0.0-999 on 4 nodes windows 2008 R2 64-bit
python testrunner.py -i ../../ini/4-w-sanity-new.ini -t warmupcluster.WarmUpClusterTest.test_warmUpCluster,num_of_docs=100

The test failed when it loaded keys to bucket default. This test passed in both centos 6.4 and ubuntu 12.04 64-bit


 Comments   
Comment by Sriram Ganesan [ 06/Aug/14 ]
The error here seems similar to one of the issues that was fixed a while ago MB-9990.
Comment by Sriram Ganesan [ 06/Aug/14 ]
I am getting the following error in a 2 node windows cluster

Memcached error #132 'Internal error': Internal error for vbucket :589 to mc 10.2.1.65:11211

and AFAIK, 11211 is the moxi port.




[MB-12038] build go programs in ns_server instead of shipping pre-built binaries (was: [windows] F-Secure flagging binary) Created: 21/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Sriram Melkote Assignee: Wayne Siu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
A popular virus scanner, F-Secure is deleting a file installed by us, generate_cert.exe.

We need to analyze why this is happening.

See: https://groups.google.com/forum/#!topic/couchbase/E3QvNolCknQ


 Comments   
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
What you want me to do? Those crappy antiviruses are known to have false positives occasionally.
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
And btw our build machines _are_ clean.
Comment by Sriram Melkote [ 27/Aug/14 ]
You could compile it latest go on Windows, and the resulting binary does not trip any virus scanners
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
I don't compile anything on windows. This binaries are built on GNU/Linux machine using go cross compiler. I can rebuild using more recent go but it's unclear if it's really going to help.
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
And surely if build folks are ready to build our go stuff in ns_server I'm ok if some fan of cmake can make it happen.
Comment by Sriram Melkote [ 27/Aug/14 ]
Yes, let's request build team to add ability to compile Go code. We'll need it soon enough, independent of this bug anyway.




[MB-10960] Add Debian package to 3.0 release - Debian 7.0 Created: 24/Apr/14  Updated: 27/Aug/14  Resolved: 08/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Sriram Melkote Assignee: Phil Labee
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on MB-11872 vbucket.cc compilation failure on Debian Closed
Duplicate
Relates to
relates to MB-11872 vbucket.cc compilation failure on Debian Closed

 Description   
Debian is consistently in the top 3 distributions in server market, by almost every count. For example, below ranks the distribution of the top 10 million web servers:

http://w3techs.com/technologies/overview/operating_system/all

You can look at various other surveys, and you'll see the message is the same. Debian is pretty much at the top for servers. Yet, we don't ship packages for it. This is quite hard to understand because we're already building .deb for Ubuntu, and it takes only a few minor changes to make it compatible with Debian/Stable.

While I don't track customer requests, I've anecdotally seen them requesting the exact same thing in unambiguous terms.


 Comments   
Comment by Sriram Melkote [ 14/May/14 ]
Eric, Ubuntu and Debian are different distributions.
Comment by Jacob Lundberg [ 10/Jul/14 ]
Could somebody please add my user (jacoblundberg) to view CBSE-1140 if that is where this work will be done? This is an important request for CollegeNET and I want to be able to view the status.
Comment by Brent Woodruff [ 10/Jul/14 ]
Hi Jacob. I am not familiar with that ticket (CBSE-1140) firsthand, so perhaps someone else would be better for discussing that issue with you. However, I wanted to let you know that all CBSE tickets are internal to Couchbase. We will not be able to grant you access to view that ticket's contents.

MB tickets such as this one are public, and updates to the status of Couchbase's support of the Debian platform for Couchbase Server will appear here.

I recommend that you communicate with Couchbase Support via email or the web if you have a question about the status of work you are expecting to be completed, which has either a related Couchbase Support ticket or a CBSE ticket.

Support email: support@couchbase.com
Support portal: http://support.couchbase.com
Comment by Anil Kumar [ 11/Jul/14 ]
Here are the details for adding support to Debian -
- Support current stable distribution of Debian is version 7
- Only 64bit

Comment by Phil Labee [ 14/Jul/14 ]
The status quo is that 2.x does not build on Debian and it does not install cleanly on Debian. This is a new platform for us, which means new build infrastructure and new installer files.

We do currently support ubuntu, which produces *.deb file for installation. So what we have may be close, but since this is a new platform it is unclear how much work will be required.
Comment by Sriram Melkote [ 14/Jul/14 ]
OK - I've removed my suggestion to split it into two tasks. Let's treat this as a new platform as suggested.
Comment by Anil Kumar [ 15/Jul/14 ]
Ceej/Phil - As mentioned before Debian is a new platform we will start supporting from 3.0.

- Support Debian 7
- Only 64bit

Comment by Phil Labee [ 23/Jul/14 ]
I need 4 VMs with Debian 7.0. Each needs:

    4 Gig RAM
    4 CPUs
    100 Gig of disk space

These machines may be used for building the server, or for running smoke tests like in CBIT-956

Please take a snapshot of each after configuring. I'm going to install a build environment but we may need to roll back in case we need to re-purpose any of these hosts.
Comment by Phil Labee [ 30/Jul/14 ]
setup debian-7-x64-builder ( 172.23.113.41, sierra-s22705.sc.couchbase.com ) using:

wget http://ftp.de.debian.org/debian/pool/main/b/buildbot-slave/buildbot-slave_0.8.6p1.orig.tar.gz
tar xvf buildbot-slave_0.8.6p1.orig.tar.gz
pushd buildbot-slave-0.8.6p1
sudo python setup.py install
popd

# build tool-chain

sudo apt-get install build-essential devscripts ruby-dev libncurses5-dev libssl-dev git-core libtool python-setuptools python-dev debhelper

wget http://ftpmirror.gnu.org/autoconf/autoconf-2.69.tar.gz
tar -xzf autoconf-2.69.tar.gz
pushd autoconf-2.69
./configure --prefix=/usr/local && make && sudo make install
popd

wget http://ftpmirror.gnu.org/automake/automake-1.11.1.tar.gz
tar -xzf automake-1.11.1.tar.gz
pushd automake-1.11.1
./configure --prefix=/usr/local && make && sudo make install
popd

wget http://ftpmirror.gnu.org/libtool/libtool-2.4.2.tar.gz
tar -xzf libtool-2.4.2.tar.gz
pushd libtool-2.4.2
./configure --prefix=/usr/local && make && sudo make install
popd

wget http://www.cmake.org/files/v2.8/cmake-2.8.12.2-Linux-i386.sh
chmod 755 cmake-2.8.12.2-Linux-i386.sh
sudo ./cmake-2.8.12.2-Linux-i386.sh --prefix=/usr/local
Comment by Phil Labee [ 30/Jul/14 ]
had to install buildbot (again) after intalling python-setuptools
Comment by Phil Labee [ 30/Jul/14 ]
wget http://production.cf.rubygems.org/rubygems/rubygems-2.4.1.tgz
tar xvf rubygems-2.4.1.tgz
pushd rubygems-2.4.1
sudo ruby setup.rb
popd

sudo gem1.8 install rake
Comment by Chris Hillery [ 30/Jul/14 ]
Why not "apt-get install rubygems"?
Comment by Phil Labee [ 30/Jul/14 ]
=> sudo apt-get install rubygems
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package rubygems
Comment by Chris Hillery [ 30/Jul/14 ]
Hmmm, it should be there... Debian 7 is "wheezy":

https://packages.debian.org/search?keywords=rubygems

You might need to check /etc/apt/sources.list and /etc/apt/sources.list.d. (Or you might just need to "apt-get update", depending on how the Debian base image was created.)
Comment by Sriram Melkote [ 31/Jul/14 ]
Yes - apt-get should be able to install automake, autoconf, rubygems, libtool, cmake etc

We should only have to build when we know the version supplied with debian doesn't work for us.

Here's an example sources.list:

  deb http://security.debian.org/ wheezy/updates main non-free contrib
  deb http://http.debian.net/debian wheezy main non-free contrib
  deb http://http.debian.net/debian wheezy-updates main non-free contrib
  deb http://http.debian.net/debian wheezy-backports main non-free contrib
Comment by Phil Labee [ 31/Jul/14 ]
rubygems was not available from apt-get. Maybe the sources.list file is incomplete.

We require specific versions of automake, autoconf, libtool, and cmake, so these were installed as described.
Comment by Phil Labee [ 04/Aug/14 ]
Same error on both debian build environments:

    3.0.0 (VM image)

    http://builds.hq.northscale.net:8010/builders/debian-7-x64-300-builder/builds/38


    master (docker container)

    http://builds.hq.northscale.net:8010/builders/debian-7-x64-master-builder/builds/13
Comment by Phil Labee [ 04/Aug/14 ]
VM debian-7-x64-builder now using gcc, g++ now at 4.4.7

=> sh
$ gcc --version
gcc (Debian 4.4.7-2) 4.4.7

$ g++ --version
g++ (Debian 4.4.7-2) 4.4.7

fails from missing dependencies. From
    
    http://builds.hq.northscale.net:8010/builders/debian-7-x64-300-builder/builds/39/steps/couchbase-server%20make%20enterprise%20/logs/stdio

In file included from src/module.c:24:
src/connection.h:26:20: error: Python.h: No such file or directory
src/connection.h:27:22: error: pythread.h: No such file or directory
src/connection.h:28:26: error: structmember.h: No such file or directory
Comment by Phil Labee [ 04/Aug/14 ]
successful ubuntu-1204-x64 builds use 4.6.3, will try build with that
Comment by Phil Labee [ 04/Aug/14 ]
compile failure has been identified as an error in the code.

will switch to 4.7.2, which is the latest, and is the default for this platform.
Comment by Chris Hillery [ 04/Aug/14 ]
MB-11872 is tracking the compile error.
Comment by Chris Hillery [ 05/Aug/14 ]
Fix for MB-11872. Latest builds show:

In file included from src/module.c:24:0:
src/connection.h:26:20: fatal error: Python.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1
make: *** [pysqlite2] Error 1

I believe this is because that step is explicitly using python2.6 to build, but the python-dev package that is installed on Debian is for python 2.7. The quick solution is to install the python2.6-dev package as well, so I'll do that on the builddocker to test. It would be better to not call python2.6 explicitly, which I believe we can do also, but it's a bit more work.
Comment by Chris Hillery [ 06/Aug/14 ]
Adding python2.6-dev worked. I was unable to do this on the debian7 VM, for some reason:

E: Unable to locate package python2.6-dev
E: Couldn't find any package by regex 'python2.6-dev'

I think /etc/apt/sources.list must need to be extended; on my Docker container, it is:

deb http://ftp.us.debian.org/debian wheezy main
deb http://ftp.us.debian.org/debian wheezy-updates main
deb http://security.debian.org wheezy/updates main
Comment by Chris Hillery [ 06/Aug/14 ]
Next error is in server-deb.rb:

./server-deb.rb /opt/couchbase couchbase-server couchbase server 1*
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- util (LoadError)
from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
from ./server-deb.rb:6:in `<main>'
make: *** [package-deb] Error 1


I am investigating.
Comment by Chris Hillery [ 06/Aug/14 ]
Fix: http://review.couchbase.org/#/c/40322/

I also fixed our buildbot config so the resulting .deb files will have "debian7" in the name: http://review.couchbase.org/#/c/40321/

And we have our first build! Unfortunately I forgot to reconfigure buildbot with my change, so the .deb file is misnamed (and possibly overwritten) in latestbuilds, but the build did succeed: http://builds.hq.northscale.net:8010/builders/debian-7-x64-master-builder/builds/20

I'll fire off another one so we actually have a .deb file to play with.
Comment by Chris Hillery [ 06/Aug/14 ]
http://latestbuilds.hq.couchbase.com/couchbase-server-enterprise_debian7_x86_64_0.0.0-1680-rel.deb

Huzzah! It even shows up on http://latestbuilds.hq.couchbase.com/ 's index..

Comment by Volker Mische [ 06/Aug/14 ]
Just want to let you know that it installed fine on my system (my dev box is on Debian).
Comment by Jacob Lundberg [ 08/Aug/14 ]

We have installed the version from August 7th on our development system and it passes our unit tests.
Comment by Chris Hillery [ 08/Aug/14 ]
Great!

I've just merged a change onto the 3.0.0 branch of voltron that should fix the Debian 7 3.0.0 builds. Once we've confirmed that build is working, I think we can close this ticket.
Comment by Phil Labee [ 08/Aug/14 ]
The same docker container is being used for 3.0.0 and master builds. The 3.0.0 build is failing with:

  ./server-deb.rb /opt/couchbase couchbase-server couchbase server 1*
  /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- util (LoadError)
     from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from ./server-deb.rb:6:in `<main>'
  make: *** [package-deb] Error 1
Comment by Chris Hillery [ 08/Aug/14 ]
That's the voltron 3.0.0 change I mentioned. I guess there hasn't been a 3.0.0 build since I merged it. I'll go ahead and force one to make sure it's fixed.
Comment by Chris Hillery [ 08/Aug/14 ]
http://builds.hq.northscale.net:8010/builders/debian-7-x64-300-builder/builds/59

Looks like it's working, so resolving this bug!
Comment by Chad Kreimendahl [ 27/Aug/14 ]
Where would we find the package for this? Is it the existing ubuntu one?
Comment by Chad Kreimendahl [ 27/Aug/14 ]
Just attempted an install with beta2 from the ubuntu package and got my answer (no). It appears the primary issue is libc6 version dependencies. wheezy (debian 7) is on 2.1.3
Comment by Sriram Melkote [ 27/Aug/14 ]
Yes, debian and ubuntu are separate package.

For example:
http://latestbuilds.hq.couchbase.com/couchbase-server-community_debian7_x86_64_3.0.0-1195-rel.deb




[MB-10307] Bucket priority on Admin UI Created: 26/Feb/14  Updated: 27/Aug/14  Resolved: 15/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Ketaki Gangal Assignee: Ketaki Gangal
Resolution: Fixed Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-02-26 at 1.46.57 PM.png    

 Description   
The new UI contains a bucket priority inplace of previous MRW code.

1. It would be nice to have some explanation on the UI itself of what the prioritization implies.
2.Options for Bucket priority are listed as 2 to 8, it should be changes to 1 to X, bucket numbers can range from 1 to any ( 10 typical or higher), this number 2 to 8 does not tie in with this.
3. For singularly created bucket, my current bucket priority is 3 - which is incorrect, since it is the only bucket present on the cluster. [see attached screenshot]




 Comments   
Comment by Dipti Borkar [ 26/Feb/14 ]
Anil, can you define this better?
thanks
Comment by Venu Uppalapati [ 26/Feb/14 ]
It would be good to have a Priority tab in the Settings screen in the UI. When there are more than a few buckets, it will provide an easy way for the User to look at the priorities of each bucket and also change them if needed.
Comment by Pavel Paulau [ 27/Feb/14 ]
Notice that ep_engine supports only 2 priorities ("Low" and "High") and current UI/API will be changed in any case.
Comment by Venu Uppalapati [ 27/Feb/14 ]
This is what I see in /cmake/ep-engine/configuration.json, range 1-8 :
"max_num_workers": {
234 "default": "4",
235 "descr": "Bucket Priority relative to other buckets",
236 "dynamic": false,
237 "type": "size_t",
238 "validator": {
239 "range": {
240 "max": 8,
241 "min": 1
242 }
243 }
244 },
When were the changes made to make it only High/Low priority? Also, each task has an individual priority assigned inside /cmake/ep-engine/src/priority.cc.
Comment by Pavel Paulau [ 27/Feb/14 ]
See https://github.com/couchbase/ep-engine/blob/master/src/workload.h#L44
Comment by Sundar Sridharan [ 28/Feb/14 ]
Venu, this was done during the time we added the automatic thread count feature.
Like Pavel mentioned above, perhaps a more meaningful UI might be to just allow for a radio button that says HIGH and LOW for the bucket priority and internally sets to either 3 or 8 for the "max_num_workers" so when a customer has a very large number of buckets (say more than 10), they can select some of the buckets to be of lower priority.
thanks
Comment by Anil Kumar [ 09/Apr/14 ]
Alk/Pavel -

Sundar provided details on ep_engine api and values to pass

I have provided with mockup UI for the Admin UI
Comment by Aleksey Kondratenko [ 15/Apr/14 ]
http://review.couchbase.org/35677
Comment by Ketaki Gangal [ 27/Aug/14 ]
3.0.0-1174-rel works as expected.




[MB-12084] Create 3.0.0 chef-based rightscale template for EE and CE Created: 27/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: cloud
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Wei-Li Liu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Need this before 3.0 GA




[MB-12082] Marketplace AMI - Enterprise Edition and Community Edition - provide AMI id to PM Created: 27/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: cloud
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Wei-Li Liu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Need AMI's before 3.0.0 GA




[MB-12083] Create 3.0.0 legacy rightscale templates for Enterprise and Community Edition (non-chef) Created: 27/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: cloud
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Wei-Li Liu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We need this before 3.0 GA




[MB-12041] Disabling access.log on multiple buckets results in node failing to become available Created: 21/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Brent Woodruff Assignee: Abhinav Dangeti
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-08-26 at 8.53.15 AM.png     PNG File Screen Shot 2014-08-26 at 8.53.29 AM.png     PNG File Screen Shot 2014-08-26 at 8.58.37 AM.png     PNG File Screen Shot 2014-08-26 at 8.58.43 AM.png    
Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
On review of a customer ticket today during support discussions, this particular issue was brought up. It is unclear from subsequent discussions in that ticket whether this issue was addressed and fixed.

Steps to reproduce:

* Initialize a Couchbase node with more than one bucket

* Disable the access.log on *both* buckets using the following command for each bucket:

wget -O- --user=Administrator --password=password --post-data='ns_bucket:update_bucket_props("bucket1", [{extra_config_string, "alog_path="}]).' http://localhost:8091/diag/eval

wget -O- --user=Administrator --password=password --post-data='ns_bucket:update_bucket_props("bucket2", [{extra_config_string, "alog_path="}]).' http://localhost:8091/diag/eval

where 'bucket1' and 'bucket2' are the bucket names.

* Restart the node and observe the following errors in the logs:

memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
memcached<0.89.0>: Unsupported key: <^A>

* Note that the node remains pending and never becomes available

 Comments   
Comment by Abhinav Dangeti [ 21/Aug/14 ]
I don't see the node failing to become available.

Started couchbase server with 2 buckets:
  1 Fri Aug 22 10:36:25.628702 PDT 3: (default) Trying to connect to mccouch: "127.0.0.1:13000"
  2 Fri Aug 22 10:36:25.628978 PDT 3: (default) Connected to mccouch: "127.0.0.1:13000"
  3 Fri Aug 22 10:36:25.644502 PDT 3: (No Engine) Bucket default registered with low priority
  4 Fri Aug 22 10:36:25.644528 PDT 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
  5 Fri Aug 22 10:36:25.646178 PDT 3: (default) metadata loaded in 982 usec
  6 Fri Aug 22 10:36:25.646205 PDT 3: (default) Enough number of items loaded to enable traffic
  7 Fri Aug 22 10:36:25.646559 PDT 3: (default) warmup completed in 1052 usec
  8 Fri Aug 22 10:36:33.495128 PDT 3: (default) Shutting down tap connections!
  9 Fri Aug 22 10:36:33.495174 PDT 3: (default) Shutting down dcp connections!
 10 Fri Aug 22 10:36:33.496244 PDT 3: (No Engine) Unregistering last bucket default
 11 Fri Aug 22 10:36:41.791797 PDT 3: (bucket1) Trying to connect to mccouch: "127.0.0.1:13000"
 12 Fri Aug 22 10:36:41.791932 PDT 3: (bucket1) Connected to mccouch: "127.0.0.1:13000"
 13 Fri Aug 22 10:36:41.800241 PDT 3: (No Engine) Bucket bucket1 registered with low priority
 14 Fri Aug 22 10:36:41.800273 PDT 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
 15 Fri Aug 22 10:36:41.801437 PDT 3: (bucket1) metadata loaded in 719 usec
 16 Fri Aug 22 10:36:41.801450 PDT 3: (bucket1) Enough number of items loaded to enable traffic
 17 Fri Aug 22 10:36:41.801593 PDT 3: (bucket1) warmup completed in 761 usec
 18 Fri Aug 22 10:36:46.922063 PDT 3: (bucket2) Trying to connect to mccouch: "127.0.0.1:13000"
 19 Fri Aug 22 10:36:46.922191 PDT 3: (bucket2) Connected to mccouch: "127.0.0.1:13000"
 20 Fri Aug 22 10:36:46.931024 PDT 3: (No Engine) Bucket bucket2 registered with low priority
 21 Fri Aug 22 10:36:46.932154 PDT 3: (bucket2) metadata loaded in 715 usec
 22 Fri Aug 22 10:36:46.932170 PDT 3: (bucket2) Enough number of items loaded to enable traffic
 23 Fri Aug 22 10:36:46.932314 PDT 3: (bucket2) warmup completed in 776 usec

Loaded 1000 items in each, and restarted node, after setting the alog_path to NULL, in the same way mentioned.
  1 Fri Aug 22 10:38:08.372050 PDT 3: (bucket2) Trying to connect to mccouch: "127.0.0.1:13000"
  2 Fri Aug 22 10:38:08.372307 PDT 3: (bucket2) Connected to mccouch: "127.0.0.1:13000"
  3 Fri Aug 22 10:38:08.382418 PDT 3: (No Engine) Bucket bucket2 registered with low priority
  4 Fri Aug 22 10:38:08.382445 PDT 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
  5 Fri Aug 22 10:38:08.434024 PDT 3: (bucket1) Trying to connect to mccouch: "127.0.0.1:13000"
  6 Fri Aug 22 10:38:08.434205 PDT 3: (bucket1) Connected to mccouch: "127.0.0.1:13000"
  7 Fri Aug 22 10:38:08.445064 PDT 3: (No Engine) Bucket bucket1 registered with low priority
  8 Fri Aug 22 10:38:08.481732 PDT 3: (bucket2) metadata loaded in 98 ms
  9 Fri Aug 22 10:38:08.507847 PDT 3: (bucket2) warmup completed in 124 ms
 10 Fri Aug 22 10:38:08.540342 PDT 3: (bucket1) metadata loaded in 92 ms
 11 Fri Aug 22 10:38:08.553951 PDT 3: (bucket1) warmup completed in 106 ms

[10:37:46] abhinav: ~/Documents/couchbase30/ep-engine $ ./management/cbstats localhost:12000 all -b bucket1 | grep alog
 ep_alog_block_size: 4096
 ep_alog_path:
 ep_alog_sleep_time: 1440
 ep_alog_task_time: 10
[10:38:50] abhinav: ~/Documents/couchbase30/ep-engine $ ./management/cbstats localhost:12000 all -b bucket2 | grep alog
 ep_alog_block_size: 4096
 ep_alog_path:
 ep_alog_sleep_time: 1440
 ep_alog_task_time: 10

I do see the duplicate entry warning, but that I'm guessing is because we set alog_path again after initializing it to the default value, in which case it would overwrite.
Comment by Abhinav Dangeti [ 22/Aug/14 ]
I tried your scenario with the latest 3.0 and then with 2.5.1, and noted similar behavior.
Can you point me to the build, with which you saw this issue, or perhaps the logs from where you hit the issue?
Comment by Abhinav Dangeti [ 25/Aug/14 ]
I also merged a change that will let the user enable/disable access log generation during run time:
http://review.couchbase.org/#/c/40884/
Comment by Brent Woodruff [ 26/Aug/14 ]
Hi Abhinav,

Apologies for the delay in getting back to you regarding a build that exhibits this issue. I believe it was the 2.5.1 release, since that was what the customer had just upgraded to in the originating issue. I was running that release on my Mac testing the commands provided by engineering to disable the access.log. I was able to reproduce this issue again today.

Attached screenshots of before and after these commands:

$./cbstats localhost:11210 config -b default |grep alog_path
 ep_alog_path: /Users/brent/Library/Application Support/Couchbase/var/lib/couchdb/default/access.log

$./cbstats localhost:11210 config -b beer-sample |grep alog_path
 ep_alog_path: /Users/brent/Library/Application Support/Couchbase/var/lib/couchdb/beer-sample/access.log

$ wget -O- --user=Administrator --password=couchbase --post-data='ns_bucket:update_bucket_props("default", [{extra_config_string, "alog_path="}]).' http://localhost:8091/diag/eval
# wget output removed

$ wget -O- --user=Administrator --password=couchbase --post-data='ns_bucket:update_bucket_props("beer-sample", [{extra_config_string, "alog_path="}]).' http://localhost:8091/diag/eval
# wget output removed

# Restarted Couchbase

$./cbstats localhost:11210 config -b default |grep alog_path
 ep_alog_path:

$./cbstats localhost:11210 config -b beer-sample |grep alog_path
# no output
Comment by Brent Woodruff [ 26/Aug/14 ]
Note: In the first screen shot showing the buckets, both buckets were availalbe, I just did not think to get the green indicator since my goal was just to show that there was more than one bucket configured.
Comment by Abhinav Dangeti [ 26/Aug/14 ]
Brent, I did not see this issue when I used a couchbase instance started by cluster_run. However I do see this issue with a mac build for some reason.
Note that this issue is not because of having multiple buckets in the cluster. I see this issue with a single bucket as well.

The duplicate entry warning showed just once while in cluster_run. With the build however, the babysitter logs are flooded with those messages:

37899 memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
37900 memcached<0.89.0>: Unsupported key: <Vû^F^A>
37901 memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
37902 memcached<0.89.0>: Unsupported key: <<86>^C^G^A>
37903 memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
37904 memcached<0.89.0>: Unsupported key: <öê^F^A>
37905 memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
37906 memcached<0.89.0>: Unsupported key: <&ó^F^A>
37907 memcached<0.89.0>: WARNING: Found duplicate entry for "alog_path"
37908 memcached<0.89.0>: Unsupported key: <Vû^F^A>
...
Comment by Abhinav Dangeti [ 26/Aug/14 ]
However, this issue already seems to be resolved in 3.0. Verified with 3.0.0-1175-rel.
Comment by Abhinav Dangeti [ 27/Aug/14 ]
Please re-open if you see this with 3.0




[MB-12005] vbucket-seqno stats getting timed out during Views DGM test Created: 19/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Meenakshi Goel Assignee: Sriram Melkote
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1166-rel

Triage: Triaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Yes

 Description   
Test to Reproduce:
./testrunner -i yourfile.ini -t view.createdeleteview.CreateDeleteViewTests.pending_removal_with_ddoc_ops,ddoc_ops=update,test_with_view=True,num_ddocs=3,num_views_per_ddoc=3,items=200000,nodes_out=1,active
_resident_threshold=10,dgm_run=True,eviction_policy=fullEviction,skip_cleanup=true

Steps to Reproduce:
1. Setup a 5-node cluster
2. Rebalance in all nodes
3. Load bucket to achieve dgm 10%
4. Failover 1 node
5. Create Views and perform ddoc update operations
6. Test exits with error during ddoc validation

014-08-18 04:00:37 | INFO | MainProcess | Cluster_Thread | [rest_client._query] index query url: http://10.3.5.90:8092/default/_design/ddoc_test1/_view/views0?stale=false&connection_timeout=60000&full_set=true
2014-08-18 04:15:37 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] socket error while connecting to http://10.3.5.90:8092/default/_design/ddoc_test1/_view/views0?stale=false&connection_timeout=60000&full_set=true error timed out
2014-08-18 04:15:37 | ERROR | MainProcess | Cluster_Thread | [task.execute] Unexpected Exception Caught
ERROR
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/taskmanager.py', 31, 'run', 'task.step(self)'), ('lib/tasks/task.py', 56, 'step', 'self.execute(task_manager)'), ('lib/tasks/task.py', 1525, 'execute', 'self.set_exception(e)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
Mon Aug 18 04:15:37 2014
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('testrunner.py', 262, 'run', '**self._Thread__kwargs)'), ('/usr/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/usr/lib/python2.7/unittest/case.py', 391, '__call__', 'return self.run(*args, **kwds)'), ('/usr/lib/python2.7/unittest/case.py', 327, 'run', 'testMethod()'), ('pytests/view/createdeleteview.py', 626, 'pending_removal_with_ddoc_ops', 'self._verify_ddoc_data_all_buckets()'), ('pytests/view/createdeleteview.py', 274, '_verify_ddoc_data_all_buckets', 'result = self.cluster.query_view(self.master, ddoc_name, view.name, query, num_items, bucket)'), ('lib/couchbase/cluster.py', 464, 'query_view', 'return _task.result(timeout)'), ('lib/tasks/future.py', 160, 'result', 'return self.__get_result()'), ('lib/tasks/future.py', 111, '__get_result', 'print traceback.extract_stack()')]
2014-08-18 04:15:37 | WARNING | MainProcess | test_thread | [basetestcase.tearDown] CLEANUP WAS SKIPPED

======================================================================
ERROR: pending_removal_with_ddoc_ops (view.createdeleteview.CreateDeleteViewTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytests/view/createdeleteview.py", line 626, in pending_removal_with_ddoc_ops
    self._verify_ddoc_data_all_buckets()
  File "pytests/view/createdeleteview.py", line 274, in _verify_ddoc_data_all_buckets
    result = self.cluster.query_view(self.master, ddoc_name, view.name, query, num_items, bucket)
  File "lib/couchbase/cluster.py", line 464, in query_view
    return _task.result(timeout)
  File "lib/tasks/future.py", line 160, in result
    return self.__get_result()
  File "lib/tasks/future.py", line 112, in __get_result
    raise self._exception
ServerUnavailableException: unable to reach the host @ 10.3.5.90

Logs:
[couchdb:error,2014-08-18T4:39:41.963,ns_1@10.3.5.90:<0.12717.6>:couch_log:error:44]Set view `default`, replica group `_design/ddoc_test0`, doc loader error
error: {timeout,{gen_server,call,
                                 [<0.12658.6>,
                                  {add_stream,706,0,0,5637,6},
                                  60000]}}
stacktrace: [{gen_server,call,3,[{file,"gen_server.erl"},{line,188}]},
             {couch_dcp_client,enum_docs_since,8,
                 [{file,
                      "/home/buildbot/buildbot_slave/ubuntu-1004-x64-300-builder/build/build/couchdb/src/couch_dcp/src/couch_dcp_client.erl"},
                  {line,246}]},
             {couch_set_view_updater,'-load_changes/8-fun-2-',12,
                 [{file,
                      "/home/buildbot/buildbot_slave/ubuntu-1004-x64-300-builder/build/build/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,516}]},
             {lists,foldl,3,[{file,"lists.erl"},{line,1248}]},
             {couch_set_view_updater,load_changes,8,
                 [{file,
                      "/home/buildbot/buildbot_slave/ubuntu-1004-x64-300-builder/build/build/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,589}]},
             {couch_set_view_updater,'-update/8-fun-3-',14,
                 [{file,
                      "/home/buildbot/buildbot_slave/ubuntu-1004-x64-300-builder/build/build/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,281}]}]

[couchdb:error,2014-08-18T6:46:34.997,ns_1@10.3.5.90:<0.12648.6>:couch_log:error:44]dcp client (<0.12660.6>): vbucket-seqno stats timed out after 2.0 seconds. Waiting...
[couchdb:error,2014-08-18T6:46:39.608,ns_1@10.3.5.90:<0.21856.6>:couch_log:error:44]dcp client (<0.21861.6>): vbucket-seqno stats timed out after 2.0 seconds. Waiting...
[couchdb:error,2014-08-18T6:46:42.611,ns_1@10.3.5.90:<0.21856.6>:couch_log:error:44]dcp client (<0.21861.6>): vbucket-seqno stats timed out after 2.0 seconds. Waiting...

*Observed stacktraces and crashes in logs. Uploading logs.

Live Cluster:
1:10.3.5.90
2:10.3.5.91
3:10.3.5.92
4:10.3.5.93
5:10.3.4.75

 Comments   
Comment by Meenakshi Goel [ 19/Aug/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12005/f806d72b/10.3.5.90-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12005/8f3bae64/10.3.5.91-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12005/4fc16d8a/10.3.5.92-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12005/16116a7e/10.3.4.75-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12005/a3e503f6/10.3.5.93-diag.zip
Comment by Sarath Lakshman [ 19/Aug/14 ]
I believe we tried a toy build with separate connection for stats before in MB-11706. So I think I don't have much information about this problem
Comment by Nimish Gupta [ 22/Aug/14 ]
I have a toy build with change for separate connection (http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-nimish-x86_64_3.0.0-704-toy.deb). Meenakshi, please run the test with this build.
Comment by Meenakshi Goel [ 22/Aug/14 ]
Started test with the toy build.
Comment by Nimish Gupta [ 25/Aug/14 ]
I don't see any crash in couchdb after using the separate connection. The crashes are in ns_server logs, looks due to rebalance and failover which were in earlier logs also. The separate connection has reduced the number of stats timeout messages. Previously there were 714 stats timeout message which now has reduced to 113.
Comment by Sriram Melkote [ 25/Aug/14 ]
Wayne, please help us obtain a toy build with mctimings compiled.
Comment by Cihan Biyikoglu [ 26/Aug/14 ]
Hi team, could you explain why this is critical? trying to shut the gates for 3.0 and want to make sure we keep only things that have high impact on the release.
Comment by Ketaki Gangal [ 27/Aug/14 ]
This causes a number of view-test timeouts. While these can be worked around by adjusting test timeouts, imo it is not the right way for the code/query timings to work.
As mentioned earlier, there are a large number of stat calls being made here.

This should manifest in performance tests already, but with functional tests, there is a large increase in query-runtime due to the above.


Comment by Meenakshi Goel [ 27/Aug/14 ]
Started test with toy build http://qa.sc.couchbase.com/job/ubuntu_x64--65_02--view_query_extended-P1/171/console




[MB-12081] Remove counting mutations introduced for MB-11589 Created: 27/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Sriram Melkote Assignee: Volker Mische
Resolution: Fixed Votes: 0
Labels: RC2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-11589 Sliding endseqno during initial index... Open
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We are currently counting number of mutations requested vs received. This was diagnostic code used to get closer to resolving MB-11589. However, that bug has been deferred to 3.0.1 - the diagnostic code must be removed, to avoid performance and logging overhead in release build.

http://review.couchbase.org/40790

 Comments   
Comment by Wayne Siu [ 27/Aug/14 ]
Reviewed with PM/Cihan. Approved for RC2.
Comment by Cihan Biyikoglu [ 27/Aug/14 ]
approved for RC2.




[MB-11948] [Windows]: Simple-test broken - Rebalance exited with reason {unexpected_exit..{dcp_wait_for_data_move_failed,"default",254,..wrong_rebalancer_pid}}} Created: 13/Aug/14  Updated: 27/Aug/14  Resolved: 25/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Meenakshi Goel Assignee: Meenakshi Goel
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1143-rel

Attachments: Zip Archive ep_persistence_failures.zip    
Issue Links:
Duplicate
is duplicated by MB-11981 [3.0.0-1166-Windows] items are stucke... Resolved
Triage: Triaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/68/console
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/66/consoleFull

Test to Reproduce:
./testrunner -i <yourfile>.ini -t rebalance.rebalancein.RebalanceInTests.rebalance_in_with_ops,nodes_in=3,replicas=1,items=50000,doc_ops=create;update;delete

Logs:
[ns_server:error,2014-08-13T7:46:46.007,ns_1@10.3.3.213:janitor_agent-default<0.18311.0>:janitor_agent:handle_call:639]Rebalance call failed due to the wrong rebalancer pid <0.18169.0>. Should be undefined.
[ns_server:error,2014-08-13T7:46:46.007,ns_1@10.3.3.213:<0.18299.0>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {'EXIT',<0.18312.0>,
                               {dcp_wait_for_data_move_failed,"default",254,
                                   'ns_1@10.3.3.213',
                                   ['ns_1@10.1.2.66'],
                                   wrong_rebalancer_pid}}
[ns_server:error,2014-08-13T7:46:46.007,ns_1@10.3.3.213:<0.18299.0>:misc:sync_shutdown_many_i_am_trapping_exits:1430]Shutdown of the following failed: [{<0.18312.0>,
                                    {dcp_wait_for_data_move_failed,"default",
                                     254,'ns_1@10.3.3.213',
                                     ['ns_1@10.1.2.66'],
                                     wrong_rebalancer_pid}}]
[ns_server:error,2014-08-13T7:46:46.007,ns_1@10.3.3.213:<0.18245.0>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {'EXIT',<0.18289.0>,
                            {bulk_set_vbucket_state_failed,
                             [{'ns_1@10.3.3.213',
                               {'EXIT',
                                {{{{case_clause,
                                    {error,
                                     {{{badmatch,
                                        {error,
                                         {{badmatch,{error,enobufs}},
                                          [{mc_replication,connect,1,
                                            [{file,"src/mc_replication.erl"},
                                             {line,30}]},
                                           {mc_replication,connect,1,
                                            [{file,"src/mc_replication.erl"},
                                             {line,49}]},
                                           {dcp_proxy,connect,4,
                                            [{file,"src/dcp_proxy.erl"},
                                             {line,179}]},
                                           {dcp_proxy,maybe_connect,1,
                                            [{file,"src/dcp_proxy.erl"},
                                             {line,166}]},
                                           {dcp_consumer_conn,init,2,
                                            [{file,
                                              "src/dcp_consumer_conn.erl"},
                                             {line,55}]},
                                           {dcp_proxy,init,1,
                                            [{file,"src/dcp_proxy.erl"},
                                             {line,48}]},
                                           {gen_server,init_it,6,
                                            [{file,"gen_server.erl"},
                                             {line,304}]},
                                           {proc_lib,init_p_do_apply,3,
                                            [{file,"proc_lib.erl"},
                                             {line,239}]}]}}},
                                       [{dcp_replicator,init,1,
                                         [{file,"src/dcp_replicator.erl"},
                                          {line,47}]},
                                        {gen_server,init_it,6,
                                         [{file,"gen_server.erl"},{line,304}]},
                                        {proc_lib,init_p_do_apply,3,
                                         [{file,"proc_lib.erl"},{line,239}]}]},
                                      {child,undefined,'ns_1@10.1.2.66',
                                       {dcp_replicator,start_link,
                                        ['ns_1@10.1.2.66',"default"]},
                                       temporary,60000,worker,
                                       [dcp_replicator]}}}},
                                   [{dcp_sup,start_replicator,2,
                                     [{file,"src/dcp_sup.erl"},{line,78}]},
                                    {dcp_sup,
                                     '-set_desired_replications/2-lc$^2/1-2-',
                                     2,
                                     [{file,"src/dcp_sup.erl"},{line,55}]},
                                    {dcp_sup,set_desired_replications,2,
                                     [{file,"src/dcp_sup.erl"},{line,55}]},
                                    {replication_manager,handle_call,3,
                                     [{file,"src/replication_manager.erl"},
                                      {line,130}]},
                                    {gen_server,handle_msg,5,
                                     [{file,"gen_server.erl"},{line,585}]},
                                    {proc_lib,init_p_do_apply,3,
                                     [{file,"proc_lib.erl"},{line,239}]}]},
                                  {gen_server,call,
                                   ['replication_manager-default',
                                    {change_vbucket_replication,255,
                                     'ns_1@10.1.2.66'},
                                    infinity]}},
                                 {gen_server,call,
                                  [{'janitor_agent-default','ns_1@10.3.3.213'},
                                   {if_rebalance,<0.18169.0>,
                                    {update_vbucket_state,255,replica,
                                     undefined,'ns_1@10.1.2.66'}},
                                   infinity]}}}}]}}

Uploading Logs.

Live Cluster
1:10.3.3.213
2:10.3.2.21
3:10.3.2.23
4:10.1.2.66

 Comments   
Comment by Meenakshi Goel [ 13/Aug/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11948/f806d72b/10.3.3.213-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11948/11dd43ca/10.3.2.21-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11948/75af6ef6/10.1.2.66-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11948/9dc45b16/10.3.2.23-diag.zip
Comment by Aleksey Kondratenko [ 13/Aug/14 ]
You digged out right error message.

ENOBUFS is mapped from WSAENOBUFS and googling for it found me http://support.microsoft.com/kb/196271 which reminds me about that installer thing that we did this registry setting. It looks like you still need to do it.
Comment by Sriram Melkote [ 13/Aug/14 ]
I think we support only Windows 2008 and they fixed that issue,

it has 16k ports by default.
http://support.microsoft.com/kb/929851

Alk, any idea why we initiate so many outgoing connections?
Comment by Aleksey Kondratenko [ 13/Aug/14 ]
We shouldn't. And I don't know what "many" is.
Comment by Aleksey Kondratenko [ 14/Aug/14 ]
Rebalance is stuck waiting for seqno persistence. It's something in ep-engine.
Comment by Aleksey Kondratenko [ 14/Aug/14 ]
And memcached logs are full of this:

Thu Aug 14 10:51:05.630859 Pacific Daylight Time 3: (default) Warning: failed to open database file for vBucket = 1013 rev = 1
Thu Aug 14 10:51:05.633789 Pacific Daylight Time 3: (default) Warning: couchstore_open_db failed, name=c:/Program Files/Couchbase/Server/var/lib/couchbase/data/default/246.couch.1 option=2 rev=1 error=no such file [errno = 0: 'The operation completed successfully.

So indeed persistence is not working.
Comment by Chiyoung Seo [ 14/Aug/14 ]
Sriram,

I think this issue was caused by the file path name issue on windows that we discussed the other day.
Comment by Sriram Ganesan [ 15/Aug/14 ]
From bug MB-11934, Meenakshi had run the simple test on build 3.0.0-1130-rel (http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/65/consoleFull) and it looks like all the tests passed except for warmup. So, it looks like persistence was okay at that point. Did this manifest in any build earlier than 3.0.0-1143-rel? It might help pinpoint the check-ins that caused the regression.
Comment by Meenakshi Goel [ 18/Aug/14 ]
I had the last successful run with 3.0.0-1137-rel and after that picked 3.0.0-1143-rel in which observed this issue.
Also sanity tests seems to worked fine till 3.0.0-1139-rel
http://qa.sc.couchbase.com/job/CouchbaseServer-SanityTest-4Node-Windows_2012_x64/213/consoleFull
Comment by Sriram Ganesan [ 18/Aug/14 ]
Also observing a lot of couch notifier errors

memcached<0.81.0>: Wed Aug 13 07:45:39.978617 Pacific Daylight Time 3: (default) Resetting connection to mccouch, lastReceivedCommand = select_bucket lastSentCommand = select_bucket currentCommand =unknown
memcached<0.81.0>: Wed Aug 13 07:45:39.979593 Pacific Daylight Time 3: (default) Trying to connect to mccouch: "127.0.0.1:11213"
memcached<0.81.0>: Wed Aug 13 07:45:39.979593 Pacific Daylight Time 3: (default) Connected to mccouch: "127.0.0.1:11213"
memcached<0.81.0>: Wed Aug 13 07:45:39.980570 Pacific Daylight Time 3: (default) Failed to read from mccouch for select_bucket: "The operation completed successfully.

Before all those, there is an error from moxi

[ns_server:info,2014-08-13T7:45:38.812,babysitter_of_ns_1@127.0.0.1:<0.79.0>:ns_port_server:log:169]moxi<0.79.0>: 2014-08-13 07:45:40: (C:\Jenkins\workspace\cs_300_win6408\couchbase\moxi\src\agent_config.c.721) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: No vBuckets available; service maybe still initializing

Comment by Meenakshi Goel [ 21/Aug/14 ]
Please let me know if cluster is no longer required. Thanks.
Comment by Sriram Ganesan [ 21/Aug/14 ]
I was able to simulate the same problem in a much simpler warmup test (using a toy windows builder with some custom logs). We see the read failures in couch notifier when invoking recv() from the mccouch connection. The connection keeps getting reset and eventually we can't even connect to mccouch.

Thu Aug 21 16:26:06.005468 Pacific Daylight Time 3: (default) Failed to connect to: "127.0.0.1:11213"
Thu Aug 21 16:26:06.005468 Pacific Daylight Time 3: (default) Failed to connect to: "127.0.0.1:11213"
Thu Aug 21 16:26:06.012468 Pacific Daylight Time 3: (default) Failed to connect to: "127.0.0.1:11213"
Thu Aug 21 16:26:06.012468 Pacific Daylight Time 3: (default) Failed to connect to: "127.0.0.1:11213"
Thu Aug 21 16:26:06.019468 Pacific Daylight Time 3: (default) Failed to connect to: "127.0.0.1:11213"

We have only started seeing this problem from build 3.0.0-1142 onwards. On the couch notifier side, recv() returns -1 but WSAGetLastError() returns 0 which is not one of the errors mentioned at http://msdn.microsoft.com/en-us/library/windows/desktop/ms740121(v=vs.85).aspx. In order to triage this better, it might be good to know what requests mccouch is receiving and what responses it is providing in order to find the explanation for the bizarre behavior on the couch notifier side. Test can be run using the following command on a couple of windows boxes

./testrunner -i ./dev.ini -t "memcapable.WarmUpMemcachedTest.do_warmup_10k"
Comment by Sriram Ganesan [ 21/Aug/14 ]
Uploading logs from one of the nodes in the warmup test that was run. Let me know if you need more details.
Comment by Sriram Ganesan [ 21/Aug/14 ]
Assigning to ns_server to find out more details on the activity on the mccouch side.
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
From our side we see this:

[ns_server:debug,2014-08-21T16:26:06.006,ns_1@127.0.0.1:<0.418.0>:mc_tcp_listener:accept_loop:31]Got new connection
[ns_server:debug,2014-08-21T16:26:06.006,ns_1@127.0.0.1:<0.22291.1>:mc_connection:handle_select_bucket:131]Got select bucket default
[ns_server:debug,2014-08-21T16:26:06.006,ns_1@127.0.0.1:<0.418.0>:mc_tcp_listener:accept_loop:33]Passed connection to mc_conn_sup: <0.22291.1>
[ns_server:debug,2014-08-21T16:26:06.006,ns_1@127.0.0.1:<0.22291.1>:mc_connection:handle_select_bucket:133]Sent reply on select bucket
[ns_server:info,2014-08-21T16:26:06.006,ns_1@127.0.0.1:<0.22291.1>:mc_connection:run_loop:162]mccouch connection was normally closed

I.e. our side sees normal connection closure.

Eventual inability of ep-engine side to connect is likely caused by exhausting ports space by TIME_WAIT sockets.
Comment by Sriram Ganesan [ 25/Aug/14 ]
http://review.couchbase.org/#/c/40865/

Merged in a fix for handling the persistence failures in ep-engine. Please verify the fix and update the ticket if there are any more issues.
Comment by Ketaki Gangal [ 26/Aug/14 ]
Hi Sriram,

Can you provide the exact build number for this - we can test on that build.
Comment by Sriram Ganesan [ 26/Aug/14 ]
Hello Ketaki

The 3.0.0 builds seem to be pointing to the 3.0 branch of ep-engine. I believe you will need to 3.0.1 builds to verify the above change as the 3.0.1 release manifest https://github.com/membase/manifest/blob/master/rel-3.0.1.xml points to the ep-engine master branch where this change was checked in but I don't see any 3.0.1 windows builds at http://latestbuilds.hq.couchbase.com. You might want to ping the build folks about generating those builds.

Thanks
Sriram
Comment by Meenakshi Goel [ 27/Aug/14 ]
Verified with 3.0.1-1206-rel
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/74/console




[MB-12075] Couchbase Server constantly shutting down and restarting after upgrade from 2.1.1-766 Created: 26/Aug/14  Updated: 27/Aug/14  Resolved: 26/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Ubuntu 64 bit, 3.0.0-1174-rel

Issue Links:
Gantt: start-finish
is triggering MB-12065 [Offline Upgrade 2.1.1-766 -> 3.0.0-1... Resolved
is triggering MB-12066 [Offline Upgrade 2.1.1.766 -> 3.0.0-1... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Pls look at http://10.3.3.240:8091/index.html#sec=servers, node 10.3.3.239 in particular.

The node is unstable, Couchbase server is restarting every 4 secs as you can see from the GUI log.

--> This happened after the cluster was upgraded from 2.1.1.766 to 3.0.0-1174.
--> After offline upgrade, source cluster was forced to use encrypted xdcr.
--> No replication seems to have happened after this change and node .239 entered a bad state.
--> Seen only on Ubuntu
--> Is probably the reason for MB-12066(yes, there is data loss - 1000 items less on each bucket on C2).

Cbcollect
-------------

[C1]
10.3.3.239 : https://s3.amazonaws.com/bugdb/jira/MB-12066/d30b70e7/10.3.3.239-8252014-2346-couch.tar.gz
10.3.3.240 : https://s3.amazonaws.com/bugdb/jira/MB-12066/4b079374/10.3.3.240-8252014-2340-diag.zip

[C2]
10.3.3.218 : https://s3.amazonaws.com/bugdb/jira/MB-12066/e4f38323/10.3.3.218-8252014-2343-diag.zip
10.3.3.225 : https://s3.amazonaws.com/bugdb/jira/MB-12066/6f010183/10.3.3.225-8252014-2345-diag.zip


 Comments   
Comment by Aleksey Kondratenko [ 26/Aug/14 ]
Indeed happens because of ns_server bug. And also indeed causes xdcr to malfunction.
Comment by Aleksey Kondratenko [ 26/Aug/14 ]
I've fixed the thing for live cluster already. Will post fix in few hours.
Comment by Aruna Piravi [ 26/Aug/14 ]
Thanks, I can now see all keys replicated. Will close 12066 as duplicate of this bug.
Comment by Cihan Biyikoglu [ 26/Aug/14 ]
approved for RC2 if we can get the fix by wed EOD
thanks
Comment by Aleksey Kondratenko [ 26/Aug/14 ]
manifest bumped here: http://review.couchbase.org/40954

actual fixes are:

* http://review.couchbase.org/40953
* http://review.couchbase.org/40949

Comment by Aruna Piravi [ 27/Aug/14 ]
Alk, I see 140 lines of code changed/added. Do you foresee any risks of regression at this point?
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
No. The second patch is large because I had to refactor remote_cluster_info facility a bit. I tested it thoroughly, plus it's quite deterministic piece of code. If we have any regressions they will be caught as part of usual xdcr testing.




[MB-12080] unable to build cbq-engine Created: 27/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: None
Security Level: Public

Type: Task Priority: Test Blocker
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
output is:
[root@grape-001 query]# ./build.sh
cd parser/n1ql
nex...
goyacc...
go build...
cd server/main
go build -o cbq-engine...
# github.com/couchbaselabs/query/accounting/logger_retriever
../../accounting/logger_retriever/logger_retriever.go:73: undefined: logger.LogLevel
cd shell
go build -o cbq...
cd tutorial
go build

shell is built fine, but cbq-engine is not built




[MB-11589] Sliding endseqno during initial index build or upr reading from disk snapshot results in longer stale=false query latency and index startup time Created: 28/Jun/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Sarath Lakshman Assignee: Sriram Melkote
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-11920 DCP based rebalance with views doesn'... Closed
Relates to
relates to MB-11919 3-5x increase in index size during re... Open
relates to MB-12081 Remove counting mutations introduced ... Resolved
relates to MB-11918 Latency of stale=update_after queries... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We have to fix this depending on the development cycles we have left for 3.0

 Comments   
Comment by Anil Kumar [ 17/Jul/14 ]
Triage - July 17

Currently investigating we will decide depending on the scope of changes needed.
Comment by Anil Kumar [ 30/Jul/14 ]
Triage : Anil, Wayne .. July 29th

Raising this issue to "Critical" this needs to be fixed by RC.
Comment by Sriram Melkote [ 31/Jul/14 ]
The issue is that we'll have to change the view dcp client to stream all 1024 vbuckets in parallel, or we'll need an enhancement in ep-engine to stop streaming at the point requested. Neither is a simple change - the reason it's in 3.0 is because Dipti had requested we try to optimize query performance. I'll leave it at Major as I don't want to commit to fixing this in RC and also, the product works with reasonable performance without this fix and so it's not a must have for RC.
Comment by Sriram Melkote [ 31/Jul/14 ]
Mike noted that even streaming all vbuckets in parallel (which was perhaps possible to do in 3.0) won't directly solve the issue as the backfills are scheduled one at a time. ep-engine could hold onto smaller snapshots but that's not something we can consider in 3.0 - so net effect is that we'll have to revisit this in 3.0.1 to design a proper solution.
Comment by Sriram Melkote [ 12/Aug/14 ]
Bringing back to 3.0 as this is the root cause of MB-11920 and MB-11918
Comment by Anil Kumar [ 13/Aug/14 ]
Deferring this to 3.0.1 since making this out of scope for 3.0.




[MB-12048] View engine 2.5 to 3.0 index file upgrade Created: 22/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sarath Lakshman Assignee: Ketaki Gangal
Resolution: Fixed Votes: 0
Labels: RC2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: start-finish
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
View engine 2.5 index files are not compatible with 3.0 index files. Hence, it requires index rebuild for 3.0.
We need a method that renames index files to new compatible filenames (with signature) and append new header.

 Comments   
Comment by Volker Mische [ 22/Aug/14 ]
I created a hacky script that still involves some manual steps, though i think it works in general.

Though I'm hitting a bigger issue with DCP. It expectes that you send the correct partition/vBucket version with your request whenever you don't start indexing from scratch. The problem is that this information is persisted on disk, hence in the index header. When we offline upgrade from 2.x to 3.0 we don't know which partition that server might have, hence we can't save the correct one.

Currently the only sane way I see is changing the DCP semantics and making it possible to resume from a certain seq number with sending {0, 0} as partition version (currently you need to send the correct partition version in case you want to resume).
Comment by Sriram Melkote [ 22/Aug/14 ]
Folks - we need to do this as an automatic feature (i.e., fully automated without needing any manual steps) or not at all. Let's talk with EP engine folks to spec this.
Comment by Sarath Lakshman [ 22/Aug/14 ]
I am guessing it can be done as part of installation postscript which checks the current version and performs upgrade of existing files.
Eg. RPM and Deb has a way to specify post scripts.
Comment by Volker Mische [ 22/Aug/14 ]
Sarath, yes, that would be a way.
Comment by Volker Mische [ 22/Aug/14 ]
Siri, I misunderstood you comment as you probably did mine. The manual steps are only needed atm to verify my idea works. The final result will be a script that can be run without any manual steps.

My misunderstanding was that I thought you talk about "online upgrade", but you didn't really say that.
Comment by Sriram Melkote [ 22/Aug/14 ]
Ketaki, can we add a test to detect this situation? The test must fail until we fix this issue.
Comment by Ketaki Gangal [ 22/Aug/14 ]
Hi Siri,

We run automation tests which do offline upgrades from 2.X to 3.X, what these tests dont check is whether index is rebuilt /not.
https://github.com/couchbase/testrunner/blob/master/conf/py-newupgrade.conf#L39

I ll update the tests to add a check for index-rebuild verification.

Sarath: Can you provide details on how to check if indexes are rebuilt/not.


Comment by Sarath Lakshman [ 23/Aug/14 ]
I can think of a very easy way where you can have a timeout for stale=false after Couchbase is up soon after warmup. Run a stale=false and it should run immediately. Since index rebuild is going on, it would take more time.
Comment by Sriram Melkote [ 25/Aug/14 ]
Product Managers, please note:

This is UPGRADE step. We'll need to discuss how we'll handle upgrade logic in the product before we decide on closure of this issue.

+Anil, Ilam, Cihan
Comment by Cihan Biyikoglu [ 25/Aug/14 ]
lets pick this up on the daily synup tomorrow.
thanks
-cihan
Comment by Volker Mische [ 26/Aug/14 ]
I think I found a nice solution. Now it's an online upgrade:

http://review.couchbase.org/40914
http://review.couchbase.org/40915
http://review.couchbase.org/40916
Comment by Sriram Melkote [ 27/Aug/14 ]
This has been discussed earlier this week and is a 3.0 approved exception.




[MB-12065] [Offline Upgrade 2.1.1-766 -> 3.0.0-1174] replica items mismatch after upgrade Created: 26/Aug/14  Updated: 27/Aug/14  Resolved: 26/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Abhinav Dangeti
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Ubuntu 64 bit (12.04)

Issue Links:
Gantt: start-finish
is triggered by MB-12075 Couchbase Server constantly shutting ... Resolved
Triage: Untriaged
Operating System: Ubuntu 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.3.239 : https://s3.amazonaws.com/bugdb/jira/MB-12065/40f0a9b2/10.3.3.239-8222014-2216-diag.zip
10.3.3.239 : https://s3.amazonaws.com/bugdb/jira/MB-12065/7584aa98/10.3.3.239-8222014-2222-couch.tar.gz
10.3.3.239 : https://s3.amazonaws.com/bugdb/jira/MB-12065/a15d63ee/10.3.3.239-diag.txt
10.3.3.240 : https://s3.amazonaws.com/bugdb/jira/MB-12065/802987e6/10.3.3.240-8222014-2222-couch.tar.gz
10.3.3.240 : https://s3.amazonaws.com/bugdb/jira/MB-12065/c555bf6e/10.3.3.240-diag.txt
10.3.3.240 : https://s3.amazonaws.com/bugdb/jira/MB-12065/c7c31c83/10.3.3.240-8222014-2214-diag.zip


[Destination]
10.3.3.199 : https://s3.amazonaws.com/bugdb/jira/MB-12065/85abaa8a/10.3.3.199-8222014-2220-diag.zip
10.3.3.199 : https://s3.amazonaws.com/bugdb/jira/MB-12065/99f3fb96/10.3.3.199-diag.txt.gz
10.3.3.199 : https://s3.amazonaws.com/bugdb/jira/MB-12065/c11ea857/10.3.3.199-8222014-2222-couch.tar.gz
10.3.3.218 : https://s3.amazonaws.com/bugdb/jira/MB-12065/27bf3a49/10.3.3.218-8222014-2219-diag.zip
10.3.3.218 : https://s3.amazonaws.com/bugdb/jira/MB-12065/402915cc/10.3.3.218-8222014-2222-couch.tar.gz
10.3.3.218 : https://s3.amazonaws.com/bugdb/jira/MB-12065/851c2f72/10.3.3.218-diag.txt.gz
Is this a Regression?: Unknown

 Description   
http://qa.hq.northscale.net/job/ubuntu_x64--36_01--XDCR_upgrade-P1/36/consoleFull

[Test]
./testrunner -i ubuntu_x64--36_01--XDCR_upgrade-P1.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True,upgrade_version=3.0.0-1174-rel,initial_vbuckets=1024 -t xdcr.upgradeXDCR.UpgradeTests.offline_cluster_upgrade,initial_version=2.1.1-766-rel,sdata=False,bucket_topology=default:1>2;bucket0:1><2,upgrade_nodes=dest;src,use_encryption_after_upgrade=src


[Test Steps]
1. Create 2-2 Nodes Source and Destination Cluster.

Source Nodes 10.3.3.240 (Master), 10.3.3.239
Source Nodes 10.3.3.218 (Master), 10.3.3.199


2. Setup CAPI Mode XDCR

bucket: bucket0 (BiXDCR)
bucket: default (UniXDCR)

3. Load 1000 items on bucket0 bucket on the both the cluster.
4. Load 1000 items on default bucket on Source cluster.
5. Offline upgrade both the cluster with 3.0.0-1174-rel.
6. Modify XDCR Settings to use SSL on Source Cluster Only*****.
7. Load 1000 Items on bucket0 and default bucket on Source.
8. Verify items.

Replica items mismatch on Source Cluster itself.

[2014-08-22 22:11:14,388] - [task:459] WARNING - Not Ready: vb_replica_curr_items 1500 == 2000 expected on '10.3.3.240:8091''10.3.3.239:8091', default bucket
[2014-08-22 22:11:16,410] - [task:459] WARNING - Not Ready: vb_replica_curr_items 2500 == 3000 expected on '10.3.3.240:8091''10.3.3.239:8091', bucket0 bucket
[2014-08-22 22:11:19,432] - [task:459] WARNING - Not Ready: vb_replica_curr_items 1500 == 2000 expected on '10.3.3.240:8091''10.3.3.239:8091', default bucket





 Comments   
Comment by Sangharsh Agarwal [ 26/Aug/14 ]
Some updates from the test Logs:

1. After upgrade, we see the errors that test were not able to connect with 10.3.3.239:

[2014-08-22 22:08:44,291] - [data_helper:295] INFO - creating direct client 10.3.3.239:11210 bucket0
[2014-08-22 22:08:44,440] - [data_helper:295] INFO - creating direct client 10.3.3.240:11210 bucket0
[2014-08-22 22:08:45,351] - [task:772] INFO - Batch create documents done #: 0 with exp:0
[2014-08-22 22:08:45,691] - [task:772] INFO - Batch create documents done #: 1000 with exp:0
[2014-08-22 22:08:46,955] - [data_helper:295] INFO - creating direct client 10.3.3.239:11210 default
[2014-08-22 22:08:46,970] - [rest_client:750] ERROR - socket error while connecting to http://10.3.3.239:8091/pools/default/buckets/default?basic_stats=true error [Errno 111] Connection refused
[2014-08-22 22:08:47,974] - [rest_client:750] ERROR - socket error while connecting to http://10.3.3.239:8091/pools/default/buckets/default?basic_stats=true error [Errno 111] Connection refused
[2014-08-22 22:08:49,172] - [data_helper:295] INFO - creating direct client 10.3.3.240:11210 default
[2014-08-22 22:08:49,356] - [task:772] INFO - Batch create documents done #: 0 with exp:0
[2014-08-22 22:08:49,753] - [task:772] INFO - Batch create documents done #: 1000 with exp:0


2. Test is passed on CentOS too.
Comment by Sangharsh Agarwal [ 26/Aug/14 ]
There were lot of logs, where buckets are going to shutdown here:

2014-08-22 22:08:30.085 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "bucket0" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:08:30.085 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "default" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:08:32.731 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "bucket0" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:08:32.862 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "default" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:08:32.915 mb_master:0:warning:message(ns_1@10.3.3.239) - Somebody thinks we're master. Not forcing mastership takover over ourselves
2014-08-22 22:08:32.962 menelaus_sup:1:info:web start ok(ns_1@10.3.3.239) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.239'. Version: "3.0.0-1174-rel-enterprise".
2014-08-22 22:08:32.997 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "default" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:08:33.035 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "bucket0" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:08:35.795 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "bucket0" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:08:35.925 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "default" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:08:35.992 mb_master:0:warning:message(ns_1@10.3.3.239) - Somebody thinks we're master. Not forcing mastership takover over ourselves
2014-08-22 22:08:36.033 menelaus_sup:1:info:web start ok(ns_1@10.3.3.239) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.239'. Version: "3.0.0-1174-rel-enterprise".
2014-08-22 22:08:36.069 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "default" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:08:36.073 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "bucket0" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:08:39.216 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "bucket0" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:08:39.292 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "default" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:08:39.368 mb_master:0:warning:message(ns_1@10.3.3.239) - Somebody thinks we're master. Not forcing mastership takover over ourselves
2014-08-22 22:08:39.404 menelaus_sup:1:info:web start ok(ns_1@10.3.3.239) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.239'. Version: "3.0.0-1174-rel-enterprise".



[ns_server.info.log]

[user:info,2014-08-22T22:08:00.713,ns_1@10.3.3.239:ns_memcached-default<0.12208.0>:ns_memcached:terminate:783]Shutting down bucket "default" on 'ns_1@10.3.3.239' for server shutdown
[ns_server:info,2014-08-22T22:08:00.769,ns_1@10.3.3.239:ns_memcached-default<0.12208.0>:ns_memcached:terminate:795]This bucket shutdown is not due to bucket deletion or reconfiguration. Doing nothing
[ns_server:info,2014-08-22T22:08:01.801,ns_1@10.3.3.239:ns_server_sup<0.13612.0>:dir_size:start_link:49]Starting quick version of dir_size with program name: i386-linux-godu
[ns_server:info,2014-08-22T22:08:01.816,ns_1@10.3.3.239:ns_config_rep<0.13631.0>:ns_config_rep:do_pull:343]Pulling config from: 'ns_1@10.3.3.240'

[user:warn,2014-08-22T22:08:01.838,ns_1@10.3.3.239:ns_server_sup<0.13612.0>:mb_master:check_master_takeover_needed:153]Somebody thinks we're master. Not forcing mastership takover over ourselves
[ns_server:info,2014-08-22T22:08:01.864,ns_1@10.3.3.239:ns_doctor<0.13655.0>:ns_doctor:update_status:235]The following buckets became ready on node 'ns_1@10.3.3.240': ["bucket0",
                                                               "default"]
[user:info,2014-08-22T22:08:01.879,ns_1@10.3.3.239:ns_server_sup<0.13612.0>:menelaus_sup:start_link:44]Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.239'. Version: "3.0.0-1174-rel-enterprise".
[ns_server:info,2014-08-22T22:08:01.880,ns_1@10.3.3.239:<0.13739.0>:mc_tcp_listener:init:24]mccouch is listening on port 11213
[ns_server:info,2014-08-22T22:08:01.882,ns_1@10.3.3.239:<0.13743.0>:ns_memcached_log_rotator:init:28]Starting log rotator on "/opt/couchbase/var/lib/couchbase/logs"/"memcached.log"* with an initial period of 39003ms
[ns_server:info,2014-08-22T22:08:01.900,ns_1@10.3.3.239:<0.13807.0>:compaction_new_daemon:spawn_scheduled_kv_compactor:468]Start compaction of vbuckets for bucket default with config:
[{database_fragmentation_threshold,{30,undefined}},
 {view_fragmentation_threshold,{30,undefined}}]
[ns_server:info,2014-08-22T22:08:01.903,ns_1@10.3.3.239:janitor_agent-default<0.13805.0>:janitor_agent:read_flush_counter:1048]Loading flushseq failed: {error,enoent}. Assuming it's equal to global config.
[ns_server:info,2014-08-22T22:08:01.904,ns_1@10.3.3.239:ns_memcached-default<0.13784.0>:ns_memcached:handle_cast:675]Main ns_memcached connection established: {ok,#Port<0.9250>}
[ns_server:info,2014-08-22T22:08:01.904,ns_1@10.3.3.239:janitor_agent-bucket0<0.13811.0>:janitor_agent:read_flush_counter:1048]Loading flushseq failed: {error,enoent}. Assuming it's equal to global config.

Comment by Sangharsh Agarwal [ 26/Aug/14 ]
Trying to reproduce this issue to give you live cluster.
Comment by Sangharsh Agarwal [ 26/Aug/14 ]
There is one more issued logged MB-12066. Not sure of this issue is similar to MB-12066 as in this bug only replica items are mismatch.
Comment by Abhinav Dangeti [ 26/Aug/14 ]
I see from the diags of 10.3.3.240, that the node is continuously restarting after upgrade:

2014-08-22 22:14:52.811 menelaus_sup:1:info:web start ok(ns_1@10.3.3.239) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.239'. Version: "3.0.0-1174-rel-enterprise".
2014-08-22 22:14:52.860 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "default" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:14:52.885 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "bucket0" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:14:54.853 ns_memcached:0:info:message(ns_1@10.3.3.240) - Shutting down bucket "bucket0" on 'ns_1@10.3.3.240' for server shutdown
2014-08-22 22:14:55.256 ns_memcached:0:info:message(ns_1@10.3.3.240) - Shutting down bucket "default" on 'ns_1@10.3.3.240' for server shutdown
2014-08-22 22:14:56.166 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "bucket0" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:14:56.452 menelaus_sup:1:info:web start ok(ns_1@10.3.3.240) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.240'. Version: "3.0.0-1174-rel-enterprise".
2014-08-22 22:14:56.509 ns_memcached:0:info:message(ns_1@10.3.3.240) - Bucket "default" loaded on node 'ns_1@10.3.3.240' in 0 seconds.
2014-08-22 22:14:56.510 ns_memcached:0:info:message(ns_1@10.3.3.240) - Bucket "bucket0" loaded on node 'ns_1@10.3.3.240' in 0 seconds.
2014-08-22 22:14:56.538 ns_memcached:0:info:message(ns_1@10.3.3.239) - Shutting down bucket "default" on 'ns_1@10.3.3.239' for server shutdown
2014-08-22 22:14:57.705 menelaus_sup:1:info:web start ok(ns_1@10.3.3.239) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.239'. Version: "3.0.0-1174-rel-enterprise".
2014-08-22 22:14:57.782 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "bucket0" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:14:57.819 ns_memcached:0:info:message(ns_1@10.3.3.239) - Bucket "default" loaded on node 'ns_1@10.3.3.239' in 0 seconds.
2014-08-22 22:15:00.409 ns_memcached:0:info:message(ns_1@10.3.3.240) - Shutting down bucket "bucket0" on 'ns_1@10.3.3.240' for server shutdown
2014-08-22 22:15:00.693 ns_memcached:0:info:message(ns_1@10.3.3.240) - Shutting down bucket "default" on 'ns_1@10.3.3.240' for server shutdown
2014-08-22 22:15:00.917 menelaus_sup:1:info:web start ok(ns_1@10.3.3.240) - Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.240'. Version: "3.0.0-1174-rel-enterprise".
2014-08-22 22:15:00.972 ns_memcached:0:info:message(ns_1@10.3.3.240) - Bucket "default" loaded on node 'ns_1@10.3.3.240' in 0 seconds.

This can cause replicators to break, so I'm going to mark this as a duplicate of MB-12075.




[MB-11470] fdb_seek can't seek backwards Created: 18/Jun/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: 2.5.1
Fix Version/s: bug-backlog
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Jens Alfke Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If fdb_seek is called with a key that collates before its next key, it appears to have no effect. In other words, it can't seek backwards.

Couchbase Lite does need to do random-access seeks when a view query is given an array of keys. The keys can be given in any order, and the results need to be returned in the same order.

Currently I'm working around this by detecting when the seek would be backwards, and instead closing the iterator and opening a new one. I don't know how much more expensive this is; if it would be a lot faster to allow the iterator to seek backwards, then this might be worth implementing.




[MB-11071] Support multiple KV instances in a single database file Created: 08/May/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Chiyoung Seo Assignee: Jung-Sang Ahn
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
In some use cases, applications need to create lots of KV instances and read/write items from/to those instances. If we create a separate database file for each KV instance, there will be lots of files to be opened, which causes too many file descriptors to be used, or we should open / close each file repeatedly, which incurs lots of additional overhead (e.g.,open / close system call, reading a database header, reading btree nodes again, etc.).

To address this issue, we need to support multiple KV instances in a single database file. This will allow us to manage our own buffer cache more efficiently and increase a write batch size significantly, which improve the write throughput a lot.

In addition, this will help us improve the ep-engine flusher write throughput when we switch from Couchstore to ForestDB, because we can put mulitple VBuckets in a single ForestDB file (i.e., a single shard file).





[MB-11620] Database encryption Created: 02/Jul/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: 2.5.1
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Jens Alfke Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: security
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
A number of current/potential Couchbase Lite developers want to be able to encrypt their databases. (This is a requirement in some areas like healthcare, and strongly desired by financial institutions.) We'd like to have a way to do this with ForestDB.

The desired behavior is that a symmetric key would be provided when opening the database, and all file I/O would be encrypted using that key (transparently, without affecting access via the public API.)




[MB-11847] Warmup stats ep_warmup_estimated_value_count returns "unknown" Created: 29/Jul/14  Updated: 27/Aug/14  Resolved: 15/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Minor
Reporter: Venu Uppalapati Assignee: Abhinav Dangeti
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 973

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Warmup stats ep_warmup_estimated_value_count returns "unknown"
ep_warmup_estimated_key_count: 99999
ep_warmup_estimated_value_count: unknown


 Comments   
Comment by Abhinav Dangeti [ 30/Jul/14 ]
I'm pretty sure, the ep_warmup_estimated_value_count was unknown, when ep_warmup_value_count were zero, i.e no values were actually loaded post warmup.
Comment by Venu Uppalapati [ 30/Jul/14 ]
This happens even when ep_warmup_value_count is non-zero for example:
ep_warmup_estimated_key_count: 99999
ep_warmup_estimated_value_count: unknown
ep_warmup_value_count: 76077
we will likely see this in DGM scenario.
Comment by Abhinav Dangeti [ 01/Aug/14 ]
http://review.couchbase.org/#/c/40200/
Comment by Abhinav Dangeti [ 01/Aug/14 ]
Merged.
Comment by Venu Uppalapati [ 05/Aug/14 ]
In build 1105 see ep_warmup_estimated_value_count is 0. The bucket under test has full eviction policy.
./cbstats localhost:11210 raw warmup
ep_warmup: enabled
 ep_warmup_dups: 0
 ep_warmup_estimate_time: 36551
 ep_warmup_estimated_key_count: 99999
 ep_warmup_estimated_value_count: 0
 ep_warmup_item_expired: 0
 ep_warmup_key_count: 99999
 ep_warmup_keys_time: 130188
 ep_warmup_min_item_threshold: 100
 ep_warmup_min_memory_threshold: 100
 ep_warmup_oom: 0
 ep_warmup_state: done
 ep_warmup_thread: complete
 ep_warmup_time: 458606
 ep_warmup_value_count: 99999
Comment by Abhinav Dangeti [ 05/Aug/14 ]
http://review.couchbase.org/#/c/40304/
Comment by Abhinav Dangeti [ 05/Aug/14 ]
Merged.
Comment by Venu Uppalapati [ 27/Aug/14 ]
verified in full eviction mode




[MB-11884] warmup_min_items_threshold is not honored under metadata eviction policy Created: 05/Aug/14  Updated: 27/Aug/14  Resolved: 15/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Venu Uppalapati Assignee: Abhinav Dangeti
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Steps to reproduce:
1)Configure bucket with value eviction policy and set warmup_min_items_threshold to 10% using ns_server's REST api.
2)Restart the node for warmup and observe below stats:
curr_items_tot:99999
ep_warmup_value_count:10004
3)change the eviction policy to metadata eviction, the bucket warmup runs again
the stats are now as follows:
curr_items_tot:99999
ep_warmup_value_count:99999

complete warmup stats for second warmup:
ep_warmup: enabled
 ep_warmup_dups: 0
 ep_warmup_estimate_time: 54464
 ep_warmup_estimated_key_count: 99999
 ep_warmup_estimated_value_count: 0
 ep_warmup_item_expired: 0
 ep_warmup_key_count: 99999
 ep_warmup_keys_time: 101946
 ep_warmup_min_item_threshold: 10
 ep_warmup_min_memory_threshold: 100
 ep_warmup_oom: 0
 ep_warmup_state: done
 ep_warmup_thread: complete
 ep_warmup_time: 357686
 ep_warmup_value_count: 99999


 Comments   
Comment by Abhinav Dangeti [ 05/Aug/14 ]
http://review.couchbase.org/#/c/40308
Comment by Abhinav Dangeti [ 05/Aug/14 ]
Merged.
Comment by Venu Uppalapati [ 27/Aug/14 ]
verified




[MB-11867] A method to find the position of a key in a view. Created: 01/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.5.1, 3.0.1
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Patrick Varley Assignee: Nimish Gupta
Resolution: Done Votes: 0
Labels: community
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This came up in the #couchbase IRC channel today:

If you have a game where players are in different zones (US, EU and Asia), there is no simple way to find the player's position base on score in a given zone.

 Comments   
Comment by Patrick Varley [ 04/Aug/14 ]
Using a reduce and count this is possible:

http://blog.couchbase.com/using-map-and-reduce-view-ranking




[MB-11187] V8 crashes on memory allocation errors, closes erlang on some indexing loads Created: 22/May/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.2.0, 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Brent Woodruff Assignee: Wayne Siu
Resolution: Unresolved Votes: 0
Labels: hotfix
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
On some indexing workloads, V8 can experience issues allocating memory. This, in turn, will cause Erlang to close resulting in the node becoming pending while the babysitter restarts the main Erlang VM.

This can occur even when there is sufficient memory available on the node. The node does not need to experience out-of-memory for this to happen.

To diagnose if this is occurring, it's possible to check logs for a few messages.

babysitter log:

====
[ns_server:info,2014-05-16T20:46:44.417,babysitter_of_ns_1@127.0.0.1:<0.71.0>:ns_port_server:log:168]ns_server<0.71.0>:
ns_server<0.71.0>: #
ns_server<0.71.0>: # Fatal error in CALL_AND_RETRY_2
ns_server<0.71.0>: # Allocation failed - process out of memory
ns_server<0.71.0>: #
ns_server<0.71.0>:

[ns_server:info,2014-05-16T20:46:44.744,babysitter_of_ns_1@127.0.0.1:<0.71.0>:ns_port_server:log:168]ns_server<0.71.0>: /opt/couchbase/lib/erlang/lib/os_mon-2.2.7/priv/bin/memsup: Erlang has closed.Erlang has closed
ns_server<0.71.0>:

[ns_server:info,2014-05-16T20:46:44.745,babysitter_of_ns_1@127.0.0.1:<0.70.0>:supervisor_cushion:handle_info:58]Cushion managed supervisor for ns_server failed: {abnormal,134}
[error_logger:error,2014-05-16T20:46:44.745,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.71.0> terminating
** Last message in was {#Port<0.2943>,{exit_status,134}}
** When Server state == {state,#Port<0.2943>,ns_server,
                               {[[],
                                 "/opt/couchbase/lib/erlang/lib/os_mon-2.2.7/priv/bin/memsup: Erlang has closed.Erlang has closed ",
                                 [],"#",
                                 "# Allocation failed - process out of memory",
                                 "# Fatal error in CALL_AND_RETRY_2","#",[],
                                 "working as port","working as port",
                                 "Apache CouchDB has started. Time to relax.",
                                 "Apache CouchDB 1.2.0a-01dda76-git (LogLevel=info) is starting.",
                                 empty],
                                [empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty,empty,empty,empty,empty,
                                 empty,empty,empty]},
                               {ok,{1400273204926,#Ref<0.0.0.18660>}},
                               [[],
                                "/opt/couchbase/lib/erlang/lib/os_mon-2.2.7/priv/bin/memsup: Erlang has closed.Erlang has closed "],
                               0,true}
** Reason for termination ==
** {abnormal,134}
====

One can check for the latest occurrence of this error (or for all occurrences by removing the tail -1) with this command:

$ /opt/couchbase/bin/cbbrowse_logs babysitter | awk '/\]ns_server<.*>: $/,/# Allocation failed - process out of memory/' | grep -E -o '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]*:[0-9]{2}:[0-9]{2}'
2014-05-16T20:46:44


 Comments   
Comment by Sarath Lakshman [ 23/May/14 ]
Backported changes for v8 version upgrade.

Following changes are under review:
http://review.couchbase.org/#/c/37506/
http://review.couchbase.org/#/c/37507/
Comment by Sarath Lakshman [ 23/May/14 ]
Merged back ported changes
Comment by Sarath Lakshman [ 23/May/14 ]
I have sent a mail to build team for providing the build with the following changes:

couchdb commit (https://github.com/couchbase/couchdb/commit/505c2278b34eb7a47843d5017d101e98aa856d6a)

v8 version change (http://review.couchbase.org/#/c/32781/2/override-3.0.0.xml)
Comment by Sriram Melkote [ 29/May/14 ]
Wayne - it's good to have some QE coverage for this as it's shipping to a production customer. Can Ketaki or Meenakshi please validate and also do some additional sanity testing on this build? Thanks!
Comment by Sarath Lakshman [ 29/May/14 ]
Brent, Could you help us verify this patch using the reproducible setup you have ?
Comment by Sarath Lakshman [ 29/May/14 ]
No. We should wait for QE team to certify this fix before rollout to the customer.
Comment by Wayne Siu [ 29/May/14 ]
Brent,
We still need to run the regression tests on the hotfix. I'll update the ticket with an ETA on Monday.
In a mean time, if the customer could help verify the hotfix, it will be ok with the expectation that regression tests are still in progress.
Comment by Wayne Siu [ 09/Jun/14 ]
Brent,
We can provide the ubuntu 10.4 package. We'll run a quick sanity on the binary, and update the ticket here later. Will shoot for later today.
Comment by Wayne Siu [ 12/Jun/14 ]
The 10.4 package passed the sanity tests.
Comment by Wayne Siu [ 20/Jun/14 ]
Brent,
Please let us know if we could close this ticket.
Comment by Brent Woodruff [ 20/Jun/14 ]
I believe it would be ok to close this MB. The backporting work has been completed, the builds required have been made and tested, and the updated files were provided.
Comment by Wayne Siu [ 31/Jul/14 ]
Brent,
Let us know if there is any open item at this time.
Comment by Brent Woodruff [ 01/Aug/14 ]
I don't have any outstanding items regarding this ticket, thanks.
Comment by Volker Mische [ 27/Aug/14 ]
Wayne, I think this one is ready to be closed.




[MB-11808] GeoSpatial in 3.0 Created: 24/Jul/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server, UI, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Melkote Assignee: Volker Mische
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We must hide GeoSpatial related UI elements in 3.0 release, as we have not completed the task of moving GeoSpatial features over to UPR.

We should use the simplest way to hide elements (like "display:none" attribute) because we fully expect to resurface this in 3.0.1


 Comments   
Comment by Sriram Melkote [ 24/Jul/14 ]
In the 3.0 release meeting, it was fairly clear that we won't be able to add Geo support for 3.0 due to the release being in Beta phase now and heading to code freeze soon. So, we should plan for it in 3.0.1 - updating description to reflect this.
Comment by Volker Mische [ 27/Aug/14 ]
No longer needed, spatial views are in 3.0 now :)




[MB-9720] kickoff indexing on view creation Created: 11/Dec/13  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.2.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Jon Adams Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We have data and then we programmatically create views via CouchDB directly, and then our API hits the views with stale=true which produces zero results (which makes sense because we didn't index data yet)

It seems interesting that in this state (data exists, views just created, request made to view that has not yet indexed and returns zero results) that CB does not detect the view has never indexed and should at least kickoff index once. Note: I'm not saying the client should trigger the index (e.g. update_after) as it has specified stale is ok. But, it seems like the server should detect it has never indexed data at all for a view a client has attempted to utilize and assume it should probably index something. Thoughts?

 Comments   
Comment by Perry Krug [ 12/Dec/13 ]
I'm not sure I would agree in the general case Jon. Certainly there is merit to doing this, but the downside would be forcing an index to take place beyond the control of the user. Especially on very large datasets, it may be desirable to actually delay the indexing itself to prevent undo load on the system. It doesn't sound too onerous for the external user/client/application to upload a view and then trigger an index update (with stale=update_after likely) as part of the workflow which gives everyone the control to decide when or when not to do it.

Would be interested in hearing more about your application and what you've been doing with CouchDB...feel free to email me on the side at perry@couchbase.com
Comment by Volker Mische [ 27/Aug/14 ]
When you publish your view as a production view and you have more than 5000 documents (that's the default value) then index will be created by the automatic updater.




[MB-9784] too large emit value returns no response Created: 20/Dec/13  Updated: 27/Aug/14  Due: 20/Jun/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.5.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Minor
Reporter: Iryna Mironava Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.5.0-1011-rel
<manifest><remote name="couchbase" fetch="git://github.com/couchbase/"/><remote name="membase" fetch="git://github.com/membase/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="db49bf5d4e601c5994f8bd7f61ca6cff6840af5d"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="0a3a9df0a55d759b5b78a3a7d001a97a4d35af1c"/><project name="cbsasl" path="cbsasl" revision="578523010d4efaa9fed1a32880c67bfb03c20728"/><project name="couchbase-cli" path="couchbase-cli" revision="b8c21c7462e3b45cc0c69259547613a4c45b6be3" remote="couchbase"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="f14c0f53b633b5313eca1ef64b0f241330cf02c4"/><project name="couchdb" path="couchdb" revision="01dda76eab9edb6b64490c524ccdaf8e5a8b655b"/><project name="couchdbx-app" path="couchdbx-app" revision="cc4fe0884faeebbb36a45fcdf6a072d736b0ca5d"/><project name="couchstore" path="couchstore" revision="30f8f0872ef28f95765a7cad4b2e45e32b95dff8"/><project name="ep-engine" path="ep-engine" revision="7c2254cd57a987d087c092897fa35bc2e4833039"/><project name="geocouch" path="geocouch" revision="000096996e57b2193ea8dde87e078e653a7d7b80"/><project name="healthchecker" path="healthchecker" revision="829e18598bfef537263c0cf014420d499a467a7d"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ea579a523ca3af872c292b1e33d800e3649a8892" remote="membase"/><project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/><project name="memcached" path="memcached" revision="639cd3ee86d7a72f1a00d01c51fc49b5966f7f2d" remote="membase"/><project name="moxi" path="moxi" revision="2b5a228f58fcfd1a836d6ad9a8f279b4f0ebfe80"/><project name="ns_server" path="ns_server" revision="7f17d27b0b971710c93b3fe1ef553fec83ae1e17"/><project name="portsigar" path="portsigar" revision="2204847c85a3ccaecb2bb300306baf64824b2597"/><project name="sigar" path="sigar" revision="a402af5b6a30ea8e5e7220818208e2601cb6caba"/><project name="testrunner" path="testrunner" revision="67b9b374b552312b6addf3fd4a3c17891450eb7b"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="674fcd94a8a0a3595f64e13762ba3a6529e09926" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/><!--
  <project name="voltron" path="voltron" revision="20c2d314ad110bd4d301c44f12c22c7ae6365596" />
  --></manifest>

Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-9784/447a45ae/10.3.121.110-12202013-53-diag.zip

 Description   
Development view with too large emitted value returns emty resonse for development subset queries, and empty for full set
If perform query from UI, it redirects to page with login, and need to login to application again

iryna@ubuntu:~/couchbase/testrunner$ curl -v 'http://10.3.121.110:8092/default/_design/dev_large&#39;
curl: /usr/local/lib/libcurl.so.4: no version information available (required by curl)
* About to connect() to 10.3.121.110 port 8092 (#0)
* Trying 10.3.121.110... connected
* Connected to 10.3.121.110 (10.3.121.110) port 8092 (#0)
> GET /default/_design/dev_large HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.21.4 OpenSSL/1.0.1 zlib/1.2.3.4
> Host: 10.3.121.110:8092
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Couchbase-Meta: {"id":"_design/dev_large","rev":"2-a9936869","type":"json"}
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Fri, 20 Dec 2013 12:52:19 GMT
< Content-Type: application/json
< Content-Length: 196
< Cache-Control: must-revalidate
<
* Connection #0 to host 10.3.121.110 left intact
* Closing connection #0
{"views":{"large":{"map":"function (doc, meta) {\n var val_test = 'test';\n while (val_test.length < 700000000) {\n val_test = val_test.concat(val_test);\n }\n emit(meta.id, val_test);}"}}}


iryna@ubuntu:~/couchbase/testrunner$ curl 'http://10.3.121.110:8092/default/_design/dev_large/_view/large?stale=false&connection_timeout=60000&limit=10&skip=0&#39;
curl: /usr/local/lib/libcurl.so.4: no version information available (required by curl)
curl: (52) Empty reply from server
iryna@ubuntu:~/couchbase/testrunner$ curl -v 'http://10.3.121.110:8092/default/_design/dev_large/_view/large?full_set=true&connection_timeout=60000&limit=10&skip=0&#39;
curl: /usr/local/lib/libcurl.so.4: no version information available (required by curl)
* About to connect() to 10.3.121.110 port 8092 (#0)
* Trying 10.3.121.110... connected
* Connected to 10.3.121.110 (10.3.121.110) port 8092 (#0)
> GET /default/_design/dev_large/_view/large?full_set=true&connection_timeout=60000&limit=10&skip=0 HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.21.4 OpenSSL/1.0.1 zlib/1.2.3.4
> Host: 10.3.121.110:8092
> Accept: */*
>
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Fri, 20 Dec 2013 13:02:10 GMT
< Content-Type: text/plain;charset=utf-8
< Cache-Control: must-revalidate
<
{"total_rows":0,"rows":[
]
}
* Connection #0 to host 10.3.121.110 left intact
* Closing connection #0
iryna@ubuntu:~/couchbase/testrunner$


 Comments   
Comment by Filipe Manana [ 20/Dec/13 ]
Similar answer to the other ticket.

This message is only implemented for production views in 2.5 and below. Your query example is doing it for a dev view (and not full_set=true).
For 3.0 that's not the case anymore.

About the UI issue, I don't know, not my area.
Comment by Volker Mische [ 24/Jun/14 ]
The test uses such a big emit that it crashes v8 on my machine (the emitted value is > 1GB). With one zero less, I get the correct mapreduce error in the logs and the result is empty as expected. I think the issue can be closed.
Comment by Volker Mische [ 27/Aug/14 ]
As per previous comment. It works as expected unless you've insanely large emits that crash V8. Hence I close it as a won't fix.




[MB-9160] Add `include_ids` parameter to view query api with reduce support Created: 22/Sep/13  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.1.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Jon Adams Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It would be awesome if there was a simple `include_ids` parameter for use in the context of views w/ complex/custom map/reduce. ID is included when not reducing but it would be really powerful to be able to get a collection of document ids that a reduced result is made up of. Thoughts?

 Comments   
Comment by Filipe Manana [ 23/Sep/13 ]
Can you give a practical example where that would be useful?
thanks
Comment by Jon Adams [ 23/Sep/13 ]
Let's say you have documents for repositories and and a languages array with the language as the key and number of lines as the value.

{
"name": "couchycouch",
"age": 3
"languages": [
 "Erlang": 392,
 "JavaScript" :185,
]}

And you wrote a view with a compound index [age,language] and value is the number emitted. for simplicity we use one of the built in reduce functions.

Now, we could query the view and get the reduces results, which is great. I am able to group results by age and/or language. However, if one result is particularly interesting, and I'd like to also understand which docs were included that made up that result it would be nice to hit that same query with a new parameter to also include (an array?) the document ids for all docs that matched or made up that particular row. of the reduced result. For example, it's nice that I know I have over 9,000 lines of perl that 10 years old but which documents made up that result (without having to write a new view)?
Comment by Filipe Manana [ 23/Sep/13 ]
Sorry but that's not possible.

Keeping track of the document IDs that contributed to a reduction value would explode the data structures (B+trees) as it means the IDs would have to be stored together with every full or partial reduce value stored in a B+Tree node.

If want to learn more about the data structures, see http://guide.couchdb.org/draft/views.html, it applies to Couchbase too.
Comment by Jon Adams [ 23/Sep/13 ]
nooooooooooooo
Comment by Jon Adams [ 23/Sep/13 ]
;) Thanks for the link. I'll just write some views.
Comment by Volker Mische [ 27/Aug/14 ]
As Filipe mentions in a comment, it's not possible. The original but reported agreed and said he'll use normal views instead.




[MB-10138] Fix spatial views Created: 06/Feb/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
Currently the spatial views are broken on 3.0. Fix them.

 Comments   
Comment by Sriram Melkote [ 01/Apr/14 ]
We can bring it back to 3.0 if we make good progress. With currently discussed timelines, it appears unlikely and so, we should plan this for the dot release immediately after 3.0
Comment by Volker Mische [ 27/Aug/14 ]
Spatial views work again on current 3.0.




[MB-8746] Set view based spatial views Created: 01/Aug/13  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   
Make the spatial views work like the mapreduce set views (aka b-superstar).

 Comments   
Comment by Volker Mische [ 01/Aug/13 ]
I closed the old CBD based issue in favor of creating this new (clearer) one.
Comment by Sriram Melkote [ 15/Apr/14 ]
As 3.0 is feature complete, we will need to address this in the next minor release
Comment by Volker Mische [ 15/Apr/14 ]
We'll so how long 3.0 will be delayed. I got the OK from Ravi ages ago to consider it a bug in order to get it in. Though I'm not even sure if I can meet that deadline.
Comment by Volker Mische [ 27/Aug/14 ]
3.0 now uses set view based spatial views.




[MB-5733] Stale does not mean what it should Created: 28/Jun/12  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0-developer-preview-4
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: d3fault Assignee: Dipti Borkar
Resolution: Fixed Votes: 0
Labels: usability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: n/a

Triage: Untriaged

 Description   
Views are updated asynchronously. There is no atomic guarantee that after I insert an item, it will show up in affecting Views if I query them immediately after. Currently, it will only show up when the value hits the hard disk. Stale=false currently means "among all the values that have already hit the disk, do not give me stale data". I have been reading your forums and a lot of users seem to be getting confused by this.

I would suggest one of the two things:
1) make stale=false force get everything in the working set. i don't see why you have to wait until the value hits the disk before you are able to include it in a view... but it probably has to do with something internal. perhaps a redesign? also: what happens if i use a memcached bucket? can you even use views? there is no persist stage with memcached buckets so when would you be able to query the view? (i have no current interest in using memcached buckets but am just wondering)
2) stop using the word stale altogether. change it to something along the lines of "not outdated but already persisted data" (that is what your current implementation of stale means)

 Comments   
Comment by d3fault [ 28/Jun/12 ]
Also, I have heard about 'observe' already. While I think it will help the problem somewhat and definitely is handy functionality, a clarification of the word 'stale' will go a lot further in terms of improving usability.
Comment by Dipti Borkar [ 28/Jun/12 ]
thank you for the feedback d3fault
Comment by d3fault [ 01/Feb/13 ]
I would like to remove #1 from the two suggestions. After more thought, I *think* it is impossible to perform because of the CAP theorem requirements. Go with #2 :-P
Comment by Volker Mische [ 27/Aug/14 ]
In 3.0 stale=false also includes the in-memory items.




[MB-9667] 2.2 EE node does not stay running on Windows 7 Created: 03/Dec/13  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.2.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Task Priority: Major
Reporter: Mel Boulos Assignee: Mel Boulos
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7, 2.2 EE


 Description   
I'm working with Ubisoft who is trying to install 2.2 EE on windows 7. This is a single node install on a desktop. The install completes, they are saying the node does not stay running and they are not able to maintain a connection to the admin console. I had them run cbcollect, this is what I see if logs. I've attached the logs, just in case you wanted to see the entire log.

Diag.log
** this error prints throughout the entire log. According to my research this is expected when a new bucket is created, once babysitting restarts the moxi should have the new bucket map. It doesn't look the bucket map is updated so it keeps -retrying.

mb_master:0:info:message(ns_1@127.0.0.1) - I'm the only node, so I'm the master.
2013-11-27 10:18:11.250 menelaus_sup:1:info:web start ok(ns_1@127.0.0.1) - Couchbase Server has started on web port 8091 on node 'ns_1@127.0.0.1'.
2013-11-27 10:18:11.266 ns_log:0:info:message(ns_1@127.0.0.1) - Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: WARNING: curl error: transfer closed with outstanding read data remaining from: http://127.0.0.1:8091/pools/default/saslBucketsStreaming
EOL on stdin. Exiting
2013-11-27 10:18:11.578

ns_server.debug.log
** I initially thought their disk was full. They verified they have 90 GB free.

=========================CRASH REPORT=========================
  crasher:
    initial call: disksup:init/1
    pid: <0.22663.0>
    registered_name: disksup
    exception exit: {badarith,[{disksup,check_disks_win32,2},
                               {disksup,check_disks_win32,2},
                               {disksup,handle_info,2},
                               {gen_server,handle_msg,5},
                               {proc_lib,init_p_do_apply,3}]}
      in function gen_server:terminate/6
    ancestors: [os_mon_sup,<0.22561.0>]
    messages: [{'$gen_call',{<0.22549.0>,#Ref<0.0.3.192839>},
                               get_disk_data}]
    links: [<0.22562.0>]
    dictionary: [{{disk_almost_full,"D:\\"},set}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 730

ns_server.babysitting.log
[ns_server:info,2013-11-28T10:03:24.713,babysitter_of_ns_1@127.0.0.1:<0.84.0>:ns_port_server:log:168]memcached<0.84.0>: Thu Nov 28 10:03:24.510621 Est 3: (default) Failed to read from mccouch: "Unknown error"
memcached<0.84.0>: Thu Nov 28 10:03:24.510621 Est 3: (default) Resetting connection to mccouch, lastReceivedCommand = notify_vbucket_update lastSentCommand = notify_vbucket_update currentCommand =unknown
memcached<0.84.0>: Thu Nov 28 10:03:24.510621 Est 3: (default) Trying to connect to mccouch: "127.0.0.1:11213"

[ns_server:info,2013-11-28T10:03:24.728,babysitter_of_ns_1@127.0.0.1:<0.75.0>:ns_port_server:log:168]ns_server<0.75.0>: win32sysinfo:Erlang has closed.fatal error: runtime: cannot reserve arena virtual address space
ns_server<0.75.0>: in message_loop
ns_server<0.75.0>: win32sysinfo:Erlang has closed.in message_loop
ns_server<0.75.0>: win32sysinfo:Erlang has closed.fatal error: runtime: cannot reserve arena virtual address space
ns_server<0.75.0>: in message_loop
ns_server<0.75.0>: win32sysinfo:Erlang has closed.fatal error: runtime: cannot reserve arena virtual address space
ns_server<0.75.0>: in message_loop
ns_server<0.75.0>: win32sysinfo:Erlang has closed.fatal error: runtime: cannot reserve arena virtual address space


 Comments   
Comment by Bin Cui [ 03/Dec/13 ]
" win32sysinfo:Erlang has closed.fatal error: runtime: cannot reserve arena virtual address space"

By default, installer will put install everything under c:\progrom files. I suspect that they don't have much space on c drive. Maybe they can try on d drive or find another machine and see what happens.
Comment by Michael Catanzariti [ 15/Dec/13 ]
Hi Bin Cui,

As mentioned to Mel, my C drive is far from beeing full (252 Gb free out of 399 Gb).




Allow for dynamic change of the number of connections (MB-11066)

[MB-11924] Add 'max_connections' setting to UI / REST. Created: 11/Aug/14  Updated: 27/Aug/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Technical task Priority: Major
Reporter: Dave Rigby Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We want to be able to change the maximum number of connections in memcached dynamically. We have MB-11066 (parent of this) to add the support in memcached to allow this, but we also need an interface for the user to do this.

I'll leave it up to ns_server to determine the best way to show this in the UI (but one suggestion is to add it to the "internalSettings" window).

For the ns_server -> memcached interface, we have added a new pair of binary protocol messages - IOCTL_GET / IOCTL_SET (see http://review.couchbase.org/#/c/39608/ ).

For the maximum connections setting, the proposal for IOCTL_GET / IOCTL_SET is:

key: max_conns_on_port_XXXX

    XXXX is the port number to change, e.g. "max_conns_on_port_11210"

IOCTL_GET: returns a nil-terminated string (in the response body) specifying the current connection limit.

IOCT_SET: value (body) is a nul-terminated string specify the new connection limit.

    responses:

- PROTOCOL_BINARY_RESPONSE_SUCCESS - New value accepted, limit has been changed.
- PROTOCOL_BINARY_RESPONSE_KEY_ENOENT - Invalid key
- PROTOCOL_BINARY_RESPONSE_EACCESS - User is not authenticated as admin.
- PROTOCOL_BINARY_RESPONSE_EINVAL - Value was not in the correct format (string, convertible to non-zero, non-negative).


 Comments   
Comment by Aleksey Kondratenko [ 11/Aug/14 ]
There's very generic ticket somewhere about "settings for everything UI" we can make it part of that work. In which case expect it done in next decade :)

But we can also implement it in less pretty way. E.g. diag-eval-lable config entry or internal settings.
Comment by Dave Rigby [ 12/Aug/14 ]
I think internal settings (and I assume implicitly a REST endpoint) should be fine.

Ideally, upon changing the maximum ns_server would both send the IOCTL_SET message to all memcached instances, *and* update the memcached.json config file so the change persists across restarts.

Comment by Dave Rigby [ 27/Aug/14 ]
@Alk: The memcached changes have been merged to the 3.0.1 branch - see http://review.couchbase.org/#/c/40751/ Please let me know if you need any more info to be able to implement this.




[MB-11066] Allow for dynamic change of the number of connections Created: 07/May/14  Updated: 27/Aug/14

Status: In Progress
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0, 2.5.1, 3.0
Fix Version/s: techdebt-backlog, 3.0.1
Security Level: Public

Type: Improvement Priority: Major
Reporter: Trond Norbye Assignee: Dave Rigby
Resolution: Unresolved Votes: 0
Labels: #memcached
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-11064 Increase default memcached connection... Open
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-11924 Add 'max_connections' setting to UI /... Technical task Open Aleksey Kondratenko  
MB-11949 Document which memcached.json setting... Technical task Resolved Trond Norbye  

 Description   
The number of connection should not be a preconfigured value, but something we can change "on the fly". There is no good reason why it is currently kept in a fixed array today (except the possibility to easy dump all connection objects in a debugger)

 Comments   
Comment by Dave Rigby [ 02/Jul/14 ]
http://review.couchbase.org/#/c/39018/
Comment by Wayne Siu [ 23/Jul/14 ]
Fix merged to 3.0.1.
Comment by Dave Rigby [ 24/Jul/14 ]
This isn't complete - the patch merged to 3.0.1 (ab7d44a) is only the groundwork to support this, there isn't yet code committed to actually allow the user to change the connection count.
Comment by Matt Ingenthron [ 11/Aug/14 ]
Related to this, should we better define the behavior when number of connections are exhausted? Right now IIRC, it's that the connection will be established and immediately dropped, which is hard for a client library or another part of the system to distinguish between network problems and a server that's exhausted it's connections.
Comment by Dave Rigby [ 11/Aug/14 ]
@Matt: That's a good point, but essentially orthogonal to this issue - can you create a separate MB to track that request?
Comment by Matt Ingenthron [ 11/Aug/14 ]
It's not fully orthogonal and I've filed it before, but since I can't find it I'll file it now.
Comment by Trond Norbye [ 11/Aug/14 ]
Its been a while since I checked, but I believe that we at one point stopped accepting the clients when we hit the limit and then started accepting again when we fell below a watermark (but it could be that we reverted that; I'd have to check).

BUT this is another thing that we should have in our system: the UI should give more warnings when bad things are about to happen (or happened)..
Comment by Trond Norbye [ 11/Aug/14 ]
I just looked at the file and the logic I described only happens for EMFILE...
Comment by Matt Ingenthron [ 11/Aug/14 ]
I opened MB-11926 to cover the "reliable, understandable behavior when hit max connections" side of this.
Comment by Dave Rigby [ 12/Aug/14 ]
memcached changes: http://review.couchbase.org/#/c/40531/
Comment by Dave Rigby [ 13/Aug/14 ]
As discussed with Alk and agreed with Trond, going with an alternative mechanism - will add two new binary protocol commands - CONFIG_VERIFY and CONFIG_RELOAD.

* CONFIG_VERIFY passes a proposed new config to memcached in the body, which memcached will verify and return either SUCCESS or EINVAL if the config is not valid (for example attempting to change settings which cannot be changed dynamically).

* CONFIG_RELOAD instructs memcached to reload the memcached.json file from disk. Any modified parameters which are permitted to be changed dynamically will be updated.


Comment by Dave Rigby [ 20/Aug/14 ]
memcached changes (new approach):

http://review.couchbase.org/#/c/40749 - MB-11066: Refactor config parsing code to allow parse at arbitrary time
http://review.couchbase.org/#/c/40750 - MB-11066: Unit tests for config_parse
http://review.couchbase.org/#/c/40751 - MB-11066: Add support for dynamically reloading memcached JSON config
Comment by Dave Rigby [ 27/Aug/14 ]
memcached changes merged.




[MB-12078] JSON detection may not be correct at views Created: 27/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.5.1, 3.0.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Matt Ingenthron Assignee: Sriram Melkote
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Brett Lawson and I were reviewing something for encoding for client libraries and in the process we checked what JSON specs consider legal. There's this interesting bit:

http://tools.ietf.org/html/rfc7159#section-2

   JSON text is a sequence of tokens. The set of tokens includes six
   structural characters, strings, numbers, and three literal names.

   A JSON text is a serialized value. Note that certain previous
   specifications of JSON constrained a JSON text to be an object or an
   array. Implementations that generate only objects or arrays where a
   JSON text is called for will be interoperable in the sense that all
   implementations will accept these as conforming JSON texts.

This actually means that the numeric values stored by memcached protocol are valid JSON, though the view engine doesn't treat them that way. I believe they're detected as non-JSON at view engine. I'm not sure if this is still the case with 3.0, but I thought I should file this since the revelation that a sequence of digits is valid JSON may trigger some thoughts (or unit tests).

 Comments   
Comment by Brett Lawson [ 27/Aug/14 ]
View-engine actually handles this properly, although the document editor does not. A ticket has been opened on that instead.




[MB-11722] Remove the couch notifier Created: 14/Jul/14  Updated: 26/Aug/14  Resolved: 26/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Abhinav Dangeti
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: Related to https://www.couchbase.com/issues/browse/MB-11935
Is this a Regression?: Unknown

 Comments   
Comment by Abhinav Dangeti [ 01/Aug/14 ]
ns_server: http://review.couchbase.org/#/c/38834/
ep-engine: http://review.couchbase.org/#/c/40195/

Unit tests, and make simple-test pass for now.
Setting up a toy build.
Comment by Abhinav Dangeti [ 01/Aug/14 ]
ToyBuild: http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent58-3.0.0-toy-couchstore-x86_64_3.0.0-noMcCouch-toy.rpm
Comment by Chiyoung Seo [ 26/Aug/14 ]
The above changes were merged into the master branch.




[MB-12057] apparent deadlock in ep-engine/bucket-engine (was: node_in is in pending state/ unable to restart cb service there/Rebalance exited with reason {not_all_nodes_are_ready_yet ) Created: 24/Aug/14  Updated: 26/Aug/14  Resolved: 26/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1174

Triage: Triaged
Is this a Regression?: Unknown

 Description   
steps:
1)run data load ~12hours on source cluster http://172.23.105.156/
2) then start replication for all 4 buckets on destination nodes(172.23.105.159, 172.23.105.160, 172.23.105.206)
3) almost immediately after step#2 add 172.23.105.207 to destination cluster and rebalance


Rebalance exited with reason {not_all_nodes_are_ready_yet,
['ns_1@172.23.105.207']}
ns_orchestrator002 ns_1@172.23.105.159 10:49:49 - Sun Aug 24, 2014
Started rebalancing bucket UserInfo ns_rebalancer000 ns_1@172.23.105.159 10:48:49 - Sun Aug 24, 2014
Starting rebalance, KeepNodes = ['ns_1@172.23.105.159','ns_1@172.23.105.160',
'ns_1@172.23.105.206','ns_1@172.23.105.207'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
ns_orchestrator004 ns_1@172.23.105.159 10:48:49 - Sun Aug 24, 2014
Control connection to memcached on 'ns_1@172.23.105.207' disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
"src/mc_client_binary.erl"},
{line,
151}]},
{mc_client_binary,
select_bucket,
2,
[{file,
"src/mc_client_binary.erl"},
{line,
346}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
"src/ns_memcached.erl"},
{line,
1269}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
744}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
171}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} (repeated 1 times) ns_memcached000 ns_1@172.23.105.207 10:44:42 - Sun Aug 24, 2014
Control connection to memcached on 'ns_1@172.23.105.207' disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
"src/mc_client_binary.erl"},
{line,
151}]},
{mc_client_binary,
select_bucket,
2,
[{file,
"src/mc_client_binary.erl"},
{line,
346}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
"src/ns_memcached.erl"},
{line,
1269}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
744}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
171}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} ns_memcached000 ns_1@172.23.105.207 10:43:56 - Sun Aug 24, 2014
Rebalance exited with reason {not_all_nodes_are_ready_yet,
['ns_1@172.23.105.207']}


trying to restart 172.23.105.207 node(firewall is turn off there):

[root@centos-64-x64 logs]# /etc/init.d/couchbase-server status
couchbase-server is running
[root@centos-64-x64 logs]# /etc/init.d/couchbase-server restart
Stopping couchbase-server
^C
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a
                                                           [ OK ]
Starting couchbase-server [ OK ]
[root@centos-64-x64 logs]# /etc/init.d/couchbase-server status
couchbase-server is running
[root@centos-64-x64 logs]# /etc/init.d/couchbase-server stop
Stopping couchbase-serverNOTE: shutdown failed
{badrpc,nodedown}
                                                           [FAILED]
[root@centos-64-x64 logs]# /etc/init.d/couchbase-server start
couchbase-server is already started [WARNING]


cluster will be available a few hours
 

 Comments   
Comment by Andrei Baranouski [ 24/Aug/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12057/de39e575/172.23.105.159-8242014-1113-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12057/de39e575/172.23.105.160-8242014-1116-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12057/de39e575/172.23.105.206-8242014-1119-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12057/de39e575/172.23.105.207-8242014-1122-diag.zip
Comment by Aleksey Kondratenko [ 24/Aug/14 ]
Grabbing backtraces from memcached process on bad node might be very handy.
Comment by Andrei Baranouski [ 24/Aug/14 ]
[root@centos-64-x64 ~]# ps -ef| grep couch
root 3795 3469 0 12:17 pts/2 00:00:00 grep couch
498 27065 1 0 Aug20 ? 00:00:14 /opt/couchbase/lib/erlang/erts-5.10.4/bin/epmd -daemon
498 27102 1 0 Aug20 ? 00:01:05 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -- -root /opt/couchbase/lib/erlang -progname erl -- -home /opt/couchbase -- -smp enable -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 error_logger false -sasl sasl_error_logger false -hidden -name babysitter_of_ns_1@127.0.0.1 -setcookie nocookie -noshell -noinput -noshell -noinput -run ns_babysitter_bootstrap -- -couch_ini /opt/couchbase/etc/couchdb/default.ini /opt/couchbase/etc/couchdb/default.d/capi.ini /opt/couchbase/etc/couchdb/default.d/geocouch.ini /opt/couchbase/etc/couchdb/local.ini -ns_babysitter cookiefile "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie" -ns_server config_path "/opt/couchbase/etc/couchbase/static_config" -ns_server pidfile "/opt/couchbase/var/lib/couchbase/couchbase-server.pid" -ns_server cookiefile "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie-ns-server" -ns_server enable_mlockall false
498 27136 27102 3 Aug20 ? 02:52:05 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 -- -root /opt/couchbase/lib/erlang -progname erl -- -home /opt/couchbase -- -smp enable -setcookie nocookie -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 error_logger false -sasl sasl_error_logger false -nouser -run child_erlang child_start ns_bootstrap -- -smp enable -couch_ini /opt/couchbase/etc/couchdb/default.ini /opt/couchbase/etc/couchdb/default.d/capi.ini /opt/couchbase/etc/couchdb/default.d/geocouch.ini /opt/couchbase/etc/couchdb/local.ini
498 27171 27136 0 Aug20 ? 00:00:15 /opt/couchbase/lib/erlang/lib/os_mon-2.2.14/priv/bin/memsup
498 27172 27136 0 Aug20 ? 00:00:00 /opt/couchbase/lib/erlang/lib/os_mon-2.2.14/priv/bin/cpu_sup
498 27228 27102 0 Aug20 ? 00:17:42 /opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcached.json
498 31489 27136 0 10:36 ? 00:00:15 /opt/couchbase/lib/ns_server/erlang/lib/ns_server/priv/i386-linux-godu
[root@centos-64-x64 ~]# gdb -p 27102
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 27102
Reading symbols from /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp...done.
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libncurses.so.5
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 27130]
[New LWP 27129]
[New LWP 27128]
[New LWP 27127]
[New LWP 27126]
[New LWP 27125]
[New LWP 27124]
[New LWP 27123]
[New LWP 27122]
[New LWP 27121]
[New LWP 27120]
[New LWP 27119]
[New LWP 27118]
[New LWP 27117]
[New LWP 27116]
[New LWP 27115]
[New LWP 27114]
[New LWP 27113]
[New LWP 27112]
[New LWP 27111]
[New LWP 27110]
[New LWP 27109]
[New LWP 27108]
[New LWP 27107]
[New LWP 27106]
[New LWP 27105]
[New LWP 27104]
[New LWP 27103]
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libtinfo.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libtinfo.so.5
0x00007f4f4c9614f3 in select () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install couchbase-server-3.0.0-1174.x86_64
(gdb) thread app all bt

Thread 29 (Thread 0x7f4f4af7f700 (LWP 27103)):
#0 0x00007f4f4ce2954d in read () from /lib64/libpthread.so.0
#1 0x0000000000550641 in signal_dispatcher_thread_func (unused=<value optimized out>) at sys/unix/sys.c:2916
#2 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178974f0) at pthread/ethread.c:106
#3 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f4f4a0ff700 (LWP 27104)):
#0 0x00007f4f4ce2643c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005a7699 in ethr_cond_wait (cnd=<value optimized out>, mtx=<value optimized out>) at common/ethr_mutex.c:1368
#2 0x0000000000463e3f in erts_cnd_wait (unused=<value optimized out>) at beam/erl_threads.h:1788
#3 erts_smp_cnd_wait (unused=<value optimized out>) at beam/erl_smp.h:938
#4 sys_msg_dispatcher_func (unused=<value optimized out>) at beam/erl_trace.c:3286
#5 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975d0) at pthread/ethread.c:106
#6 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f4f4a57e700 (LWP 27105)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80138) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80138) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4bec9c40) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4bec9c40) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4bec9c40) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f4f496fe700 (LWP 27106)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80178) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80178) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4bec9d80) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4bec9d80) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4bec9d80) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f4f496dc700 (LWP 27107)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af801b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af801b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4bec9ec0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4bec9ec0) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4bec9ec0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 24 (Thread 0x7f4f496ba700 (LWP 27108)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af801f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af801f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca000) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca000) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca000) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f4f49698700 (LWP 27109)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80238) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80238) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca140) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca140) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca140) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f4f49676700 (LWP 27110)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80278) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80278) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca280) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca280) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca280) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f4f49654700 (LWP 27111)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af802b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af802b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca3c0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca3c0) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca3c0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f4f49632700 (LWP 27112)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af802f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af802f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca500) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca500) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca500) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 19 (Thread 0x7f4f49610700 (LWP 27113)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80338) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80338) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca640) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca640) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca640) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f4f495ee700 (LWP 27114)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80378) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80378) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca780) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca780) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca780) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f4f495cc700 (LWP 27115)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af803b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af803b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4beca8c0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4beca8c0) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4beca8c0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f4f495aa700 (LWP 27116)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af803f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af803f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4becaa00) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4becaa00) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4becaa00) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f4f49588700 (LWP 27117)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80438) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80438) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4becab40) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4becab40) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4becab40) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f4f49566700 (LWP 27118)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80478) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80478) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4becac80) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4becac80) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4becac80) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f4f49544700 (LWP 27119)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af804b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af804b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4becadc0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4becadc0) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4becadc0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f4f49522700 (LWP 27120)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af804f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af804f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f4f4becaf00) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f4f4becaf00) at beam/erl_async.c:371
#5 async_main (arg=0x7f4f4becaf00) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff178975b0) at pthread/ethread.c:106
#7 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f4f4d913700 (LWP 27121)):
#0 0x00007f4f4ce2a09d in waitpid () from /lib64/libpthread.so.0
#1 0x000000000054f158 in child_waiter (unused=<value optimized out>) at sys/unix/sys.c:2840
#2 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897520) at pthread/ethread.c:106
#3 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f4f48ebf700 (LWP 27122)):
#0 0x00007f4f4c95f253 in poll () from /lib64/libc.so.6
#1 0x000000000055a971 in check_fd_events (ps=0x7f4f4bdd4910, pr=0x7f4f48ebe320, len=0x7f4f48ebeb3c, utvp=<value optimized out>) at sys/common/erl_poll.c:2071
#2 erts_poll_wait_nkp (ps=0x7f4f4bdd4910, pr=0x7f4f48ebe320, len=0x7f4f48ebeb3c, utvp=<value optimized out>) at sys/common/erl_poll.c:2184
#3 0x000000000055d5c3 in erts_check_io_nkp (do_wait=<value optimized out>) at sys/common/erl_check_io.c:1183

#4 0x00000000004989d6 in scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a442340, rq=0x7f4f4a440b40) at beam/erl_process.c:2533
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a442340) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f4f484be700 (LWP 27123)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af805b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af805b8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a44c600, rq=0x7f4f4a440cc0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a44c600, rq=0x7f4f4a440cc0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a44c600) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f4f47abd700 (LWP 27124)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af805f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af805f8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a4568c0, rq=0x7f4f4a440e40) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a4568c0, rq=0x7f4f4a440e40) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a4568c0) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f4f470bc700 (LWP 27125)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80638) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80638) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a460b80, rq=0x7f4f4a440fc0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a460b80, rq=0x7f4f4a440fc0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a460b80) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f4f466bb700 (LWP 27126)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80678) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80678) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a46ae40, rq=0x7f4f4a441140) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a46ae40, rq=0x7f4f4a441140) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a46ae40) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 5 (Thread 0x7f4f45cba700 (LWP 27127)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af806b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af806b8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a475100, rq=0x7f4f4a4412c0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a475100, rq=0x7f4f4a4412c0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a475100) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f4f452b9700 (LWP 27128)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af806f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af806f8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a47f3c0, rq=0x7f4f4a441440) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a47f3c0, rq=0x7f4f4a441440) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a47f3c0) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f4f448b8700 (LWP 27129)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80738) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80738) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f4f4a489680, rq=0x7f4f4a4415c0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f4f4a489680, rq=0x7f4f4a4415c0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f4f4a489680) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#9 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f4f43eb7700 (LWP 27130)):
#0 0x00007f4f4c9652d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f4f4af80778) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f4f4af80778) at pthread/ethr_event.c:218
#3 0x0000000000498c8e in erts_tse_wait (unused=<value optimized out>) at beam/erl_threads.h:2710
#4 aux_thread (unused=<value optimized out>) at beam/erl_process.c:2272
#5 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fff17897600) at pthread/ethread.c:106
#6 0x00007f4f4ce22851 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f4f4c96890d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f4f4dafa700 (LWP 27102)):
#0 0x00007f4f4c9614f3 in select () from /lib64/libc.so.6
#1 0x0000000000550a70 in erts_sys_main_thread () at sys/unix/sys.c:3059
---Type <return> to continue, or q <return> to quit---
#2 0x00000000004490b9 in erl_start (argc=56, argv=<value optimized out>) at beam/erl_init.c:1775
#3 0x00000000004273a9 in main (argc=<value optimized out>, argv=<value optimized out>) at sys/unix/erl_main.c:29
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) q
A debugging session is active.

Inferior 1 [process 27102] will be detached.

Quit anyway? (y or n) Y
Detaching from program: /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp, process 27102
[root@centos-64-x64 ~]# gdb -p 27136
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 27136
Reading symbols from /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp...done.

warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libncurses.so.5
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 27174]
[New LWP 27167]
[New LWP 27166]
[New LWP 27165]
[New LWP 27164]
[New LWP 27163]
[New LWP 27162]
[New LWP 27161]
[New LWP 27160]
[New LWP 27159]
[New LWP 27158]
[New LWP 27157]
[New LWP 27156]
[New LWP 27155]
[New LWP 27154]
[New LWP 27153]
[New LWP 27152]
[New LWP 27151]
[New LWP 27150]
[New LWP 27149]
[New LWP 27148]
[New LWP 27147]
[New LWP 27146]
[New LWP 27145]
[New LWP 27144]
[New LWP 27143]
[New LWP 27142]
[New LWP 27141]
[New LWP 27140]
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libtinfo.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libtinfo.so.5
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /opt/couchbase/lib/couchdb/erlang/lib/mapreduce-1.0/priv/mapreduce_nif.so...done.
Loaded symbols for /opt/couchbase/lib/couchdb/erlang/lib/mapreduce-1.0/priv/mapreduce_nif.so
Reading symbols from /opt/couchbase/lib/libv8.so...done.
Loaded symbols for /opt/couchbase/lib/libv8.so
Reading symbols from /opt/couchbase/lib/erlang/lib/crypto-3.2/priv/lib/crypto.so...done.
Loaded symbols for /opt/couchbase/lib/erlang/lib/crypto-3.2/priv/lib/crypto.so
Reading symbols from /usr/lib64/libcrypto.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libcrypto.so.6
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /opt/couchbase/lib/erlang/lib/crypto-3.2/priv/lib/crypto_callback.so...done.
Loaded symbols for /opt/couchbase/lib/erlang/lib/crypto-3.2/priv/lib/crypto_callback.so
Reading symbols from /opt/couchbase/lib/couchdb/erlang/lib/ejson-0.1.0/priv/ejson.so...done.
Loaded symbols for /opt/couchbase/lib/couchdb/erlang/lib/ejson-0.1.0/priv/ejson.so
Reading symbols from /opt/couchbase/lib/couchdb/erlang/lib/snappy-1.0.4/priv/snappy_nif.so...done.
Loaded symbols for /opt/couchbase/lib/couchdb/erlang/lib/snappy-1.0.4/priv/snappy_nif.so
Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
Reading symbols from /opt/couchbase/lib/couchdb/erlang/lib/couch-1.2.0a-961ad59-git/priv/lib/couch_icu_driver.so...done.
Loaded symbols for /opt/couchbase/lib/couchdb/erlang/lib/couch-1.2.0a-961ad59-git/priv/lib/couch_icu_driver.so
Reading symbols from /opt/couchbase/lib/libicui18n.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicui18n.so.44
Reading symbols from /opt/couchbase/lib/libicuuc.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicuuc.so.44
Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicudata.so.44
0x00007f7bfcfdc4f3 in select () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install couchbase-server-3.0.0-1174.x86_64
(gdb) thread app all bt

Thread 30 (Thread 0x7f7bfb5ff700 (LWP 27140)):
#0 0x00007f7bfd4a454d in read () from /lib64/libpthread.so.0
#1 0x0000000000550641 in signal_dispatcher_thread_func (unused=<value optimized out>) at sys/unix/sys.c:2916
#2 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e070) at pthread/ethread.c:106
#3 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7f7bfa37f700 (LWP 27141)):
#0 0x00007f7bfd4a143c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000005a7699 in ethr_cond_wait (cnd=<value optimized out>, mtx=<value optimized out>) at common/ethr_mutex.c:1368
#2 0x0000000000463e3f in erts_cnd_wait (unused=<value optimized out>) at beam/erl_threads.h:1788
#3 erts_smp_cnd_wait (unused=<value optimized out>) at beam/erl_smp.h:938
#4 sys_msg_dispatcher_func (unused=<value optimized out>) at beam/erl_trace.c:3286
#5 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e150) at pthread/ethread.c:106
#6 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f7bfabfe700 (LWP 27142)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600138) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600138) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2380) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2380) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2380) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f7bf997e700 (LWP 27143)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600178) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600178) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb24c0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb24c0) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb24c0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f7bf973f700 (LWP 27144)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6001b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6001b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2600) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2600) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2600) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f7bf971d700 (LWP 27145)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6001f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6001f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2740) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2740) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2740) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 24 (Thread 0x7f7bf96fb700 (LWP 27146)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600238) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600238) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2880) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2880) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2880) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f7bf96d9700 (LWP 27147)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600278) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600278) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb29c0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb29c0) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb29c0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f7bf96b7700 (LWP 27148)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6002b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6002b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2b00) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2b00) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2b00) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f7bf9695700 (LWP 27149)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6002f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6002f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2c40) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2c40) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2c40) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 20 (Thread 0x7f7bf9673700 (LWP 27150)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600338) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600338) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2d80) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2d80) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2d80) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f7bf9651700 (LWP 27151)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600378) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600378) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb2ec0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb2ec0) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb2ec0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f7bf962f700 (LWP 27152)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6003b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6003b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb3000) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb3000) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb3000) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f7bf960d700 (LWP 27153)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6003f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6003f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb3140) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb3140) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb3140) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f7bf95eb700 (LWP 27154)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600438) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600438) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb3280) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb3280) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb3280) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f7bf95c9700 (LWP 27155)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600478) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600478) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb33c0) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb33c0) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb33c0) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f7bf95a7700 (LWP 27156)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6004b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6004b8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb3500) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb3500) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb3500) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f7bf9585700 (LWP 27157)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6004f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6004f8) at pthread/ethr_event.c:218
#3 0x00000000004f847a in erts_tse_wait (arg=0x7f7bfabb3640) at beam/erl_threads.h:2710
#4 async_get (arg=0x7f7bfabb3640) at beam/erl_async.c:371
#5 async_main (arg=0x7f7bfabb3640) at beam/erl_async.c:492
#6 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e130) at pthread/ethread.c:106
#7 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f7bfdf8e700 (LWP 27158)):
#0 0x00007f7bfd4a509d in waitpid () from /lib64/libpthread.so.0
#1 0x000000000054f158 in child_waiter (unused=<value optimized out>) at sys/unix/sys.c:2840
#2 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e0a0) at pthread/ethread.c:106
#3 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f7bf90ff700 (LWP 27159)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600578) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600578) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc4e3680, rq=0x7f7bfc4e1e40) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc4e3680, rq=0x7f7bfc4e1e40) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc4e3680) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f7bf86fe700 (LWP 27160)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6005b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6005b8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc4ed940, rq=0x7f7bfc4e1fc0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc4ed940, rq=0x7f7bfc4e1fc0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc4ed940) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f7bf7cfd700 (LWP 27161)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6005f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6005f8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc4f7c00, rq=0x7f7bfc4e2140) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc4f7c00, rq=0x7f7bfc4e2140) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc4f7c00) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f7bf72fc700 (LWP 27162)):
#0 0x00007f7bfcfe3f03 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000555b6c in check_fd_events (ps=0x7f7bfc454910, pr=0x7f7bf72fb320, len=0x7f7bf72fbb3c, utvp=0x7f7bf72fbb20) at sys/common/erl_poll.c:2023
#2 erts_poll_wait_kp (ps=0x7f7bfc454910, pr=0x7f7bf72fb320, len=0x7f7bf72fbb3c, utvp=0x7f7bf72fbb20) at sys/common/erl_poll.c:2184
#3 0x00000000005589b3 in erts_check_io_kp (do_wait=<value optimized out>) at sys/common/erl_check_io.c:1183
#4 0x00000000004989d6 in scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc501ec0, rq=0x7f7bfc4e22c0) at beam/erl_process.c:2533
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc501ec0) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f7bf68fb700 (LWP 27163)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600678) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600678) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc50c180, rq=0x7f7bfc4e2440) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc50c180, rq=0x7f7bfc4e2440) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc50c180) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 6 (Thread 0x7f7bf5efa700 (LWP 27164)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6006b8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6006b8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc516440, rq=0x7f7bfc4e25c0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc516440, rq=0x7f7bfc4e25c0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc516440) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f7bf54f9700 (LWP 27165)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb6006f8) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb6006f8) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc520700, rq=0x7f7bfc4e2740) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc520700, rq=0x7f7bfc4e2740) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc520700) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f7bf4af8700 (LWP 27166)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600738) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600738) at pthread/ethr_event.c:218
#3 0x0000000000498345 in erts_tse_wait (fcalls=<value optimized out>, esdp=0x7f7bfc52a9c0, rq=0x7f7bfc4e28c0) at beam/erl_threads.h:2710
#4 scheduler_wait (fcalls=<value optimized out>, esdp=0x7f7bfc52a9c0, rq=0x7f7bfc4e28c0) at beam/erl_process.c:2354
#5 0x000000000049e7b3 in schedule (p=<value optimized out>, calls=<value optimized out>) at beam/erl_process.c:7017
#6 0x00000000005311d0 in process_main () at beam/beam_emu.c:1198
#7 0x0000000000493f94 in sched_thread_func (vesdp=0x7f7bfc52a9c0) at beam/erl_process.c:5801
#8 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#9 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f7bf40f7700 (LWP 27167)):
#0 0x00007f7bfcfe02d9 in syscall () from /lib64/libc.so.6
#1 0x00000000005a9c25 in wait__ (e=0x7f7bfb600778) at pthread/ethr_event.c:92
#2 ethr_event_wait (e=0x7f7bfb600778) at pthread/ethr_event.c:218
#3 0x0000000000498c8e in erts_tse_wait (unused=<value optimized out>) at beam/erl_threads.h:2710
#4 aux_thread (unused=<value optimized out>) at beam/erl_process.c:2272
#5 0x00000000005a95d6 in thr_wrapper (vtwd=0x7fffebc4e180) at pthread/ethread.c:106
#6 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f7ba97aa700 (LWP 27174)):
#0 0x00007f7bfd4a4d2d in nanosleep () from /lib64/libpthread.so.0
#1 0x00007f7baa112c21 in terminatorLoop (args=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/couchdb/src/mapreduce/mapreduce_nif.cc:480
---Type <return> to continue, or q <return> to quit---
#2 0x00000000005a95d6 in thr_wrapper (vtwd=0x7f7bf90fead0) at pthread/ethread.c:106
#3 0x00007f7bfd49d851 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f7bfcfe390d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f7bfe175700 (LWP 27136)):
#0 0x00007f7bfcfdc4f3 in select () from /lib64/libc.so.6
#1 0x0000000000550a70 in erts_sys_main_thread () at sys/unix/sys.c:3059
#2 0x00000000004490b9 in erl_start (argc=46, argv=<value optimized out>) at beam/erl_init.c:1775
#3 0x00000000004273a9 in main (argc=<value optimized out>, argv=<value optimized out>) at sys/unix/erl_main.c:29
Comment by Aleksey Kondratenko [ 24/Aug/14 ]
beam's backtraces are irrelevant
Comment by Andrei Baranouski [ 24/Aug/14 ]
gdb -p 27228
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 27228
Reading symbols from /opt/couchbase/bin/memcached...done.

warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1
Reading symbols from /opt/couchbase/bin/../lib/libplatform.so.0.1.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libplatform.so.0.1.0
Reading symbols from /opt/couchbase/bin/../lib/libcJSON.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcJSON.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libJSON_checker.so...done.
Loaded symbols for /opt/couchbase/bin/../lib/libJSON_checker.so
Reading symbols from /opt/couchbase/bin/../lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libsnappy.so.1
Reading symbols from /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4
Reading symbols from /opt/couchbase/bin/../lib/libevent_core-2.0.so.5...done.
Loaded symbols for /opt/couchbase/bin/../lib/libevent_core-2.0.so.5
Reading symbols from /usr/lib64/libssl.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libssl.so.6
Reading symbols from /usr/lib64/libcrypto.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libcrypto.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 31578]
[New LWP 31577]
[New LWP 31576]
[New LWP 31575]
[New LWP 31574]
[New LWP 31573]
[New LWP 31572]
[New LWP 31571]
[New LWP 31570]
[New LWP 31569]
[New LWP 31568]
[New LWP 27271]
[New LWP 27270]
[New LWP 27269]
[New LWP 27268]
[New LWP 27267]
[New LWP 27266]
[New LWP 27265]
[New LWP 27242]
[New LWP 27241]
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libkrb5support.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so
Reading symbols from /opt/couchbase/lib/libdirutils.so.0.1.0...done.
Loaded symbols for /opt/couchbase/lib/libdirutils.so.0.1.0
Reading symbols from /opt/couchbase/lib/libv8.so...done.
Loaded symbols for /opt/couchbase/lib/libv8.so
Reading symbols from /opt/couchbase/lib/libicui18n.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicui18n.so.44
Reading symbols from /opt/couchbase/lib/libicuuc.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicuuc.so.44
Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicudata.so.44
0x00007f78968df0ad in pthread_join () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install couchbase-server-3.0.0-1174.x86_64
(gdb) thread app all bt

Thread 21 (Thread 0x7f78943df700 (LWP 27241)):
#0 0x00007f7895a7360d in read () from /lib64/libc.so.6
#1 0x00007f7895a09f68 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00007f7895a0ba6e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00007f7895a0014a in _IO_getline_info_internal () from /lib64/libc.so.6
#4 0x00007f78959fefa9 in fgets () from /lib64/libc.so.6
#5 0x00007f78943e08b1 in check_stdin_thread (arg=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/extensions/daemon/stdin_check.c:38
#6 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce0e0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#7 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f78937db700 (LWP 27242)):
#0 0x00007f78968e27bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7897b2a9eb in cb_cond_timedwait (cond=0x7f78939de6e0, mutex=0x7f78939de6a0, ms=<value optimized out>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f78937ddd53 in logger_thead_main (arg=0x140e900) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/extensions/loggers/file_logger.c:342
#3 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce080) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f7892bce700 (LWP 27265)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f7892bd2db0 in lock_engines (name=0x13d23f0 "RevAB") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:356
#5 find_bucket (name=0x13d23f0 "RevAB") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:715
#6 0x00007f7892bd45d0 in handle_select_bucket (handle=0x7f7892dda840, cookie=0x5b51600, request=0x5b85000, response=0x40d2a0 <binary_response_handler>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:3054
#7 bucket_unknown_command (handle=0x7f7892dda840, cookie=0x5b51600, request=0x5b85000, response=0x40d2a0 <binary_response_handler>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:3206
#8 0x0000000000418701 in process_bin_unknown_packet (c=0x5b51600) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:2616
#9 process_bin_packet (c=0x5b51600) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5414
#10 complete_nread (c=0x5b51600) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5818
#11 conn_nread (c=0x5b51600) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7032
#12 0x000000000040b4bd in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x5b51600)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7305
#13 0x00007f78970abd3c in event_process_active_single_queue (base=0x5ce4280, flags=<value optimized out>) at event.c:1308
#14 event_process_active (base=0x5ce4280, flags=<value optimized out>) at event.c:1375
#15 event_base_loop (base=0x5ce4280, flags=<value optimized out>) at event.c:1572
#16 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce140) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#17 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f78921cd700 (LWP 27266)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f7892bd2db0 in lock_engines (name=0x13d20a8 "default") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:356
#5 find_bucket (name=0x13d20a8 "default") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:715
#6 0x00007f7892bd3e02 in handle_connect (cookie=0x5be5b00, type=<value optimized out>, event_data=<value optimized out>, cb_data=0x7f7892dda840)
---Type <return> to continue, or q <return> to quit---
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:1291
#7 0x000000000040dcaf in perform_callbacks (sfd=124, parent_port=11210, init_state=0x415390 <conn_new_cmd>, event_flags=18, read_buffer_size=<value optimized out>, base=0x5ce4500, timeout=0x0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:248
#8 conn_new (sfd=124, parent_port=11210, init_state=0x415390 <conn_new_cmd>, event_flags=18, read_buffer_size=<value optimized out>, base=0x5ce4500, timeout=0x0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:915
#9 0x00000000004198ea in thread_libevent_process (fd=<value optimized out>, which=<value optimized out>, arg=0x5cb2af0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:312
#10 0x00007f78970abd3c in event_process_active_single_queue (base=0x5ce4500, flags=<value optimized out>) at event.c:1308
#11 event_process_active (base=0x5ce4500, flags=<value optimized out>) at event.c:1375
#12 event_base_loop (base=0x5ce4500, flags=<value optimized out>) at event.c:1572
#13 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce150) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#14 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f78917cc700 (LWP 27267)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f7892bd2db0 in lock_engines (name=0x13d20a8 "default") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:356
#5 find_bucket (name=0x13d20a8 "default") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:715
#6 0x00007f7892bd3e02 in handle_connect (cookie=0x5c76300, type=<value optimized out>, event_data=<value optimized out>, cb_data=0x7f7892dda840)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:1291
#7 0x000000000040dcaf in perform_callbacks (sfd=126, parent_port=11210, init_state=0x415390 <conn_new_cmd>, event_flags=18, read_buffer_size=<value optimized out>, base=0x5ce4780, timeout=0x0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:248
#8 conn_new (sfd=126, parent_port=11210, init_state=0x415390 <conn_new_cmd>, event_flags=18, read_buffer_size=<value optimized out>, base=0x5ce4780, timeout=0x0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:915
#9 0x00000000004198ea in thread_libevent_process (fd=<value optimized out>, which=<value optimized out>, arg=0x5cb2be0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:312
#10 0x00007f78970abd3c in event_process_active_single_queue (base=0x5ce4780, flags=<value optimized out>) at event.c:1308
#11 event_process_active (base=0x5ce4780, flags=<value optimized out>) at event.c:1375
#12 event_base_loop (base=0x5ce4780, flags=<value optimized out>) at event.c:1572
#13 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce160) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#14 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f7890dcb700 (LWP 27268)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f7892bd2db0 in lock_engines (name=0x1052b980 "UserInfo") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:356
#5 find_bucket (name=0x1052b980 "UserInfo") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:715
#6 0x00007f7892bd45d0 in handle_select_bucket (handle=0x7f7892dda840, cookie=0x5c79000, request=0x5ca3000, response=0x40d2a0 <binary_response_handler>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:3054
#7 bucket_unknown_command (handle=0x7f7892dda840, cookie=0x5c79000, request=0x5ca3000, response=0x40d2a0 <binary_response_handler>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:3206
#8 0x0000000000418701 in process_bin_unknown_packet (c=0x5c79000) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:2616
#9 process_bin_packet (c=0x5c79000) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5414
#10 complete_nread (c=0x5c79000) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5818
#11 conn_nread (c=0x5c79000) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7032
#12 0x000000000040b4bd in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x5c79000)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7305
---Type <return> to continue, or q <return> to quit---
#13 0x00007f78970abd3c in event_process_active_single_queue (base=0x5ce4a00, flags=<value optimized out>) at event.c:1308
#14 event_process_active (base=0x5ce4a00, flags=<value optimized out>) at event.c:1375
#15 event_base_loop (base=0x5ce4a00, flags=<value optimized out>) at event.c:1572
#16 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce170) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#17 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f78903ca700 (LWP 27269)):
#0 0x00007f78968e243c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f788d8b671d in wait (this=0xd525b00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/syncobject.h:39
#2 EventuallyPersistentStore::initialize (this=0xd525b00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep.cc:346

#3 0x00007f788d8c342c in EventuallyPersistentEngine::initialize (this=0x21072a00, config=<value optimized out>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.cc:2011
#4 0x00007f788d8c3786 in EvpInitialize (handle=0x21072a00,
    config_str=0x1c106003 "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_size=314572800;tap_keepalive=300;dbname=/data/MsgsCalls;allow_data_loss_during_shutdown=true;backend=couchdb;couch_bucket=MsgsCalls;couch_port=11213;ma"...) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.cc:135
#5 0x00007f7892bd325b in create_bucket_UNLOCKED (e=0x7f7892dda840, bucket_name=0x1052c840 "MsgsCalls", path=0x1c105fe0 "/opt/couchbase/lib/memcached/ep.so",
    config=0x1c106003 "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_size=314572800;tap_keepalive=300;dbname=/data/MsgsCalls;allow_data_loss_during_shutdown=true;backend=couchdb;couch_bucket=MsgsCalls;couch_port=11213;ma"..., e_out=0x0, msg=0x7f78903c9730 "", msglen=1024) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:857
#6 0x00007f7892bd3496 in handle_create_bucket (handle=0x7f7892dda840, cookie=0x5b4fe00, request=<value optimized out>, response=0x40d2a0 <binary_response_handler>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:2841
#7 0x00007f7892bd4979 in bucket_unknown_command (handle=0x7f7892dda840, cookie=0x5b4fe00, request=0x5b6f000, response=0x40d2a0 <binary_response_handler>)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:3193
#8 0x0000000000418701 in process_bin_unknown_packet (c=0x5b4fe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:2616
#9 process_bin_packet (c=0x5b4fe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5414
#10 complete_nread (c=0x5b4fe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:5818
#11 conn_nread (c=0x5b4fe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7032
#12 0x000000000040b4bd in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x5b4fe00)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7305
#13 0x00007f78970abd3c in event_process_active_single_queue (base=0x5ce4c80, flags=<value optimized out>) at event.c:1308
#14 event_process_active (base=0x5ce4c80, flags=<value optimized out>) at event.c:1375
#15 event_base_loop (base=0x5ce4c80, flags=<value optimized out>) at event.c:1572
#16 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce180) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#17 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f788f9c9700 (LWP 27270)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x00007f7892bd2db0 in lock_engines (name=0x13d20a8 "default") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:356
#5 find_bucket (name=0x13d20a8 "default") at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:715
#6 0x00007f7892bd3e02 in handle_connect (cookie=0x5a30400, type=<value optimized out>, event_data=<value optimized out>, cb_data=0x7f7892dda840)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:1291
#7 0x000000000040dcaf in perform_callbacks (sfd=91, parent_port=11209, init_state=0x415390 <conn_new_cmd>, event_flags=18, read_buffer_size=<value optimized out>, base=0x5ce4f00, timeout=0x0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:248
#8 conn_new (sfd=91, parent_port=11209, init_state=0x415390 <conn_new_cmd>, event_flags=18, read_buffer_size=<value optimized out>, base=0x5ce4f00, timeout=0x0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:915
#9 0x00000000004198ea in thread_libevent_process (fd=<value optimized out>, which=<value optimized out>, arg=0x5cb2eb0)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:312
#10 0x00007f78970abd3c in event_process_active_single_queue (base=0x5ce4f00, flags=<value optimized out>) at event.c:1308
#11 event_process_active (base=0x5ce4f00, flags=<value optimized out>) at event.c:1375
---Type <return> to continue, or q <return> to quit---
#12 event_base_loop (base=0x5ce4f00, flags=<value optimized out>) at event.c:1572
#13 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce190) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#14 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f788efc8700 (LWP 27271)):
#0 0x00007f7895a80f03 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f78970c0376 in epoll_dispatch (base=0x5ce5180, tv=<value optimized out>) at epoll.c:404
#2 0x00007f78970abc44 in event_base_loop (base=0x5ce5180, flags=<value optimized out>) at event.c:1558
#3 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce1a0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f788e5c7700 (LWP 31568)):
#0 0x00007f7895a44b8d in nanosleep () from /lib64/libc.so.6
#1 0x00007f7895a79d64 in usleep () from /lib64/libc.so.6
#2 0x00007f788d8f40b5 in updateStatsThread (arg=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/memory_tracker.cc:36
#3 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ce240) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f788b31f700 (LWP 31569)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x000000000041a236 in notify_io_complete (cookie=0x5c27800, status=ENGINE_SUCCESS) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:459
#5 0x00007f788d8ba536 in EventuallyPersistentEngine::notifyIOComplete(const void *, ._101) (this=0x621e000, cookie=0x5c27800, status=ENGINE_SUCCESS)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.h:478
#6 0x00007f788d8a43e3 in EventuallyPersistentStore::completeBGFetchMulti (this=0x5d4f8c0, vbId=<value optimized out>, fetchedItems=std::vector of length 4, capacity 4 = {...},
    startTime=1593905183966731) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep.cc:1620
#7 0x00007f788d88f0a2 in BgFetcher::doFetch (this=0x5d31ba0, vbId=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:90
#8 0x00007f788d88f817 in BgFetcher::run (this=0x5d31ba0, task=0x81161c0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:156
#9 0x00007f788d91b603 in BgFetcherTask::run (this=0xfffffffffffffe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/tasks.cc:89

#10 0x00007f788d8f5b3a in ExecutorThread::run (this=0x5eba000) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:110
#11 0x00007f788d8f6296 in launch_executor_thread (arg=0x5cb2e68) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#12 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cec50) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#13 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f788a91e700 (LWP 31570)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x000000000041a236 in notify_io_complete (cookie=0x5b03200, status=ENGINE_SUCCESS) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:459
#5 0x00007f788d8ba536 in EventuallyPersistentEngine::notifyIOComplete(const void *, ._101) (this=0x621e000, cookie=0x5b03200, status=ENGINE_SUCCESS)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.h:478
#6 0x00007f788d8a43e3 in EventuallyPersistentStore::completeBGFetchMulti (this=0x5d4f8c0, vbId=<value optimized out>, fetchedItems=std::vector of length 4, capacity 4 = {...},
    startTime=1593905182650801) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep.cc:1620
#7 0x00007f788d88f0a2 in BgFetcher::doFetch (this=0x5d32560, vbId=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:90
#8 0x00007f788d88f817 in BgFetcher::run (this=0x5d32560, task=0x7ccbc50) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:156
#9 0x00007f788d91b603 in BgFetcherTask::run (this=0xfffffffffffffe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/tasks.cc:89
---Type <return> to continue, or q <return> to quit---
#10 0x00007f788d8f5b3a in ExecutorThread::run (this=0x5eba0e0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:110
#11 0x00007f788d8f6296 in launch_executor_thread (arg=0x5cb2e68) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#12 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cec70) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#13 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f7889f1d700 (LWP 31571)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x000000000041a236 in notify_io_complete (cookie=0x5b03500, status=ENGINE_SUCCESS) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:459
#5 0x00007f788d8ba536 in EventuallyPersistentEngine::notifyIOComplete(const void *, ._101) (this=0x621e000, cookie=0x5b03500, status=ENGINE_SUCCESS)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.h:478
#6 0x00007f788d8a43e3 in EventuallyPersistentStore::completeBGFetchMulti (this=0x5d4f8c0, vbId=<value optimized out>, fetchedItems=std::vector of length 1, capacity 1 = {...},
    startTime=1593905182702662) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep.cc:1620
#7 0x00007f788d88f0a2 in BgFetcher::doFetch (this=0x5d305b0, vbId=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:90
#8 0x00007f788d88f817 in BgFetcher::run (this=0x5d305b0, task=0x8115d60) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:156
#9 0x00007f788d91b603 in BgFetcherTask::run (this=0xfffffffffffffe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/tasks.cc:89
#10 0x00007f788d8f5b3a in ExecutorThread::run (this=0x5eba1c0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:110
#11 0x00007f788d8f6296 in launch_executor_thread (arg=0x5cb2e68) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#12 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cec60) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#13 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f788951c700 (LWP 31572)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x000000000041a236 in notify_io_complete (cookie=0x5b05300, status=ENGINE_SUCCESS) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:459
#5 0x00007f788d8ba536 in EventuallyPersistentEngine::notifyIOComplete(const void *, ._101) (this=0x621e000, cookie=0x5b05300, status=ENGINE_SUCCESS)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.h:478
#6 0x00007f788d8a43e3 in EventuallyPersistentStore::completeBGFetchMulti (this=0x5d4f8c0, vbId=<value optimized out>, fetchedItems=std::vector of length 4, capacity 4 = {...},
    startTime=1593905193178279) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep.cc:1620
#7 0x00007f788d88f0a2 in BgFetcher::doFetch (this=0x5d32630, vbId=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:90
#8 0x00007f788d88f817 in BgFetcher::run (this=0x5d32630, task=0x7cc8730) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/bgfetcher.cc:156
#9 0x00007f788d91b603 in BgFetcherTask::run (this=0xfffffffffffffe00) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/tasks.cc:89
#10 0x00007f788d8f5b3a in ExecutorThread::run (this=0x5eba2a0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:110
#11 0x00007f788d8f6296 in launch_executor_thread (arg=0x5cb2e68) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#12 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cec80) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#13 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f7888b1b700 (LWP 31573)):
#0 0x00007f78968e27bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7897b2a9eb in cb_cond_timedwait (cond=0x14370c0, mutex=0x1437088, ms=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f788d91d7b5 in TaskQueue::_doSleep (this=0x1437080, t=...) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f788d92056f in TaskQueue::_fetchNextTask (this=0x1437080, t=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:98
#4 0x00007f788d92090e in TaskQueue::fetchNextTask (this=0x1437080, thread=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:142

#5 0x00007f788d8e397c in ExecutorPool::_nextTask (this=0x5eb1840, t=..., tick=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:214
#6 0x00007f788d8e3a4e in ExecutorPool::nextTask (this=0x5eb1840, t=..., tick=236 '\354') at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:229
#7 0x00007f788d8f597b in ExecutorThread::run (this=0x5eba380) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:78
---Type <return> to continue, or q <return> to quit---
#8 0x00007f788d8f6296 in launch_executor_thread (arg=0x14370c4) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#9 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cec90) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#10 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f788811a700 (LWP 31574)):
#0 0x00007f78968e27bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7897b2a9eb in cb_cond_timedwait (cond=0x14370c0, mutex=0x1437088, ms=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f788d91d7b5 in TaskQueue::_doSleep (this=0x1437080, t=...) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f788d92056f in TaskQueue::_fetchNextTask (this=0x1437080, t=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:98
#4 0x00007f788d92090e in TaskQueue::fetchNextTask (this=0x1437080, thread=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:142
#5 0x00007f788d8e397c in ExecutorPool::_nextTask (this=0x5eb1840, t=..., tick=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:214
#6 0x00007f788d8e3a4e in ExecutorPool::nextTask (this=0x5eb1840, t=..., tick=114 'r') at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:229
#7 0x00007f788d8f597b in ExecutorThread::run (this=0x5eba460) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:78
#8 0x00007f788d8f6296 in launch_executor_thread (arg=0x14370c4) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#9 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13ceca0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#10 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f7887719700 (LWP 31575)):
#0 0x00007f78968e27bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7897b2a9eb in cb_cond_timedwait (cond=0x14370c0, mutex=0x1437088, ms=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f788d91d7b5 in TaskQueue::_doSleep (this=0x1437080, t=...) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f788d92056f in TaskQueue::_fetchNextTask (this=0x1437080, t=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:98
#4 0x00007f788d92090e in TaskQueue::fetchNextTask (this=0x1437080, thread=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:142
#5 0x00007f788d8e397c in ExecutorPool::_nextTask (this=0x5eb1840, t=..., tick=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:214
#6 0x00007f788d8e3a4e in ExecutorPool::nextTask (this=0x5eb1840, t=..., tick=10 '\n') at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:229
#7 0x00007f788d8f597b in ExecutorThread::run (this=0x5eba540) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:78
#8 0x00007f788d8f6296 in launch_executor_thread (arg=0x14370c4) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#9 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cecb0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#10 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f7886d18700 (LWP 31576)):
#0 0x00007f78968e27bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7897b2a9eb in cb_cond_timedwait (cond=0x14370c0, mutex=0x1437088, ms=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f788d91d7b5 in TaskQueue::_doSleep (this=0x1437080, t=...) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f788d92056f in TaskQueue::_fetchNextTask (this=0x1437080, t=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:98
#4 0x00007f788d92090e in TaskQueue::fetchNextTask (this=0x1437080, thread=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:142
#5 0x00007f788d8e397c in ExecutorPool::_nextTask (this=0x5eb1840, t=..., tick=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:214
#6 0x00007f788d8e3a4e in ExecutorPool::nextTask (this=0x5eb1840, t=..., tick=187 '\273') at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:229
#7 0x00007f788d8f597b in ExecutorThread::run (this=0x5eba620) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:78
#8 0x00007f788d8f6296 in launch_executor_thread (arg=0x14370c4) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#9 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cecc0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#10 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f7886317700 (LWP 31577)):
#0 0x00007f78968e27bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f7897b2a9eb in cb_cond_timedwait (cond=0x1437380, mutex=0x1437348, ms=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f788d91d7b5 in TaskQueue::_doSleep (this=0x1437340, t=...) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f788d92056f in TaskQueue::_fetchNextTask (this=0x1437340, t=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:98
#4 0x00007f788d92090e in TaskQueue::fetchNextTask (this=0x1437340, thread=..., toSleep=true) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/taskqueue.cc:142
---Type <return> to continue, or q <return> to quit---
#5 0x00007f788d8e397c in ExecutorPool::_nextTask (this=0x5eb1840, t=..., tick=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:214
#6 0x00007f788d8e3a4e in ExecutorPool::nextTask (this=0x5eb1840, t=..., tick=191 '\277') at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorpool.cc:229
#7 0x00007f788d8f597b in ExecutorThread::run (this=0x5eba700) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:78
#8 0x00007f788d8f6296 in launch_executor_thread (arg=0x1437384) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#9 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cecd0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#10 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f7885916700 (LWP 31578)):
#0 0x00007f78968e5054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f78968e0388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f78968e0257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f7897b2aaf9 in cb_mutex_enter (mutex=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:85
#4 0x000000000041a236 in notify_io_complete (cookie=0x5c77500, status=ENGINE_SUCCESS) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:459
#5 0x00007f788d8ba536 in EventuallyPersistentEngine::notifyIOComplete(const void *, ._101) (this=0x5d2c000, cookie=0x5c77500, status=ENGINE_SUCCESS)
    at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/ep_engine.h:478
#6 0x00007f788d912c26 in UprConnMap::manageConnections (this=0x5d40000) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/tapconnmap.cc:1107
#7 0x00007f788d91887f in ConnManager::run (this=0x1433310) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/tapconnmap.cc:151
#8 0x00007f788d8f5b3a in ExecutorThread::run (this=0x5eba7e0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:110
#9 0x00007f788d8f6296 in launch_executor_thread (arg=0x5cb2aa8) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/executorthread.cc:34
#10 0x00007f7897b2a7ea in platform_thread_wrap (arg=0x13cece0) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#11 0x00007f78968de851 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f7895a8090d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f78983427e0 (LWP 27228)):
#0 0x00007f78968df0ad in pthread_join () from /lib64/libpthread.so.0
#1 0x0000000000419f74 in threads_shutdown () at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/thread.c:659
#2 0x000000000040f5f5 in main (argc=<value optimized out>, argv=<value optimized out>) at /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:8762
Comment by Aleksey Kondratenko [ 24/Aug/14 ]
memcached backtraces look dead-lock-ful
Comment by Mike Wiederhold [ 24/Aug/14 ]
It looks like all of the executor threads are running background fetches which are blocked from completing due to bucket creation. The bucket creation is waiting on warmup to complete, but it is not able to since there is no available executor thread to run it.
Comment by Sundar Sridharan [ 25/Aug/14 ]
Thanks Mike, you are right about the lack of reader threads. There is a quick fix for this issue, but instead of merging that, I wish to understand why one bucket's bgfetch completion notification should get blocked on a new bucket creation in memcached. Will try to update once I understand this better. thanks
Comment by Andrei Baranouski [ 25/Aug/14 ]
possible root of the problem lies in the http://www.couchbase.com/issues/browse/MB-12056
on this cluster I've played with the issue
Comment by Sundar Sridharan [ 25/Aug/14 ]
Andrei, I looked at MB-12056, from the first looks at least it appears like they are separate issues.
Comment by Sundar Sridharan [ 25/Aug/14 ]
Discussed this with Trond, the root cause is that the front-end thread is getting blocked by a ep-engine background thread due to wait for warmup in initialize() phase of bucket creation.
The fix is to change this behavior to make bucket creation non-blocking to avoid the deadlock seen here.
thanks
Comment by Sundar Sridharan [ 26/Aug/14 ]
fix uploaded for review at http://review.couchbase.org/#/c/40892 thanks
Comment by Wayne Siu [ 26/Aug/14 ]
Reviewed with PM/Cihan, this ticket is approved for rc2.
Comment by Sundar Sridharan [ 26/Aug/14 ]
fix has been merged commit id 46df358fadbd1f2b57996ad5546702b0e66731ad thanks




[MB-12030] Show (existing) new setting for XDCR i.e Number of Workers Created: 19/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Anil Kumar Assignee: Aruna Piravi
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: Unknown

 Description   
Show the new (existing) setting on XDCR settings which will be useful for tuning the replication. This new setting "Number of Workers" will help in tuning the XDCR performance.



 Comments   
Comment by Anil Kumar [ 19/Aug/14 ]
This is 'approved' for merging to 3.0 branch and will be included in RC2 build.
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
manifest update here: http://review.couchbase.org/40758
actual change here: http://review.couchbase.org/40736
Comment by Aruna Piravi [ 26/Aug/14 ]
Hi Anil, have we documented this already? thanks.




[MB-11903] XDCR: New stat "%utilization" does not have a brief description on UI Created: 07/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Aruna Piravi Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: RC2, rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-08-07 at 1.43.48 PM.png    
Triage: Untriaged
Is this a Regression?: No

 Description   
See screenshot. Mouse-over should display a brief description about the stat.

 Comments   
Comment by Aruna Piravi [ 07/Aug/14 ]
Pls also review the description of batches/sec stat.
Comment by Anil Kumar [ 13/Aug/14 ]
Pavel - Please make text changes for stats ...

1. change the stat names to match with other stats
    - meta batches/sec to "meta batches per sec"
    - docs batches/sec to "doc batches per sec"
    - wakeups/sec to "wakeups per sec"
    - batches/sec to "XDCR vbucket replicators per sec"
    - % utilization change to "% time spent with vbucket replicators"

2. change the description
   - batches/sec ("XDCR vbucket replicators per sec" ) - Rate at which XDCR vbucket replicators replicates batches per sec
   - % utilization change ("% time spent with vbucket replicators") - Percentage of time spent with vbucket replicators * 100 / (max_workers_count * <time passed since last sample>)
Comment by Aleksey Kondratenko [ 14/Aug/14 ]
This is too long. Cannot do
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC
Comment by Raju Suravarjjala [ 19/Aug/14 ]
Triage: Anil: Can you please shorten the text for Alk? He said he can change it as soon as you give brief description
Comment by Anil Kumar [ 19/Aug/14 ]
Alk - Please make text changes for stats with shorter names ...

1. change the stat names to match with other stats
    - meta batches/sec to "meta batches per sec"
    - docs batches/sec to "doc batches per sec"
    - wakeups/sec to "wakeups per sec"
    - batches/sec to "XDCR vb reps per sec"
    - % utilization change to "% time spent vb reps"

2. change the description
   - batches/sec ("XDCR vb reps per sec") - Rate at which XDCR vbucket replicators replicates batches per sec
   - % utilization change ("% time spent vb reps") - Percentage of time spent with vbucket replicators * 100 / (max_workers_count * <time passed since last sample>)
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
http://review.couchbase.org/40760
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
manifest updated here http://review.couchbase.org/40761
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
For the record: I'm not a fan of changing familiar name "% utilization" to something far less comprehensible.




[MB-12037] ns_server may lose replicas on stopped rebalance/graceful failover (was: {DCP} : Delta Recovery Impossible after re-try of graceful failover since in first attempt failed) Created: 21/Aug/14  Updated: 26/Aug/14  Resolved: 26/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: RC2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 10.6.2.144-10.6.2.150
centos 6x
1174

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8202014-2226-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8202014-2227-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8202014-2228-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8202014-2228-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/1174.log.tar.gz
Is this a Regression?: Unknown

 Description   
Scenario
1. Create a 7 Node cluster
2. Create default bucket with 100 K items
3. Graceful failover a node
4. Kill memcached of another node during graceful failover
5. Graceful failover the same node in step 3
6. Add-back the node with delta recovery
7. Hit Rebalance

In Step 7, Rebalance fails for delta recovery. Says delta recovery is not possible. Although we see nodes in the cluster are in healthy state.

We see the following warning:: "Fail Over Warning: Rebalance required, some data is not currently replicated!”

Seems like the delta recovery will not work in this condition, unless we rebalance the cluster. Also, I was able to cancel the delta recovery and do full recovery.

Opening the bug to follow-up on the issue. Attaching logs and data files



 Comments   
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
I was able to reproduce it easily. There's indeed something wrong with restarting graceful failover which impacts delta recovery.
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
And predictably happens with any stop/restart of graceful failover.
Comment by Parag Agarwal [ 21/Aug/14 ]
When the warning is showing (" Rebalance required, some data is not currently replicated!"), we don't expect delta recovery to succeed and this should be correct behavior? Asking since we will have to document it as well

Comment by Aleksey Kondratenko [ 21/Aug/14 ]
Warning has nothing to do with that. And warning is valid. Midway into graceful failover your're not balanced indeed.
Comment by Aleksey Kondratenko [ 21/Aug/14 ]
manifest updated here: http://review.couchbase.org/40811

fix merged here: http://review.couchbase.org/40803
Comment by Parag Agarwal [ 22/Aug/14 ]
Tested
Comment by Parag Agarwal [ 22/Aug/14 ]
Test Run:: http://qa.hq.northscale.net/job/centos_x64--02_01--Rebalance-In/6/console
Comment by Parag Agarwal [ 22/Aug/14 ]
Saw the issue again for the following scenario with build 1186 1

 Scenario
1. Create a 7 Node cluster
2. Create default bucket with 200 K items
3. Graceful failover a node
4. Kill memcached of another node during graceful failover
5. Graceful failover the same node in step 3
6. Add-back the node with delta recovery
7. Hit Rebalance

We see the following warning:: "Fail Over Warning: Rebalance required, some data is not currently replicated!”

In Step 7, Rebalance fails for delta recovery. Says delta recovery is not possible. Although we see nodes in the cluster are in healthy state. This is true when we have 200 K items Vs 100K items where it passes.

I am attaching the logs for you to analyze. Since the above warning comes in both cases. Not sure about the internal state of the system which stops the add-back delta recovery.

Test fails for 2k items
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1436-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1437-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1439-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1440-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1441-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1442-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1444-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1446-couch.tar.gz

Test passes for 1 K Items

https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1458-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-150-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-151-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-153-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-154-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-156-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-157-diag.zip
Comment by Parag Agarwal [ 22/Aug/14 ]
I think the logs were not uploaded, will add them again
Comment by Parag Agarwal [ 22/Aug/14 ]
fixed the logs
Comment by Aleksey Kondratenko [ 22/Aug/14 ]
Please open new ticket for new instance of the issue
Comment by Parag Agarwal [ 22/Aug/14 ]
http://www.couchbase.com/issues/browse/MB-12055
Comment by Wayne Siu [ 26/Aug/14 ]
Reviewed with PM/Cihan. Approved for RC2.




[MB-11996] Clicking on Removing Read-Only User does not give a meaningful message in the popup dialog Created: 18/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Pavel Blagodov
Resolution: Fixed Votes: 0
Labels: RC2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: hostname: 10.3.2.43
Centos 5.8 but this is a generic issue

Attachments: PNG File Screen Shot 2014-08-18 at 4.55.35 PM.png    
Triage: Untriaged
Is this a Regression?: No

 Description   
Go to http://10.3.2.43:8091/index.html
Login as Administrator
Password: password
Go to Settings and Click on on Account management
Create a read only user (raju_read_only)
Try to click on Removing the Readonly User. You will see a popup dialog saying "Removing". "Do you want to Remove it?"
Popup message does not mention any user to be removed

Please see the attached screenshot

 Comments   
Comment by Anil Kumar [ 19/Aug/14 ]
Pavel - We can take this minor UI change for RC2.

Message - "Are you sure you want to remove <read-only username>?
Comment by Pavel Blagodov [ 20/Aug/14 ]
http://review.couchbase.org/40747
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
http://review.couchbase.org/40747




[MB-11973] [System test] Seeing odd messages on couchdb.logs, where startSeqNo> endSeqNo, expecting a rollback for partition X. Created: 15/Aug/14  Updated: 26/Aug/14  Resolved: 22/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Mike Wiederhold
Resolution: Fixed Votes: 0
Labels: RC2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6.4
6 Nodes, 2 Buckets, 1 ddoc X 2 Views
3.0.0-1163-rel

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1. Load items ~ 700M, 630M on both the buckets, 10% dgm
2. During initial indexing / loading phase, I see a number of log messages which expect rollbacks on partitions due to startSeqNo > endSeqNo.

Not sure why should the above happen - given that there is no rebalance/failover/add new node etc activity.

Logs from the cluster https://s3.amazonaws.com/bugdb/11973/bug_logs.tar

 Comments   
Comment by Sriram Melkote [ 15/Aug/14 ]
Moving to critical as even though there's no problem, if we did indeed rollback after indexing 65k mutations, there'll be a big performance penalty
Comment by Mike Wiederhold [ 15/Aug/14 ]
I don't know why this was assigned to me. These log messages need to be looked at on the view engine side first.
Comment by Ketaki Gangal [ 15/Aug/14 ]
These have been looked at by the view-team, it looks like there is an issue w/ the endSeqNo which the view-component receives from ep-engine seqno resets likely.

Will wait for view-team to add more details on this however.


Comment by Sarath Lakshman [ 18/Aug/14 ]
Siri, I am currently looking into this bug
Comment by Sarath Lakshman [ 18/Aug/14 ]
One possibility of problem at view engine side is that, we cache seq nos stats every 300ms. I will check if this is due to low stats cache update frequency
Comment by Sarath Lakshman [ 18/Aug/14 ]
Looks like its a problem due to stats caching. Since it is a lazily updated async cache, it takes triggers a cache update only when cache ttl is expired and it returns old cache value without waiting for fetching latest. For replica vbuckets, only time cache will be updated is by every 5 seconds trigger. At each 5 second trigger, it will read old seqs from cache. Only next cache reader will see the asynchronously updated cache. But next reader is again after 5 seconds since there are no queries consuming the seqnos. So always updater is started with around 5 seconds old seqnos.
Comment by Sarath Lakshman [ 18/Aug/14 ]
Just confirmed from code that we don't use cached seqs for 5 seconds trigger. I just realized that we dont use cached seqs for anything other than stale=update_after queries. Hence this doesn't look like a view engine problem.
Comment by Sarath Lakshman [ 18/Aug/14 ]
Looks like there is a problem in EP-Engine for replica vbuckets. I read the log files and found that EP-Engine silently seem to rollback to a seqno less than it had for replica vbuckets. I do not see this problem for any active vbuckets.

Example from view engine logs:

View engine received items from a snapshot for vbucket 386 upto seqno 518299 and wrote data to index.
Next time when the updater tried to index next set of items, we received highseqn=518222 from vbucket-seqno stats

[couchdb:info,2014-08-15T8:50:18.745,ns_1@10.6.2.166:<0.26830.22>:couch_log:info:41]set view `saslbucket`, replica (prod) group `_design/ddoc2`: received a snapshot marker (on-disk) for partition 386 from sequence 518052 to 518299
[couchdb:info,2014-08-15T8:50:27.357,ns_1@10.6.2.166:<0.26990.22>:couch_log:info:41]dcp client (<0.1755.0>): Expecting a rollback for partition 386. Found start_seqno > end_seqno (518299 > 518222).


I see this pattern only for replica vbuckets. Atleast from the logs of a node that I investigated, I do not see any such pattern for any of the active vbuckets.

Is it something to do with replication in EP-Engine ?
Comment by Sarath Lakshman [ 18/Aug/14 ]
I just double checked that we are grabing latest seqnos stats (no caching) before starting updater and use those seqnos as endseqnos.
Comment by Raju Suravarjjala [ 18/Aug/14 ]
Triage: Not a blocker for 3.0 RC1
Comment by Chiyoung Seo [ 18/Aug/14 ]
Mike,

It seems to me that this issue was caused by the recent change that was made in the ep-engine:

http://review.couchbase.org/#/c/40346/

The above commit adapts the replica vbucket so that it should use its closed checkpoint's end seqno as its high seqno for the UPR stream. Did you communicate the above change to the view team?
Comment by Mike Wiederhold [ 18/Aug/14 ]
I'll look into this. The changes made on the ep-engine side shouldn't have affected anyone since nothing changed externally.
Comment by Mike Wiederhold [ 20/Aug/14 ]
http://review.couchbase.org/#/c/40765/




[MB-12055] ns_janitor may lose replicas of nearly completed vbucket moves (was: {DCP} : Delta Recovery Impossible after re-try of graceful failover since in first attempt failed) Created: 22/Aug/14  Updated: 26/Aug/14  Resolved: 26/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Parag Agarwal Assignee: Parag Agarwal
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 1186, centos 6x, 10.6.2.144-10.6.2.160

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
build 1186

 Scenario
1. Create a 7 Node cluster
2. Create default bucket with 200 K items
3. Graceful failover a node
4. Kill memcached of another node during graceful failover
5. Graceful failover the same node in step 3
6. Add-back the node with delta recovery
7. Hit Rebalance

We see the following warning:: "Fail Over Warning: Rebalance required, some data is not currently replicated!”

In Step 7, Rebalance fails for delta recovery. Says delta recovery is not possible. Although we see nodes in the cluster are in healthy state. This is true when we have 200 K items Vs 100K items where it passes.

I am attaching the logs for you to analyze. Since the above warning comes in both cases. Not sure about the internal state of the system which stops the add-back delta recovery.

Test fails for 2k items
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1436-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1437-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1439-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-1445-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1440-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1441-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1442-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1446-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1444-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1446-couch.tar.gz

Test passes for 1 K Items

https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-1458-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.144-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-150-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.145-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-151-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.146-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-153-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.147-8222014-159-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.148-8222014-154-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.149-8222014-156-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-1510-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12037/10.6.2.150-8222014-157-diag.zip

 Comments   
Comment by Aleksey Kondratenko [ 25/Aug/14 ]
http://review.couchbase.org/40886
Comment by Aleksey Kondratenko [ 25/Aug/14 ]
manifest updated here: http://review.couchbase.org/40888
Comment by Parag Agarwal [ 25/Aug/14 ]
The issues =is fixed for the scenario mentioned. Also tested it by killing 3 nodes, stopping 3 nodes, stop graceful failover.
Comment by Wayne Siu [ 26/Aug/14 ]
Reopening it for proper tagging (RC2).




[MB-11995] No way to see the username of the Read only User Created: 18/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Pavel Blagodov
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: hostname: 10.3.2.43
Centos 5.8 but this is a generic issue

Attachments: PNG File Screen Shot 2014-08-18 at 4.35.07 PM.png    
Triage: Untriaged
Is this a Regression?: No

 Description   
Go to http://10.3.2.43:8091/index.html
Login as Administrator
Password: password
Go to Settings and Click on on Account management
Create a read only user (raju_read_only)
logout and log back in after a while. You will not be able to see the name of the read_only_user (Please clear your cache if the username is displayed)
Please see the screenshot below

 Comments   
Comment by Anil Kumar [ 19/Aug/14 ]
Minor UI fix we can take it for RC2.

Read-Only username which got created needs to shown.

" <read-only username> is existing user. "
Comment by Pavel Blagodov [ 20/Aug/14 ]
http://review.couchbase.org/40747
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
http://review.couchbase.org/40747




[MB-11811] [Tools] Change UPR to DCP for tools Created: 24/Jul/14  Updated: 26/Aug/14  Resolved: 25/Jul/14

Status: Resolved
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Bin Cui Assignee: Ashvinder Singh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Bin Cui [ 24/Jul/14 ]
http://review.couchbase.org/#/c/39814/




[MB-11852] [System test] Memcached crashes during initial loading Created: 30/Jul/14  Updated: 26/Aug/14  Resolved: 31/Jul/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Ketaki Gangal Assignee: Ketaki Gangal
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1050-rel

Attachments: File stack_10.6.2.172.rtf    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
1. Setup a cluster , 2 Buckets, 1 ddoc X 2 Views
2. Load 120M, 107M items on the buckets.

- Seeing a number of memcached crashes across couple of nodes.
This is a first time the test is failing on the intial phases.

Attached stack trace from one of the nodes.
Attaching collect_info.

 

 Comments   
Comment by Mike Wiederhold [ 30/Jul/14 ]
http://review.couchbase.org/#/c/40017/
http://review.couchbase.org/#/c/40019/




[MB-10456] [Tools] Support delta recovery for rebalance Created: 13/Mar/14  Updated: 26/Aug/14  Resolved: 21/Mar/14

Status: Closed
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Bin Cui Assignee: Thuan Nguyen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-9979 Delta node recovery after failover: a... Resolved

 Description   
Adding CLI support for delta-node recovery after failover.

 Comments   
Comment by Parag Agarwal [ 26/Aug/14 ]
this is fixed. Closing it since I have tested this feature




[MB-10383] memcached process consumes high amounts of cpu Created: 06/Mar/14  Updated: 26/Aug/14  Resolved: 23/May/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, performance
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Brett Lawson Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Mac OS X 10.9.1 x64
Couchbase Server 3.0.0-355

Triage: Triaged
Operating System: MacOSX 64-bit
Is this a Regression?: Unknown

 Description   
When using Couchbase Server 3.0, even while under no load, the memcached process will frequently consume a significant amount of CPU (30%+). This affects overall system performance and makes for a remarkably frustrating development experience.

 Comments   
Comment by Sundar Sridharan [ 20/May/14 ]
Fix uploaded for review at http://review.couchbase.org/37367 thanks
Comment by Sundar Sridharan [ 22/May/14 ]
fix has been merged - avoid busy looping of threads when there is serialization. please help close the issue once fix is verified. thanks
Comment by Venu Uppalapati [ 26/Aug/14 ]
here is a representative sample of idle time couchbase(RC1 build) cpu usage on a single core VM
Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.1%st




[MB-12069] [windows] Dependency management Created: 26/Aug/14  Updated: 26/Aug/14  Resolved: 26/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Melkote Assignee: Chris Hillery
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The windows 3rd party binary dependencies should be moved from Google Drive to Amazon S3 or some other suitable location that allows access without a special client.

It will be good if the depot is versioned. If not, we should at least have a directory structure that mirrors the manifest file names of our various builds (i.e, rel-3.0.0)




[MB-10034] Implement minimal all_docs replacement for 3.0 suitable only for documents UI Created: 27/Jan/14  Updated: 26/Aug/14  Resolved: 25/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on MB-10117 [couchbase_bucket]: Implement minima... Closed
depends on MB-10118 [ns_server] Implement minimal all_do... Closed

 Description   
Implement all_docs replacement for 3.0

 Comments   
Comment by Dipti Borkar [ 27/Jan/14 ]
Is this ns_server or something the indexing team will be helping with?
Adding Siri.
Comment by Dipti Borkar [ 27/Jan/14 ]
adding Chiyoung as well, in the 3.0 timeframe, might be best to use an API on top of the id index in Couchstore.

document editing UI and N1QL will be the consumers.
Comment by Aleksey Kondratenko [ 27/Jan/14 ]
Dipti, there's misunderstanding.

_this_ ticket is created with assumption that we will do ep-engine-based all_docs. And we've had talk with niql folks and they are fine with all_docs replacement be _not usable_ to them in 3.0. AFAIK they agreed to simply create "secondary" primary key index.

Here's email I've sent to Mike that reflects our near-agreement:

MIME-Version: 1.0
Received: by 10.58.143.74 with HTTP; Thu, 23 Jan 2014 10:35:36 -0800 (PST)
Date: Thu, 23 Jan 2014 10:35:36 -0800
Delivered-To: alkondratenko@gmail.com
Message-ID: <CADpJO7x71yvr=8TCj=wTJFFbBduBkM_Je6vr-DFJFWQc1HS7Mw@mail.gmail.com>
Subject: Finalized requirements for all-docs replacement for ns_server
From: Aliaksey Kandratsenka <alkondratenko@gmail.com>
To: Mike Wiederhold <mike@couchbase.com>
Cc: mgmt_dev <mgmt_dev@couchbase.com>
Content-Type: multipart/alternative; boundary=047d7b86c660e53afe04f0a781ed

--047d7b86c660e53afe04f0a781ed
Content-Type: text/plain; charset=UTF-8

Hi.

So UI actually does support descending option. So unless we're willing to
drop it (I think that'll be perfectly fine), then ep-engine needs to handle
it too.

So we're getting something like this:

* new ep-engine command. I'll call it gladiolus (just in this document) for
lack of any other name.

* gladiolus command gets key, vbucket, limit, and direction (forward or
backward).

* it then opens corresponding vbucket (which must be in active state) file.
And walks by-id tree of it. In given direction and starting with ids >=
<given key>.

* and returns at most <limit> ids (and not full documents). There's no need
to return cas or any other metadata. Just ids

* And it needs to skip deleted docs.

* expect limit to be at most 1000 or so. For default vbucket count it'll be
usually 2 or 3.
Comment by Mike Wiederhold [ 27/Jan/14 ]
Ep-engine needs to do the initial coding so I'll make sure this gets done either by me or someone else.
Comment by Sriram Melkote [ 28/Jan/14 ]
Query will in the interim use the result of CREATE PRIMARY INDEX USING VIEW mechanism to do full bucket scan. When secondary index is available, it will switch to using those. Given that the new feature being coded into ep-engine is intended for narrow use by ns_server UI only (and not query or any other potential all_docs use case we may not know), it is best not to term it as all_docs replacement.
Comment by Aleksey Kondratenko [ 28/Jan/14 ]
Updated subject according to Sriram's suggestion
Comment by Michael Nitschinger [ 11/Feb/14 ]
So is this something we can utilize for clients? or is it purely internal
Comment by Dipti Borkar [ 11/Feb/14 ]
This is bare minimal infrastructure to get the doc UI working. Ultimately, indexing team will build a high-performance primary index which the clients can use.
So I would be cautious in adding any other consumers of this api.
Comment by Aleksey Kondratenko [ 11/Feb/14 ]
>> So is this something we can utilize for clients? or is it purely internal

Purely internal. And it's going to be dead slow. Expect something like 10 seconds per request.
Comment by Aleksey Kondratenko [ 11/Feb/14 ]
>> So is this something we can utilize for clients? or is it purely internal

I've seen that weird exchange on sdk list recently and I believe there is serious conflation of at least 3 different use-cases. And it's really important to understand exactly what is needed for each feature request. So:

* UI needs range queries with _short limits_ on (ordered) set of keys

* current implementation of niql is using _all_docs to _stream_ all keys of bucket in _sorted order_. That's _streaming_ versus returning short list of keys in order.

* upr gives you efficient and _resumable_ way to stream all keys or all docs. But _without any specific key order_.

Those are very very different use cases. We cannot do 1 efficiently. We can do 2 semi-efficiently (but 3.0 is going to the code). And we can do 3 very efficiently.

I believe that "I need something to find all docs for my hadoop/BI/warehouse/whatever integration" use-case is covered by upr. I'm not saying that it's going to be completely public for 3.0 (that's a separate topic AFAIK, but my knowledge is a bit second-hand). But in my understanding upr features and free-ness from key ordering requirements is right fit for most client "changes-feed/list of docs" requirements.
Comment by Aleksey Kondratenko [ 19/Feb/14 ]
(spotted Mike adding couchbase-bucket as component here)

Mike, your ep-engine colleagues have _already_ hijacked originally our ticket for that. Please, don't hijack second one. :)
Comment by Chiyoung Seo [ 24/Apr/14 ]
Mark it as resolved as we already merged both ns-server and ep-engine changes.
Comment by Anil Kumar [ 19/Aug/14 ]
Venu - Can you please close this ticket once you've verified it.
Comment by Venu Uppalapati [ 26/Aug/14 ]
from ns_server side is there a restriction on how big the allowable range is? I do not see any specification regarding this, although ep-engine spec lays out the implementation details for the ALL KEYS request/response structure.
Comment by Aleksey Kondratenko [ 26/Aug/14 ]
There's no spec for ns_server because UI is our spec. We'll fetch up to 1000 of docs overall. We might fetch much less than that per-vbucket. If we do that and how much is implementation detail.
Comment by Venu Uppalapati [ 26/Aug/14 ]
1) verified range get through UI
2) logged MB-12077 for UI bug
3)verified no more than 1000 docs are fetched for a given range query.




[MB-12077] Add 'Update/Fetch' button to UI docs query window. Created: 26/Aug/14  Updated: 26/Aug/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Venu Uppalapati Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2014-08-26 at 3.56.27 PM.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
refer to attached screenshot.
Steps to reproduce:
1)On Bucket->Documents screen click the Documents filter to open the range query window
2)There is a reset button to clear user input and a close button that closes the range query and fetches the docs.
3)This is un-intuitive to close the window to fetch the docs. If the user wants try a new range, the window has to be closed and opened every time.
4)There should be an Update/Fetch button that allows user to get results without closing window.




[MB-12076] Internal moxi misconfiguration Created: 22/Aug/14  Updated: 26/Aug/14

Status: Open
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: kay Assignee: Sergey Avseyev
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 6.5

Attachments: Text File normal.log     Text File problem.log    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
I have 4 servers cluster. Four buckets. One of them is default couchbase bucket with replica=1.

On one server moxi behavior is very strange. Third server's moxi lives its own live.
I've telneted to 11211 moxi's port and tried to set test keys. This key apperared only on that server, not on whole cluster. Also couchbase monitoring tool doesn't show any activity on cluster.

I've noticed that problem moxi process listens only three TCP port's:
{code}
netstat -nlpt | grep 30070
tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN 30070/moxi
tcp 0 0 :::11211 :::* LISTEN 30070/moxi
tcp 0 0 :::6696 :::* LISTEN 30070/moxi
{code}

Other servers' moxies have four listen ports:
{code}
netstat -nltp | grep 2577
tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN 2577/moxi
tcp 0 0 0.0.0.0:60593 0.0.0.0:* LISTEN 2577/moxi
tcp 0 0 :::11211 :::* LISTEN 2577/moxi
tcp 0 0 :::18347 :::* LISTEN 2577/moxi

netstat -nlpt | grep 23001
tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN 23001/moxi
tcp 0 0 0.0.0.0:11339 0.0.0.0:* LISTEN 23001/moxi
tcp 0 0 :::11211 :::* LISTEN 23001/moxi
tcp 0 0 :::5191 :::* LISTEN 23001/moxi

netstat -nlpt | grep 31535
tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN 31535/moxi
tcp 0 0 0.0.0.0:33578 0.0.0.0:* LISTEN 31535/moxi
tcp 0 0 :::11211 :::* LISTEN 31535/moxi
tcp 0 0 :::53475 :::* LISTEN 31535/moxi
{code}

So it seems that moxi on problem server was not able to listen one TCP port.

I've attached debug logs for two servers: problem and normal server.

The problem process is still running. Please let me know which logs do you need for further problem investigation.

 Comments   
Comment by kay [ 22/Aug/14 ]
I use couchbase-server-2.5.1-1083.x86_64
Comment by kay [ 22/Aug/14 ]
please change subproject to moxi for this issue




[MB-11970] vb_active_perc_mem_resident reports 0% when there are 0 items Created: 15/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Bryce Jasmer Assignee: Mike Wiederhold
Resolution: Fixed Votes: 0
Labels: rc2, stats
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I can argue both ways (0% or 100%) for what vb_active_perc_mem_resident should be when there are no items in the vbucket, but it seems like the safer choice for this condition would be to report 100% of the items are resident. Reporting 0% indicates to me there is a bad situation and everything has been flushed to disk, but since there isn't anything at all there, there is no error and everything that should be in memory is in memory, just that there isn't anything to put in memory.


 Comments   
Comment by Mike Wiederhold [ 20/Aug/14 ]
http://review.couchbase.org/#/c/40756/
http://review.couchbase.org/#/c/40759/
Comment by Anil Kumar [ 20/Aug/14 ]
Minor stats issue "approved" to be included for RC2.
Comment by Aleksey Kondratenko [ 20/Aug/14 ]
Hm. I was not aware of ep-engine side change. ns_server change is merged and is now part of 3.0 manifest. http://review.couchbase.org/40761
Comment by Mike Wiederhold [ 20/Aug/14 ]
I'm backporting the ep-engine one, but this is related to the UI so the ep-engine change is not necessary for resolving this issue since it is not used by the UI.
Comment by Wayne Siu [ 26/Aug/14 ]
Reviewed by PM/Cihan, this ticket is approved for RC2.




[MB-12032] [Debian 7] Storage age about 10-15% more for high priority bucket only as opposed to low priority bucket only. Created: 19/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Venu Uppalapati Assignee: Venu Uppalapati
Resolution: Fixed Votes: 0
Labels: RC2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: No

 Description   
Steps to reproduce:
1)Create a low priority bucket and insert 100k items.
2)get the storage age statistics.
3)delete the low priority bucket and create a high priority bucket. insert 100k items.
4)get the storage age statistics.
5)upon repeating this test, it is consistently seen that storage age in case of high priority bucket only is 10-15% higher than low priority bucket.

 Comments   
Comment by Sundar Sridharan [ 19/Aug/14 ]
fix uploaded for review at http://review.couchbase.org/#/c/40741 thanks
Comment by Sundar Sridharan [ 20/Aug/14 ]
Fix has been merged
Comment by Wayne Siu [ 26/Aug/14 ]
Reviewed by PM/Cihan, this ticket is approved for RC2.




[MB-12036] cbbackup throws error and exits with exception Created: 20/Aug/14  Updated: 26/Aug/14  Resolved: 20/Aug/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Ashvinder Singh Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: rc2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: All OSes

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Tested with build: 3.0.0-1174.rel

Setup: 3-Node cluster with XDCR between another 3-node cluster.

The cbbackup fails when try to take full backup of the cluster.
Command executed: /opt/couchbase/bin/cbbackup http://Administrator:password@172.23.106.71:8091 /tmp/backup -m full

Output from Cbbackup>>

vb_937:abs_high_seqno 0
vb_756:abs_high_seqno 0
vb_982:purge_seqno 0
vb_813:high_seqno 0
vb_921:high_seqno 0
vb_1008:uuid 54486033539771
vb_976:uuid 21079925921752


Exception in thread w2:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/opt/couchbase/lib/python/pump.py", line 286, in run_worker
    source_map, sink_map, self.ctl, curx).run()
  File "/opt/couchbase/lib/python/pump.py", line 344, in run
    rv_batch, batch = self.source.provide_batch()
  File "/opt/couchbase/lib/python/pump_dcp.py", line 113, in provide_batch
    rv, dcp_conn = self.get_dcp_conn()
  File "/opt/couchbase/lib/python/pump_dcp.py", line 395, in get_dcp_conn
    self.setup_dcp_streams()
  File "/opt/couchbase/lib/python/pump_dcp.py", line 475, in setup_dcp_streams
    if int(vbid) not in self.node_vbucket_map:
ValueError: invalid literal for int() with base 10: '751:abs'


The last build when cbbackup was working: 3.0.0-1157. The build 3.0.0-1158 when cbbackup was broken.

 Comments   
Comment by Bin Cui [ 20/Aug/14 ]
http://review.couchbase.org/#/c/40767/
Comment by Wayne Siu [ 26/Aug/14 ]
Reviewed by PM/Cihan, this ticket is approved for RC2.




[MB-11894] Documentation does not feel smooth between pages with a scroll bar and pages without. Created: 06/Aug/14  Updated: 26/Aug/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: MacOSX 64-bit
Is this a Regression?: Unknown

 Description   
This might be me being nitpicking. When you flick between these two links it does not feel smooth:

http://docs.couchbase.com/prebuilt/couchbase-manual-3.0/Features/features.html
http://docs.couchbase.com/prebuilt/couchbase-manual-3.0/Features/bcp.html

Which are sequence sections on the left hand bar; "Couchbase 3.0 Features" to "Database Change Protocol"

 Comments   
Comment by Amy Kurtzman [ 26/Aug/14 ]
Yes, it might be nitpicking. :-)

Can you clarify what the problem is? I'm not really sure what you mean. Are you commenting on the physical function of the page or the textual content of the page/

The last sentence is unclear:
"Which are sequence sections on the left hand bar; "Couchbase 3.0 Features" to "Database Change Protocol"
Is that a question?

Also please keep in mind that this section will be revised for to the GA release.
Comment by Patrick Varley [ 26/Aug/14 ]
"""
The last sentence is unclear:
"Which are sequence sections on the left hand bar; "Couchbase 3.0 Features" to "Database Change Protocol"
Is that a question?
"""
It was more a indication that it might be noticed easily as the chapters are beside each other.

 It only happens on my bigger display it looks like because "features.html" page is longer and as a result Chrome puts a scroll bar in. Which then causes everything to shift over to the left. I will record it when I'm in the office tomorrow.




[MB-12074] {Windows}:: Rebalance-in hangs Created: 26/Aug/14  Updated: 26/Aug/14