[MB-11573] replica items count mismatch on source cluster Created: 27/Jun/14  Updated: 11/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sangharsh Agarwal Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-855

Ubuntu 12.04

Issue Links:
Duplicate
is duplicated by MB-11593 Active and replica items count mismat... Resolved
Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11573/357ca392/10.3.3.144-6232014-1234-diag.zip
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11573/7f58a87c/10.3.3.144-diag.txt.gz
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11573/b7ffa0dc/10.3.3.144-6232014-122-couch.tar.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11573/501d443d/10.3.3.146-diag.txt.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11573/b43882dd/10.3.3.146-6232014-122-couch.tar.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11573/cf026514/10.3.3.146-6232014-1231-diag.zip
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11573/07012cdf/10.3.3.147-6232014-122-couch.tar.gz
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11573/66b29090/10.3.3.147-6232014-1238-diag.zip
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11573/cba12a00/10.3.3.147-diag.txt.gz

[Destination]
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11573/68cbcc2c/10.3.3.142-6232014-1230-diag.zip
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11573/cd5167da/10.3.3.142-diag.txt.gz
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11573/d75b9f56/10.3.3.142-6232014-122-couch.tar.gz
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11573/4882073a/10.3.3.143-diag.txt.gz
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11573/92b171e4/10.3.3.143-6232014-1226-diag.zip
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11573/b04e77a9/10.3.3.143-6232014-122-couch.tar.gz
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11573/68cd72b2/10.3.3.145-diag.txt.gz
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11573/c4455351/10.3.3.145-6232014-122-couch.tar.gz
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11573/d7e4cc1f/10.3.3.145-6232014-1228-diag.zip
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11573/297c9ed8/10.3.3.148-6232014-1237-diag.zip
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11573/44a7d424/10.3.3.148-6232014-123-couch.tar.gz
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11573/d2e4f811/10.3.3.148-diag.txt.gz
Is this a Regression?: Unknown

 Description   
http://qa.hq.northscale.net/job/ubuntu_x64--01_02--rebalanceXDCR-P0/17/consoleFull

[Test]
./testrunner -i ubuntu_x64--01_02--rebalanceXDCR-P0.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_in,items=100000,rdirection=bidirection,ctopology=chain,doc-ops=update-delete,doc-ops-dest=update-delete,expires=60,rebalance=destination,num_rebalance=1,GROUP=P1


[Test Error]
[2014-06-23 11:46:00,302] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:05,339] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:10,379] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:15,433] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:20,463] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:25,498] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:31,528] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket
[2014-06-23 11:46:36,566] - [task:440] WARNING - Not Ready: vb_replica_curr_items 80001 == 80000 expected on '10.3.3.146:8091''10.3.3.144:8091''10.3.3.147:8091', default bucket


[Test Steps]
1. Setup 3-3 node Src and Dest cluster.
2. Setup CAPI mode replication between default bucket.
3. Load 1M items on each cluster.
4. Add one node on Destination cluster.
5. Perform 30-30% update and delete on each cluster. Update items with expiration time of 60 seconds.
6. Wait for expiration time of 60 seconds.
8. Expecting 80000 items on each side.
9. 1 replica item on Source side was extra.

I ran this test twice on the same cluster, but couldn't reproduce solely. Please see if you find anything suspected from the logs.

 Comments   
Comment by Sangharsh Agarwal [ 27/Jun/14 ]
there are 1 more tests failed with this error in the same job.
Comment by Abhinav Dangeti [ 27/Jun/14 ]
I couldn't reproduce it either with the test case you pointed out.
If you were able to reproduce it in one of the jenkins jobs' or by yourself, I'd appreciate it if you could point me to the cluster in that state.
Comment by Sangharsh Agarwal [ 30/Jun/14 ]
Abhinav,
   I am trying to reproduce it. In addition to that just to update this issue occurring on various jobs on latest build 3.0.0-884. Approximate 5 issues are failed. If you need logs for those execution, please let me know I can give you as of now.
Comment by Abhinav Dangeti [ 30/Jun/14 ]
Sangarsh, I need the live cluster for debugging this. As I am not able to reproduce this issue and you aren't as well, please keep re-running the task or somehow monitoring the jenkins job, so that you can basically get a cluster in this state.
Comment by Sangharsh Agarwal [ 01/Jul/14 ]
Abhinav,
   Bug is re-produce on 3.0.0-884 build

[Test Logs]
https://friendpaste.com/5YBBomzEpWMeGiksM8dlHx

[Source]
10.5.2.231
10.5.2.232
10.5.2.233
10.5.2.234 -> Added node during test

[Destination]
10.5.2.228
10.5.2.229
10.5.2.230
10.3.5.68 -> Added node during test.

Cluster is Live for debugging.



[Test Error]
2014-07-01 00:02:05 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 80002 == 80000 expected on '10.5.2.232:8091''10.5.2.231:8091''10.5.2.234:8091''10.5.2.233:8091', default bucket
2014-07-01 00:02:11 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 80002 == 80000 expected on '10.5.2.232:8091''10.5.2.231:8091''10.5.2.234:8091''10.5.2.233:8091', default bucket
2014-07-01 00:02:18 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 80002 == 80000 expected on '10.5.2.232:8091''10.5.2.231:8091''10.5.2.234:8091''10.5.2.233:8091', default bucket
2014-07-01 00:02:23 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 80002 == 80000 expected on '10.5.2.232:8091''10.5.2.231:8091''10.5.2.234:8091''10.5.2.233:8091', default bucket
2014-07-01 00:02:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 80002 == 80000 expected on '10.5.2.232:8091''10.5.2.231:8091''10.5.2.234:8091''10.5.2.233:8091', default bucket
2014-07-01 00:02:35 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 80002 == 80000 expected on '10.5.2.232:8091''10.5.2.231:8091''10.5.2.234:8091''10.5.2.233:8091', default bucket
Comment by Abhinav Dangeti [ 01/Jul/14 ]
Thanks sangharsh, I'll let you know once I'm done.
replica vbuckets 770 & 801 have 1 delete less when compared to their actives'.
Comment by Aruna Piravi [ 01/Jul/14 ]
Hit a similar problem in system tests where there is a difference of 1item in bi-xdcr when both clusters are compared. Total items : ~100M Will attach cbcollect info. Chiyoung thinks the root cause could be same. So attaching logs to this MB.

Live clusters available for investigation: http://172.23.105.44:8091/ http://172.23.105.54:8091/
Comment by Aruna Piravi [ 01/Jul/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11573/C1.tar
https://s3.amazonaws.com/bugdb/jira/MB-11573/C2.tar

Pls let me know if you need the clusters. Thanks.
Comment by Sundar Sridharan [ 01/Jul/14 ]
In Aruna's cluster we see more of this issue…
$ ./cbvdiff 172.23.105.54:11210,172.23.105.55:11210,172.23.105.57:11210,172.23.105.58:11210,172.23.105.60:11210,172.23.105.61:11210,172.23.105.62:11210,172.23.105.63:11210 -b standardbucket
VBucket 42: active count 96859 != 96860 replica count

VBucket 50: active count 96923 != 96924 replica count

VBucket 94: active count 96918 != 96919 replica count

VBucket 196: active count 96791 != 96792 replica count

VBucket 391: active count 96911 != 96912 replica count

VBucket 418: active count 97009 != 97010 replica count

VBucket 427: active count 96772 != 96773 replica count

VBucket 488: active count 96717 != 96718 replica count

VBucket 787: active count 96729 != 96730 replica count

Active item count = 99136544
---
Comment by Sundar Sridharan [ 01/Jul/14 ]
In cluster 10.5.2.234 we see one item in vb_replica write queue that does not seem to decrement to zero in vbucket 770
Comment by Sundar Sridharan [ 02/Jul/14 ]
Since this issue seems to be associated with deleteWithMeta not being replicated to the replica node, I have created a new toy build that logs deletes on replica.
could you please help reproduce this with the toy build
couchbase-server-community_cent58-3.0.0-toy-sundar-x86_64_3.0.0.rpm?
thanks in advance
Comment by Sangharsh Agarwal [ 03/Jul/14 ]
Can you please merge your changes, I will verify from update RPM.
Comment by Sundar Sridharan [ 03/Jul/14 ]
Sangharsh, I cannot merge these changes, because we do not want to log these per-document messages at any log level otherwise it will easily mask other important messages.
The toy build for this is http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_cent58-3.0.0-toy-sundar-x86_64_3.0.0-701-toy.rpm

If this cannot be done, please let me know? thanks
Comment by Sangharsh Agarwal [ 07/Jul/14 ]
Test cases are running, I will update you once finished.
Comment by Sangharsh Agarwal [ 07/Jul/14 ]
Sundar, XDCR Encryption is disabled in Community version, I can not run tests on this toy build. Can you please provide toy build on Enterprise version or Ubuntu (Debian package).
Comment by Sundar Sridharan [ 07/Jul/14 ]
Sangharsh, it looks like we do not have ubuntu toy builders at this moment due to infrastructure issues. But we believe this problem should be seen on CentOs as well. Could you please help reproduce this issue on centos machines? thanks
Comment by Sundar Sridharan [ 07/Jul/14 ]
Sangharsh, There are a few CentOS machines that I would like to use to see if I can reproduce the issue on my own too. Could you please share the details of the file ubuntu_x64--01_02--rebalanceXDCR-P0.ini as mentioned in the command..
./testrunner -i ubuntu_x64--01_02--rebalanceXDCR-P0.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_in,items=100000,rdirection=bidirection,ctopology=chain,doc-ops=update-delete,doc-ops-dest=update-delete,expires=60,rebalance=destination,num_rebalance=1,GROUP=P1
Please let me know if there are any specific test instructions other than the above too?
thanks in advance
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
Sundar,
    Sorry, If I was not clear in my last comment:

1. This bug occurred on Ubuntu Vms initially, Still occurring with Ubuntu.
2. On build 3.0.0-918, this issue only occurred with Rebalance + XDCR SSL, not with normal XDCR. And SSL is not enable in this community toy build.

You can not verify the bug, if test run without above.

Additionally ubuntu_x64--01_02--rebalanceXDCR-P0.ini contain the ubuntu VMs.

Anyways, CentOS VMs are below, Please go ahead with verification.

[global]
username:root
password:couchbase
port:8091

[cluster1]
1:_1
2:_2
3:_3


[cluster2]
4:_4
5:_5
6:_6

[servers]
1:_1
2:_2
3:_3
4:_4
5:_5
6:_6
7:_7
8:_8

[_1]
ip:10.5.2.228

[_2]
ip:10.5.2.229

[_3]
ip:10.5.2.230

[_4]
ip:10.5.2.231

[_5]
ip:10.5.2.232

[_6]
ip:10.5.2.233

[_7]
ip:10.5.2.234

[_8]
ip:10.3.5.68

[membase]
rest_username:Administrator
rest_password:password
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
Sundar,
    I am able to re-produce the bug with toy build also:

[Test Log]
test.log : https://s3.amazonaws.com/bugdb/jira/MB-11573/6f3ccc42/test.log

[Server Logs]

[Source]
10.5.2.231 : https://s3.amazonaws.com/bugdb/jira/MB-11573/3fa17d4c/10.5.2.231-782014-623-diag.zip
10.5.2.231 : https://s3.amazonaws.com/bugdb/jira/MB-11573/5116e212/10.5.2.231-diag.txt.gz
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11573/05bbeb62/10.5.2.232-diag.txt.gz
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11573/15b11b01/10.5.2.232-782014-67-couch.tar.gz
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11573/d388f019/10.5.2.232-782014-621-diag.zip
10.5.2.233 : https://s3.amazonaws.com/bugdb/jira/MB-11573/8b59760a/10.5.2.233-diag.txt.gz
10.5.2.233 : https://s3.amazonaws.com/bugdb/jira/MB-11573/bf4a6786/10.5.2.233-782014-626-diag.zip

[Destination]
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11573/0c3e02f3/10.5.2.228-782014-67-couch.tar.gz
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11573/cac82b8b/10.5.2.228-782014-616-diag.zip
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11573/e24d1ab4/10.5.2.228-diag.txt.gz
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11573/307bc81f/10.5.2.229-782014-67-couch.tar.gz
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11573/ac67e5f4/10.5.2.229-782014-619-diag.zip
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11573/cbcbada6/10.5.2.229-diag.txt.gz
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11573/0ee89b4a/10.5.2.230-782014-618-diag.zip
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11573/66d42064/10.5.2.230-782014-67-couch.tar.gz
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11573/7cbbe047/10.5.2.230-diag.txt.gz


[Test Steps]
1. Setup 3-3 Node Source and Destination Cluster.
2. Setup capi Mode Bi-XDCR.
3. Load 1M items on each cluster asychronously.
4. Rebalance out 2 nodes from Source cluster during data load.
5. After Rebalance, Verify items on each cluster. Test failed as Destination Cluster has 2 items less than.

[2014-07-08 06:00:44,001] - [task:440] WARNING - Not Ready: vb_replica_curr_items 199998 == 200000 expected on '10.5.2.228:8091''10.5.2.230:8091''10.5.2.229:8091', default bucket
[2014-07-08 06:00:47,033] - [task:440] WARNING - Not Ready: curr_items 199998 == 200000 expected on '10.5.2.228:8091''10.5.2.230:8091''10.5.2.229:8091', default bucket
[2014-07-08 06:00:48,062] - [task:440] WARNING - Not Ready: vb_active_curr_items 199998 == 200000 expected on '10.5.2.228:8091''10.5.2.230:8091''10.5.2.229:8091', default bucket


Comment by Sangharsh Agarwal [ 08/Jul/14 ]
I am re-running the test to leave live cluster for you to investigate.
Comment by Sundar Sridharan [ 08/Jul/14 ]
This is interesting, looks like the symptoms here are quite different from the initial one mentioned in the bug. There is no mismatch between active and replica items on the destination cluster.
./cbvdiff 10.5.2.228:11210,10.5.2.230:11210,10.5.2.229:11210
Active item count = 199999
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
I logged https://www.couchbase.com/issues/browse/MB-11593 for this issue too, but it was marked as duplicate of this issue.
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
Cluster is Live for Investigation. You can use VMs for investgation/run the test.
Comment by Sundar Sridharan [ 08/Jul/14 ]
Thanks Sangharsh, just to confirm - 10.5.2.231 and 10.5.2.233 were the nodes from the source cluster that were rebalanced out right?
I am on the cluster right now.. Please let me know if you need it back.
Comment by Sundar Sridharan [ 08/Jul/14 ]
vbucket 303 has 212 items on source and only 211 items on destination.
id: loadOne5836 is present on 10.5.2.232 (src) and not on destination node 10.5.2.228
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
Yes, 10.5.2.231, 233 were rebalance out.
Comment by Sundar Sridharan [ 08/Jul/14 ]
Sangharsh, so were all keys with prefix loadOne were inserted into cluster 1 comprising of 10.5.2.231, 10.5.2.232 and 10.5.2.233 while all keys with prefix loadTwo were inserted into cluster 2 comprising of 10.5.2.228, 10.5.2.229, 10.5.2.230?
Also could you please tell us if the workload overlaps the key space across the source and destination clusters (which means loadOne keys can be inserted both into cluster 1 as well as cluster 2) ?
Comment by Aruna Piravi [ 08/Jul/14 ]
Sangharsh is not available at this time. So answering Sundar's qn .

> Sangharsh, so were all keys with prefix loadOne were inserted into cluster 1 comprising of 10.5.2.231, 10.5.2.232 and 10.5.2.233 while all keys with prefix loadTwo were inserted into cluster 2 comprising of 10.5.2.228, 10.5.2.229, 10.5.2.230?
Yes, you are correct. loadOne* goes to all servers listed under [cluster1] in .ini, loadTwo* gets loaded to servers listed under [cluster2].

>Also could you please tell us if the workload overlaps the key space across the source and destination clusters (which means loadOne keys can be inserted both into cluster 1 as well as cluster 2) ?
I checked the code, we are not doing updates/deletes on overlapping key space in this test.
Comment by Sundar Sridharan [ 08/Jul/14 ]
Looks like on the Producer the start seqno has skipped one item
memcached.log.14.txt:19313:Tue Jul 8 10:18:29.198224 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-6098641df836bdbfff9953ad74a05bbe - (vb 303) stream created with start seqno 0 and end seqno 21
memcached.log.14.txt:21205:Tue Jul 8 10:18:35.825112 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-0412be9c4abad09bd0892f2d827d7f5f - (vb 303) stream created with start seqno 22 and end seqno 24 <<<<<<<<<<<<<<<<---------------------!!
memcached.log.14.txt:23783:Tue Jul 8 10:18:49.391231 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-21d7493b0123d12d6ac5723df80b154f - (vb 303) stream created with start seqno 24 and end seqno 26
memcached.log.14.txt:43257:Tue Jul 8 10:20:07.665858 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-886c34d2cd6b30b88c77ededf205e086 - (vb 303) stream created with start seqno 26 and end seqno 105

And we see that the missing item (loadOne5836) also has the seqno 21
Doc seq: 21
     id: loadOne5836
     rev: 1
     content_meta: 131
     size (on disk): 40
     cas: 731155696354949, expiry: 0, flags: 0
Comment by Sundar Sridharan [ 08/Jul/14 ]
fix uploaded at http://review.couchbase.org/#/c/39224/
Comment by Chiyoung Seo [ 08/Jul/14 ]
The fix was merged. Please retest it when the new build is ready.
Comment by Sangharsh Agarwal [ 09/Jul/14 ]
Issue occurred on latest build i.e. 3.0.0-942 again:

[Jenkin]
http://qa.hq.northscale.net/job/ubuntu_x64--01_02--rebalanceXDCR-P0/23/consoleFull

[Test]
./testrunner -i ubuntu_x64--01_02--rebalanceXDCR-P0.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_in,items=100000,rdirection=unidirection,ctopology=chain,doc-ops=update-delete,expires=60,rebalance=source-destination,num_rebalance=1,GROUP=P1


[Test Logs]
[2014-07-08 23:58:42,637] - [task:456] WARNING - Not Ready: vb_replica_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:44,677] - [task:456] WARNING - Not Ready: curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:46,718] - [task:456] WARNING - Not Ready: vb_active_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:47,743] - [task:456] WARNING - Not Ready: vb_replica_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:49,784] - [task:456] WARNING - Not Ready: curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:51,827] - [task:456] WARNING - Not Ready: vb_active_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:52,873] - [task:456] WARNING - Not Ready: vb_replica_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:54,913] - [task:456] WARNING - Not Ready: curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:56,952] - [task:456] WARNING - Not Ready: vb_active_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket
[2014-07-08 23:58:57,994] - [task:456] WARNING - Not Ready: vb_replica_curr_items 54999 == 40000 expected on '10.3.3.143:8091''10.3.3.145:8091''10.3.3.142:8091''10.3.3.149:8091', default bucket

There were metadata mismtach found too, which shows many deletion were not replicated to destination cluster:

[2014-07-09 00:05:17,634] - [xdcrbasetests:1255] INFO - Verifying RevIds for 10.3.3.146 -> 10.3.3.143, bucket: default
[2014-07-09 00:05:17,950] - [data_helper:289] INFO - creating direct client 10.3.3.144:11210 default
[2014-07-09 00:05:18,421] - [data_helper:289] INFO - creating direct client 10.3.3.146:11210 default
[2014-07-09 00:05:18,823] - [data_helper:289] INFO - creating direct client 10.3.3.148:11210 default
[2014-07-09 00:05:19,226] - [data_helper:289] INFO - creating direct client 10.3.3.147:11210 default
[2014-07-09 00:05:19,876] - [data_helper:289] INFO - creating direct client 10.3.3.142:11210 default
[2014-07-09 00:05:20,316] - [data_helper:289] INFO - creating direct client 10.3.3.143:11210 default
[2014-07-09 00:05:20,771] - [data_helper:289] INFO - creating direct client 10.3.3.149:11210 default
[2014-07-09 00:05:21,188] - [data_helper:289] INFO - creating direct client 10.3.3.145:11210 default
[2014-07-09 00:06:30,464] - [task:1161] INFO - RevId Verification : 40000 existing items have been verified
[2014-07-09 00:06:30,478] - [task:1220] ERROR - ===== Verifying rev_ids failed for key: loadOne80026 =====
[2014-07-09 00:06:30,478] - [task:1221] ERROR - deleted mismatch: Source deleted:1, Destination deleted:0, Error Count:1
[2014-07-09 00:06:30,478] - [task:1221] ERROR - seqno mismatch: Source seqno:2, Destination seqno:1, Error Count:2
[2014-07-09 00:06:30,478] - [task:1221] ERROR - cas mismatch: Source cas:10784420840980343, Destination cas:10784420840980342, Error Count:3
[2014-07-09 00:06:30,479] - [task:1222] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 10784420840980343, 'flags': 0, 'expiration': 1404888661}
[2014-07-09 00:06:30,479] - [task:1223] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 10784420840980342, 'flags': 0, 'expiration': 0}
[2014-07-09 00:06:30,487] - [task:1220] ERROR - ===== Verifying rev_ids failed for key: loadOne6230 =====
[2014-07-09 00:06:30,488] - [task:1221] ERROR - deleted mismatch: Source deleted:1, Destination deleted:0, Error Count:4
[2014-07-09 00:06:30,488] - [task:1221] ERROR - seqno mismatch: Source seqno:3, Destination seqno:1, Error Count:5
[2014-07-09 00:06:30,488] - [task:1221] ERROR - cas mismatch: Source cas:10784931188024077, Destination cas:10784393526456264, Error Count:6
[2014-07-09 00:06:30,489] - [task:1222] ERROR - Source meta data: {'deleted': 1, 'seqno': 3, 'cas': 10784931188024077, 'flags': 0, 'expiration': 1404889014}
[2014-07-09 00:06:30,489] - [task:1223] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 10784393526456264, 'flags': 0, 'expiration': 0}
[2014-07-09 00:06:30,494] - [task:1220] ERROR - ===== Verifying rev_ids failed for key: loadOne77329 =====
[2014-07-09 00:06:30,494] - [task:1221] ERROR - deleted mismatch: Source deleted:1, Destination deleted:0, Error Count:7
[2014-07-09 00:06:30,495] - [task:1221] ERROR - seqno mismatch: Source seqno:2, Destination seqno:1, Error Count:8
[2014-07-09 00:06:30,495] - [task:1221] ERROR - cas mismatch: Source cas:10784419815115230, Destination cas:10784419815115229, Error Count:9
[2014-07-09 00:06:30,495] - [task:1222] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 10784419815115230, 'flags': 0, 'expiration': 1404888644}
[2014-07-09 00:06:30,495] - [task:1223] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 10784419815115229, 'flags': 0, 'expiration': 0}
[2014-07-09 00:06:30,508] - [task:1220] ERROR - ===== Verifying rev_ids failed for key: loadOne90011 =====
[2014-07-09 00:06:30,509] - [task:1221] ERROR - deleted mismatch: Source deleted:1, Destination deleted:0, Error Count:10
[2014-07-09 00:06:30,509] - [task:1221] ERROR - seqno mismatch: Source seqno:2, Destination seqno:1, Error Count:11
[2014-07-09 00:06:30,509] - [task:1221] ERROR - cas mismatch: Source cas:10784424284736794, Destination cas:10784424284736793, Error Count:12
[2014-07-09 00:06:30,509] - [task:1222] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 10784424284736794, 'flags': 0, 'expiration': 1404888730}
[2014-07-09 00:06:30,510] - [task:1223] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 10784424284736793, 'flags': 0, 'expiration': 0}


[Test Step]
1. Setup 3-3 node Src and Destination cluster.
2. Setup unixdcr CAPI mode Source -> Destination.
3. Load 1M items on Source.
4. Add 1 node at Source cluster and 1 node at destination cluster.
5. Update 30% (Expiration time of 60 seconds) and Delete 30% items on Source.
6. Verify items. Lesser items found on Destination cluster.

[Server Logs]

[Source]
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11573/2956573a/10.3.3.144-diag.txt.gz
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11573/bc5a1183/10.3.3.144-792014-022-diag.zip
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11573/e54806e5/10.3.3.144-792014-06-couch.tar.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11573/86ea590a/10.3.3.146-792014-019-diag.zip
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11573/95f08cd8/10.3.3.146-diag.txt.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11573/bd342640/10.3.3.146-792014-06-couch.tar.gz
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11573/26c6ec6e/10.3.3.148-792014-06-couch.tar.gz
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11573/545c1e7c/10.3.3.148-792014-023-diag.zip
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11573/eb81965a/10.3.3.148-diag.txt.gz
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11573/53ab8e05/10.3.3.147-792014-024-diag.zip
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11573/c0a49871/10.3.3.147-diag.txt.gz
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11573/e05fb7f2/10.3.3.147-792014-06-couch.tar.gz

10.3.3.148 -> Added node at Source.

[Destination]
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11573/29a36a28/10.3.3.142-792014-06-couch.tar.gz
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11573/60d75961/10.3.3.142-diag.txt.gz
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11573/85967976/10.3.3.142-792014-018-diag.zip
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11573/704327af/10.3.3.143-792014-06-couch.tar.gz
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11573/7aeb0fe5/10.3.3.143-diag.txt.gz
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11573/81482a1f/10.3.3.143-792014-015-diag.zip
10.3.3.149 : https://s3.amazonaws.com/bugdb/jira/MB-11573/b88f45cf/10.3.3.149-792014-07-couch.tar.gz
10.3.3.149 : https://s3.amazonaws.com/bugdb/jira/MB-11573/d070e995/10.3.3.149-diag.txt.gz
10.3.3.149 : https://s3.amazonaws.com/bugdb/jira/MB-11573/e9d83bfb/10.3.3.149-792014-026-diag.zip
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11573/49a3f167/10.3.3.145-diag.txt.gz
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11573/666f3c03/10.3.3.145-792014-06-couch.tar.gz
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11573/c4ddf1ca/10.3.3.145-792014-017-diag.zip

10.3.3.149 -> Added node at destination
Comment by Sundar Sridharan [ 09/Jul/14 ]
Sangharsh, could you please try to reproduce this issue with toy build couchbase-server-community_cent58-3.0.0-toy-sundar-x86_64_3.0.0-702-toy.rpm which contains the latest ep-engine fixes along with the logging for deleted items on replica.
Also it would be great if you could leave the cluster running when the issue reproduces?
thanks
Comment by Aruna Piravi [ 10/Jul/14 ]
I could not reproduce this mismatch on Sundar's toybuild. Active and replica items match. In any case, this problem is not consistent so this run may not be any helpful.

Arunas-MacBook-Pro:testrunner apiravi$ ./scripts/ssh.py -i bixdcr.ini "/opt/couchbase/bin/cbstats localhost:11210 all|grep vb_replica_curr_items"
10.3.4.189
 vb_replica_curr_items: 26636

10.3.4.186
 vb_replica_curr_items: 40000

10.3.4.187
 vb_replica_curr_items: 40000

10.3.4.190
 vb_replica_curr_items: 26660

10.3.4.188
 vb_replica_curr_items: 26704

Arunas-MacBook-Pro:testrunner apiravi$ ./scripts/ssh.py -i bixdcr.ini "/opt/couchbase/bin/cbstats localhost:11210 all|grep vb_active_curr_items"
10.3.4.186
 vb_active_curr_items: 40000

10.3.4.187
 vb_active_curr_items: 40000

10.3.4.189
 vb_active_curr_items: 26644

10.3.4.188
 vb_active_curr_items: 26727

10.3.4.190
 vb_active_curr_items: 26629

Arunas-MacBook-Pro:testrunner apiravi$
Comment by Sangharsh Agarwal [ 10/Jul/14 ]
Yes, this bug don't occur always.

I have started rebalance job with this toy build : http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/13/consoleFull.


Comment by Sundar Sridharan [ 11/Jul/14 ]
Sangharsh, if issue does not reproduce with toy-build please feel free to use the latest daily build. It is only important that the cluster be left as is if issue reoccurs. Thanks




[MB-11672] Missing items in index after rebalance (Intermittent failure) Created: 08/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sarath Lakshman Assignee: Meenakshi Goel
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates MB-11641 {UPR}:: Reading from views timing out... Closed
Relates to
relates to MB-11371 Corruption in PartitionVersions struc... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Test:
NODES=4 TEST=rebalance.rebalanceinout.RebalanceInOutTests.measure_time_index_during_rebalance,items=200000,data_perc_add=30,nodes_init=3,nodes_in=1,skip_cleanup=True,nodes_out=1,num_ddocs=2,num_views=2,max_verify=50000,value_size=1024,GROUP=IN_OUT make any-test

Once in three or four times, it is found that view query results have lesser items than expected number of items.

Logs:
https://s3.amazonaws.com/bugdb/jira/MB-11371/f9ad56ee/172.23.107.24-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11371/07e24114/172.23.107.25-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11371/a9c9a36d/172.23.107.26-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11371/2517f70b/172.23.107.27-diag.zip

 Comments   
Comment by Parag Agarwal [ 08/Jul/14 ]
What is the output of the test run?
Comment by Parag Agarwal [ 08/Jul/14 ]
Sarath: Did you hit this issue while verifying https://www.couchbase.com/issues/browse/MB-11641 ?
Comment by Parag Agarwal [ 08/Jul/14 ]
Saw this in 935
Comment by Sriram Melkote [ 10/Jul/14 ]
Sarath mentioned on today's codebase, we're not hitting it - it's not clear it it's just reduced in frequency, or was fixed by recent changes. Will update again.
Comment by Sarath Lakshman [ 10/Jul/14 ]
I tried a lot of time trying to reproduce this on latest code. Meenakshi, you see if it can be reproduced with latest build ?




[MB-11632] MemcachedError: Memcached error #7 'Not my vbucket': Created: 03/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Sangharsh Agarwal Assignee: Sangharsh Agarwal
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-918

Issue Links:
Duplicate
duplicates MB-11345 {UPR}: MemcachedError: Memcached erro... Resolved
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: 10.3.5.68 : https://s3.amazonaws.com/bugdb/jira/MB-11632/100dfc8a/10.3.5.68-722014-751-diag.zip
10.3.5.68 : https://s3.amazonaws.com/bugdb/jira/MB-11632/237dba2b/10.3.5.68-diag.txt.gz
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11632/24138e56/10.5.2.228-722014-736-diag.zip
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11632/c8b1fc6a/10.5.2.228-diag.txt.gz
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11632/5ad9bbe6/10.5.2.229-diag.txt.gz
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11632/b27540e8/10.5.2.229-722014-739-diag.zip
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11632/430d06d3/10.5.2.230-diag.txt.gz
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11632/4b889db9/10.5.2.230-722014-741-diag.zip
10.5.2.231 : https://s3.amazonaws.com/bugdb/jira/MB-11632/55491803/10.5.2.231-diag.txt.gz
10.5.2.231 : https://s3.amazonaws.com/bugdb/jira/MB-11632/c67b87d4/10.5.2.231-722014-743-diag.zip
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11632/844d3c91/10.5.2.232-722014-747-diag.zip
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11632/c34de89d/10.5.2.232-diag.txt.gz
10.5.2.233 : https://s3.amazonaws.com/bugdb/jira/MB-11632/acff04e6/10.5.2.233-diag.txt.gz
10.5.2.233 : https://s3.amazonaws.com/bugdb/jira/MB-11632/fabedc0b/10.5.2.233-722014-749-diag.zip
10.5.2.234 : https://s3.amazonaws.com/bugdb/jira/MB-11632/2f79e1e6/10.5.2.234-722014-750-diag.zip
10.5.2.234 : https://s3.amazonaws.com/bugdb/jira/MB-11632/4ff680d5/10.5.2.234-diag.txt.gz
test.logging.conf : https://s3.amazonaws.com/bugdb/jira/MB-11632/7645b00a/test.logging.conf
test.log : https://s3.amazonaws.com/bugdb/jira/MB-11632/42a3e2e9/test.log
Is this a Regression?: Unknown

 Description   
http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/10/consoleFull

[Test]
./testrunner -i centos_x64--107_01--rebalanceXDCR-P1.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_out,items=100000,rdirection=bidirection,async_load=True,ctopology=chain,expires=60,rebalance=source,num_rebalance=2,max_verify=10000,GROUP=P1


[Operation Failed]
Starting rebalance-out nodes:['10.5.2.229', '10.5.2.230'] at source cluster 10.5.2.228


[Test Error]
ERROR: async_rebalance_out (xdcr.rebalanceXDCR.Rebalance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytests/xdcr/rebalanceXDCR.py", line 101, in async_rebalance_out
    load_task.result()
  File "lib/tasks/future.py", line 153, in result
    return self.__get_result()
  File "lib/tasks/future.py", line 112, in __get_result
    raise self._exception
MemcachedError: Memcached error #7 'Not my vbucket': {"rev":3759,"name":"default","uri":"/pools/default/buckets/default?bucket_uuid=e6b476e6e03693dbe324ca738a5ec615","streamingUri":"/pools/default/bucketsStreaming/default?bucket_uuid=e6b476e6e03693dbe324ca738a5ec615","nodes":[{"couchApiBase":"http://10.5.2.228:8092/default%2Be6b476e6e03693dbe324ca738a5ec615","hostname":"10.5.2.228:8091","ports":{"proxy":11211,"direct":11210}},{"couchApiBase":"http://10.5.2.229:8092/default%2Be6b476e6e03693dbe324ca738a5ec615","hostname":"10.5.2.229:8091","ports":{"proxy":11211,"direct":11210}},{"couchApiBase":"http://10.5.2.230:8092/default%2Be6b476e6e03693dbe324ca738a5ec615","hostname":"10.5.2.230:8091","ports":{"proxy":11211,"direct":11210}}],"nodesExt":[{"services":{"mgmt":8091,"capi":8092,"moxi":11211,"kv":11210,"kvSSL":11207,"capiSSL":18092,"mgmtSSL":18091},"hostname":"10.5.2.228"},{"services":{"mgmt":8091,"capi":8092,"moxi":11211,"kv":11210,"kvSSL":11207,"capiSSL":18092,"mgmtSSL":18091},"hostname":"10.5.2.229"},{"services":{"mgmt":8091,"capi":8092,"moxi":11211,"kv":11210,"kvSSL":11207,"capiSSL":18092,"mgmtSSL":18091},"hostname":"10.5.2.230"}],"nodeLocator":"vbucket","uuid":"e6b476e6e03693dbe324ca738a5ec615","ddocs":{"uri":"/pools/default/buckets/default/ddocs"},"vBucketServerMap":{"hashAlgorithm":"CRC","numReplicas":1,"serverList":["10.5.2.228:11210","10.5.2.229:11210","10.5.2.230:11210"],"vBucketMap":[[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,1],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[0,2],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,0],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[1,2],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,0],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[2,1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1]],"vBucketMapForward":[[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1],[0,-1]]},"bucketCapabilitiesVer":"","bucketCapabilities":["cbhello","touch","couchapi","cccp"]} for vbucket :970 to mc 10.5.2.228:11210

----------------------------------------------------------------------
Ran 1 test in 402.901s



[Test Steps]
1. Load data asynchronously during Rebalance-out operation. Bi-XDCR is also going on in parallel.

Test failed with Not my vbucket error.

 Comments   
Comment by Aleksey Kondratenko [ 03/Jul/14 ]
I'm not seeing any errors in logs and you're should to be aware that not-my-vbucket during rebalance is expected and clients must be able to handle it.
Comment by Aleksey Kondratenko [ 03/Jul/14 ]
Actually found some rebalance failure, but it's not due to not-my-vbucket.
Comment by Sangharsh Agarwal [ 07/Jul/14 ]
Alk, What rebalance failure you found from the logs?
Comment by Aleksey Kondratenko [ 07/Jul/14 ]
It was looking like logs of unrelated cluster.
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
Please locate the logs between below timestamp from the attached logs.

[2014-07-02 07:05:46,631] - [xdcrbasetests:633] INFO - Starting rebalance-out nodes:['10.5.2.229', '10.5.2.230'] at source cluster 10.5.2.228
[2014-07-02 07:05:47,419] - [rest_client:1087] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.5.2.229%2Cns_1%4010.5.2.230&user=Administrator&knownNodes=ns_1%4010.5.2.230%2Cns_1%4010.5.2.229%2Cns_1%4010.5.2.228
[2014-07-02 07:05:47,440] - [rest_client:1091] INFO - rebalance operation started
[2014-07-02 07:05:47,451] - [rest_client:1208] INFO - rebalance percentage : 0 %
[2014-07-02 07:05:57,471] - [rest_client:1208] INFO - rebalance percentage : 3.54246188327 %
[2014-07-02 07:06:07,496] - [rest_client:1208] INFO - rebalance percentage : 7.81784691481 %
[2014-07-02 07:06:17,514] - [rest_client:1208] INFO - rebalance percentage : 9.89446250156 %
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/task.py', 525, 'run', 'self.next()'), ('lib/tasks/task.py', 682, 'next', 'self._unlocked_create(partition, key, value, is_base64_value=is_base64_value)'), ('lib/tasks/task.py', 554, '_unlocked_create', 'self.set_exception(error)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
Wed Jul 2 07:06:21 2014
[2014-07-02 07:06:27,534] - [rest_client:1208] INFO - rebalance percentage : 12.8261550946 %
[2014-07-02 07:06:37,551] - [rest_client:1208] INFO - rebalance percentage : 15.3913861135 %
[2014-07-02 07:06:47,566] - [rest_client:1208] INFO - rebalance percentage : 16.9793862681 %
[2014-07-02 07:06:57,582] - [rest_client:1208] INFO - rebalance percentage : 19.422463429 %
[2014-07-02 07:07:07,599] - [rest_client:1208] INFO - rebalance percentage : 20.6440020094 %
[2014-07-02 07:07:17,616] - [rest_client:1208] INFO - rebalance percentage : 22.232002164 %
[2014-07-02 07:07:27,633] - [rest_client:1208] INFO - rebalance percentage : 23.8200023186 %
[2014-07-02 07:07:37,649] - [rest_client:1208] INFO - rebalance percentage : 25.5301563312 %
[2014-07-02 07:07:47,668] - [rest_client:1208] INFO - rebalance percentage : 26.2630794794 %
[2014-07-02 07:07:57,684] - [rest_client:1208] INFO - rebalance percentage : 27.6067719179 %
[2014-07-02 07:08:07,707] - [rest_client:1208] INFO - rebalance percentage : 29.6833875047 %
[2014-07-02 07:08:17,734] - [rest_client:1208] INFO - rebalance percentage : 31.2713876592 %
[2014-07-02 07:08:27,773] - [rest_client:1208] INFO - rebalance percentage : 32.9815416719 %
[2014-07-02 07:08:37,802] - [rest_client:1208] INFO - rebalance percentage : 36.7683112712 %
[2014-07-02 07:08:47,826] - [rest_client:1208] INFO - rebalance percentage : 40.7993885867 %
[2014-07-02 07:08:57,849] - [rest_client:1208] INFO - rebalance percentage : 45.9298506245 %
[2014-07-02 07:09:07,885] - [rest_client:1208] INFO - rebalance percentage : 51.9153896687 %
[2014-07-02 07:09:17,911] - [rest_client:1208] INFO - rebalance percentage : 57.7787748548 %
[2014-07-02 07:09:27,936] - [rest_client:1208] INFO - rebalance percentage : 64.008621615 %
[2014-07-02 07:09:37,955] - [rest_client:1208] INFO - rebalance percentage : 70.6049299494 %
[2014-07-02 07:09:47,980] - [rest_client:1208] INFO - rebalance percentage : 77.3233921418 %
[2014-07-02 07:09:58,011] - [rest_client:1208] INFO - rebalance percentage : 82.9661561537 %
[2014-07-02 07:10:08,046] - [rest_client:1208] INFO - rebalance percentage : 90.8247925817 %
[2014-07-02 07:10:18,066] - [rest_client:1208] INFO - rebalance percentage : 100 %
[2014-07-02 07:10:28,397] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:29,402] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:30,406] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:31,410] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:32,414] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:33,417] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:34,421] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:35,424] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:36,428] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.229:8091/pools error [Errno 111] Connection refused
[2014-07-02 07:10:37,436] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.230:8091/nodes/self error [Errno 111] Connection refused
[2014-07-02 07:10:38,440] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.230:8091/nodes/self error [Errno 111] Connection refused
[2014-07-02 07:10:39,444] - [rest_client:750] ERROR - socket error while connecting to http://10.5.2.230:8091/nodes/self error [Errno 111] Connection refused
[2014-07-02 07:10:40,456] - [task:392] INFO - rebalancing was completed with progress: 100% in 293.015354872 sec

Comment by Aleksey Kondratenko [ 08/Jul/14 ]
Logs indicate that node 229 was leaving cluster at this time.

Naturally, trying to bootstrap from node that's out of cluster and is restarting itself is pointless.
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
This test is failing continuously failing from some past builds:

http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/10/consoleFull -> 918
http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/9/consoleFull -> 884
http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/8/consoleFull -> 855
http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/7/consoleFull -> 814

Comment by Sangharsh Agarwal [ 08/Jul/14 ]
Changed summary, I observed that Rebalance was completed 100% though data writing failed with this error. My mistake. Changed the subject. Passing it to Couchbase-bucket team.
Comment by David Liao [ 09/Jul/14 ]
It looks like an issue in the testing script.

The error (not_my_vbucket) was returned when loading data during the rebalance_out operation as indicated by the log:

[2014-07-02 07:06:17,514] - [rest_client:1208] INFO - rebalance percentage : 9.89446250156 %
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/task.py', 525, 'run', 'self.next()'), ('lib/tasks/task.py', 682, 'next', 'self._unlocked_create(partition, key, value, is_base64_value=is_base64_value)'), ('lib/tasks/task.py', 554, '_unlocked_create', 'self.set_exception(error)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
Wed Jul 2 07:06:21 2014
[2014-07-02 07:06:27,534] - [rest_client:1208] INFO - rebalance percentage : 12.8261550946 %

The test script should handle this error intend of raising an exception and treating it as an failure.
Comment by Sangharsh Agarwal [ 10/Jul/14 ]
Chiyoung,
   Can you please look into this. It seems to me a degrade issue rather than a test script issue only. Bug is continuously reproducing from last several builds.
Comment by Sangharsh Agarwal [ 10/Jul/14 ]
David, Are we going to tell each customer to changed their old applications to handle such errors in new way now due to 3.0? If we are marking this in release notes somewhere, we can close this bug.
Comment by Mike Wiederhold [ 10/Jul/14 ]
This probably has to do with how we send the set vbucket state message. Currently we don't wait for all items sent to the replica to be processed before sending the set vbucket state message. As a result if the replica has a lot of mutations in front of the set vbucket state message there may be a period of time where there are no active vbucekts in the cluster. This is currently different than how we do this with tap so we need to improve this area.
Comment by David Liao [ 10/Jul/14 ]
the testing script starts the "loading data phase" before the "balancing out" operation. This is the primary suspect of the cause of the error.

The QE team should provide detailed information regarding the time stamps of the related critical events such as when exactly this error occurred and on which node it occurred then it can be correlated to the timing of other operations to help the dev team investigate. Also, QE should make sure that testing script can actually handle this error correctly like other SDK such as libcouchbase does.

Comment by Chiyoung Seo [ 10/Jul/14 ]
Mike,

That sounds like a critical issue. We shouldn't send set_vbucket_state(active) message to the replica (i.e., new master) until the replica processes all (or almost all) the items from the current master.

Please create a new ticket for that issue.
Comment by Chiyoung Seo [ 10/Jul/14 ]
Sangharsh, Aruna, Wayne,

NOT_MY_VBUCKET response during rebalance in / out is not an error, but expected because vbuckets are moving during that period.

All of our official SDKs can handle NOT_MY_VBUCKET response and retry an operation.

Can you please confirm if NOT_MY_VBUCKET was received during rebalance period? If so, please don't consider it as a test failure.
Comment by Aruna Piravi [ 10/Jul/14 ]
Yes, we understand NOT_MY_VBUCKET is returned when the server is rebalancing and we do handle those exceptions in test. I think this exception comes immediately after rebalance is 100% complete. We have incoming data load coming during rebalance, so with XDCR we do get not_my_vbucket errors which are caught and handled by testrunner loader if server is rebalancing.

If we did not handle this exception, test would stop right here, at 12% rebalance, reporting error.

[2014-07-02 07:06:17,514] - [rest_client:1208] INFO - rebalance percentage : 9.89446250156 %
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/task.py', 525, 'run', 'self.next()'), ('lib/tasks/task.py', 682, 'next', 'self._unlocked_create(partition, key, value, is_base64_value=is_base64_value)'), ('lib/tasks/task.py', 554, '_unlocked_create', 'self.set_exception(error)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
Wed Jul 2 07:06:21 2014
[2014-07-02 07:06:27,534] - [rest_client:1208] INFO - rebalance percentage : 12.8261550946 %

Sangharsh,
In bugs like these, can we provide the following info -
1. was this problem seen in last release (2.5.0, 2.5.1?)
2. proof that exception was indeed thrown soon after rebalance(add timestamp for the error?) From test logs we know the timeperiod in which the server was balancing. So this info is crucial. Another way we can say this with confidence is to look at product logs. We can see when rebalance ended on the server that returned the error. And when the last not_my_vbucket was seen in xdcr.error log on the source.
Comment by Chiyoung Seo [ 10/Jul/14 ]
Please note that even if the rebalance is completed, the client can still get NOT_MY_VBUCKET error if the vbucket_map on the client side is not up to date. In this case, the client should update the map and retry.
Comment by Wayne Siu [ 10/Jul/14 ]
Hi Sangharsh,
If you could reproduce the issue, please try to "freeze" the cluster and notify the Dev team as they need to debug on a live cluster if possible.




[MB-11554] cbrecovery mismatch in item count after rebalance Created: 26/Jun/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Ashvinder Singh Assignee: David Liao
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: All OS

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
Test run on build 3.0.0-866
Steps to reproduce:
- Setup two clusters - src-A and dest-B (each with three nodes and one bucket each) with xdcr
- Setup two floating nodes on dest-B
- Create 80 K items, ensure all items are replicated on the dest-B cluster
- Failover two nodes at dest-B
- Add one node at dest-B cluster
- run cbrecovery from src-A to dest-B
- Ensure cbrecovery completes successfully
- do rebalance on dest-B cluster
- ensure rebalance completes
- Do mutations on src-A cluster
- Verify meta data on dest-B cluster matches src-A cluster
Bug: Meta data does not match between src-A and dest-B cluster.

Source meta data: {'deleted': 0, 'seqno': 1, 'cas': 6811709964931922, 'flags': 0, 'expiration': 0}
[2014-06-25 01:10:42,636] - [task:1206] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 0, 'flags': 0, 'expiration': 0}

Found using jenkins job: http://qa.sc.couchbase.com/view/All/job/ubuntu_x64--38_01--cbrecovery-P1/44/consoleFull


 Comments   
Comment by Bin Cui [ 10/Jul/14 ]
I wonder if ep_engine by any chance will change the cas field?
Comment by Ashvinder Singh [ 10/Jul/14 ]

Attaching testrunner logs wth cbrecovery commented out of the script.

>>>>>>>>>>>>>

/testrunner -i my.ini -t cbRecoverytests.cbrecovery.cbrecover_multiple_autofailover_swapout_reb_routine,items=1000,rdirection=bidirection,ctopology=chain,failover=destination,fail_count=2,add_count=2,doc-ops=update,expires=90,max_verify=10000

Test Input params:
{'add_count': '2', 'max_verify': '10000', 'doc-ops': 'update', 'items': '1000', 'failover': 'destination', 'expires': '90', 'conf_file': 'conf/py-cbrecoverytests.conf', 'num_nodes': 8, 'cluster_name': 'my', 'ctopology': 'chain', 'fail_count': '2', 'rdirection': 'bidirection', 'ini': 'my.ini', 'case_number': 1, 'logs_folder': '/Users/ashvinder/mygit/testrunner/logs/testrunner-14-Jul-10_15-44-20/test_1', 'spec': 'py-cbrecoverytests'}
Run before suite setup for cbRecoverytests.cbrecovery.cbrecover_multiple_autofailover_swapout_reb_routine
cbrecover_multiple_autofailover_swapout_reb_routine (cbRecoverytests.cbrecovery) ... 2014-07-10 15:44:20 | INFO | MainProcess | test_thread | [xdcrbasetests._init_parameters] Initializing input parameters started...
2014-07-10 15:44:20 | INFO | MainProcess | test_thread | [xdcrbasetests._init_parameters] Initializing input parameters completed.
2014-07-10 15:44:20 | INFO | MainProcess | test_thread | [xdcrbasetests.setUp] ============== XDCRbasetests setup is started for test #1 cbrecover_multiple_autofailover_swapout_reb_routine==============
2014-07-10 15:44:21 | INFO | MainProcess | test_thread | [rest_client.remove_remote_cluster] removing remote cluster name:cluster1
2014-07-10 15:44:21 | INFO | MainProcess | test_thread | [rest_client.remove_remote_cluster] removing remote cluster name:cluster0
2014-07-10 15:44:22 | INFO | MainProcess | test_thread | [xdcrbasetests._do_cleanup] cleanup cluster1: [ip:172.23.106.71 port:8091 ssh_username:root, ip:172.23.106.72 port:8091 ssh_username:root, ip:172.23.106.73 port:8091 ssh_username:root]
2014-07-10 15:44:22 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [u'default'] on 172.23.106.71
2014-07-10 15:44:22 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] remove bucket default ...
2014-07-10 15:44:25 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleted bucket : default from 172.23.106.71
2014-07-10 15:44:25 | INFO | MainProcess | test_thread | [bucket_helper.wait_for_bucket_deletion] waiting for bucket deletion to complete....
2014-07-10 15:44:26 | INFO | MainProcess | test_thread | [rest_client.bucket_exists] existing buckets : []
2014-07-10 15:44:26 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] rebalancing all nodes in order to remove nodes
2014-07-10 15:44:26 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=ns_1%40172.23.106.72%2Cns_1%40172.23.106.73&user=Administrator&knownNodes=ns_1%40172.23.106.71%2Cns_1%40172.23.106.72%2Cns_1%40172.23.106.73
2014-07-10 15:44:26 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started
2014-07-10 15:44:31 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] rebalance progress took 5.16095399857 seconds
2014-07-10 15:44:31 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] sleep for 5.16095399857 seconds after rebalance...
2014-07-10 15:44:36 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.72:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 15:44:38 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 15:44:39 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 15:44:40 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 15:44:41 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 15:44:42 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 15:44:43 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] removed all the nodes from cluster associated with ip:172.23.106.71 port:8091 ssh_username:root ? [(u'ns_1@172.23.106.72', 8091), (u'ns_1@172.23.106.73', 8091)]
2014-07-10 15:44:43 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 172.23.106.71:8091
2014-07-10 15:44:43 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 172.23.106.71:8091 is running
2014-07-10 15:44:43 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 172.23.106.72
2014-07-10 15:44:44 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 172.23.106.72:8091
2014-07-10 15:44:44 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 172.23.106.72:8091 is running
2014-07-10 15:44:44 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 172.23.106.73
2014-07-10 15:44:45 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 172.23.106.73:8091
2014-07-10 15:44:45 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 172.23.106.73:8091 is running
2014-07-10 15:44:45 | INFO | MainProcess | test_thread | [xdcrbasetests._do_cleanup] cleanup cluster2: [ip:10.3.121.106 port:8091 ssh_username:root, ip:10.3.121.103 port:8091 ssh_username:root, ip:10.3.5.43 port:8091 ssh_username:root]
2014-07-10 15:44:45 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [u'default'] on 10.3.121.106
2014-07-10 15:44:45 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] remove bucket default ...
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleted bucket : default from 10.3.121.106
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [bucket_helper.wait_for_bucket_deletion] waiting for bucket deletion to complete....
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [rest_client.bucket_exists] existing buckets : []
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] rebalancing all nodes in order to remove nodes
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=ns_1%4010.3.5.43%2Cns_1%4010.3.121.103&user=Administrator&knownNodes=ns_1%4010.3.121.106%2Cns_1%4010.3.5.43%2Cns_1%4010.3.121.103
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started
2014-07-10 15:44:53 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:45:03 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] rebalance progress took 10.0290470123 seconds
2014-07-10 15:45:03 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] removed all the nodes from cluster associated with ip:10.3.121.106 port:8091 ssh_username:root ? [(u'ns_1@10.3.5.43', 8091), (u'ns_1@10.3.121.103', 8091)]
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 10.3.121.106:8091
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 10.3.121.106:8091 is running
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 10.3.121.103
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 10.3.121.103:8091
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 10.3.121.103:8091 is running
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 10.3.5.43
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 10.3.5.43:8091
2014-07-10 15:45:18 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 10.3.5.43:8091 is running
2014-07-10 15:45:19 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 172.23.106.71:8091:username=Administrator&password=password&port=8091
2014-07-10 15:45:19 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=2069&username=Administrator&password=password
2014-07-10 15:45:19 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 172.23.106.72:8091:username=Administrator&password=password&port=8091
2014-07-10 15:45:19 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=2069&username=Administrator&password=password
2014-07-10 15:45:19 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 172.23.106.73:8091:username=Administrator&password=password&port=8091
2014-07-10 15:45:19 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=2069&username=Administrator&password=password
2014-07-10 15:45:21 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.106.72:8091 to cluster
2014-07-10 15:45:21 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.106.72:8091 to this cluster @172.23.106.71:8091
2014-07-10 15:45:26 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.106.73:8091 to cluster
2014-07-10 15:45:26 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.106.73:8091 to this cluster @172.23.106.71:8091
2014-07-10 15:45:37 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%40172.23.106.71%2Cns_1%40172.23.106.72%2Cns_1%40172.23.106.73
2014-07-10 15:45:37 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started
2014-07-10 15:45:37 | INFO | MainProcess | Cluster_Thread | [task.check] rebalancing was completed with progress: 100% in 0.0825810432434 sec
2014-07-10 15:45:39 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://172.23.106.71:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=2069&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-07-10 15:45:39 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] 0.0812261104584 seconds to create bucket default
2014-07-10 15:45:39 | INFO | MainProcess | Cluster_Thread | [bucket_helper.wait_for_memcached] waiting for memcached bucket : default in 172.23.106.71 to accept set ops
2014-07-10 15:45:41 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:45:42 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:45:43 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:45:44 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:45:45 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:45:46 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:46:26 | INFO | MainProcess | Cluster_Thread | [task.check] bucket 'default' was created with per node RAM quota: 2069
2014-07-10 15:46:26 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 172.23.106.71:8091: True content: ok command: ale:set_loglevel(xdcr_trace, debug).
2014-07-10 15:46:26 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 172.23.106.72:8091: True content: ok command: ale:set_loglevel(xdcr_trace, debug).
2014-07-10 15:46:26 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 172.23.106.73:8091: True content: ok command: ale:set_loglevel(xdcr_trace, debug).
2014-07-10 15:46:27 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 10.3.121.106:8091:username=Administrator&password=password&port=8091
2014-07-10 15:46:27 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=262&username=Administrator&password=password
2014-07-10 15:46:27 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 10.3.121.103:8091:username=Administrator&password=password&port=8091
2014-07-10 15:46:27 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=262&username=Administrator&password=password
2014-07-10 15:46:27 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 10.3.5.43:8091:username=Administrator&password=password&port=8091
2014-07-10 15:46:27 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=262&username=Administrator&password=password
2014-07-10 15:46:28 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.3.121.103:8091 to cluster
2014-07-10 15:46:28 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @10.3.121.103:8091 to this cluster @10.3.121.106:8091
2014-07-10 15:46:31 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.3.5.43:8091 to cluster
2014-07-10 15:46:31 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @10.3.5.43:8091 to this cluster @10.3.121.106:8091
2014-07-10 15:46:36 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.121.106%2Cns_1%4010.3.5.43%2Cns_1%4010.3.121.103
2014-07-10 15:46:36 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started
2014-07-10 15:46:36 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:46:46 | INFO | MainProcess | Cluster_Thread | [task.check] rebalancing was completed with progress: 100% in 10.0222330093 sec
2014-07-10 15:46:47 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] http://10.3.121.106:8091/pools/default/buckets with param: bucketType=membase&evictionPolicy=valueOnly&threadsNumber=3&ramQuotaMB=262&proxyPort=11211&authType=sasl&name=default&flushEnabled=1&replicaNumber=1&replicaIndex=1&saslPassword=None
2014-07-10 15:46:47 | INFO | MainProcess | Cluster_Thread | [rest_client.create_bucket] 0.0290520191193 seconds to create bucket default
2014-07-10 15:46:47 | INFO | MainProcess | Cluster_Thread | [bucket_helper.wait_for_memcached] waiting for memcached bucket : default in 10.3.121.106 to accept set ops
2014-07-10 15:46:50 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.103:11210 default
2014-07-10 15:46:50 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:46:50 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.5.43:11210 default
2014-07-10 15:46:51 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.103:11210 default
2014-07-10 15:46:51 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:46:51 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.5.43:11210 default
2014-07-10 15:46:54 | INFO | MainProcess | Cluster_Thread | [task.check] bucket 'default' was created with per node RAM quota: 262
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 10.3.121.106:8091: True content: ok command: ale:set_loglevel(xdcr_trace, debug).
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 10.3.121.103:8091: True content: ok command: ale:set_loglevel(xdcr_trace, debug).
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 10.3.5.43:8091: True content: ok command: ale:set_loglevel(xdcr_trace, debug).
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [xdcrbasetests.set_xdcr_param] Setting xdcrFailureRestartInterval to 1 ..
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [rest_client.set_internalSetting] Update internal setting xdcrFailureRestartInterval=1
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [rest_client.set_internalSetting] Update internal setting xdcrFailureRestartInterval=1
2014-07-10 15:46:54 | INFO | MainProcess | test_thread | [xdcrbasetests.set_xdcr_param] Setting xdcrCheckpointInterval to 60 ..
2014-07-10 15:46:55 | INFO | MainProcess | test_thread | [rest_client.set_internalSetting] Update internal setting xdcrCheckpointInterval=60
2014-07-10 15:46:55 | INFO | MainProcess | test_thread | [rest_client.set_internalSetting] Update internal setting xdcrCheckpointInterval=60
2014-07-10 15:46:55 | INFO | MainProcess | test_thread | [rest_client.add_remote_cluster] adding remote cluster hostname:10.3.121.106:8091 with username:password Administrator:password name:cluster1 to source node: 172.23.106.71:8091
2014-07-10 15:46:55 | INFO | MainProcess | test_thread | [rest_client.start_replication] starting continuous replication type:capi from default to default in the remote cluster cluster1
2014-07-10 15:46:55 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 5 secs. ...
2014-07-10 15:47:00 | INFO | MainProcess | test_thread | [rest_client.add_remote_cluster] adding remote cluster hostname:172.23.106.71:8091 with username:password Administrator:password name:cluster0 to source node: 10.3.121.106:8091
2014-07-10 15:47:01 | INFO | MainProcess | test_thread | [rest_client.start_replication] starting continuous replication type:capi from default to default in the remote cluster cluster0
2014-07-10 15:47:01 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 5 secs. ...
2014-07-10 15:47:06 | INFO | MainProcess | test_thread | [xdcrbasetests.setUp] ============== XDCRbasetests setup is finished for test #1 cbrecover_multiple_autofailover_swapout_reb_routine ==============
2014-07-10 15:47:06 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.71 with username : root password : couchbase ssh_key:
2014-07-10 15:47:07 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.71
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: sudo cat /proc/cpuinfo
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: df -Th
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: sudo cat /proc/meminfo
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: hostname
2014-07-10 15:47:09 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:09 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: hostname -d
2014-07-10 15:47:09 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:09 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: iptables -F
2014-07-10 15:47:09 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:09 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.72 with username : root password : couchbase ssh_key:
2014-07-10 15:47:10 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.72
2014-07-10 15:47:11 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: sudo cat /proc/cpuinfo
2014-07-10 15:47:11 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:11 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: df -Th
2014-07-10 15:47:11 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:11 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: sudo cat /proc/meminfo
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: hostname
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: hostname -d
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: iptables -F
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:12 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.73 with username : root password : couchbase ssh_key:
2014-07-10 15:47:13 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.73
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: sudo cat /proc/cpuinfo
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: df -Th
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: sudo cat /proc/meminfo
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: hostname
2014-07-10 15:47:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: hostname -d
2014-07-10 15:47:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: iptables -F
2014-07-10 15:47:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:15 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.121.106 with username : root password : couchbase ssh_key:
2014-07-10 15:47:16 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.121.106
2014-07-10 15:47:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: sudo cat /proc/cpuinfo
2014-07-10 15:47:19 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:19 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: df -Th
2014-07-10 15:47:19 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:19 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: sudo cat /proc/meminfo
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: hostname
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: hostname -d
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: iptables -F
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:20 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.121.103 with username : root password : couchbase ssh_key:
2014-07-10 15:47:21 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.121.103
2014-07-10 15:47:23 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: sudo cat /proc/cpuinfo
2014-07-10 15:47:23 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:23 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: df -Th
2014-07-10 15:47:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: sudo cat /proc/meminfo
2014-07-10 15:47:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: hostname
2014-07-10 15:47:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: hostname -d
2014-07-10 15:47:25 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:25 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: iptables -F
2014-07-10 15:47:25 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:25 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.5.43 with username : root password : couchbase ssh_key:
2014-07-10 15:47:26 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.5.43
2014-07-10 15:47:28 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: sudo cat /proc/cpuinfo
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: df -Th
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: sudo cat /proc/meminfo
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: hostname
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:29 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: hostname -d
2014-07-10 15:47:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: iptables -F
2014-07-10 15:47:30 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:30 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.74 with username : root password : couchbase ssh_key:
2014-07-10 15:47:31 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.74
2014-07-10 15:47:32 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: sudo cat /proc/cpuinfo
2014-07-10 15:47:32 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:32 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: df -Th
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: sudo cat /proc/meminfo
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: hostname
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: hostname -d
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: iptables -F
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:33 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.5.44 with username : root password : couchbase ssh_key:
2014-07-10 15:47:34 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.5.44
2014-07-10 15:47:34 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: sudo cat /proc/cpuinfo
2014-07-10 15:47:35 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:35 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: df -Th
2014-07-10 15:47:35 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:35 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: sudo cat /proc/meminfo
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: hostname
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: hostname -d
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: iptables -F
2014-07-10 15:47:36 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:47:37 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:47:38 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:47:39 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:47:40 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 0 with exp:0
2014-07-10 15:47:41 | INFO | MainProcess | load_gen_task | [task.has_next] Batch create documents done #: 1000 with exp:0
2014-07-10 15:47:42 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:47:43 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:47:43 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:48:18 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.71', 'password': 'password', 'port': u'8091'}
2014-07-10 15:48:18 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:48:18 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.71', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:48:18 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.71', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (1425714222, 0, '')
2014-07-10 15:48:18 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.72', 'password': 'password', 'port': u'8091'}
2014-07-10 15:48:18 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:48:19 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.72', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:48:19 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.72', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (1483122082, 0, '')
2014-07-10 15:48:19 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.73', 'password': 'password', 'port': u'8091'}
2014-07-10 15:48:19 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.73', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.73', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (788117479, 0, '')
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [xdcrbasetests._expiry_pager] wait for expiry pager to run on all these nodes
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'10.3.121.103', 'password': 'password', 'port': u'8091'}
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.121.103:11210 default
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.121.103', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.121.103', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (2624631019, 0, '')
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'10.3.121.106', 'password': 'password', 'port': u'8091'}
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.121.106', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.121.106', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (333823049, 0, '')
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'10.3.5.43', 'password': 'password', 'port': u'8091'}
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.5.43:11210 default
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.5.43', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.5.43', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (1537498275, 0, '')
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [xdcrbasetests._expiry_pager] wait for expiry pager to run on all these nodes
2014-07-10 15:48:20 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 15 secs. ...
2014-07-10 15:48:36 | INFO | MainProcess | test_thread | [xdcrbasetests._wait_for_replication_to_catchup] Replication caught up for bucket default: 1000
2014-07-10 15:48:37 | INFO | MainProcess | test_thread | [xdcrbasetests._wait_for_replication_to_catchup] Replication caught up for bucket default: 1000
2014-07-10 15:48:37 | INFO | MainProcess | test_thread | [rest_client.update_autofailover_settings] settings/autoFailover params : enabled=true&timeout=30
2014-07-10 15:48:37 | INFO | MainProcess | test_thread | [cbRecoverytests.cbrecover_multiple_autofailover_swapout_reb_routine] Triggering stop_server over 2 nodes on destination ..
2014-07-10 15:48:37 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.121.103 with username : root password : couchbase ssh_key:
2014-07-10 15:48:37 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.121.103
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: sudo cat /proc/cpuinfo
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: df -Th
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: sudo cat /proc/meminfo
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: hostname
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: hostname -d
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:48:38 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: /etc/init.d/couchbase-server stop
2014-07-10 15:48:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 15:48:41 | INFO | MainProcess | test_thread | [remote_util.log_command_output] * Stopped couchbase-server
2014-07-10 15:48:41 | INFO | MainProcess | test_thread | [cbRecoverytests.get_failover_count] 'clusterMembership' for node ns_1@10.3.121.103 is active
2014-07-10 15:48:41 | INFO | MainProcess | test_thread | [cbRecoverytests.get_failover_count] 'clusterMembership' for node ns_1@10.3.121.106 is active
2014-07-10 15:48:41 | INFO | MainProcess | test_thread | [cbRecoverytests.get_failover_count] 'clusterMembership' for node ns_1@10.3.5.43 is active
2014-07-10 15:48:41 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 30 secs. ...
2014-07-10 15:49:11 | INFO | MainProcess | test_thread | [cbRecoverytests.get_failover_count] 'clusterMembership' for node ns_1@10.3.121.103 is inactiveFailed
2014-07-10 15:49:11 | INFO | MainProcess | test_thread | [cbRecoverytests.get_failover_count] 'clusterMembership' for node ns_1@10.3.121.106 is active
2014-07-10 15:49:11 | INFO | MainProcess | test_thread | [cbRecoverytests.get_failover_count] 'clusterMembership' for node ns_1@10.3.5.43 is active
2014-07-10 15:49:11 | INFO | MainProcess | test_thread | [cbRecoverytests.wait_for_failover_or_assert] 1 node(s) failed over as expected
2014-07-10 15:49:11 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 10 secs. ...
2014-07-10 15:49:23 | INFO | MainProcess | test_thread | [rest_client.fail_over] fail_over node ns_1@10.3.5.43 successful
2014-07-10 15:49:23 | INFO | MainProcess | test_thread | [rest_client.add_node] adding remote node @172.23.106.74:8091 to this cluster @10.3.121.106:8091
2014-07-10 15:49:34 | INFO | MainProcess | test_thread | [rest_client.add_node] adding remote node @10.3.5.44:8091 to this cluster @10.3.121.106:8091
2014-07-10 15:49:44 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 15 secs. ...
2014-07-10 15:49:59 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.121.106%2Cns_1%4010.3.5.43%2Cns_1%4010.3.121.103%2Cns_1%40172.23.106.74%2Cns_1%4010.3.5.44
2014-07-10 15:49:59 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started
2014-07-10 15:49:59 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:01 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:03 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:05 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:07 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:09 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:11 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:13 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:15 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:17 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:19 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:21 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:23 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:25 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:27 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:29 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:31 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:33 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:36 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:38 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:40 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 15:50:42 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0.522365196078 %
2014-07-10 15:50:44 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 1.3059129902 %
2014-07-10 15:50:46 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 2.22005208333 %
2014-07-10 15:50:48 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 2.61182598039 %
2014-07-10 15:50:50 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 3.78714767157 %
2014-07-10 15:50:52 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 5.09306066176 %
2014-07-10 15:50:54 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 5.74601715686 %
2014-07-10 15:50:56 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 6.26838235294 %
2014-07-10 15:50:58 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 7.05193014706 %
2014-07-10 15:51:00 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 7.05193014706 %
2014-07-10 15:51:02 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 7.83547794118 %
2014-07-10 15:51:04 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 7.83547794118 %
2014-07-10 15:51:06 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 8.35784313725 %
2014-07-10 15:51:08 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 8.55353860294 %
2014-07-10 15:51:10 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 9.07590379902 %
2014-07-10 15:51:12 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 10.3818167892 %
2014-07-10 15:51:14 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 11.5571384804 %
2014-07-10 15:51:16 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 12.4712775735 %
2014-07-10 15:51:18 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 13.5160079657 %
2014-07-10 15:51:20 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 14.1689644608 %
2014-07-10 15:51:22 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 15.0831035539 %
2014-07-10 15:51:24 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 15.474877451 %
2014-07-10 15:51:26 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 15.9972426471 %
2014-07-10 15:51:28 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 16.2584252451 %
2014-07-10 15:51:30 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 16.5196078431 %
2014-07-10 15:51:33 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 16.9113817402 %
2014-07-10 15:51:36 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 17.4337469363 %
2014-07-10 15:51:38 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 18.3478860294 %
2014-07-10 15:51:40 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 19.5232077206 %
2014-07-10 15:51:42 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 20.3067555147 %
2014-07-10 15:51:44 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 20.9597120098 %
2014-07-10 15:51:46 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 22.6573988971 %
2014-07-10 15:51:48 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 23.3103553922 %
2014-07-10 15:51:50 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 23.7021292892 %
2014-07-10 15:51:52 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 24.7468596814 %
2014-07-10 15:51:54 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 25.1386335784 %
2014-07-10 15:51:56 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 25.9221813725 %
2014-07-10 15:51:58 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 28.4034160539 %
2014-07-10 15:52:00 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 30.3622855392 %
2014-07-10 15:52:02 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 32.4517463235 %
2014-07-10 15:52:04 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 32.4517463235 %
2014-07-10 15:52:06 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 33.2352941176 %
2014-07-10 15:52:08 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 34.8023897059 %
2014-07-10 15:52:10 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 37.0224417892 %
2014-07-10 15:52:12 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 39.1770067402 %
2014-07-10 15:52:14 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 40.8746936275 %
2014-07-10 15:52:16 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 41.2664675245 %
2014-07-10 15:52:18 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 41.7888327206 %
2014-07-10 15:52:20 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 41.7888327206 %
2014-07-10 15:52:22 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 43.6171109069 %
2014-07-10 15:52:24 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 44.2030484069 %
2014-07-10 15:52:26 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 45.4404105392 %
2014-07-10 15:52:28 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 47.1354166667 %
2014-07-10 15:52:30 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 48.046875 %
2014-07-10 15:52:32 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 48.9583333333 %
2014-07-10 15:52:34 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 50.1302083333 %
2014-07-10 15:52:36 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 51.171875 %
2014-07-10 15:52:38 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 52.1484375 %
2014-07-10 15:52:40 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 53.1901041667 %
2014-07-10 15:52:42 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 54.2317708333 %
2014-07-10 15:52:44 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 55.3385416667 %
2014-07-10 15:52:46 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 56.4453125 %
2014-07-10 15:52:48 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 57.9427083333 %
2014-07-10 15:52:50 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 59.5703125 %
2014-07-10 15:52:52 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 61.1328125 %
2014-07-10 15:52:54 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 62.6953125 %
2014-07-10 15:52:56 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 64.3880208333 %
2014-07-10 15:52:58 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 65.9505208333 %
2014-07-10 15:53:00 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 66.2760416667 %
2014-07-10 15:53:02 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 66.3411458333 %
2014-07-10 15:53:06 | INFO | MainProcess | test_thread | [rest_client.rebalance_reached] rebalance reached >100% in 187.513608932 seconds
2014-07-10 15:53:11 | INFO | MainProcess | test_thread | [rest_client.update_autofailover_settings] settings/autoFailover params : enabled=false&timeout=30
2014-07-10 15:53:12 | INFO | MainProcess | test_thread | [cbRecoverytests.common_tearDown_verification] vbucket_map differs from earlier
2014-07-10 15:53:12 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 30 secs. ...
2014-07-10 15:53:42 | INFO | MainProcess | test_thread | [xdcrbasetests.merge_buckets] merge buckets 172.23.106.71->10.3.121.106, bidirection:False
2014-07-10 15:53:42 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 90 secs. Waiting for expiration of updated items ...
2014-07-10 15:55:13 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.71', 'password': 'password', 'port': u'8091'}
2014-07-10 15:55:13 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:55:13 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.71', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:55:13 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.71', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (176936681, 0, '')
2014-07-10 15:55:14 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.72', 'password': 'password', 'port': u'8091'}
2014-07-10 15:55:14 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:55:14 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.72', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:55:14 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.72', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (3351476380, 0, '')
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.73', 'password': 'password', 'port': u'8091'}
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.73', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.73', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (3991819421, 0, '')
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [xdcrbasetests._expiry_pager] wait for expiry pager to run on all these nodes
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'10.3.121.106', 'password': 'password', 'port': u'8091'}
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.121.106', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.121.106', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (3725148002, 0, '')
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'10.3.5.44', 'password': 'password', 'port': u'8091'}
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.5.44:11210 default
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.5.44', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:55:15 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'10.3.5.44', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (535767650, 0, '')
2014-07-10 15:55:16 | INFO | MainProcess | test_thread | [data_helper.direct_client] dict:{'username': 'Administrator', 'ip': u'172.23.106.74', 'password': 'password', 'port': u'8091'}
2014-07-10 15:55:16 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.74:11210 default
2014-07-10 15:55:16 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.74', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10 on default
setting param: exp_pager_stime 10
2014-07-10 15:55:16 | INFO | MainProcess | test_thread | [cluster_helper.flushctl_set_per_node] Setting flush param on server {'username': 'Administrator', 'ip': u'172.23.106.74', 'password': 'password', 'port': u'8091'}, exp_pager_stime to 10, result: (2211377594, 0, '')
2014-07-10 15:55:16 | INFO | MainProcess | test_thread | [xdcrbasetests._expiry_pager] wait for expiry pager to run on all these nodes
2014-07-10 15:55:16 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 10 secs. ...
2014-07-10 15:55:26 | INFO | MainProcess | test_thread | [xdcrbasetests.verify_xdcr_stats] Verify xdcr replication stats at Source Cluster : 172.23.106.71
2014-07-10 15:55:26 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:55:27 | INFO | MainProcess | Cluster_Thread | [task.check] Saw ep_queue_size 0 == 0 expected on '172.23.106.71:8091',default bucket
2014-07-10 15:55:27 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:55:28 | INFO | MainProcess | Cluster_Thread | [task.check] Saw ep_queue_size 0 == 0 expected on '172.23.106.72:8091',default bucket
2014-07-10 15:55:28 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:55:28 | INFO | MainProcess | Cluster_Thread | [task.check] Saw ep_queue_size 0 == 0 expected on '172.23.106.73:8091',default bucket
2014-07-10 15:55:28 | INFO | MainProcess | test_thread | [xdcrbasetests.verify_xdcr_stats] Verify xdcr replication stats at Destination Cluster : 10.3.121.106
2014-07-10 15:55:29 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:55:29 | INFO | MainProcess | Cluster_Thread | [task.check] Saw ep_queue_size 0 == 0 expected on '10.3.121.106:8091',default bucket
2014-07-10 15:55:30 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.74:11210 default
2014-07-10 15:55:31 | INFO | MainProcess | Cluster_Thread | [task.check] Saw ep_queue_size 0 == 0 expected on '172.23.106.74:8091',default bucket
2014-07-10 15:55:31 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.5.44:11210 default
2014-07-10 15:55:31 | INFO | MainProcess | Cluster_Thread | [task.check] Saw ep_queue_size 0 == 0 expected on '10.3.5.44:8091',default bucket
2014-07-10 15:55:31 | INFO | MainProcess | test_thread | [xdcrbasetests.__wait_for_outbound_mutations_zero] Waiting for Outbound mutation to be zero on cluster node: 10.3.121.106
2014-07-10 15:55:32 | INFO | MainProcess | test_thread | [xdcrbasetests.__wait_for_outbound_mutations_zero] Current outbound mutations on cluster node: 10.3.121.106 for bucket default is 0
2014-07-10 15:55:33 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:55:34 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:55:35 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:55:35 | INFO | MainProcess | Cluster_Thread | [task.check] Saw curr_items 700 == 700 expected on '172.23.106.71:8091''172.23.106.72:8091''172.23.106.73:8091',default bucket
2014-07-10 15:55:35 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:55:36 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:55:37 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:55:37 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_active_curr_items 700 == 700 expected on '172.23.106.71:8091''172.23.106.72:8091''172.23.106.73:8091',default bucket
2014-07-10 15:55:38 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:55:39 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:55:39 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:55:40 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_replica_curr_items 700 == 700 expected on '172.23.106.71:8091''172.23.106.72:8091''172.23.106.73:8091',default bucket
2014-07-10 15:55:40 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 15:55:42 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 15:55:43 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 15:55:43 | INFO | MainProcess | test_thread | [task.__init__] 1000 items will be verified on default bucket
2014-07-10 15:55:44 | INFO | MainProcess | load_gen_task | [task.has_next] 0 items were verified
2014-07-10 15:55:44 | INFO | MainProcess | load_gen_task | [data_helper.getMulti] Can not import concurrent module. Data for each server will be got sequentially
2014-07-10 15:55:55 | INFO | MainProcess | load_gen_task | [task.has_next] 1000 items were verified in 12.09790802 sec.the average number of ops - 82.6589125587 per second
2014-07-10 15:55:55 | INFO | MainProcess | test_thread | [xdcrbasetests.__wait_for_outbound_mutations_zero] Waiting for Outbound mutation to be zero on cluster node: 172.23.106.71
2014-07-10 15:55:56 | INFO | MainProcess | test_thread | [xdcrbasetests.__wait_for_outbound_mutations_zero] Current outbound mutations on cluster node: 172.23.106.71 for bucket default is 0
2014-07-10 15:55:57 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:55:57 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.74:11210 default
2014-07-10 15:55:58 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.5.44:11210 default
2014-07-10 15:55:58 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:55:58 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:55:58 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.74:11210 default
2014-07-10 15:55:59 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.5.44:11210 default
2014-07-10 15:55:59 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:55:59 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 15:55:59 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 172.23.106.74:11210 default
2014-07-10 15:56:01 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.5.44:11210 default
2014-07-10 15:56:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:04 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:04 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:09 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:09 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:11 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:17 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:20 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:20 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:22 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:25 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:25 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:27 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:30 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:30 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:33 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:36 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:36 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:38 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:41 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:41 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:43 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:46 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:46 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:48 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:52 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:54 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:57 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:57 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:56:59 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:02 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:02 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:04 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:07 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:07 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:10 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:13 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:13 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:18 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:18 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:20 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:24 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:24 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:26 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:31 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:34 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:35 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:37 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:40 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:40 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:42 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:47 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:52 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:56 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:56 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:57:58 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:03 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:08 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:11 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:11 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:14 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:17 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:17 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:19 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:23 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:23 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:25 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:28 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:28 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:30 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:33 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:33 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:35 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:38 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:39 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:41 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:44 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:44 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:46 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:49 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:49 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:54 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:54 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:58:57 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:02 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:05 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:05 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:07 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:10 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:10 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:13 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:16 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:16 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:18 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:21 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:21 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:23 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:26 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:26 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:32 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:32 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:34 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:37 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:37 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:39 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:42 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:42 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:44 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:48 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:48 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:53 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:53 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:55 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:58 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 15:59:58 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:04 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:04 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:09 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:09 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:11 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:17 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:20 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:20 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:22 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:25 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:25 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:27 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:30 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:31 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:33 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:36 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:36 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:38 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:41 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:41 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:44 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:47 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:47 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:49 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:52 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:52 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:54 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:57 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:00:57 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:03 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:03 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:05 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:08 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:08 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:10 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:13 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:13 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:15 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:18 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:19 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:21 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:24 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:24 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:26 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:31 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:34 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:34 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:37 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:40 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:40 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:42 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:47 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:53 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:56 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:56 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:01:58 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:03 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:09 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:12 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:12 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:14 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:17 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:17 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:19 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:22 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:22 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:24 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:28 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:28 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:30 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:33 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:33 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
2014-07-10 16:02:35 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 549 == 700 expected on '10.3.121.106:8091''172.23.106.74:8091''10.3.5.44:8091', default bucket
[('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py', 781, '__bootstrap', 'self.__bootstrap_inner()'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py', 808, '__bootstrap_inner', 'self.run()'), ('./testrunner.py', 262, 'run', '**self._Thread__kwargs)'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py', 395, '__call__', 'return self.run(*args, **kwds)'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py', 331, 'run', 'testMethod()'), ('pytests/cbRecoverytests.py', 520, 'cbrecover_multiple_autofailover_swapout_reb_routine', 'self.common_tearDown_verification()'), ('pytests/cbRecoverytests.py', 251, 'common_tearDown_verification', 'self.verify_results(verify_src=True)'), ('pytests/xdcr/xdcrbasetests.py', 1428, 'verify_results', 'self.verify_xdcr_stats(src_nodes, dest_nodes, verify_src)'), ('pytests/xdcr/xdcrbasetests.py', 1401, 'verify_xdcr_stats', 'self._verify_item_count(self.dest_master, dest_nodes, timeout=timeout)'), ('pytests/xdcr/xdcrbasetests.py', 1334, '_verify_item_count', 'task.result(timeout)'), ('lib/tasks/future.py', 162, 'result', 'self.set_exception(TimeoutError())'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
Thu Jul 10 16:02:36 2014
2014-07-10 16:02:36 | INFO | MainProcess | test_thread | [xdcrbasetests._verify_revIds] Verifying RevIds for 172.23.106.71 -> 10.3.121.106, bucket: default
2014-07-10 16:02:37 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.71:11210 default
2014-07-10 16:02:37 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.72:11210 default
2014-07-10 16:02:38 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.73:11210 default
2014-07-10 16:02:40 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.121.106:11210 default
2014-07-10 16:02:40 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.5.44:11210 default
2014-07-10 16:02:40 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 172.23.106.74:11210 default
2014-07-10 16:03:07 | INFO | MainProcess | load_gen_task | [task.has_next] RevId Verification : -241 deleted items have been verified
2014-07-10 16:03:07 | ERROR | MainProcess | test_thread | [xdcrbasetests._verify_revIds] 100 keys not found on 10.3.121.106:[('key: loadOne737', 'vbucket: 309'), ('key: loadOne899', 'vbucket: 830'), ('key: loadOne997', 'vbucket: 324'), ('key: loadOne962', 'vbucket: 694'), ('key: loadOne685', 'vbucket: 269'), ('key: loadOne988', 'vbucket: 224'), ('key: loadOne639', 'vbucket: 847'), ('key: loadOne648', 'vbucket: 777'), ('key: loadOne423', 'vbucket: 261'), ('key: loadOne913', 'vbucket: 752'), ('key: loadOne325', 'vbucket: 297'), ('key: loadOne759', 'vbucket: 215'), ('key: loadOne591', 'vbucket: 317'), ('key: loadOne945', 'vbucket: 740'), ('key: loadOne373', 'vbucket: 317'), ('key: loadOne854', 'vbucket: 314'), ('key: loadOne802', 'vbucket: 302'), ('key: loadOne322', 'vbucket: 845'), ('key: loadOne717', 'vbucket: 771'), ('key: loadOne589', 'vbucket: 765'), ('key: loadOne805', 'vbucket: 842'), ('key: loadOne741', 'vbucket: 791'), ('key: loadOne933', 'vbucket: 198'), ('key: loadOne403', 'vbucket: 819'), ('key: loadOne990', 'vbucket: 800'), ('key: loadOne881', 'vbucket: 254'), ('key: loadOne730', 'vbucket: 849'), ('key: loadOne965', 'vbucket: 210'), ('key: loadOne455', 'vbucket: 807'), ('key: loadOne668', 'vbucket: 319'), ('key: loadOne822', 'vbucket: 792'), ('key: loadOne353', 'vbucket: 779'), ('key: loadOne874', 'vbucket: 780'), ('key: loadOne779', 'vbucket: 737'), ('key: loadOne342', 'vbucket: 279'), ('key: loadOne444', 'vbucket: 315'), ('key: loadOne890', 'vbucket: 738'), ('key: loadOne679', 'vbucket: 803'), ('key: loadOne768', 'vbucket: 253'), ('key: loadOne865', 'vbucket: 272'), ('key: loadOne922', 'vbucket: 730'), ('key: loadOne721', 'vbucket: 333'), ('key: loadOne412', 'vbucket: 303'), ('key: loadOne981', 'vbucket: 316'), ('key: loadOne833', 'vbucket: 260'), ('key: loadOne974', 'vbucket: 718'), ('key: loadOne365', 'vbucket: 325'), ('key: loadOne587', 'vbucket: 325'), ('key: loadOne750', 'vbucket: 267'), ('key: loadOne719', 'vbucket: 187'), ('key: loadOne842', 'vbucket: 322'), ('key: loadOne598', 'vbucket: 225'), ('key: loadOne333', 'vbucket: 337'), ('key: loadOne706', 'vbucket: 287'), ('key: loadOne813', 'vbucket: 818'), ('key: loadOne362', 'vbucket: 801'), ('key: loadOne659', 'vbucket: 277'), ('key: loadOne902', 'vbucket: 236'), ('key: loadOne845', 'vbucket: 806'), ('key: loadOne432', 'vbucket: 793'), ('key: loadOne580', 'vbucket: 801'), ('key: loadOne334', 'vbucket: 821'), ('key: loadOne748', 'vbucket: 715'), ('key: loadOne954', 'vbucket: 248'), ('key: loadOne999', 'vbucket: 764'), ('key: loadOne925', 'vbucket: 190'), ('key: loadOne415', 'vbucket: 843'), ('key: loadOne726', 'vbucket: 809'), ('key: loadOne628', 'vbucket: 339'), ('key: loadOne888', 'vbucket: 290'), ('key: loadOne694', 'vbucket: 785'), ('key: loadOne548', 'vbucket: 335'), ('key: loadOne340', 'vbucket: 793'), ('key: loadOne446', 'vbucket: 821'), ('key: loadOne892', 'vbucket: 236'), ('key: loadOne867', 'vbucket: 798'), ('key: loadOne920', 'vbucket: 212'), ('key: loadOne723', 'vbucket: 835'), ('key: loadOne410', 'vbucket: 801'), ('key: loadOne983', 'vbucket: 818'), ('key: loadOne976', 'vbucket: 192'), ('key: loadOne831', 'vbucket: 778'), ('key: loadOne878', 'vbucket: 698'), ('key: loadOne367', 'vbucket: 843'), ('key: loadOne428', 'vbucket: 727'), ('key: loadOne752', 'vbucket: 773'), ('key: loadOne585', 'vbucket: 843'), ('key: loadOne840', 'vbucket: 844'), ('key: loadOne539', 'vbucket: 265'), ('key: loadOne378', 'vbucket: 751'), ('key: loadOne918', 'vbucket: 290'), ('key: loadOne704', 'vbucket: 785'), ('key: loadOne809', 'vbucket: 764'), ('key: loadOne811', 'vbucket: 316'), ('key: loadOne360', 'vbucket: 303'), ('key: loadOne582', 'vbucket: 303'), ('key: loadOne847', 'vbucket: 296'), ('key: loadOne949', 'vbucket: 850'), ('key: loadOne430', 'vbucket: 279'), ('key: loadOne336', 'vbucket: 315')]
FAIL
2014-07-10 16:03:07 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.121.103 with username : root password : couchbase ssh_key:
2014-07-10 16:03:07 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.121.103
2014-07-10 16:03:07 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: sudo cat /proc/cpuinfo
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: df -Th
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: sudo cat /proc/meminfo
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: hostname
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: hostname -d
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:08 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.103: /etc/init.d/couchbase-server start
2014-07-10 16:03:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:17 | INFO | MainProcess | test_thread | [remote_util.log_command_output] * Started couchbase-server
2014-07-10 16:03:17 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.5.43 with username : root password : couchbase ssh_key:
2014-07-10 16:03:17 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.5.43
2014-07-10 16:03:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: sudo cat /proc/cpuinfo
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: df -Th
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: sudo cat /proc/meminfo
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: hostname
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: hostname -d
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.43: /etc/init.d/couchbase-server start
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [remote_util.log_command_output] * couchbase-server is already started
2014-07-10 16:03:18 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 20 secs. ...
2014-07-10 16:03:38 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.71 with username : root password : couchbase ssh_key:
2014-07-10 16:03:39 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.71
Collecting data files from 172.23.106.71

2014-07-10 16:03:40 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: sudo cat /proc/cpuinfo
2014-07-10 16:03:40 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:40 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: df -Th
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: sudo cat /proc/meminfo
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: hostname
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: hostname -d
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:41 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: tar -zcvf 172.23.106.71-7102014-163-couch.tar.gz '/opt/couchbase/var/lib/couchbase/data' >/dev/null 2>&1
2014-07-10 16:03:42 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully


2014-07-10 16:03:45 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.71: rm -f /root/172.23.106.71-7102014-163-couch.tar.gz
2014-07-10 16:03:45 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:45 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.72 with username : root password : couchbase ssh_key:
2014-07-10 16:03:46 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.72
Collecting data files from 172.23.106.72

2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: sudo cat /proc/cpuinfo
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: df -Th
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: sudo cat /proc/meminfo
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: hostname
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:47 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: hostname -d
2014-07-10 16:03:48 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:48 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: tar -zcvf 172.23.106.72-7102014-163-couch.tar.gz '/opt/couchbase/var/lib/couchbase/data' >/dev/null 2>&1
2014-07-10 16:03:48 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully


2014-07-10 16:03:51 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.72: rm -f /root/172.23.106.72-7102014-163-couch.tar.gz
2014-07-10 16:03:51 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:51 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.73 with username : root password : couchbase ssh_key:
2014-07-10 16:03:52 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.73
Collecting data files from 172.23.106.73

2014-07-10 16:03:54 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: sudo cat /proc/cpuinfo
2014-07-10 16:03:54 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:54 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: df -Th
2014-07-10 16:03:54 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:54 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: sudo cat /proc/meminfo
2014-07-10 16:03:55 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:55 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: hostname
2014-07-10 16:03:55 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:55 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: hostname -d
2014-07-10 16:03:55 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:55 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: tar -zcvf 172.23.106.73-7102014-163-couch.tar.gz '/opt/couchbase/var/lib/couchbase/data' >/dev/null 2>&1
2014-07-10 16:03:56 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully


2014-07-10 16:03:59 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.73: rm -f /root/172.23.106.73-7102014-163-couch.tar.gz
2014-07-10 16:03:59 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:03:59 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.121.106 with username : root password : couchbase ssh_key:
2014-07-10 16:04:00 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.121.106
Collecting data files from 10.3.121.106

2014-07-10 16:04:02 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: sudo cat /proc/cpuinfo
2014-07-10 16:04:07 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:07 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: df -Th
2014-07-10 16:04:07 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:07 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: sudo cat /proc/meminfo
2014-07-10 16:04:13 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:13 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: hostname
2014-07-10 16:04:13 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:13 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: hostname -d
2014-07-10 16:04:13 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:13 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: tar -zcvf 10.3.121.106-7102014-164-couch.tar.gz '/opt/couchbase/var/lib/couchbase/data' >/dev/null 2>&1
2014-07-10 16:04:14 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully


2014-07-10 16:04:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.121.106: rm -f /root/10.3.121.106-7102014-164-couch.tar.gz
2014-07-10 16:04:15 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:15 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 172.23.106.74 with username : root password : couchbase ssh_key:
2014-07-10 16:04:15 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 172.23.106.74
Collecting data files from 172.23.106.74

2014-07-10 16:04:16 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: sudo cat /proc/cpuinfo
2014-07-10 16:04:16 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:16 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: df -Th
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: sudo cat /proc/meminfo
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: hostname
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: hostname -d
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:17 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: tar -zcvf 172.23.106.74-7102014-164-couch.tar.gz '/opt/couchbase/var/lib/couchbase/data' >/dev/null 2>&1
2014-07-10 16:04:18 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully


2014-07-10 16:04:19 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 172.23.106.74: rm -f /root/172.23.106.74-7102014-164-couch.tar.gz
2014-07-10 16:04:19 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:19 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.3.5.44 with username : root password : couchbase ssh_key:
2014-07-10 16:04:20 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.3.5.44
Collecting data files from 10.3.5.44

2014-07-10 16:04:21 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: sudo cat /proc/cpuinfo
2014-07-10 16:04:21 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:21 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: df -Th
2014-07-10 16:04:21 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:21 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: sudo cat /proc/meminfo
2014-07-10 16:04:22 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:22 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: hostname
2014-07-10 16:04:22 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:22 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: hostname -d
2014-07-10 16:04:27 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:27 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: tar -zcvf 10.3.5.44-7102014-164-couch.tar.gz '/opt/couchbase/var/lib/couchbase/data' >/dev/null 2>&1
2014-07-10 16:04:28 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully


2014-07-10 16:04:28 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.3.5.44: rm -f /root/10.3.5.44-7102014-164-couch.tar.gz
2014-07-10 16:04:28 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
2014-07-10 16:04:28 | INFO | MainProcess | test_thread | [xdcrbasetests.tearDown] ============== XDCRbasetests cleanup is started for test #1 cbrecover_multiple_autofailover_swapout_reb_routine ==============
2014-07-10 16:04:28 | INFO | MainProcess | test_thread | [rest_client.remove_remote_cluster] removing remote cluster name:cluster1
2014-07-10 16:04:28 | INFO | MainProcess | test_thread | [rest_client.remove_remote_cluster] removing remote cluster name:cluster0
2014-07-10 16:04:29 | INFO | MainProcess | test_thread | [xdcrbasetests._do_cleanup] cleanup cluster1: [ip:172.23.106.71 port:8091 ssh_username:root, ip:172.23.106.72 port:8091 ssh_username:root, ip:172.23.106.73 port:8091 ssh_username:root]
2014-07-10 16:04:29 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [u'default'] on 172.23.106.71
2014-07-10 16:04:29 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] remove bucket default ...
2014-07-10 16:04:40 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleted bucket : default from 172.23.106.71
2014-07-10 16:04:40 | INFO | MainProcess | test_thread | [bucket_helper.wait_for_bucket_deletion] waiting for bucket deletion to complete....
2014-07-10 16:04:40 | INFO | MainProcess | test_thread | [rest_client.bucket_exists] existing buckets : []
2014-07-10 16:04:40 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] rebalancing all nodes in order to remove nodes
2014-07-10 16:04:40 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=ns_1%40172.23.106.72%2Cns_1%40172.23.106.73&user=Administrator&knownNodes=ns_1%40172.23.106.71%2Cns_1%40172.23.106.72%2Cns_1%40172.23.106.73
2014-07-10 16:04:40 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started
2014-07-10 16:04:45 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] rebalance progress took 5.14277410507 seconds
2014-07-10 16:04:45 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] sleep for 5.14277410507 seconds after rebalance...
2014-07-10 16:04:53 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.72:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 16:04:54 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.72:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 16:04:55 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.72:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 16:04:56 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.72:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 16:04:57 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 16:04:58 | ERROR | MainProcess | test_thread | [rest_client._http_request] socket error while connecting to http://172.23.106.73:8091/nodes/self error [Errno 61] Connection refused
2014-07-10 16:04:59 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] removed all the nodes from cluster associated with ip:172.23.106.71 port:8091 ssh_username:root ? [(u'ns_1@172.23.106.72', 8091), (u'ns_1@172.23.106.73', 8091)]
2014-07-10 16:04:59 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 172.23.106.71:8091
2014-07-10 16:05:00 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 172.23.106.71:8091 is running
2014-07-10 16:05:00 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 172.23.106.72
2014-07-10 16:05:00 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 172.23.106.72:8091
2014-07-10 16:05:00 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 172.23.106.72:8091 is running
2014-07-10 16:05:00 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 172.23.106.73
2014-07-10 16:05:01 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 172.23.106.73:8091
2014-07-10 16:05:01 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 172.23.106.73:8091 is running
2014-07-10 16:05:01 | INFO | MainProcess | test_thread | [xdcrbasetests._do_cleanup] cleanup cluster2: [ip:10.3.121.106 port:8091 ssh_username:root, ip:10.3.121.103 port:8091 ssh_username:root, ip:10.3.5.43 port:8091 ssh_username:root]
2014-07-10 16:05:01 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [u'default'] on 10.3.121.106
2014-07-10 16:05:01 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] remove bucket default ...
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleted bucket : default from 10.3.121.106
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [bucket_helper.wait_for_bucket_deletion] waiting for bucket deletion to complete....
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [rest_client.bucket_exists] existing buckets : []
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] rebalancing all nodes in order to remove nodes
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=ns_1%40172.23.106.74%2Cns_1%4010.3.5.44&user=Administrator&knownNodes=ns_1%4010.3.121.106%2Cns_1%40172.23.106.74%2Cns_1%4010.3.5.44
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [rest_client.rebalance] rebalance operation started
2014-07-10 16:05:04 | INFO | MainProcess | test_thread | [rest_client._rebalance_progress] rebalance percentage : 0 %
2014-07-10 16:05:14 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] rebalance progress took 10.0199129581 seconds
2014-07-10 16:05:14 | INFO | MainProcess | test_thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.cleanup_cluster] removed all the nodes from cluster associated with ip:10.3.121.106 port:8091 ssh_username:root ? [(u'ns_1@172.23.106.74', 8091), (u'ns_1@10.3.5.44', 8091)]
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 10.3.121.106:8091
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 10.3.121.106:8091 is running
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 10.3.121.103
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 10.3.121.103:8091
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 10.3.121.103:8091 is running
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [bucket_helper.delete_all_buckets_or_assert] deleting existing buckets [] on 10.3.5.43
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] waiting for ns_server @ 10.3.5.43:8091
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [cluster_helper.wait_for_ns_servers_or_assert] ns_server @ 10.3.5.43:8091 is running
2014-07-10 16:05:24 | INFO | MainProcess | test_thread | [xdcrbasetests.tearDown] ============== XDCRbasetests cleanup is finished for test #1 cbrecover_multiple_autofailover_swapout_reb_routine ==============
Cluster instance shutdown with force

======================================================================
FAIL: cbrecover_multiple_autofailover_swapout_reb_routine (cbRecoverytests.cbrecovery)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytests/cbRecoverytests.py", line 520, in cbrecover_multiple_autofailover_swapout_reb_routine
    self.common_tearDown_verification()
  File "pytests/cbRecoverytests.py", line 251, in common_tearDown_verification
    self.verify_results(verify_src=True)
  File "pytests/xdcr/xdcrbasetests.py", line 1428, in verify_results
    self.verify_xdcr_stats(src_nodes, dest_nodes, verify_src)
  File "pytests/xdcr/xdcrbasetests.py", line 1412, in verify_xdcr_stats
    self.fail("Mismatches on Meta Information on xdcr-replicated items!")
AssertionError: Mismatches on Meta Information on xdcr-replicated items!

----------------------------------------------------------------------
Ran 1 test in 1263.856s

FAILED (failures=1)
summary so far suite cbRecoverytests.cbrecovery , pass 0 , fail 1
failures so far...
cbRecoverytests.cbrecovery.cbrecover_multiple_autofailover_swapout_reb_routine
testrunner logs, diags and results are available under /Users/ashvinder/mygit/testrunner/logs/testrunner-14-Jul-10_15-44-20/test_1
Run after suite setup for cbRecoverytests.cbrecovery.cbrecover_multiple_autofailover_swapout_reb_routine
Comment by David Liao [ 10/Jul/14 ]
The new failure is not due to meta data mismatch but item count mismatch. So the cbrecovery step can't be skipped.




[MB-11685] windows uninstall failed to remove files Created: 10/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 R2 64bit

Attachments: PNG File ss_2014-07-10_at_5.48.41 PM.png    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Install couchbase server 3.0.0-936 on windows server 2008 R2 64-bit
Uninstall couchbase server 3.0.0-936, the uninstall process did finish but it did not delete couchbase server files
under c:/Program Files/Couchbase/Server

Install couchbase server 3.0.0-949 on windows server 2008 R2 64-bit
Uninstall couchbase server 3.0.0-9349. Got the same issue. All files are not deleted.
IP of windows server: 10.1.2.92
Vm is available for debug





[MB-8685] add UI help text to clarify flush for memacached (was: Flushing memcached bucket does not set Item Count to 0) Created: 23/Jul/13  Updated: 10/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.1.0, 2.2.0, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Brent Woodruff Assignee: Pavel Blagodov
Resolution: Unresolved Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
When a memcached bucket is flushed through the UI, the Item Count does not become 0.

Steps to reproduce:
* Create a memcached bucket with Flush enabled.
* Put data in the memcached bucket so that Item Count in the UI is > 0.
* Using the UI, flush the memcached bucket.
* Note that Item Count is > 0.
* Note that if Gets are performed on the data, the Item Count will be reduced.

My understanding is that when a flush is run on a memcached bucket, data is not actually removed, but all items are set to expired. Since the data is not actually removed, it may be technically correct for the Item Count to remain non-zero, however it is confusing to report this in the UI.

Suggestion is that either Item Count be set to 0 following a successful flush of memcached buckets since they are 'effectively' removed, or that a note of help / explanation be linked to or otherwise provided following the flush of a memcached bucket.

 Comments   
Comment by Aleksey Kondratenko [ 23/Jul/13 ]
Brent, I know this is a bit unexpected. Especially with comparison to couchbase bucket type.

However that's actually how memcached behaves on flush. If you send flush to memcached (ours or upstream) it will merely expire all items by some clever trick. So memory and items stats will not drop down to 0. But all GETs will return nothing and memory will be released on as-needed basis.
Comment by Brent Woodruff [ 24/Jul/13 ]
Can we 'fix' this by adding some help or a tip to the UI? Or changing the 'Flush' button for memcached buckets to 'Expire all items'?
Comment by Aleksey Kondratenko [ 24/Jul/13 ]
That looks like decent idea. Assigning on our PM in order to finalize decision and prioritize
Comment by Anil Kumar [ 24/Jul/13 ]
Yes I agree that's good idea I'll track this for next release cycle.
Comment by Perry Krug [ 25/Jul/13 ]
Honestly I don't agree with part of that approach. I realize there is a bit of problem/confusion here with the user, but I don't think that changing from "flush" to "expire" will solve that. We need to keep in mind that "flush" is a very common term within the memcached (and Couchbase) community, and that out REST API, CLI and client SDK's refer to "flush". Changing everything would be a big pain and make a number of API's incompatible, and not changing everything would put different terms in different places for the same functionality. Either way will still result in questions to support...which is exactly what we're trying to address here.

If I may propose, upon clicking the "flush" button, the popup should just have a note that says "Note, flushing memcached buckets does not immediately clear all items or memory, rather invalidates all data which will then be removed through eviction and/or further application access". And then also add the right descriptions into our documentation.
Comment by Anil Kumar [ 10/Feb/14 ]
We will add "What's this" link with description explaining what Flush means in case Ccouchbase and Memacached buckets.

Comment by Aleksey Kondratenko [ 10/Feb/14 ]
Anil, please add exact text message and better description on UI change (where to add this comment and how it'll look).
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
Raising to Major for 3.0 - assigning to Anil to provide Alk's request.
Comment by Anil Kumar [ 04/Jun/14 ]
Triage - June 04 2014 Alk, Wayne, Parag, Anil
Comment by Ruth Harris [ 24/Jun/14 ]
Just to clarify what happening when flush is enabled via the UI?
    couchbase-cli bucket-edit --enable-flush=[0|1]?
    then couchbase-cli bucket_flush?


Could we clarify the REST commands and process?
The REST mentions to use either the UI or CLI, however, it recommends using the cbepctl flush_param rather than the bucket-create/edit --enable-flush

Also, could someone address MB-10985? It's related:
Per www.couchbase.com/issues/browse/MB-10985, the CLI flushall_enabled parameter is supposed to be deprecated.
See http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#enabling-flush-of-data-buckets---will-be-deprecated
Comment by Ruth Harris [ 09/Jul/14 ]
UI help text for "What's this?"

Enable flush allows you to flush data from a bucket. When flushed, items in memcached buckets are flagged for removal and subsequently removed; items in couchbase buckets are immediately removed.

Comment by Anil Kumar [ 09/Jul/14 ]
Pavel - Please check the Ruth's comment for Help text.
Comment by Brent Woodruff [ 10/Jul/14 ]
I'd like to chime in and say that I don't believe the currently proposed text solves the issue because it contains some ambiguous wording. "flagged for removal and subsequently removed" still sounds like the end user should expect the item count to become zero upon flushing a memcached bucket. The technical term "expired" needs to be in there and it needs to be explicit that no change in the item count for memcached buckets is expected.

Suggested text:

"Enable flush allows you to flush data from a bucket. If the bucket is a couchbase bucket, then all items are immediately removed and the item count immediately becomes zero. If the bucket is a memcached bucket, then all Items are immediately marked as expired and, while the item count will not immediately become zero, the expired items cannot be retrieved and will removed as normal operations on the memached bucket continue."
Comment by Anil Kumar [ 10/Jul/14 ]
Brent - we can always go back to our documentation to read more about it. Goal here isn't replace documentation with helptext but its a quick reference.

I have created separate ticket MB-11684 to fix that in our documentation.

For helptext lets go with short version - "Enable flush allows you to flush data from a bucket. When flushed, items in memcached buckets are flagged for removal and subsequently removed; items in couchbase buckets are immediately removed."




[MB-11684] Clarify FLUSH on Couchbase bucket vs Memacached bucket Created: 10/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Clarify FLUSH on Couchbase bucket vs Memacached bucket -

"Enable flush allows you to flush data from a bucket. If the bucket is a couchbase bucket, then all items are immediately removed and the item count immediately becomes zero. If the bucket is a memcached bucket, then all Items are immediately marked as expired and, while the item count will not immediately become zero, the expired items cannot be retrieved and will removed as normal operations on the memached bucket continue."




[MB-11629] Memcached crashed during rebalance Created: 03/Jul/14  Updated: 10/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-918
ubuntu 12.04, 64 bit

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
http://qa.hq.northscale.net/job/ubuntu_x64--01_02--rebalanceXDCR-P0/19/consoleFull


[Core Logs]
Basic crash dump analysis of /tmp//core.memcached.21508.

Please send the file to support@couchbase.com

--------------------------------------------------------------------------------
File information:
-rwxr-xr-x 1 couchbase couchbase 4958595 2014-07-01 19:00 /opt/couchbase/bin/memcached
6cd323a6609b29186b45436c840e7580 /opt/couchbase/bin/memcached
-rw------- 1 couchbase couchbase 332505088 2014-07-02 17:38 /tmp//core.memcached.21508
5b2434f7bc783b86c7d165249c783519 /tmp//core.memcached.21508
--------------------------------------------------------------------------------
Core file callstacks:
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/couchbase/bin/memcached...done.
[New Thread 21511]
[New Thread 21512]
[New Thread 23321]
[New Thread 25659]
[New Thread 29947]
[New Thread 21508]
[New Thread 29948]
[New Thread 21509]
[New Thread 21510]
[New Thread 29934]
[New Thread 29950]
[New Thread 27817]
[New Thread 21514]
[New Thread 29949]
[New Thread 21515]
[New Thread 21513]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1
Reading symbols from /opt/couchbase/bin/../lib/libplatform.so.0.1.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libplatform.so.0.1.0
Reading symbols from /opt/couchbase/bin/../lib/libcJSON.so.1.0.0...done.
Loaded symbols for /opt/couchbase/bin/../lib/libcJSON.so.1.0.0
Reading symbols from /opt/couchbase/bin/../lib/libJSON_checker.so...done.
Loaded symbols for /opt/couchbase/bin/../lib/libJSON_checker.so
Reading symbols from /opt/couchbase/bin/../lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/bin/../lib/libsnappy.so.1
Reading symbols from /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4
Reading symbols from /opt/couchbase/bin/../lib/libevent_core-2.0.so.5...done.
Loaded symbols for /opt/couchbase/bin/../lib/libevent_core-2.0.so.5
Reading symbols from /lib/libssl.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.0.9.8
Reading symbols from /lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.0.9.8
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so
Reading symbols from /opt/couchbase/lib/libdirutils.so.0.1.0...done.
Loaded symbols for /opt/couchbase/lib/libdirutils.so.0.1.0
Reading symbols from /opt/couchbase/lib/libv8.so...done.
Loaded symbols for /opt/couchbase/lib/libv8.so
Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicui18n.so.44
Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicuuc.so.44
Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicudata.so.44
Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
Program terminated with signal 6, Aborted.
#0 0x00007f4c6455ca75 in raise () from /lib/libc.so.6

Thread 16 (Thread 21513):
#0 0x00007f4c646102d3 in epoll_wait () from /lib/libc.so.6
#1 0x00007f4c65c875a6 in epoll_dispatch (base=0x6312780,
    tv=<value optimized out>) at epoll.c:404
#2 0x00007f4c65c72a04 in event_base_loop (base=0x6312780,
    flags=<value optimized out>) at event.c:1558
#3 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e110)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 15 (Thread 21515):
#0 0x00007f4c646102d3 in epoll_wait () from /lib/libc.so.6
#1 0x00007f4c65c875a6 in epoll_dispatch (base=0x6312c80,
    tv=<value optimized out>) at epoll.c:404
#2 0x00007f4c65c72a04 in event_base_loop (base=0x6312c80,
    flags=<value optimized out>) at event.c:1558
#3 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e130)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 14 (Thread 29949):
#0 0x00007f4c65471bc9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c666f21cb in cb_cond_timedwait (cond=0x64c8058, mutex=0x64c8020,
    ms=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f4c5fa2c5ff in SyncObject::wait (this=0x64c8018,
    tv=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f4c5fa27cfc in ExecutorPool::trySleep (this=0x64c8000, t=...,
    now=...)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:190
#4 0x00007f4c5fa28066 in ExecutorPool::_nextTask (this=0x64c8000, t=...,
    tick=8 '\b')
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:155
#5 0x00007f4c5fa280bf in ExecutorPool::nextTask (this=0x64c8000, t=...,
    tick=98 'b')
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:165
#6 0x00007f4c5fa39362 in ExecutorThread::run (this=0xb338760)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:77
#7 0x00007f4c5fa397ad in launch_executor_thread (arg=0x64c805c)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#8 0x00007f4c666f20df in platform_thread_wrap (arg=0x9045ef0)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#9 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#10 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 13 (Thread 21514):
#0 0x00007f4c646102d3 in epoll_wait () from /lib/libc.so.6
#1 0x00007f4c65c875a6 in epoll_dispatch (base=0x6312a00,
    tv=<value optimized out>) at epoll.c:404
#2 0x00007f4c65c72a04 in event_base_loop (base=0x6312a00,
    flags=<value optimized out>) at event.c:1558
#3 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e120)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 12 (Thread 27817):
#0 0x00007f4c6547185c in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c62d00034 in engine_shutdown_thread (arg=0x6325180)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:1610
#2 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5f700)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#3 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#4 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 11 (Thread 29950):
#0 0x00007f4c65471bc9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c666f21cb in cb_cond_timedwait (cond=0x64c8058, mutex=0x64c8020,
    ms=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f4c5fa2c5ff in SyncObject::wait (this=0x64c8018,
    tv=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f4c5fa27cfc in ExecutorPool::trySleep (this=0x64c8000, t=...,
    now=...)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:190
#4 0x00007f4c5fa28066 in ExecutorPool::_nextTask (this=0x64c8000, t=...,
    tick=16 '\020')
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:155
#5 0x00007f4c5fa280bf in ExecutorPool::nextTask (this=0x64c8000, t=...,
    tick=97 'a')
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:165
#6 0x00007f4c5fa39362 in ExecutorThread::run (this=0xb338260)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:77
#7 0x00007f4c5fa397ad in launch_executor_thread (arg=0x64c805c)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#8 0x00007f4c666f20df in platform_thread_wrap (arg=0x9045920)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#9 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#10 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 10 (Thread 29934):
#0 0x00007f4c6547185c in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c62d00034 in engine_shutdown_thread (arg=0xa0a3e20)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:1610
#2 0x00007f4c666f20df in platform_thread_wrap (arg=0xafe2e90)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#3 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#4 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 9 (Thread 21510):
#0 0x00007f4c65471bc9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c666f21cb in cb_cond_timedwait (cond=0x7f4c6390e240,
    mutex=0x7f4c6390e200, ms=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f4c6370d1e8 in logger_thead_main (arg=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/extensions/loggers/file_logger.c:372
#3 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e070)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 8 (Thread 21509):
#0 0x00007f4c64601a6d in read () from /lib/libc.so.6
#1 0x00007f4c6459c598 in _IO_file_underflow () from /lib/libc.so.6
#2 0x00007f4c6459e13e in _IO_default_uflow () from /lib/libc.so.6
#3 0x00007f4c6459268e in _IO_getline_info () from /lib/libc.so.6
#4 0x00007f4c64591579 in fgets () from /lib/libc.so.6
#5 0x00007f4c64110a91 in fgets (arg=<value optimized out>)
    at /usr/include/bits/stdio2.h:255
#6 check_stdin_thread (arg=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/extensions/daemon/stdin_check.c:38
#7 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e060)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#8 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#9 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()

Thread 7 (Thread 29948):
#0 0x00007f4c645af645 in mempcpy () from /lib/libc.so.6
#1 0x00007f4c6459ef8e in _IO_default_xsputn () from /lib/libc.so.6
#2 0x00007f4c6456ea10 in vfprintf () from /lib/libc.so.6
#3 0x00007f4c64626d30 in __vsnprintf_chk () from /lib/libc.so.6
#4 0x00007f4c64626c6a in __snprintf_chk () from /lib/libc.so.6
#5 0x00007f4c5f9e6bc5 in snprintf (this=0x6ec5698,
    add_stat=0x4099e0 <append_stats>, cookie=0x62a5b00)
    at /usr/include/bits/stdio2.h:66
#6 CheckpointManager::addStats (this=0x6ec5698,
    add_stat=0x4099e0 <append_stats>, cookie=0x62a5b00)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/checkpoint.cc:1560
#7 0x00007f4c5fa24de3 in StatCheckpointVisitor::addCheckpointStat(void const*, void (*)(char const*, unsigned short, char const*, unsigned int, void const*), EventuallyPersistentStore*, RCPtr<VBucket>&) ()
   from /opt/couchbase/lib/memcached/ep.so
#8 0x00007f4c5fa24eb8 in StatCheckpointVisitor::visitBucket(RCPtr<VBucket>&)
    () from /opt/couchbase/lib/memcached/ep.so
#9 0x00007f4c5f9f0ffc in EventuallyPersistentStore::visit (
    this=<value optimized out>, visitor=...)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/ep.cc:2784
#10 0x00007f4c5fa26f68 in StatCheckpointTask::run() ()
   from /opt/couchbase/lib/memcached/ep.so
#11 0x00007f4c5fa39401 in ExecutorThread::run (this=0xb338440)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:95
#12 0x00007f4c5fa397ad in launch_executor_thread (arg=0x7f4c5c476998)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#13 0x00007f4c666f20df in platform_thread_wrap (arg=0x9045e20)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#14 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#15 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 6 (Thread 21508):
#0 0x00007f4c646102d3 in epoll_wait () from /lib/libc.so.6
#1 0x00007f4c65c875a6 in epoll_dispatch (base=0x6312000,
    tv=<value optimized out>) at epoll.c:404
#2 0x00007f4c65c72a04 in event_base_loop (base=0x6312000,
    flags=<value optimized out>) at event.c:1558
#3 0x00000000004108d4 in main (argc=<value optimized out>,
    argv=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:8768

Thread 5 (Thread 29947):
#0 0x00007f4c65471bc9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c666f21cb in cb_cond_timedwait (cond=0x64c8058, mutex=0x64c8020,
    ms=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007f4c5fa2c5ff in SyncObject::wait (this=0x64c8018,
    tv=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/syncobject.h:74
#3 0x00007f4c5fa27cfc in ExecutorPool::trySleep (this=0x64c8000, t=...,
    now=...)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:190
#4 0x00007f4c5fa28066 in ExecutorPool::_nextTask (this=0x64c8000, t=...,
    tick=0 '\000')
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:155
#5 0x00007f4c5fa280bf in ExecutorPool::nextTask (this=0x64c8000, t=...,
    tick=99 'c')
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorpool.cc:165
#6 0x00007f4c5fa39362 in ExecutorThread::run (this=0xb337d60)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:77
#7 0x00007f4c5fa397ad in launch_executor_thread (arg=0x64c805c)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/executorthread.cc:33
#8 0x00007f4c666f20df in platform_thread_wrap (arg=0x9045c10)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#9 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#10 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()

Thread 4 (Thread 25659):
#0 0x00007f4c6547185c in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1 0x00007f4c62d00034 in engine_shutdown_thread (arg=0x63242a0)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:1610
#2 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5ee40)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#3 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#4 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 3 (Thread 23321):
#0 0x00007f4c645d369d in nanosleep () from /lib/libc.so.6
#1 0x00007f4c64608df4 in usleep () from /lib/libc.so.6
#2 0x00007f4c5fa37905 in updateStatsThread (arg=<value optimized out>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/memory_tracker.cc:36
#3 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e1f0)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 2 (Thread 21512):
#0 0x00007f4c646102d3 in epoll_wait () from /lib/libc.so.6
#1 0x00007f4c65c875a6 in epoll_dispatch (base=0x6312500,
    tv=<value optimized out>) at epoll.c:404
#2 0x00007f4c65c72a04 in event_base_loop (base=0x6312500,
    flags=<value optimized out>) at event.c:1558
#3 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e100)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 1 (Thread 21511):
#0 0x00007f4c6455ca75 in raise () from /lib/libc.so.6
#1 0x00007f4c645605c0 in abort () from /lib/libc.so.6
#2 0x00007f4c64555941 in __assert_fail () from /lib/libc.so.6
#3 0x00000000004083ec in decrement_session_ctr ()
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:7731
#4 0x00007f4c5fa1f0be in EventuallyPersistentEngine::decrementSessionCtr (
    this=0x6360800)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/ep_engine.h:510
#5 0x00007f4c5fa1df0b in processUnknownCommand (h=0x6360800,
    cookie=0x617d800, request=0x6258000,
    response=0x40cb20 <binary_response_handler>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/ep_engine.cc:1193
#6 0x00007f4c5fa1ee0c in EvpUnknownCommand (handle=0x6360800,
    cookie=0x617d800, request=0x6258000,
    response=0x40cb20 <binary_response_handler>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/ep-engine/src/ep_engine.cc:1260
#7 0x00007f4c62d02bb3 in bucket_unknown_command (handle=0x7f4c62f09220,
    cookie=0x617d800, request=0x6258000,
    response=0x40cb20 <binary_response_handler>)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/engines/bucket_engine/bucket_engine.c:3215
#8 0x00000000004191ca in process_bin_unknown_packet (c=0x617d800)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:2663
#9 process_bin_packet (c=0x617d800)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:5392
#10 complete_nread (c=0x617d800)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:5796
#11 conn_nread (c=0x617d800)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:7003
#12 0x000000000040c73d in event_handler (fd=<value optimized out>,
    which=<value optimized out>, arg=0x617d800)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/memcached/daemon/memcached.c:7276
#13 0x00007f4c65c72afc in event_process_active_single_queue (base=0x6312280,
    flags=<value optimized out>) at event.c:1308
#14 event_process_active (base=0x6312280, flags=<value optimized out>)
    at event.c:1375
#15 event_base_loop (base=0x6312280, flags=<value optimized out>)
    at event.c:1572
#16 0x00007f4c666f20df in platform_thread_wrap (arg=0x1a5e0f0)
    at /home/buildbot/ubuntu-1004-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#17 0x00007f4c6546c9ca in start_thread () from /lib/libpthread.so.0
#18 0x00007f4c6460fcdd in clone () from /lib/libc.so.6
#19 0x0000000000000000 in ?? ()
--------------------------------------------------------------------------------
Module information:
/opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0:
-rw-r--r-- 1 couchbase couchbase 415163 2014-07-01 19:01 /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0
b5f73dc290899e88fded946fde6a5244 /opt/couchbase/bin/../lib/memcached/libmcd_util.so.1.0.0
/opt/couchbase/bin/../lib/libcbsasl.so.1.1.1:
-rw-r--r-- 1 couchbase couchbase 412706 2014-07-01 19:01 /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1
bf540b44de8c56f1bcab31c97edf825d /opt/couchbase/bin/../lib/libcbsasl.so.1.1.1
/opt/couchbase/bin/../lib/libplatform.so.0.1.0:
-rw-r--r-- 1 couchbase couchbase 556937 2014-07-01 19:01 /opt/couchbase/bin/../lib/libplatform.so.0.1.0
852ed58c53d2c9e69cef21d27e08370b /opt/couchbase/bin/../lib/libplatform.so.0.1.0
/opt/couchbase/bin/../lib/libcJSON.so.1.0.0:
-rw-r--r-- 1 couchbase couchbase 95332 2014-07-01 19:01 /opt/couchbase/bin/../lib/libcJSON.so.1.0.0
6e1bac6b7025efd048042f0da34841f5 /opt/couchbase/bin/../lib/libcJSON.so.1.0.0
/opt/couchbase/bin/../lib/libJSON_checker.so:
-rw-r--r-- 1 couchbase couchbase 35648 2014-07-01 19:01 /opt/couchbase/bin/../lib/libJSON_checker.so
1fd90948b5c1feb47b290e8d1714f4ac /opt/couchbase/bin/../lib/libJSON_checker.so
/opt/couchbase/bin/../lib/libsnappy.so.1:
lrwxrwxrwx 1 couchbase couchbase 18 2014-07-01 22:35 /opt/couchbase/bin/../lib/libsnappy.so.1 -> libsnappy.so.1.1.2
7c50c2a147ab8247b5a2b61a38604ccc /opt/couchbase/bin/../lib/libsnappy.so.1
/opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4:
lrwxrwxrwx 1 couchbase couchbase 28 2014-07-01 22:35 /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4 -> libtcmalloc_minimal.so.4.2.1
a0669ae75ee5ae5352dd7c2704adf766 /opt/couchbase/bin/../lib/libtcmalloc_minimal.so.4
/opt/couchbase/bin/../lib/libevent_core-2.0.so.5:
lrwxrwxrwx 1 couchbase couchbase 26 2014-07-01 22:35 /opt/couchbase/bin/../lib/libevent_core-2.0.so.5 -> libevent_core-2.0.so.5.1.0
7c23d736254e9dc999fcad4782e986e5 /opt/couchbase/bin/../lib/libevent_core-2.0.so.5
/lib/libssl.so.0.9.8:
-rw-r--r-- 1 root root 333904 2012-04-24 08:29 /lib/libssl.so.0.9.8
d32b70676a28d4b191a859f5621d489d /lib/libssl.so.0.9.8
/lib/libcrypto.so.0.9.8:
-rw-r--r-- 1 root root 1622304 2012-04-24 08:29 /lib/libcrypto.so.0.9.8
34885593167472209511acfacef5a962 /lib/libcrypto.so.0.9.8
/lib/libpthread.so.0:
lrwxrwxrwx 1 root root 20 2012-05-24 11:03 /lib/libpthread.so.0 -> libpthread-2.11.1.so
f636a70138b6fbae3d067d560725a238 /lib/libpthread.so.0
/lib/libdl.so.2:
lrwxrwxrwx 1 root root 15 2012-05-24 11:03 /lib/libdl.so.2 -> libdl-2.11.1.so
b000e5ace83cb232171a8c973c443299 /lib/libdl.so.2
/lib/librt.so.1:
lrwxrwxrwx 1 root root 15 2012-05-24 11:03 /lib/librt.so.1 -> librt-2.11.1.so
28d8634d49bbe02e93af23a7adc632e8 /lib/librt.so.1
/usr/lib/libstdc++.so.6:
lrwxrwxrwx 1 root root 19 2012-05-24 11:03 /usr/lib/libstdc++.so.6 -> libstdc++.so.6.0.13
778aaa89a71cdc1f55b5986a9a9e3499 /usr/lib/libstdc++.so.6
/lib/libm.so.6:
lrwxrwxrwx 1 root root 14 2012-05-24 11:03 /lib/libm.so.6 -> libm-2.11.1.so
77385c4ce7ee521e4c38fb9cc13f534e /lib/libm.so.6
/lib/libgcc_s.so.1:
-rw-r--r-- 1 root root 92552 2012-03-08 20:17 /lib/libgcc_s.so.1
64bedbbb11f8452c9811c7d60c56b49a /lib/libgcc_s.so.1
/lib/libc.so.6:
lrwxrwxrwx 1 root root 14 2012-05-24 11:03 /lib/libc.so.6 -> libc-2.11.1.so
940dc8606a7975374e0be61e44c12fa8 /lib/libc.so.6
/lib/libz.so.1:
lrwxrwxrwx 1 root root 15 2012-05-24 10:41 /lib/libz.so.1 -> libz.so.1.2.3.3
6abd7af4f2752f371b0ecb7cc601c3ac /lib/libz.so.1
/lib64/ld-linux-x86-64.so.2:
lrwxrwxrwx 1 root root 12 2012-05-24 11:03 /lib64/ld-linux-x86-64.so.2 -> ld-2.11.1.so
c024251e1af3963fbb3fcef25d58410e /lib64/ld-linux-x86-64.so.2
/opt/couchbase/lib/memcached/stdin_term_handler.so:
-rw-r--r-- 1 couchbase couchbase 119503 2014-07-01 19:01 /opt/couchbase/lib/memcached/stdin_term_handler.so
2ed71f63866e3ab453c975ecdd722b3b /opt/couchbase/lib/memcached/stdin_term_handler.so
/opt/couchbase/lib/memcached/file_logger.so:
-rw-r--r-- 1 couchbase couchbase 143927 2014-07-01 19:01 /opt/couchbase/lib/memcached/file_logger.so
77dea259b81310f73ea9798dd8f60e5d /opt/couchbase/lib/memcached/file_logger.so
/opt/couchbase/lib/memcached/bucket_engine.so:
-rw-r--r-- 1 couchbase couchbase 421096 2014-07-01 19:01 /opt/couchbase/lib/memcached/bucket_engine.so
f7d424c7f5a853fd5f84b6cffcfb1b38 /opt/couchbase/lib/memcached/bucket_engine.so
/opt/couchbase/lib/memcached/ep.so:
-rw-r--r-- 1 couchbase couchbase 18673956 2014-07-01 19:01 /opt/couchbase/lib/memcached/ep.so
c30a26c281056fd381a09425633277fd /opt/couchbase/lib/memcached/ep.so
/opt/couchbase/lib/libcouchstore.so:
-rw-r--r-- 1 couchbase couchbase 3933395 2014-07-01 19:01 /opt/couchbase/lib/libcouchstore.so
66dd2e0d770234cddd60446cfd95f867 /opt/couchbase/lib/libcouchstore.so
/opt/couchbase/lib/libdirutils.so.0.1.0:
-rw-r--r-- 1 couchbase couchbase 181445 2014-07-01 19:01 /opt/couchbase/lib/libdirutils.so.0.1.0
de8c58cc7f1bc7a989e66becfefb2db2 /opt/couchbase/lib/libdirutils.so.0.1.0
/opt/couchbase/lib/libv8.so:
-rw-r--r-- 1 couchbase couchbase 116909487 2014-07-01 19:01 /opt/couchbase/lib/libv8.so
02be0eea832a546d2493cf530b4535ad /opt/couchbase/lib/libv8.so
/opt/couchbase/lib/libicui18n.so.44:
lrwxrwxrwx 1 couchbase couchbase 18 2014-07-01 22:35 /opt/couchbase/lib/libicui18n.so.44 -> libicui18n.so.44.0
c7b26e773e42456459912c214e35c571 /opt/couchbase/lib/libicui18n.so.44
/opt/couchbase/lib/libicuuc.so.44:
lrwxrwxrwx 1 couchbase couchbase 16 2014-07-01 22:35 /opt/couchbase/lib/libicuuc.so.44 -> libicuuc.so.44.0
5183a8c242bef1c814ad4eb30308f779 /opt/couchbase/lib/libicuuc.so.44
/opt/couchbase/lib/libicudata.so.44:
lrwxrwxrwx 1 couchbase couchbase 18 2014-07-01 22:35 /opt/couchbase/lib/libicudata.so.44 -> libicudata.so.44.0
93c3bc2bf90a238425388d76543168a0 /opt/couchbase/lib/libicudata.so.44

 Comments   
Comment by Sangharsh Agarwal [ 03/Jul/14 ]
I think MB-11599 is wrongly marked as duplicate of MB-11572. Sorry If I missed something, but as per my observation traces attached in MB-11572 were different from MB-11599.
Comment by Sangharsh Agarwal [ 03/Jul/14 ]
[Server Log]
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11629/349add7c/10.3.3.142-diag.txt.gz
10.3.3.142 : https://s3.amazonaws.com/bugdb/jira/MB-11629/5d2c12b5/10.3.3.142-722014-1814-diag.zip
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11629/b1f77fa5/10.3.3.143-722014-188-diag.zip
10.3.3.143 : https://s3.amazonaws.com/bugdb/jira/MB-11629/c89011f9/10.3.3.143-diag.txt.gz
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11629/8b1a9770/10.3.3.144-722014-1820-diag.zip
10.3.3.144 : https://s3.amazonaws.com/bugdb/jira/MB-11629/bee710d9/10.3.3.144-diag.txt.gz
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11629/3aa0c1d1/10.3.3.145-722014-1811-diag.zip
10.3.3.145 : https://s3.amazonaws.com/bugdb/jira/MB-11629/8877a22f/10.3.3.145-diag.txt.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11629/e751ec21/10.3.3.146-diag.txt.gz
10.3.3.146 : https://s3.amazonaws.com/bugdb/jira/MB-11629/f7009b5f/10.3.3.146-722014-1817-diag.zip
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11629/862ceaf1/10.3.3.147-722014-1825-diag.zip
10.3.3.147 : https://s3.amazonaws.com/bugdb/jira/MB-11629/e029a84d/10.3.3.147-diag.txt.gz
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11629/8c91f598/10.3.3.148-722014-1822-diag.zip
10.3.3.148 : https://s3.amazonaws.com/bugdb/jira/MB-11629/d20eacee/10.3.3.148-diag.txt.gz
10.3.3.149 : https://s3.amazonaws.com/bugdb/jira/MB-11629/8d75f3f5/10.3.3.149-722014-1827-diag.zip
10.3.3.149 : https://s3.amazonaws.com/bugdb/jira/MB-11629/9bf52b14/10.3.3.149-diag.txt.gz

[Core Logs]
core : https://s3.amazonaws.com/bugdb/jira/MB-11629/ea944e30/core-10.3.3.146-0.log
Comment by Ketaki Gangal [ 03/Jul/14 ]
Seeing (likely)similar crash on rebalance on System Views test, I dont have cores with current run. Will update once I do.
Comment by Abhinav Dangeti [ 07/Jul/14 ]
Sangarsh, can you try your scenario on centos machines, and let me know if you're able to reproduce this issue.
If that be the case, I have some toy builds that contain some additional logging and some alternate logic, and we can give those a try.
Comment by Sangharsh Agarwal [ 08/Jul/14 ]
This bug occurred on ubuntu VMs. Test was passed on CentOS.
Comment by Abhinav Dangeti [ 08/Jul/14 ]
Please re-test, once we've a build with this change:
http://review.couchbase.org/#/c/39212/
Comment by Abhinav Dangeti [ 08/Jul/14 ]
Re-open if issue still persists.
Comment by Sangharsh Agarwal [ 10/Jul/14 ]
Issue re-occurred on build 3.0.0-941 execution

[Job]
http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/11/consoleFull

[Test]
./testrunner -i centos_x64--107_01--rebalanceXDCR-P1.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True -t xdcr.rebalanceXDCR.Rebalance.swap_rebalance_out_master,items=100000,rdirection=bidirection,ctopology=chain,doc-ops=update-delete,doc-ops-dest=update-delete,rebalance=source-destination,GROUP=P1


[Test Error]
[2014-07-09 10:31:10,870] - [rest_client:1216] INFO - rebalance percentage : 97.8005865103 %
[2014-07-09 10:31:11,313] - [rest_client:1200] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
[2014-07-09 10:31:11,408] - [rest_client:2007] INFO - Latest logs from UI on 10.5.2.228:
[2014-07-09 10:31:11,409] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.5.2.228\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-07-09T10:31:09.308Z', u'module': u'ns_memcached', u'tstamp': 1404927069308, u'type': u'info'}
[2014-07-09 10:31:11,409] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 2, u'text': u'Rebalance exited with reason {unexpected_exit,\n {\'EXIT\',<0.17115.99>,\n {bulk_set_vbucket_state_failed,\n [{\'ns_1@10.5.2.229\',\n {\'EXIT\',\n {{{{case_clause,\n {error,\n {{{badmatch,\n {error,\n {{badmatch,\n {memcached_error,key_enoent,\n <<"Engine not found">>}},\n [{mc_replication,connect,1,\n [{file,\n "src/mc_replication.erl"},\n {line,50}]},\n {upr_proxy,connect,4,\n [{file,"src/upr_proxy.erl"},\n {line,177}]},\n {upr_proxy,maybe_connect,1,\n [{file,"src/upr_proxy.erl"},\n {line,164}]},\n {upr_producer_conn,init,2,\n [{file,\n "src/upr_producer_conn.erl"},\n {line,30}]},\n {upr_proxy,init,1,\n [{file,"src/upr_proxy.erl"},\n {line,48}]},\n {gen_server,init_it,6,\n [{file,"gen_server.erl"},\n {line,304}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},\n {line,239}]}]}}},\n [{upr_replicator,init,1,\n [{file,"src/upr_replicator.erl"},\n {line,48}]},\n {gen_server,init_it,6,\n [{file,"gen_server.erl"},\n {line,304}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},\n {line,239}]}]},\n {child,undefined,\'ns_1@10.5.2.228\',\n {upr_replicator,start_link,\n [\'ns_1@10.5.2.228\',"default"]},\n temporary,60000,worker,\n [upr_replicator]}}}},\n [{upr_sup,start_replicator,2,\n [{file,"src/upr_sup.erl"},{line,78}]},\n {upr_sup,\n \'-set_desired_replications/2-lc$^2/1-2-\',\n 2,\n [{file,"src/upr_sup.erl"},{line,55}]},\n {upr_sup,set_desired_replications,2,\n [{file,"src/upr_sup.erl"},{line,55}]},\n {replication_manager,handle_call,3,\n [{file,"src/replication_manager.erl"},\n {line,130}]},\n {gen_server,handle_msg,5,\n [{file,"gen_server.erl"},{line,585}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,239}]}]},\n {gen_server,call,\n [\'replication_manager-default\',\n {change_vbucket_replication,118,\n \'ns_1@10.5.2.228\'},\n infinity]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@10.5.2.229\'},\n {if_rebalance,<0.27856.98>,\n {update_vbucket_state,119,replica,\n undefined,\'ns_1@10.5.2.234\'}},\n infinity]}}}}]}}}\n', u'shortText': u'message', u'serverTime': u'2014-07-09T10:31:06.026Z', u'module': u'ns_orchestrator', u'tstamp': 1404927066026, u'type': u'info'}
[2014-07-09 10:31:11,410] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'<0.17318.99> exited with {unexpected_exit,\n {\'EXIT\',<0.17115.99>,\n {bulk_set_vbucket_state_failed,\n [{\'ns_1@10.5.2.229\',\n {\'EXIT\',\n {{{{case_clause,\n {error,\n {{{badmatch,\n {error,\n {{badmatch,\n {memcached_error,key_enoent,\n <<"Engine not found">>}},\n [{mc_replication,connect,1,\n [{file,"src/mc_replication.erl"},\n {line,50}]},\n {upr_proxy,connect,4,\n [{file,"src/upr_proxy.erl"},\n {line,177}]},\n {upr_proxy,maybe_connect,1,\n [{file,"src/upr_proxy.erl"},\n {line,164}]},\n {upr_producer_conn,init,2,\n [{file,"src/upr_producer_conn.erl"},\n {line,30}]},\n {upr_proxy,init,1,\n [{file,"src/upr_proxy.erl"},\n {line,48}]},\n {gen_server,init_it,6,\n [{file,"gen_server.erl"},\n {line,304}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},\n {line,239}]}]}}},\n [{upr_replicator,init,1,\n [{file,"src/upr_replicator.erl"},\n {line,48}]},\n {gen_server,init_it,6,\n [{file,"gen_server.erl"},{line,304}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,239}]}]},\n {child,undefined,\'ns_1@10.5.2.228\',\n {upr_replicator,start_link,\n [\'ns_1@10.5.2.228\',"default"]},\n temporary,60000,worker,\n [upr_replicator]}}}},\n [{upr_sup,start_replicator,2,\n [{file,"src/upr_sup.erl"},{line,78}]},\n {upr_sup,\n \'-set_desired_replications/2-lc$^2/1-2-\',\n 2,\n [{file,"src/upr_sup.erl"},{line,55}]},\n {upr_sup,set_desired_replications,2,\n [{file,"src/upr_sup.erl"},{line,55}]},\n {replication_manager,handle_call,3,\n [{file,"src/replication_manager.erl"},\n {line,130}]},\n {gen_server,handle_msg,5,\n [{file,"gen_server.erl"},{line,585}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,239}]}]},\n {gen_server,call,\n [\'replication_manager-default\',\n {change_vbucket_replication,118,\n \'ns_1@10.5.2.228\'},\n infinity]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\'ns_1@10.5.2.229\'},\n {if_rebalance,<0.27856.98>,\n {update_vbucket_state,119,replica,\n undefined,\'ns_1@10.5.2.234\'}},\n infinity]}}}}]}}}', u'shortText': u'message', u'serverTime': u'2014-07-09T10:31:06.005Z', u'module': u'ns_vbucket_mover', u'tstamp': 1404927066005, u'type': u'critical'}
[2014-07-09 10:31:11,411] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@10.5.2.228' disconnected: {badmatch,\n {error,\n einval}}", u'shortText': u'message', u'serverTime': u'2014-07-09T10:31:05.111Z', u'module': u'ns_memcached', u'tstamp': 1404927065111, u'type': u'info'}
[2014-07-09 10:31:11,411] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u"Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 134. Restarting. Messages: Wed Jul 9 10:31:03.731560 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-ffc3287d8a27b709905167cd9eff75c6 - (vb 94) Stream closing, 0 items sent from disk, 179 items sent from memory, 199 was last seqno sent\nWed Jul 9 10:31:04.610874 PDT 3: (default) UPR (Notifier) eq_uprq:xdcr:notifier:ns_1@10.5.2.228:default - (vb 62) stream created with start seqno 184 and end seqno 0\nWed Jul 9 10:31:04.611285 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-ffc3287d8a27b709905167cd9eff75c6 - (vb 64) stream created with start seqno 1 and end seqno 175\nWed Jul 9 10:31:04.613383 PDT 3: (default) UPR (Producer) eq_uprq:xdcr:default-ffc3287d8a27b709905167cd9eff75c6 - (vb 64) Stream closing, 0 items sent from disk, 174 items sent from memory, 175 was last seqno sent\nmemcached: /buildbot/build_slave/centos-5-x64-300-builder/build/build/memcached/daemon/memcached.c:7731: decrement_session_ctr: Assertion `session_cas.ctr != 0' failed.", u'shortText': u'message', u'serverTime': u'2014-07-09T10:31:05.055Z', u'module': u'ns_log', u'tstamp': 1404927065055, u'type': u'info'}
[2014-07-09 10:31:11,412] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'Bucket "default" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2014-07-09T10:27:43.285Z', u'module': u'ns_vbucket_mover', u'tstamp': 1404926863285, u'type': u'info'}
[2014-07-09 10:31:11,412] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.234', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.5.2.234\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-07-09T10:27:42.851Z', u'module': u'ns_memcached', u'tstamp': 1404926862851, u'type': u'info'}
[2014-07-09 10:31:11,413] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2014-07-09T10:27:41.973Z', u'module': u'ns_rebalancer', u'tstamp': 1404926861973, u'type': u'info'}
[2014-07-09 10:31:11,413] - [rest_client:2008] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.5.2.230','ns_1@10.5.2.229',\n 'ns_1@10.5.2.234'], EjectNodes = ['ns_1@10.5.2.228'], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2014-07-09T10:27:41.890Z', u'module': u'ns_orchestrator', u'tstamp': 1404926861890, u'type': u'info'}


[Core File]
core : https://s3.amazonaws.com/bugdb/jira/MB-11629/79759a4c/core-10.5.2.228-0.log


[Logs]
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11629/1814cebe/10.5.2.228-diag.txt.gz
10.5.2.228 : https://s3.amazonaws.com/bugdb/jira/MB-11629/2c964a9b/10.5.2.228-792014-1053-diag.zip
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11629/78d3f41e/10.5.2.229-792014-1057-diag.zip
10.5.2.229 : https://s3.amazonaws.com/bugdb/jira/MB-11629/afee0bc6/10.5.2.229-diag.txt.gz
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11629/ac873377/10.5.2.230-diag.txt.gz
10.5.2.230 : https://s3.amazonaws.com/bugdb/jira/MB-11629/f7b9a08e/10.5.2.230-792014-1055-diag.zip
10.5.2.234 : https://s3.amazonaws.com/bugdb/jira/MB-11629/28b59b04/10.5.2.234-diag.txt.gz
10.5.2.234 : https://s3.amazonaws.com/bugdb/jira/MB-11629/e9dba800/10.5.2.234-792014-114-diag.zip


10.5.2.231 : https://s3.amazonaws.com/bugdb/jira/MB-11629/96c9d310/10.5.2.231-792014-112-diag.zip
10.5.2.231 : https://s3.amazonaws.com/bugdb/jira/MB-11629/fc3d9d94/10.5.2.231-diag.txt.gz
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11629/3c75c843/10.5.2.232-792014-1059-diag.zip
10.5.2.232 : https://s3.amazonaws.com/bugdb/jira/MB-11629/94fe0c9a/10.5.2.232-diag.txt.gz
10.5.2.233 : https://s3.amazonaws.com/bugdb/jira/MB-11629/577ed73f/10.5.2.233-diag.txt.gz
10.5.2.233 : https://s3.amazonaws.com/bugdb/jira/MB-11629/de5f568f/10.5.2.233-792014-115-diag.zip
10.3.5.68 : https://s3.amazonaws.com/bugdb/jira/MB-11629/12754a72/10.3.5.68-792014-117-diag.zip
10.3.5.68 : https://s3.amazonaws.com/bugdb/jira/MB-11629/a5609191/10.3.5.68-diag.txt.gz
Comment by Abhinav Dangeti [ 10/Jul/14 ]
http://review.couchbase.org/#/c/39297/, http://review.couchbase.org/#/c/39296/

Re-test once we've a build when these get merged.




[MB-11683] Document the incremental relancing operation in 3.0 Created: 10/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#rebalancing make changes to this section in 3.0 to include the incremental rebalance operation.




[MB-11681] ns_server.stats.log does not scale with nodes or buckets. Created: 10/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Patrick Varley Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 1
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Large number of nodes.


 Description   
ns_server.stats.log contains the statuses of every node in the cluster from ns_doctor:

[ns_doctor:debug,2014-07-10T0:58:24.510,ns_1@cb-01.lan:ns_doctor<0.3026.0>:ns_doctor:handle_info:167]Current node statuses:

As the number of node increases the ns_doctor output gets bigger and as a result we lose the historical data that mortimer uses from stats_collector:

[stats:debug,2014-07-10T5:25:03.967,ns_1@cb-01.lan:<0.5614.0>:stats_collector:log_stats:136](at {{2014,7,10},{5,25,3}} (1404984303962)) Stats for bucket "Default":

This means on large cluster we only get a few hours of data that mortimer can use. We also have this problem when there is a lot of buckets.

It might be worth putting the doctor information into a different file: ns_server.doctor.log and maybe a stats file per a bucket: ns_server.stats-<BUCKET NAME>.log.

It would be good to get input from other people in the field team.

 Comments   
Comment by Aleksey Kondratenko [ 10/Jul/14 ]
yes it does. Just like all of our logging.

And I cannot see how separate log files can help given that we're not supposed to eat unlimited amount of space for logs
Comment by Patrick Varley [ 10/Jul/14 ]
Well it is not unlimited we still have the 200MB limit per a file. In this case we have increased the disk space usage of logging by 200MB and extra 200MB per a bucket. Disk space is cheap and they compress well.

Anyway it was a suggestion, I do not care how you fix it but for large clusters I would like to see at least 5 to 7 days worth of stats and not the 8 hours we have now!
Comment by Aleksey Kondratenko [ 10/Jul/14 ]
Seeing your point now. If we scale our logs requirements by buckets count, it indeed might be reasonable.
Comment by Aleksey Kondratenko [ 10/Jul/14 ]
We'll think about this. But we cannot promise anything in short to mid term. Meanwhile you can always edit log rotation settings and increase log preservation period sufficiently for whatever bucket count you have.




[MB-9494] Support Windows Server 2012 R2 in Production Created: 07/Nov/13  Updated: 10/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Anil Kumar Assignee: Thuan Nguyen
Resolution: Unresolved Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
Triage: Triaged

 Description   
We need to support Windows Server 2012 R2 in production for 3.0.

 Comments   
Comment by Maria McDuff (Inactive) [ 14/Mar/14 ]
fyi, Still waiting for Windows build....
Comment by Wayne Siu [ 02/Jul/14 ]
Windows builds have been available for testing.
Comment by Anil Kumar [ 02/Jul/14 ]
Awesome.
Comment by Wayne Siu [ 02/Jul/14 ]
Let's wait for Tony to confirm before we close the ticket.
Comment by Anil Kumar [ 02/Jul/14 ]
Okay i saw it was resolved so went ahead closed it.
Comment by Thuan Nguyen [ 10/Jul/14 ]
We have sanity job for windows 2012 R2 running in Santa Clara.
I will find more vms to run P0 and P1 jobs




[MB-11602] KV+XDCR System test : Rebalance gets temporarily stuck but eventually proceeds to completion Created: 30/Jun/14  Updated: 10/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS

Attachments: Text File masterEvents.txt    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Build
--------
3.0.0-900(xdcr on upr, internal replication on upr)

Steps
--------
1. Load on both clusters till vb_active_resident_items_ratio < 50.
2. Setup bi-xdcr on "standardbucket", uni-xdcr on "standardbucket1"
3. Access phase with 50% gets, 50%deletes for 3 hours.
4. Rebalance-out one node (.47) at C1.
5. Rebalance-in same node at C1.
 
Problem
-------------
During rebalance -in, right after 41.9% rebalance has not progressed(the rest call indicating progress shows no increase) for little more than 5 mins. As a result test times out as shown. This has never been the case in previous runs of the same test against 2.2.0, 2.5.0 or 2.5.1.

[2014-06-30 14:17:56,782: ERROR/MainProcess] Running Phase: rebalance_in_one_source (Rebalance-in-1)
[2014-06-30 14:18:01,901: ERROR/MainProcess] Started workload workload_37e0ed9
[2014-06-30 14:18:01,930: ERROR/MainProcess] kill task workload_19da4d6
[2014-06-30 14:18:01,931: ERROR/MainProcess] {'update_perc': 22, 'indexed_keys': [], 'del_perc': 3, 'postcondition_handler': None, 'create_perc': 3, 'bucket': 'standardbucket', 'exp_perc': 2, 'miss_queue': None, 'ops_per_sec': 3000, 'consume_queue': None, 'postconditions': None, 'template': 'default', 'ttl': 3000, 'cc_queues': ['std1ph5keys'], 'preconditions': None, 'password': '', 'get_perc': 70, 'miss_perc': 5, 'wait': None}
[2014-06-30 14:18:02,005: ERROR/MainProcess] start task sent to 1 consumers
[2014-06-30 14:18:03,909: ERROR/MainProcess] Started workload workload_a2f6b8a
[2014-06-30 14:18:03,930: ERROR/MainProcess] kill task workload_f42122d
[2014-06-30 14:18:03,931: ERROR/MainProcess] {'update_perc': 22, 'indexed_keys': [], 'del_perc': 3, 'postcondition_handler': None, 'create_perc': 3, 'bucket': 'standardbucket1', 'exp_perc': 2, 'miss_queue': None, 'ops_per_sec': 3000, 'consume_queue': None, 'postconditions': None, 'template': 'default', 'ttl': 3000, 'cc_queues': ['std2ph5keys'], 'preconditions': None, 'password': '', 'get_perc': 70, 'miss_perc': 5, 'wait': None}
[2014-06-30 14:18:03,996: ERROR/MainProcess] start task sent to 1 consumers
[2014-06-30 14:18:05,917: ERROR/MainProcess] Started workload workload_28f856f
[2014-06-30 14:18:05,949: ERROR/MainProcess] kill task workload_640273e
[2014-06-30 14:18:05,950: ERROR/MainProcess] {'update_perc': 22, 'indexed_keys': [], 'del_perc': 3, 'postcondition_handler': None, 'create_perc': 3, 'bucket': 'saslbucket', 'exp_perc': 2, 'miss_queue': None, 'ops_per_sec': 3000, 'consume_queue': None, 'postconditions': None, 'template': 'default', 'ttl': 3000, 'cc_queues': ['saslph5keys'], 'preconditions': None, 'password': 'password', 'get_perc': 70, 'miss_perc': 5, 'wait': None}
[2014-06-30 14:18:06,039: ERROR/MainProcess] start task sent to 1 consumers
[2014-06-30 14:27:50,156: ERROR/MainProcess] apparently rebalance progress code in infinite loop: 41.942552351
[2014-06-30 14:27:52,158: ERROR/MainProcess] Stopping workload workload_37e0ed9
[2014-06-30 14:27:52,184: ERROR/MainProcess] kill task workload_37e0ed9
[2014-06-30 14:27:52,189: ERROR/MainProcess] Stopping workload workload_28f856f
[2014-06-30 14:27:52,226: ERROR/MainProcess] kill task workload_28f856f
[2014-06-30 14:27:54,229: ERROR/MainProcess] Stopping workload workload_a2f6b8a
[2014-06-30 14:27:54,260: ERROR/MainProcess] kill task workload_a2f6b8a
[2014-06-30 14:28:03,270: ERROR/MainProcess]
 
To continue testing, I'm increasing the timeout value to 15 mins. Please check if this has to do with rebalance performance.

Attaching cbcollect info.

 Comments   
Comment by Aruna Piravi [ 30/Jun/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11602//C1.tar - cluster where rebalance happened.
Comment by Aleksey Kondratenko [ 30/Jun/14 ]
I'm not seeing any stuck rebalance at a time logs are captured.
Comment by Aruna Piravi [ 30/Jun/14 ]
Collecting logs takes ~20 mins. Rebalance is stuck only for 5 mins. How do I get you logs for this short interval?
Comment by Aruna Piravi [ 30/Jun/14 ]
Pavel has recommended getting master events at that time. I can do that.
Comment by Aleksey Kondratenko [ 30/Jun/14 ]
You can just get me /diag which should be enough to understand what's going on.

And yes if rebalance gets unstuck and it's just perf thing then master events is a must.
Comment by Aruna Piravi [ 01/Jul/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11602/diag.tar please see last rebalance(add back after failover), hung at 50.1021961296% for over 15 mins.


[2014-06-30 22:46:44,450: ERROR/MainProcess] apparently rebalance progress code in infinite loop: 50.1021961296
[2014-06-30 22:46:46,453: ERROR/MainProcess] Stopping workload workload_8ffbdca
[2014-06-30 22:46:46,481: ERROR/MainProcess] kill task workload_8ffbdca
[2014-06-30 22:46:46,483: ERROR/MainProcess] Stopping workload workload_8738728
[2014-06-30 22:46:46,501: ERROR/MainProcess] kill task workload_8738728
[2014-06-30 22:46:48,502: ERROR/MainProcess] Stopping workload workload_b922333
[2014-06-30 22:46:48,523: ERROR/MainProcess] kill task workload_b922333
[2014-06-30 22:46:57,529: ERROR/MainProcess]

[2014-06-30 22:46:57,530: ERROR/MainProcess] ###### Test Complete! ######
Comment by Aruna Piravi [ 01/Jul/14 ]
Attaching masterEvents.txt
Comment by Aleksey Kondratenko [ 01/Jul/14 ]
I don't know how you're doing it, but please next time send me verbatim copies of text files. Without this utf16 nonse. And no RTFs too. It's not big deal for me but it's something that supposedly easy to avoid for you.
Comment by Aleksey Kondratenko [ 01/Jul/14 ]
Longest gap I have is this:

{"diff":405.06090211868286,"vbucket":421,"type":"takeoverEnded","ts":1404189263.521296,"oldMaster":"172.23.105.48:11209","node":"172.23.105.45:11209","bucket":"standardbucket"}

It means upr takeover gets very slow. I.e. corresponding takeover start event is here:

{"diff":5.698204040527344e-05,"vbucket":421,"type":"takeoverStarted","ts":1404188803.844852,"oldMaster":"172.23.105.48:11209","node":"172.23.105.45:11209","bucket":"standardbucket"}

So whole takeover for vbucket 421 took 459 seconds. That looks like some upr problem given that everything is built with assumption that takeover is well sub second operation. Particularly because we're backfilling future master well in advance.

Comment by Pavel Paulau [ 01/Jul/14 ]
Just a side note: occasionally slow takeover is not uncommon at all. I believe Mike is working on related issue (MB-11474).
Comment by Chiyoung Seo [ 01/Jul/14 ]
Mike,

When you investigate MB-11474, please see if this issue is related or duplicate of MB-11474.
Comment by Mike Wiederhold [ 07/Jul/14 ]
My guess is that this issue is caused by memory overhead, but that shouldn't actually matter in this case. We don't currently do anything to make sure takeover is as quick as possible so I think that's the first thing that I need to resolve before looking into this specific issue in more detail. Note that it's not a big deal that the takeover stream is around for as long as it is. If the other node is overloaded then this is expected, but the state transitions should happen as quickly as possible.
Comment by Aruna Piravi [ 07/Jul/14 ]
If this is of any help - we have never seen takeovers take this much time in previous releases- 2.2.0, 2.5.0 or 2.5.1 for the same test. Five min progress timeouts have always been a part of the framework and this is the first time we're seeing this problem.
Comment by Aruna Piravi [ 07/Jul/14 ]
Btw, this seems fixed along with MB-11474. Closing now. Will reopen if seen again. I tested 918 and 928.
Comment by Aruna Piravi [ 07/Jul/14 ]
I'm sorry, I guess I jumped the gun. I reverted the progress timeout to 5 mins and I still see this issue on 928.


[2014-07-07 16:38:59,729: ERROR/MainProcess] apparently rebalance progress code in infinite loop: 55.3185219006
[2014-07-07 16:39:01,733: ERROR/MainProcess] Stopping workload workload_2bcfae9
[2014-07-07 16:39:02,488: ERROR/MainProcess] kill task workload_2bcfae9
[2014-07-07 16:39:02,490: ERROR/MainProcess] Stopping workload workload_7c0efe0
[2014-07-07 16:39:02,541: ERROR/MainProcess] kill task workload_7c0efe0
[2014-07-07 16:39:04,545: ERROR/MainProcess] Stopping workload workload_28760bd
[2014-07-07 16:39:04,563: ERROR/MainProcess] kill task workload_28760bd
[2014-07-07 16:39:13,569: ERROR/MainProcess] 

[2014-07-07 16:39:13,570: ERROR/MainProcess] ###### Test Complete! ######

Comment by Mike Wiederhold [ 07/Jul/14 ]
Can you attach the logs from the latest run?
Comment by Aruna Piravi [ 08/Jul/14 ]
I can. The cluster is available too. Which would be more helpful to you?
Comment by Mike Wiederhold [ 08/Jul/14 ]
Can you post the ip address of one of the nodes so I can take a look at the cluster?
Comment by Aruna Piravi [ 08/Jul/14 ]
http://172.23.105.44:8091/
Comment by Mike Wiederhold [ 09/Jul/14 ]
Aruna,

I'm not going to be able to spend a lot of time on this issue over the next two days. If you need the cluster back feel free to take it, but I will need to reproduce it again once I have can really sit down and investigate what's going on. Is this issue easy to reproduce?
Comment by Aruna Piravi [ 10/Jul/14 ]
And you have it again on build 945 - http://172.23.105.44:8091/ . Pls use the cluster and let me know when done. Thanks.




[MB-10960] Couchbase does not support Debian platform Created: 24/Apr/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Sriram Melkote Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   
Debian is consistently in the top 3 distributions in server market, by almost every count. For example, below ranks the distribution of the top 10 million web servers:

http://w3techs.com/technologies/overview/operating_system/all

You can look at various other surveys, and you'll see the message is the same. Debian is pretty much at the top for servers. Yet, we don't ship packages for it. This is quite hard to understand because we're already building .deb for Ubuntu, and it takes only a few minor changes to make it compatible with Debian/Stable.

While I don't track customer requests, I've anecdotally seen them requesting the exact same thing in unambiguous terms.

 Comments   
Comment by Sriram Melkote [ 14/May/14 ]
Eric, Ubuntu and Debian are different distributions.
Comment by Jacob Lundberg [ 10/Jul/14 ]
Could somebody please add my user (jacoblundberg) to view CBSE-1140 if that is where this work will be done? This is an important request for CollegeNET and I want to be able to view the status.
Comment by Brent Woodruff [ 10/Jul/14 ]
Hi Jacob. I am not familiar with that ticket (CBSE-1140) firsthand, so perhaps someone else would be better for discussing that issue with you. However, I wanted to let you know that all CBSE tickets are internal to Couchbase. We will not be able to grant you access to view that ticket's contents.

MB tickets such as this one are public, and updates to the status of Couchbase's support of the Debian platform for Couchbase Server will appear here.

I recommend that you communicate with Couchbase Support via email or the web if you have a question about the status of work you are expecting to be completed, which has either a related Couchbase Support ticket or a CBSE ticket.

Support email: support@couchbase.com
Support portal: http://support.couchbase.com




[MB-11299] Upr replica streams cannot send items from partial snapshots Created: 03/Jun/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
If items are sent from a replica vbucket and those items are from a partial snapshot then we might get holes in our data.

 Comments   
Comment by Aleksey Kondratenko [ 02/Jul/14 ]
Raised to blocker. Data loss in xdcr or views is super critical IMO
Comment by Mike Wiederhold [ 10/Jul/14 ]
I agree with Alk on the severity of this issue, but I do want to note that seeing this problem will be rare. I'm planning to work on it soon, but I need to get another issue resolved first before I address this problem.




[MB-11597] KV+XDCR System test: Mutation replication rate for uni-xdcr is almost zero(900K items remaining) while another bi-xdcr to same cluster is ~10k ops/sec Created: 30/Jun/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication, performance
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Aruna Piravi Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.x 8*8 clusters. Each node : 15GB RAM, 450Gb HDD

Attachments: PNG File Screen Shot 2014-06-30 at 11.27.42 AM.png     PNG File Screen Shot 2014-06-30 at 11.31.54 AM.png    
Triage: Untriaged
Is this a Regression?: No

 Description   
Build
--------
3.0.0-900(xdcr on upr, internal replication on upr)

Clusters
-----------
Source : http://172.23.105.44:8091/
Destination : http://172.23.105.54:8091/
There's currently a test running on this cluster. You can take a look if required.

Steps
--------
1. Load on both clusters till vb_active_resident_items_ratio < 50.
2. Setup bi-xdcr on "standardbucket", uni-xdcr on "standardbucket1"
3. Access phase with 50% gets, 50%deletes running now for an hour.


Problem
-------------
See screenshot. Mutation replication rate is uneven for uni and bi-xdcr.
Bucket under discussion: standardbucket1 (has uni-xdcr) has ~900K items remaining. Mutation replication is almost 0.
Another bucket standardbucket has bi-xdcr (to same cluster) but it's mutation replication rate is ~10k ops/sec, there's data moving as can be seen from upr queue stats.

Bucket capacity:5GB for both standard buckets.

Bucket priority
-----------------------
Both standardbucket and standardbucket1 have high priority.

Attaching cbcollect.

 Comments   
Comment by Aruna Piravi [ 30/Jun/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11597//C1.tar
https://s3.amazonaws.com/bugdb/jira/MB-11597/C2.tar

Pls see if these logs help. When I was doing cbcollect the mutation replication rate picked up. If the logs don't help, I can point to the cluster as soon as I see it again. Thanks.
Comment by Aruna Piravi [ 30/Jun/14 ]
Lowering to Critical as mutation replication rate got better after few mins.
Comment by Aleksey Kondratenko [ 30/Jun/14 ]
Not seeing anything notable in logs.
Comment by Aruna Piravi [ 07/Jul/14 ]
Reproduced again, Alk looked at the cluster. Pausing and resuming replication helped bring the slower replication up to speed. We also noticed many unacked bytes in upr stream on a node where outbound mutations was 0.

New set of logs -
https://s3.amazonaws.com/bugdb/jira/MB-11597/C1.tar
https://s3.amazonaws.com/bugdb/jira/MB-11597/C2.tar




[MB-9874] Couchstore drop and reopen of file handle fails on windows Created: 09/Jan/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1, 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Trond Norbye Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: Windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows


 Description   
The unit test doing couchstore_drop_file and couchstore_repoen_file fails due to COUCHSTORE_READ_ERROR when it tries to reopen the file.

The commit http://review.couchbase.org/#/c/31767/ disabled the test to allow the rest of the unit tests to be executed.




[MB-11215] Make it possible to start UPR stream request with snapshot start/end == 0 Created: 27/May/14  Updated: 10/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Volker Mische Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Currently you need to supply the start sequence for the snapshot start and snapshot end when you are not doing a partial request. It is cleaner to supply 0 instead.

The transport spec already mentions that (It was easier to leave it in, rather than removing it an putting it back later).

 Comments   
Comment by Mike Wiederhold [ 30/Jun/14 ]
Given that we are so close to the release and that this change is not necessary to make the protocol correct I am closing this as won't fix. If there are any major objections to this please let me know otherwise we can reconsider making this change in a future release.
Comment by Volker Mische [ 30/Jun/14 ]
I find it way cleaner from the protocol perspective and would really like to see it. I would leave it open and assign a future Fix version (though I think it would be nice to have the protocol clean in 3.0, but I can see the point that we currently have bigger problems).
Comment by Volker Mische [ 10/Jul/14 ]
The spec [1] still describes the behaviour of using "0". Either fix the code or the spec :)

[1] https://github.com/couchbaselabs/cbupr/blob/master/transport-spec.md#stream-request-opcode-0x53
Comment by Chiyoung Seo [ 10/Jul/14 ]
Mike,

Please correct the UPR spec and close this ticket.




[MB-11678] SELECT COUNT(*) FROM BUCKET makes a full scan without WHERE condition Created: 09/Jul/14  Updated: 10/Jul/14  Due: 28/Jul/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Improvement Priority: Major
Reporter: fredericb Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Using N1QL, when executing
SELECT COUNT(*) FROM Bucket
It produces a full scan of the bucket, while there is no Where condition and therefore the result shall be quick.




[MB-11682] return function in "Writing custom reduce functions" is confusing Created: 10/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Minor
Reporter: Volker Mische Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The Admin docs for Couchbase Server in the "Writing custom reduce functions" chapter [1] is a bit confusing. It talks about the "calling the return() function". Normally in JavaScript (and in almost any of the examples in out docs) we use the return expression. This should be fixed to be consistent, else it leads to confusion (we had just one user on IRC).

[1]: http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/index.html#re-writing-the-built-in-reduce-functions




[MB-10496] Investigate other possible memory allocators that provide the better fragmentation management Created: 18/Mar/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0, 3.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Critical
Reporter: Chiyoung Seo Assignee: Dave Rigby
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Duplicate

 Description   
As tcmalloc incurs a significant memory fragmentation for particular load patterns (e.g., append / prepend operations), we need to investigate the other options that have much less fragmentation overhead for those load patterns.

 Comments   
Comment by Matt Ingenthron [ 19/Mar/14 ]
I'm not an expert in this area any more, but I would say that my history with allocators is that there is often a tradeoff between performance aspects and space efficiency. My own personal opinion is that it may be better to not be tied to any one memory allocator, but rather have the right abstractions so we can use one or more.

I can certainly say that the initial tc_malloc integration was perhaps a bit hasty, driven by Sharon. The problem we were trying to solve at the time was actually a glibstdc++ bug on CentOS 5.2. It could have been fixed by upgrading to CentOS 5.3, but for a variety of reasons we were trying to find another workaround or solution. tc_malloc was integrated for that.

It was then that I introduced the lib/ directory and changed the compilation to set the RPATH. The reason I did this is I was trying to avoid our shipping tc_malloc, as at the time Ubuntu didn't include it since there were bugs. That gave me enough pause to think we may not want to be the first people to use tc_malloc in this particular way.

In particular, there's no reason to believe tc_malloc is best for windows. It may also not be best for platforms like mac OS and solaris/smartOS (in case we ever get there).
Comment by Matt Ingenthron [ 19/Mar/14 ]
By the way, those comments are just all history in case it's useful. Please discount or ignore it as appropriate. ;)
Comment by Chiyoung Seo [ 19/Mar/14 ]
Thanks Matt for good comments. As you mentioned, we plan to support more than one memory allocator, so that users can choose the allocator based on their OS and workload patterns. I know that there several open source projects and think we can start with investigating them first, and then need to develop our own allocator if necessary.
Comment by Matt Ingenthron [ 19/Mar/14 ]
No worries. You'll need a benchmark or two to evaluate things. Even then, some people will probably prefer something space efficient versus time efficient, but we won't be able to support everything, etc. If it were me, I'd look to support the OS shipped advanced allocator and maybe one other, as long as they met my test criteria of course.
Comment by Dave Rigby [ 24/Mar/14 ]
Adding some possible candidates:

* jemalloc (https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919) - Note used personally but know some of the guys at FB who use it. Reportedly has good fragmentation properties.
Comment by Chiyoung Seo [ 06/May/14 ]
Trond will build a quick prototype that is based on a slabber on top of the slabber to see if that shows a better fragmentation management. He will share his ideas and initial results later.
Comment by Steve Yen [ 28/May/14 ]
Hi Trond,
Any latest thoughts / news on this?
-- steve
Comment by Aleksey Kondratenko [ 09/Jul/14 ]
There's also some recently opensources work by aerospike folks to track jemalloc allocations apparently similarly to how we're doing it with tcmalloc.
Comment by Dave Rigby [ 10/Jul/14 ]
@Alk: You got a link? I scanned back in aerospike's github fork (https://github.com/aerospike/jemalloc) for the last ~2 years but didn't see anything likely in there...
Comment by Aleksey Kondratenko [ 10/Jul/14 ]
It is sibling project (mentioned on their server's readme): https://github.com/aerospike/asmalloc




[MB-11642] Intra-replication falling far behind under moderate-heavy workload Created: 03/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, DCP
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = RAID 10 HDD

Attachments: PNG File ep_upr_replica_items_remaining.png     PNG File latency_observe.png    
Issue Links:
Relates to
relates to MB-11640 DCP Prioritization Open
Triage: Untriaged
Is this a Regression?: Yes

 Description   
Running the "standard sales" demo that puts a 50/50 workload of about 80k ops/sec across 4 nodes of m1.xlarge, 1 bucket 1 replica.

The "intra-cluster replication" value grows into the many k's.

This is a value that our users look rather closely at to determine the "safety" of their replication status. A reasonable number on 2.x has always been below 1k but I think we need to reproduce and set appropriate baselines for ourselves with 3.0.

Assigning to Pavel as it falls into the performance area and we would likely be best served if this behavior was reproduced and tracked.

 Comments   
Comment by Pavel Paulau [ 03/Jul/14 ]
Well, I underestimated your definition of moderate-heavy.)

I'm seeing similar issue when load is about 20-30K set/sec. I will create a regular test and will provide all required for debugging information.
Comment by Pavel Paulau [ 09/Jul/14 ]
Just wanted to double check, you can drain 10K documents/sec with both 2.5 and 3.0 builds, is that right?

UPDATE: actually 20K/sec because of replica.
Comment by Pavel Paulau [ 10/Jul/14 ]
In addition to replication queue (see attached screenshot) I measured replicateTo=1.

On average it looks better in 3.0 but there are quite frequent lags as well. Seems to be a regression.

Logs for build 3.0.0-943:
http://ci.sc.couchbase.com/job/ares/308/artifact/

My workload:
-- 4 nodes
-- 1 bucket
-- 40M x 1KB docs (non-DGM)
-- 70K mixed ops/sec (50% reads, 50% updates)




[MB-11671] couchbase.log needs alternitives to net-tools - ifconfig netstat output Created: 08/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Ian McCloy Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Major linux distros (Fedora18, RHEL7 and CentOS7) stopped installing legacy net-tools (ifconfig, netstat, route, hostname) by default.

ifconfig shows the rx/tx of each network device, we don't capture this elsewhere, it would be helpful to capture 'cat /proc/net/dev' and 'ip -s link' output

netstat shows all open network connections, as it is no longer shipped by default an alternative we could capture is the output from the 'ss' command.

 Comments   
Comment by Aleksey Kondratenko [ 08/Jul/14 ]
Thanks for raising it. I was not aware. BTW best way to address stuff like that is to contribute a patch.
Comment by Ian McCloy [ 09/Jul/14 ]
Fair point, I'll push a patch to gerrit.
Comment by Ian McCloy [ 10/Jul/14 ]
Code added to http://review.couchbase.org/#/c/39267/ Thanks !




[MB-11675] 40-50% performance degradation on append-heavy workload compared to 2.5.1 Created: 09/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Dave Rigby Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: OS X Mavericks 10.9.3
CB server 3.0.0-918 (http://packages.northscale.com/latestbuilds/3.0.0/couchbase-server-enterprise_x86_64_3.0.0-918-rel.zip)
Haswell MacBook Pro (16GB RAM)

Attachments: PNG File CB 2.5.1 revAB_sim.png     PNG File CB 3.0.0-918 revAB_sim.png     Zip Archive MB-11675.trace.zip     Zip Archive perf_report_result.zip     Zip Archive revAB_sim_v2.zip     Zip Archive revAB_sim.zip    

 Description   
When running an append-heavy workload (modelling a social network address book, see below) the performance of CB has dropped from ~100K ops down to 50K ops compared to 2.5.1-1083 on OS X.

Edit: I see a similar (but slightly smaller - around 40% degradation on Linux (Ubuntu 14.04) - see comment below for details.

== Workload ==

revAB_sim - generates a model social network, then builds a representation of this in Couchbase. Keys are a set of phone numbers, values are lists of phone books which contain that phone number. (See attachment).

Configured for 8 client threads, 100,000 people (documents).

To run:

* pip install networkx
* Check revAB_sim.py for correct host, port, etc
* time ./revAB_sim.py

== Cluster ==

1 node, default bucket set to 1024MB quota.

== Runtimes for workload to complete ==


## CB-2.5.1-1083:

~107K op/s. Timings for workload (3 samples):

real 2m28.536s
real 2m28.820s
real 2m31.586s


## CB-3.0.0-918

~54K op/s. Timings for workload:

real 5m23.728s
real 5m22.129s
real 5m24.947s


 Comments   
Comment by Pavel Paulau [ 09/Jul/14 ]
I'm just curious, what does consume all CPU resources?
Comment by Dave Rigby [ 09/Jul/14 ]
I haven't had chance to profile it yet; certainly in both instances (fast / slow) the CPU is at 100% between the client workload and server.
Comment by Pavel Paulau [ 09/Jul/14 ]
Is memcached top consumer? or beam.smp? or client?
Comment by Dave Rigby [ 09/Jul/14 ]
memcached highest (as expected). From the 3.0.0 package (which I still have installed):

PID COMMAND %CPU TIME #TH #WQ #PORT #MREG MEM RPRVT PURG CMPRS VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW
34046 memcached 476.9 01:34.84 17/7 0 36 419 278M+ 277M+ 0B 0B 348M 2742M 34046 33801 running 501 73397+ 160 67 26 13304643+ 879+ 4070244+
34326 Python 93.4 00:18.57 9/1 0 25 418 293M+ 293M+ 0B 0B 386M 2755M 34326 1366 running 501 77745+ 399 70 28 15441263+ 629 5754198+
0 kernel_task 71.8 00:14.29 95/9 0 2 949 1174M+ 30M 0B 0B 295M 15G 0 0 running 0 42409 0 57335763+ 52435352+ 0 0 278127194+
...
32800 beam.smp 8.5 00:05.61 30/4 0 49 330 155M- 152M- 0B 0B 345M- 2748M- 32800 32793 running 501 255057+ 468 149 30 6824071+ 1862753+ 1623911+


Python is the workload generator.

I shall try to collect an Instruments profile of 3.0 and 2.5.1 to compare...
Comment by Dave Rigby [ 09/Jul/14 ]
Instruments profile of two runs:

Run 1: 3.0.0 (slow)
Run 2: 2.5.1 (fast)

I can look into the differences tomorrow if no-one else gets there first.


Comment by Dave Rigby [ 10/Jul/14 ]
Running on Linux (Ubuntu 14.04), 24 core Xeon, I see a similar effect, but the magnitude is not as bad - 40% performance drop.

100,000 documents with 4 worker threads, same bucket size (1024MB). (Note: worker threads was dropped to 4 as I couldn't get Python SDK to reliably connect with 8 threads at the same time).

## CB-3.0.0 (source build):

    83k op/s
    real 3m26.785s

## CB-2.5.1 (source build):

    133K op/s
    real 2m4.276s


Edit: Attached updated zip file as: revAB_sim_v2.zip
Comment by Dave Rigby [ 10/Jul/14 ]
Attaching the output of `perf report` for both 2.5.1 and 3.0.0 - perf_report_result.zip

There's nothing obvious jumping out at me, looks like quite a bit has changed between the two in ep_engine.




[MB-11052] Moxi 2.5 Install Failing on Debian 7 Created: 05/May/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 2.5.0
Fix Version/s: bug-backlog, 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Jeff Dillon Assignee: Steve Yen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Triage: Untriaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Unknown

 Description   
A test was performed to upgrade Debian 7 servers from Moxi 1.8.1 to 2.5. This failed because the Moxi 2.5 packages for OpenSSL 1.0 are linked against glibc 2.14 and 2.15 but Debian 7 uses glibc 2.13. This could be corrected without any issue for Ubuntu compatibility by adding 2.13 to the list of supported linkage targets.

Here is the output of ldd:

root@devel:~# ldd /opt/moxi/bin/../bin/moxi.actual
/opt/moxi/bin/../bin/moxi.actual: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.15' not found (required by /opt/moxi/bin/../bin/moxi.actual)
/opt/moxi/bin/../bin/moxi.actual: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by /opt/moxi/bin/../bin/moxi.actual)
linux-vdso.so.1 => (0x00007fff803ff000)
libconflate.so.0 => not found
libvbucket.so.1 => not found
libmemcached.so.6 => not found
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fece58e2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fece5660000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fece5443000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fece50b8000)
/lib64/ld-linux-x86-64.so.2 (0x00007fece5af9000)

 Comments   
Comment by Aleksey Kondratenko [ 10/Jun/14 ]
Assuming it's ubuntu 12 package that doesn't work. If so then ubuntu 10 package should work. You might have to get openssl 0.9.8 for that however. Other option is to upgrade glibc to version from testing.

It doesn't look like couchbase inc will ever bother to specifically support debian 7.
Comment by Jacob Lundberg [ 10/Jun/14 ]
> It doesn't look like couchbase inc will ever bother to specifically support debian 7.

I can't tell if you are a Couchbase employee or not; is this an official statement from Couchbase?

I think you will find few competent IT operations will be willing to maintain OpenSSL 0.9.8 in their production environment going forward. 1.0 is bad enough; just look at all the vulnerabilities lately.
Comment by Aleksey Kondratenko [ 10/Jun/14 ]
I am Couchbase Inc employee. But my statement is not official statement of Couchbase.

I don't disagree with your opinion on badness of old openssl. And BTW there should be no reason for moxi package to depend on openssl at all. And your ldd output confirms it.
Comment by Jacob Lundberg [ 10/Jun/14 ]
> And BTW there should be no reason for moxi package to depend on openssl at all.

Yes now I see it does not (the name is quite misleading). It might make more sense to name the 10.04 one "good for all" and the 12.04 one "pointless unnecessary glibc dependencies". We will use the 10.04 package on our Debian 7 systems.

A few comments about not supporting Debian -- it is the basis for Ubuntu so supporting one when you support the other is not hard. A lot of shops use Debian, although in many cases they may not complain here because they just rebuild the package themselves. We are a Couchbase server customer running on Debian. We need Moxi to provide connection pooling for LAMP applications because connection pooling in mod_perl is prohibitively difficult. It will be appreciated if Couchbase will consider Debian installations when creating future packages of Moxi so we and others do not have to go to the effort of building and packaging our own versions.
Comment by Sriram Melkote [ 10/Jul/14 ]
Jacob - I've been asking Debian packages:
http://www.couchbase.com/issues/browse/MB-10960
I'm going to escalate this.




[MB-11339] [Windows] {UPR}: View fragmentation is not increasing from 1, causing test to hang Created: 06/Jun/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-771
Windows-2012 (64 bit)

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
[Jenkins]
http://qa.sc.couchbase.com/job/win_2012_x64--105_01--uniXDCR_biXDCR-P0/6/consoleFull

[Test]
./testrunner -i win_2012_x64--105_01--uniXDCR_biXDCR-P0.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False -t xdcr.biXDCR.bidirectional.replication_with_ddoc_compaction,items=20000,rdirection=bidirection,GROUP=P0


Test was running from last 24-26 hours. I have aborted the tests. Live cluster is available for diagnose.


[global]
username:Administrator
password:Membase123
port:8091

[cluster1]
1:_1
2:_2

[cluster2]
1:_3
2:_4

[servers]
1:_1
2:_2
3:_3
4:_4

[_1]
ip:172.23.107.28

[_2]
ip:172.23.107.29

[_3]
ip:172.23.107.30

[_4]
ip:172.23.107.31


[membase]
rest_username:Administrator
rest_password:password



[Test logs]

[2014-06-05 22:33:46,787] - [task:2009] INFO - dev_ddoc1: current amount of fragmentation = 1
[2014-06-05 22:33:46,800] - [rest_client:481] INFO - index query url: http://172.23.107.30:8092/default/_design/dev_ddoc1/_view/default3?full_set=true&stale=false

[2014-06-05 22:33:47,878] - [rest_client:481] INFO - index query url: http://172.23.107.28:8092/default/_design/dev_ddoc1/_view/default4?full_set=true&stale=false

[2014-06-05 22:33:50,062] - [task:2009] INFO - dev_ddoc1: current amount of fragmentation = 1
[2014-06-05 22:33:50,073] - [rest_client:481] INFO - index query url: http://172.23.107.30:8092/default/_design/dev_ddoc1/_view/default4?full_set=true&stale=false
[2014-06-05 22:33:51,315] - [data_helper:289] INFO - creating direct client 172.23.107.28:11210 default

[2014-06-05 22:33:51,448] - [data_helper:289] INFO - creating direct client 172.23.107.29:11210 default

[2014-06-05 22:33:53,294] - [task:2009] INFO - dev_ddoc1: current amount of fragmentation = 1
[2014-06-05 22:33:53,295] - [task:745] INFO - Batch update documents done #: 0 with exp:0

[2014-06-05 22:33:54,684] - [task:745] INFO - Batch update documents done #: 6000 with exp:0

[2014-06-05 22:33:56,575] - [task:2009] INFO - dev_ddoc1: current amount of fragmentation = 1
[2014-06-05 22:33:56,584] - [rest_client:481] INFO - index query url: http://172.23.107.28:8092/default/_design/dev_ddoc1/_view/default0?full_set=true&stale=false

[2014-06-05 22:33:57,344] - [rest_client:481] INFO - index query url: http://172.23.107.30:8092/default/_design/dev_ddoc1/_view/default0?full_set=true&stale=false

[2014-06-05 22:34:00,236] - [task:2009] INFO - dev_ddoc1: current amount of fragmentation = 1
[2014-06-05 22:34:00,249] - [rest_client:481] INFO - index query url: http://172.23.107.28:8092/default/_design/dev_ddoc1/_view/default1?full_set=true&stale=false

[2014-06-05 22:34:01,872] - [rest_client:481] INFO - index query url: http://172.23.107.30:8092/default/_design/dev_ddoc1/_view/default1?full_set=true&stale=false

[2014-06-05 22:34:03,671] - [task:2009] INFO - dev_ddoc1: current amount of fragmentation = 1
[2014-06-05 22:34:03,681] - [rest_client:481] INFO - index query url: http://172.23.107.28:8092/default/_design/dev_ddoc1/_view/default2?full_set=true&stale=false

[2014-06-05 22:34:05,410] - [rest_client:481] INFO - index query url: http://172.23.107.30:8092/default/_design/dev_ddoc1/_view/default2?full_set=true&stale=false


[Test Steps]
1. Create 2-2 node Source and Destination cluster.
2. Setup CAPI mode bidirectional xdcr source <---> destination for default bucket.
3. Load 20K items on each cluster.
4. Create 5 views:- default0, default1, default2, default3, defaul4 and development design doc: dev_ddoc1.
5. Disable compaction.
6. Keep on update 30% items on Source (i.e. 6000) and query on each view (on source and destination both) until View fragmentation on Source node is achieved to 80. -> Failed here. fragmentation is 1 for last 20-24 hours.

Approximately 5232 updates were performed on 6000 items in last 20-24 hours.


Source Nodes: 172.23.107.28 and 172.23.107.29
Destination Nodes: 172.23.107.30, 172.23.107.31



Cluster is live for investigation.

 Comments   
Comment by Nimish Gupta [ 09/Jun/14 ]
It looks to me config issue.

[ns_server:info,2014-06-09T5:07:18.273,ns_1@172.23.107.28:<0.21855.1458>:compaction_daemon:spawn_bucket_compactor:600]Compacting bucket default with config:
[{view_fragmentation_threshold,{undefined,undefined}},
 {database_fragmentation_threshold,{undefined,undefined}},
 {parallel_db_and_view_compaction,false}]

The compaction values are undefined, so compaction ratio is not triggering. It seems that ns_server is not able to read the config for the autocompaction value. So may be ns_sever config path on windows is not correct.
Comment by Sriram Melkote [ 09/Jun/14 ]
Alk, can you please help? Looking at ns_config_default, it would appears like view_fragmentation_threshold should not take on undefined as a value?
Comment by Aleksey Kondratenko [ 09/Jun/14 ]
Will look. But notably ticket subject mentions that stat is not increasing. That might indicate either: a) better index updating so that we're not fragmenting it at all b) something is stuck in view update code c) broken stats somehow
Comment by Aleksey Kondratenko [ 09/Jun/14 ]
undefined is not error there. Somebody explicitly disabled all compaction for bucket default. And undefined is used to mark that.

It doesn't explain why view fragmentation doesn't grow.
Comment by Sangharsh Agarwal [ 10/Jun/14 ]
Sri, compaction were disabled in the test for default bucket with below config:

        new_config = {"viewFragmntThresholdPercentage" : None,
                      "dbFragmentThresholdPercentage" : None,
                      "dbFragmentThreshold" : None,
                      "viewFragmntThreshold" : None}
Comment by Nimish Gupta [ 10/Jun/14 ]
From the btree stats (http://172.23.107.28:8092/_set_view/default/_design/dev_ddoc1/_btree_stats), it looks that fragmentation ratio is correct (1). Sangharsh, are you updating the items with the same value ? If this is the case, value will not be inserted into the view btree, and there will not be fragmentation.
Comment by Sangharsh Agarwal [ 10/Jun/14 ]
No, test is updating with different value each time. Test passes in 2.5.1.
Comment by Nimish Gupta [ 10/Jun/14 ]
Sangharsh, does this test pass on linux ?
Comment by Sangharsh Agarwal [ 10/Jun/14 ]
Yes, it is passing on Linux with 3.0.0.
Comment by Filipe Manana [ 10/Jun/14 ]
> Sangharsh, are you updating the items with the same value ? If this is the case, value will not be inserted into the view btree, and there will not be fragmentation.

That is not true in 3.0. In 2.5 and below, where the erlang couch_btree modules is used to update btrees, that is true, but not in 3.0 where we use the couchstore's btree module for that.

Further, regardless of which btree module we're using, after each incremental update an header is written to the file, which certainly increases the file's size and fragmentation.
Comment by Nimish Gupta [ 10/Jun/14 ]
Sangharsh, Could you please run this test on linux and attach the output of btree stats here ?
Comment by Sriram Melkote [ 13/Jun/14 ]
Sangharsh, can you please ping Nimish and confirm we can restart this test?
Comment by Sangharsh Agarwal [ 20/Jun/14 ]
Sri, Currently I running tests on Linux VMs, I will resume testing on Windows on Monday. Will ping Nimish then.
Comment by Sriram Melkote [ 07/Jul/14 ]
I think we have enough information on this thread to reproduce the bug. Please don't reassign until we discover something that changes the status quo and need more input




[MB-11668] Document the limits on document size for indexing Created: 08/Jul/14  Updated: 10/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation, view-engine
Affects Version/s: 2.5.0, 2.5.1, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Dave Rigby Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
As introduced by MB-9467, there are two limits on the size of documents during indexing:

1) indexer_max_doc_size - documents larger then this value are skipped by the
indexer. A message is logged (with document ID, its size, bucket name, view name, etc)
when such a document is encountered. A value of 0 means no limit (like what it used to
be before). Current default value is 1048576 bytes (1Mb). This is already a very large
value, such large documents take a long time to process, slowing down rebalance, etc.

2) max_kv_size_per_doc - maximum total size (bytes) of KV pairs that can be emitted for
a single document for a single view. When such limit is passed, message is logged (with
document ID, its size, bucket name, view name, etc). A value of 0 means no limit (like what
it used to be before). Current default value is 1048576 bytes (1Mb), which is already a
too large value, that makes everything far from efficient.

There is no mention of these anywhere in the documentation at present, and so they can be confusing to users who find that certain (large) documents are inexplicably not indexed.

I note there is an outstanding 3.0 bug (MB-9713) to add REST endpoints for these - currently you have to use a magic diag/eval to change them - but we should at least mention their existance and default values even if the REST API isn't ready yet.

- -
We probably should also document this:
3) function_timeout - maximum time mapreduce functions can take for any one document.
If it's taking longer than this, the function invocation is aborted. The default limit is 10 seconds.
Setting it to 0 will disable it (not recommended).

 Comments   
Comment by Dave Rigby [ 08/Jul/14 ]
See for example http://stackoverflow.com/questions/24609638/couchbase-view-index-update-hanging-after-adding-file-over-1-4mb where someone is hitting exactly this.




[MB-11680] Graceful failover to continue not allowed after we stop it Created: 09/Jul/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
Any OS, version 918 and above

1. Create 3 node cluster
2. Add 1 default bucket with replica=1
3. Add 500K items
4. Wait till all queues drain and replication is complete
5. Graceful failover one node
6. When Step 5 is executing stop graceful failover using UI button
7. Start Graceful failover again

Step 7 never happens since we are allowed only rebalance after we stop graceful failover once. By this I mean graceful over to continue is not possible- just as rebalance. We will have to pick the node again to graceful failover as in Step 5.

https://s3.amazonaws.com/bugdb/jira/MB-11680/log_failover_issue.tar.gz, logs are for version 943




[MB-10932] Disable force-respawn of memcached upon certain configuration changes Created: 22/Apr/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: feature-backlog, 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Given the addition of the GIO and bucket priority in 3.0, I would like to ask that we disable ns_server's forced respawn of the memcached process when the following items are changed (and therefore allow them to be changed if they cannot currently):
-GIO thread count
-Bucket IO priority
-memcached connections
-memcached worker threads
-ejection policy

These are very frequently tuned items by the field, support and customers and it would be extremely helpful to allow the new values to be "set" but not yet applied until either a memcached restart occurs or a new node is balanced in

 Comments   
Comment by Cihan Biyikoglu [ 29/May/14 ]
Abhinav, can this be done easily? or does it already work this way?
thanks
Comment by Abhinav Dangeti [ 29/May/14 ]
No, memcached does respawn when the mentioned parameters are changed.

Once 3.0 stabilizes, we can change the setting for the first 4 parameters.
We, however do not plan to change the setting for ejection policy.
Comment by Perry Krug [ 29/May/14 ]
As per hallway discussion with Alk, this will be accomplished by an unsupported flag setting through /diag/eval that will disable restart of memcached when bucket or memcached config changes are made. Those changes can then be made (ignoring warnings of restart), rolling rebalances or rolling restarts performed, and the flag unset to return to normal behavior.

This will be only used by the support team for making unsupported changes to a cluster.
Comment by Perry Krug [ 09/Jul/14 ]
Alk, I can't remember if you had a chance to get this in yet, could you post the unsupported path for setting/unsetting this if you did? Thanks!




[MB-11674] REST Api Fails to Update replicaIndex Created: 08/Jul/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: RESTful-APIs
Affects Version/s: 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Jeff Dillon Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Attempting to set replicaIndex via the REST API has no effect, which should work according to the documentation:

http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#modifying-bucket-parameters

To reproduce:

* Make the following curl call:

curl -v -X POST -u Administrator:password -d replicaIndex=1 http://host:8091/pools/default/buckets/beer-sample

200/OK

* Note the setting does not take effect by making the following call:

curl http://host:8091/pools/default/buckets/beer-sample

{"name":"beer-sample","bucketType":"membase","authType":"sasl","saslPassword":"","proxyPort":0,"replicaIndex":false,"uri":"/pools/default/buckets/beer-sample...

ASK: Determine why replicaIndex is not able to be set

 Comments   
Comment by Jeff Dillon [ 09/Jul/14 ]
Also, might there be a work around? The "index replicas" checkbox looks to be disabled in the Admin UI
Comment by Aleksey Kondratenko [ 09/Jul/14 ]
We don't allow index replicas change at runtime because we currently cannot track and apply this change at runtime for existing indexes.

Some /diag/eval magic + re-creation of indexes is likely possible.

But please note that currently replica indexes feature is somewhat a joke. Specifically because we don't control it's up-to-dateness during rebalance in same way as we do for main indexes so it can very easily be arbitrarily stale and unhelpful.




[MB-11679] Docs: Swap rebalance doesn't require an even number of nodes Created: 09/Jul/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0, 2.5.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This link: http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/index.html#swap-rebalance (and the corresponding ones in previous versions)

States that a swap rebalance requires an "even number of nodes" which isn't correct. Perhaps it's supposed to be an "equal" number of nodes, but the section then seems a bit redundant so maybe a little reworking is in order.




[MB-10921] Possibly file descriptor leak? Created: 22/Apr/14  Updated: 09/Jul/14

Status: In Progress
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Trond Norbye Assignee: Ketaki Gangal
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I ran df and du on that server and noticed similar figures (full console log at the end of this email). du reports 68GB on /var/opt/couchbase, whereas df reports ~140GB of disk usage. The lsof command shows that there are several files which have been deleted but are still opened by beam.smp. Those files are in /var/opt/couchbase/.delete/, and their total size amounts to the “Other Data” (roughly 70GB).
 
I’ve never noticed that before, yet recently we started playing with CB views. I wonder if that can be related. Also note that at the time I did the investigation, there had been no activity on the cluster for several hours: no get/set on the buckets, and no compaction or indexing was ongoing.
Are you aware of this problem? What can we do about it?

beam.smp 55872 couchbase 19u REG 8,17 16136013537 4849671 /var/opt/couchbase/.delete/babe701b000ce862e58ca2edd1b8098b (deleted)
beam.smp 55872 couchbase 34u REG 8,17 1029765807 4849668 /var/opt/couchbase/.delete/5c6df85a423263523471f6e20d82ce07 (deleted)
beam.smp 55872 couchbase 51u REG 8,17 1063802330 4849728 /var/opt/couchbase/.delete/c2a11ea6f3e70f8d222ceae9ed482b13 (deleted)
beam.smp 55872 couchbase 55u REG 8,17 403075242 4849667 /var/opt/couchbase/.delete/6af0b53325bf4f2cd1df34b476ee4bb6 (deleted)
beam.smp 55872 couchbase 56r REG 8,17 403075242 4849667 /var/opt/couchbase/.delete/6af0b53325bf4f2cd1df34b476ee4bb6 (deleted)
beam.smp 55872 couchbase 57u REG 8,17 861075170 4849666 /var/opt/couchbase/.delete/72a08b8a613198cd3a340ae15690b7f1 (deleted)
beam.smp 55872 couchbase 58r REG 8,17 861075170 4849666 /var/opt/couchbase/.delete/72a08b8a613198cd3a340ae15690b7f1 (deleted)
beam.smp 55872 couchbase 59r REG 8,17 1029765807 4849668 /var/opt/couchbase/.delete/5c6df85a423263523471f6e20d82ce07 (deleted)
beam.smp 55872 couchbase 60r REG 8,17 896931996 4849672 /var/opt/couchbase/.delete/3b1b7aae4af60e9e720ad0f0d3c0182c (deleted)
beam.smp 55872 couchbase 63r REG 8,17 976476432 4849766 /var/opt/couchbase/.delete/6f5736b1ed9ba232084ee7f0aa5bd011 (deleted)
beam.smp 55872 couchbase 66u REG 8,17 18656904860 4849675 /var/opt/couchbase/.delete/fcaf4193727374b471c990a017a20800 (deleted)
beam.smp 55872 couchbase 67u REG 8,17 662227221 4849726 /var/opt/couchbase/.delete/4e7bbc192f20def5d99447b431591076 (deleted)
beam.smp 55872 couchbase 70u REG 8,17 896931996 4849672 /var/opt/couchbase/.delete/3b1b7aae4af60e9e720ad0f0d3c0182c (deleted)
beam.smp 55872 couchbase 74r REG 8,17 662227221 4849726 /var/opt/couchbase/.delete/4e7bbc192f20def5d99447b431591076 (deleted)
beam.smp 55872 couchbase 75u REG 8,17 1896522981 4849670 /var/opt/couchbase/.delete/3ce0c5999854691fe8e3dacc39fa20dd (deleted)
beam.smp 55872 couchbase 81u REG 8,17 976476432 4849766 /var/opt/couchbase/.delete/6f5736b1ed9ba232084ee7f0aa5bd011 (deleted)
beam.smp 55872 couchbase 82r REG 8,17 1063802330 4849728 /var/opt/couchbase/.delete/c2a11ea6f3e70f8d222ceae9ed482b13 (deleted)
beam.smp 55872 couchbase 83u REG 8,17 1263063280 4849673 /var/opt/couchbase/.delete/e06facd62f73b20505d2fdeab5f66faa (deleted)
beam.smp 55872 couchbase 85u REG 8,17 1000218613 4849767 /var/opt/couchbase/.delete/0c4fb6d5cd7d65a4bae915a4626ccc2b (deleted)
beam.smp 55872 couchbase 87r REG 8,17 1000218613 4849767 /var/opt/couchbase/.delete/0c4fb6d5cd7d65a4bae915a4626ccc2b (deleted)
beam.smp 55872 couchbase 90u REG 8,17 830450260 4849841 /var/opt/couchbase/.delete/7ac46b314e4e30f81cdf0cd664bb174a (deleted)
beam.smp 55872 couchbase 95r REG 8,17 1263063280 4849673 /var/opt/couchbase/.delete/e06facd62f73b20505d2fdeab5f66faa (deleted)
beam.smp 55872 couchbase 96r REG 8,17 1896522981 4849670 /var/opt/couchbase/.delete/3ce0c5999854691fe8e3dacc39fa20dd (deleted)
beam.smp 55872 couchbase 97u REG 8,17 1400132620 4849719 /var/opt/couchbase/.delete/e8eaade7b2ee5ba7a3115f712eba623e (deleted)
beam.smp 55872 couchbase 103r REG 8,17 16136013537 4849671 /var/opt/couchbase/.delete/babe701b000ce862e58ca2edd1b8098b (deleted)
beam.smp 55872 couchbase 104u REG 8,17 1254021993 4849695 /var/opt/couchbase/.delete/f77992cdae28194411b825fa52c560cd (deleted)
beam.smp 55872 couchbase 105r REG 8,17 1254021993 4849695 /var/opt/couchbase/.delete/f77992cdae28194411b825fa52c560cd (deleted)
beam.smp 55872 couchbase 106r REG 8,17 1400132620 4849719 /var/opt/couchbase/.delete/e8eaade7b2ee5ba7a3115f712eba623e (deleted)
beam.smp 55872 couchbase 108u REG 8,17 1371453421 4849793 /var/opt/couchbase/.delete/9b8b199920075102e52742c49233c57c (deleted)
beam.smp 55872 couchbase 109r REG 8,17 1371453421 4849793 /var/opt/couchbase/.delete/9b8b199920075102e52742c49233c57c (deleted)
beam.smp 55872 couchbase 111r REG 8,17 18656904860 4849675 /var/opt/couchbase/.delete/fcaf4193727374b471c990a017a20800 (deleted)
beam.smp 55872 couchbase 115u REG 8,17 16442158432 4849708 /var/opt/couchbase/.delete/2b70b084bd9d0a1790de9b3ee6c78f69 (deleted)
beam.smp 55872 couchbase 116r REG 8,17 16442158432 4849708 /var/opt/couchbase/.delete/2b70b084bd9d0a1790de9b3ee6c78f69 (deleted)
beam.smp 55872 couchbase 151r REG 8,17 830450260 4849841 /var/opt/couchbase/.delete/7ac46b314e4e30f81cdf0cd664bb174a (deleted)
beam.smp 55872 couchbase 181u REG 8,17 770014022 4849751 /var/opt/couchbase/.delete/d35ac74521ae4c1d455c60240e1c41e1 (deleted)
beam.smp 55872 couchbase 182r REG 8,17 770014022 4849751 /var/opt/couchbase/.delete/d35ac74521ae4c1d455c60240e1c41e1 (deleted)
beam.smp 55872 couchbase 184u REG 8,17 775017865 4849786 /var/opt/couchbase/.delete/2a85b841a373ee149290b0ec906aae55 (deleted)
beam.smp 55872 couchbase 185r REG 8,17 775017865 4849786 /var/opt/couchbase/.delete/2a85b841a373ee149290b0ec906aae55 (deleted)

 Comments   
Comment by Volker Mische [ 22/Apr/14 ]
Filipe, could you have a look at this?

I also found in the bug tracker an issue that we needed to patch Erlang because of file descriptor leaks (CBD-753 [1]). Could it be related?

[1]: http://www.couchbase.com/issues/browse/CBD-753
Comment by Trond Norbye [ 22/Apr/14 ]
From the comments in that bug that seems to be blocker for 2.1 testing and this is 2.2...
Comment by Volker Mische [ 22/Apr/14 ]
Trond, though Erlang was patched, so it's independent of the Couchbase version, but depends on Erlang. Though I guess you use Erlang >= R16 anyway (which should have that patch).

Could you also try it with 2.5? Perhaps it has been fixed already.
Comment by Filipe Manana [ 22/Apr/14 ]
There's no useful information here to work on or conclude anything.

First of all, it may be database files. Both database and view files are renamed to uuids and moved to .delete directory. And before 3.0, database compaction is orchestrated in erlang land (rename + delete).

Second, we had in the past such leaks, one caused by Erlang itself (whence a patched R14/R15 is needed, or R16 unpatched) and others caused by CouchDB upstream code, which got fixed before 2.0 (and in Apache CouchDB 1.2) - geocouch is based on a copy of CouchDB's view engine that is much older than Apache CouchDB 1.x whence suffers the same leaks issue (not closing files after compactions amongst other cases).

Given there's no concrete steps to reproduce this, nor has anyone observed this recently, I can't exclude the possibility of him using the geo/spatial views or running an unpatched Erlang.
Comment by Trond Norbye [ 22/Apr/14 ]
Volker: We ship a bundled erlang in our releases don't we?

Filipe: I forwarded the email to you, Volker and Alk April the 8th with all the information I had. We can ask back for more information if that helps us pinpoint where it is..
Comment by Volker Mische [ 22/Apr/14 ]
Trond: I wasn't aware that the "I" isn't you :)

I would ask to try it again on 2.5.1 and if it's still there the steps to reproduce it.
Comment by Trond Norbye [ 22/Apr/14 ]
The customer have the system running. Are we sure there is no commands to run on the erlang thing to gather more information on the current state?
Comment by Sriram Melkote [ 06/Jun/14 ]
Ketaki, can we please make sure in 3.0 tests:

(a) Number of open file descriptors does not keep growing
(b) The files in .delete directory get cleaned up eventually
(c) Disk space does not keep growing

If both are true in long running tests, we can close this for 3.0
Comment by Sriram Melkote [ 16/Jun/14 ]
I'm going to close this as we've not seen evidence of fd leaks in R16 so far. If system tests encounter fd leak, please reopen.
Comment by Sriram Admin [ 01/Jul/14 ]
Reopen as we're seeing it in another place, CBSE-1247 making it likely this is a product (and not environment) issue
Comment by Sriram Melkote [ 03/Jul/14 ]
Nimish, can we:

(a) Give a specific pattern (i.e., view-{uuid} or something) so we can distinguish KV files from View files after moving to .delete
(b) Can we add a log message to count number of entries and size of the .delete directory


This will help us see if we're accumulating .delete files during our system testing.
Comment by Nimish Gupta [ 09/Jul/14 ]
Hi, It can due be due views may not be closing the fds before deleting the files. I have added a debug message to log the filename when we call delete (http://review.couchbase.org/#/c/39233/). Ketaki, could you please try to reproduce the issue with the latest build.
Comment by Sriram Melkote [ 09/Jul/14 ]
Ketaki, can you please attach logs from a system test run with rebalance etc, with the above change merged? This will help us understand how to fix the problem better.
Comment by Ketaki Gangal [ 09/Jul/14 ]
Yes, will do. The above changes are a part of build 3.0.0-943-rel. With next run of system tests, I will update this bug.
Comment by Nimish Gupta [ 09/Jul/14 ]
Ketaki, Please attach the output of "ls -l" also in /var/opt/couchbase/.delete directory after running the test.




[MB-11677] Indexes on array elements Created: 09/Jul/14  Updated: 09/Jul/14  Due: 28/Jul/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-alpha
Security Level: Public

Type: Improvement Priority: Major
Reporter: Gerald Sangudi Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Investigate the ability to index array elements, and to use those indexes collection-based queries.

See http://www.couchbase.com/communities/q-and-a/how-write-where-condition-arrays-using-index





[MB-11548] Memcached does not handle going back in time. Created: 25/Jun/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Patrick Varley Assignee: Jim Walker
Resolution: Unresolved Votes: 0
Labels: customer, memcached
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: No

 Description   
When you change the time of server to be a time in the pass when the memcached process is running it will start expiring all documents with TTL.

To recreate set date to a time in the past for example 2 hours ago from now.

sudo date --set="15:56:56"

You will see that time and uptime from cbstats will change to a large amount:

time: 5698679116
uptime: 4294946592

Looking at the code we can see how this happens:
http://src.couchbase.org/source/xref/2.5.1/memcached/daemon/memcached.c#6462

When you change the time to a value in the past "process_started" will be greater than "timer.tv_sec" and current_time is unsigned which means it will wrap around.

What I do not understand from the code is why current_time is the number of seconds since memcached started and not just the epoch time? (There is a comment about avoiding 64bit) .

http://src.couchbase.org/source/xref/2.5.1/memcached/daemon/memcached.c#117

Any case we should check if "process_started" is bigger than "timer.tv_sec" do something smart.

I will let you decide what the smart thing is :)

 Comments   
Comment by Patrick Varley [ 07/Jul/14 ]
It would be good if we can get this fix into 3.0. Maybe a quick patch like this is good enough for now:

static void set_current_time(void) {
    struct timeval timer;

    gettimeofday(&timer, NULL);
    if (process_started < timer.tv_sec) {
        current_time = (rel_time_t) (timer.tv_sec - process_started);
    }
    else {
       settings.extensions.logger->log(EXTENSION_LOG_WARNING, NULL, "Time has gone backward shutting down to protect data.\n");
       shutdown_server();
}


More than happy to submit the code for review.
Comment by Chiyoung Seo [ 07/Jul/14 ]
Trond,

Can you see if we can address this issue in 3.0?
Comment by Jim Walker [ 08/Jul/14 ]
Looks to me like clock_handler (which wakes up every second) should be looking for time going backwards. It is sampling time every second so can easily see big shifts in the clock and make appropriate adjustments

I don't think we should be shutting down though if we can deal with it, but it does open interesting questions about TTLs and gettimeofday going backwards.

Perhaps we need to adjust process_started by the shift?

Happy to pick this up, just doing some other stuff at the moment...
Comment by Patrick Varley [ 08/Jul/14 ]
clock_handler calls set_current_time which is where all the damage is done.

I agree if we can handle it better we should not shutdown. I did think about changing process_started but that seem a bit like hack in my head but I cannot explain why :).
I was also wondering what should we do when time shifts forward?

I think this has some interesting affects on the stats too.
Comment by Patrick Varley [ 08/Jul/14 ]
Silly question but why not set current_time to epoch seconds instead of doing the offset from the process_started?
Comment by Jim Walker [ 09/Jul/14 ]
@patrick, this is shared code used by memcache and couchbase buckets. Note that memcache is storing expiry as "seconds since process" started and couch buckets store expiry as second since epoch, hence why a lot of this number shuffling is occurring.
Comment by Jim Walker [ 09/Jul/14 ]
get_current_time() is used for a number of time based lock checks (see getl) and document expiry itself (both within memcached and couchbase buckets).

process_started is an absolute time stamp and can lead to incorrect expiry if the real clock jumped. Example
 - 11:00am memcached started process_started = 11:00am (ignoring the - 2second thing)
 - 11:05am ntp comes in and aligns the node to the correct data-centre time (let’s say - 1hr) time is now 10:05am
 - 10:10am clients now set documents with absolute expiry of 10:45am
 - documents instantly expire because memcached thinks they’re in the past.. client scratches head.

Ultimately we need to ensure that the functions get_current_time(), realtime() and abstime() all do sensible things if the clock is changed, e.g. don’t return large unsigned values.
 
Given all this I think the requirements are:

R1 Define a memcached time tick interval (which is 1 second)
  - set_current_time() callback executes at this frequency.

R2 get_current_time() the value returned must be shielded from clock changes.
   - If clock goes backwards, the returned value still increases by R1.
   - If clock goes forwards, the returned value still increases by R1.
   - Really this returns process uptime in seconds and the stat “uptime” is just current_time.

R3 monitor the system time for jumps (forward or backward).
   - Reset process_started to be current time if there’s a change which is greater or less than R1 ticks.

R4 Ensure documentation describes the effect of system clock changes and the two ways you can set document expiry.
  

Overall the code changes are simple to address the issue, I will also look at making testrunner tests to ensure the system behaves.
Comment by Patrick Varley [ 09/Jul/14 ]
Sounds good, a small reminder about handling VMs that are suspended.




[MB-11646] bidirectional xdcr: fullEviction=true: not all items are replicated Created: 06/Jul/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-926

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-11646/dca3be89/172.27.33.10-762014-189-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11646/dca3be89/172.27.33.11-762014-1813-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11646/dca3be89/172.27.33.12-762014-1811-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11646/dca3be89/172.27.33.13-762014-1815-diag.zip
Is this a Regression?: Unknown

 Description   
./testrunner -i my_xdcr.ini -t xdcr.uniXDCR.unidirectional.load_with_async_ops,rdirection=unidirection,ctopology=chain,doc-ops=update-delete,sasl_buckets=1,replication_type=xmem,eviction_policy=fullEviction,dgm_run=true,active_resident_threshold=90,items=1250,value_size=2048

steps to reproduce:
1)2 clusters with 2 buckets each, fullEviction policy, resident ratio 90 or less, load items
2)setup xdcr bidirectional

expected:
items are replicated

actual: .11 node has 50876 (source) and all others have 50875

[imironava@bella testrunner]$ ./scripts/ssh.py -i my_xdcr.ini "/opt/couchbase/bin/cbstats localhost:11210 all|grep curr_items"
172.27.33.10
 curr_items: 25437
 curr_items_tot: 50875
 vb_active_curr_items: 25437
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

172.27.33.12
 curr_items: 25437
 curr_items_tot: 50875
 vb_active_curr_items: 25437
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

172.27.33.13
 curr_items: 25438
 curr_items_tot: 50875
 vb_active_curr_items: 25438
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25437

172.27.33.11
 curr_items: 25439
 curr_items_tot: 50876
 vb_active_curr_items: 25439
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25437

attaching cbcollects

 Comments   
Comment by Aleksey Kondratenko [ 08/Jul/14 ]
I will need xdcr_trace enabled and data files as usual
Comment by Aruna Piravi [ 08/Jul/14 ]
Unable to reproduce this in my local environment.

on cluster node: 10.3.4.186 for bucket default is 0
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 sasl_bucket_1
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 sasl_bucket_1
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [task.check] Saw curr_items 50875 == 50875 expected on '10.3.4.188:8091''10.3.4.189:8091',sasl_bucket_1 bucket
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 sasl_bucket_1
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 sasl_bucket_1
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_active_curr_items 50875 == 50875 expected on '10.3.4.188:8091''10.3.4.189:8091',sasl_bucket_1 bucket
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 sasl_bucket_1
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 sasl_bucket_1
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_replica_curr_items 50875 == 50875 expected on '10.3.4.188:8091''10.3.4.189:8091',sasl_bucket_1 bucket
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 default
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 default
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [task.check] Saw curr_items 50875 == 50875 expected on '10.3.4.188:8091''10.3.4.189:8091',default bucket
2014-07-08 12:06:19 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 default
2014-07-08 12:06:20 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 default
2014-07-08 12:06:20 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_active_curr_items 50875 == 50875 expected on '10.3.4.188:8091''10.3.4.189:8091',default bucket
2014-07-08 12:06:20 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 default
2014-07-08 12:06:20 | INFO | MainProcess | Cluster_Thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 default
2014-07-08 12:06:20 | INFO | MainProcess | Cluster_Thread | [task.check] Saw vb_replica_curr_items 50875 == 50875 expected on '10.3.4.188:8091''10.3.4.189:8091',default bucket
2014-07-08 12:06:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 sasl_bucket_1
2014-07-08 12:06:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 sasl_bucket_1
2014-07-08 12:06:20 | INFO | MainProcess | test_thread | [task.__init__] 51250 items will be verified on sasl_bucket_1 bucket
2014-07-08 12:06:20 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.4.188:11210 default
2014-07-08 12:06:21 | INFO | MainProcess | test_thread | [data_helper.direct_client] creating direct client 10.3.4.189:11210 default
2014-07-08 12:06:21 | INFO | MainProcess | test_thread | [task.__init__] 51250 items will be verified on default bucket



unknownc8e0eb186a83:testrunner apiravi$ ./scripts/ssh.py -i bixdcr.ini "/opt/couchbase/bin/cbstats localhost:11210 all|grep curr_items"
10.3.4.186
 curr_items: 25437
 curr_items_tot: 50875
 vb_active_curr_items: 25437
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

10.3.4.187
 curr_items: 25438
 curr_items_tot: 50875
 vb_active_curr_items: 25438
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25437

10.3.4.188
 curr_items: 25437
 curr_items_tot: 50875
 vb_active_curr_items: 25437
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

10.3.4.189
 curr_items: 25438
 curr_items_tot: 50875
 vb_active_curr_items: 25438
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25437
Comment by Aruna Piravi [ 08/Jul/14 ]
Hi Iryna,

Since I'm unable to reproduce the problem in my environment, can you enable the trace, collect data files and cb collect?
xdcr tracing is enabled by default if testrunner is used.Please check if you see diag/eval msgs during setup as shown

2014-07-08 11:54:22 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=2069&username=Administrator&password=password
2014-07-08 11:54:22 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster] settings/web params on 10.3.4.187:8091:username=Administrator&password=password&port=8091
2014-07-08 11:54:22 | INFO | MainProcess | Cluster_Thread | [rest_client.init_cluster_memoryQuota] pools/default params : memoryQuota=2069&username=Administrator&password=password
2014-07-08 11:54:23 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 10.3.4.186:8091: True content: 'ale:set_loglevel(xdcr_trace, debug).' command: 'ale:set_loglevel(xdcr_trace, debug).'
2014-07-08 11:54:23 | INFO | MainProcess | test_thread | [rest_client.diag_eval] /diag/eval status on 10.3.4.187:8091: True content: 'ale:set_loglevel(xdcr_trace, debug).' command: 'ale:set_loglevel(xdcr_trace, debug).'
2014-07-08 11:54:23 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 10.3.4.187:8091 to cluster
2014-07-08 11:54:23 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @10.3.4.187:8091 to

To get the data files, pls stop the test before teardown and run -

python scripts/collect_data_files.py -i <.ini>. This has also been automated. If testrunner finds any mismatch in item count or metadata, it will collect datafiles too.

Thanks
Aruna
Comment by Iryna Mironava [ 09/Jul/14 ]
*.10 and *.11 -one cluster, *.12 and *.13 -another
[imironava@bella testrunner]$ ./scripts/ssh.py -i my_xdcr.ini "/opt/couchbase/bin/cbstats localhost:11210 all -b sasl_bucket_1 -p password|grep curr_items"
172.27.33.12
 curr_items: 25437
 curr_items_tot: 50875
 vb_active_curr_items: 25437
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

172.27.33.10
 curr_items: 25438
 curr_items_tot: 50876
 vb_active_curr_items: 25438
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

172.27.33.13
 curr_items: 25438
 curr_items_tot: 50875
 vb_active_curr_items: 25438
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25437

172.27.33.11
 curr_items: 25438
 curr_items_tot: 50876
 vb_active_curr_items: 25438
 vb_pending_curr_items: 0
 vb_replica_curr_items: 25438

cbcollect with traces enabled:
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.10-792014-1846-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.11-792014-1848-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.12-792014-1847-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.13-792014-1850-diag.zip

data_files:
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.13-792014-1845-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.12-792014-1844-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.11-792014-1845-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-11646/dba3be89/172.27.33.10-792014-1844-couch.tar.gz




[MB-11585] A query with stale=false never returns Created: 27/Jun/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Tom Yeh Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: viewquery
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Not sure what really happens, but my couchbase server never returns if I issues a query with stale=false. For example,

http://localhost:8092/default/_design/task/_view/by_project?stale=false&key=%22GgmBVrB9CGakdeHNnBMXZyms%22

Also, CPU usage is more than 95%.

It returns immediately if I don't specify stale=false.

It worked fine before, but not sure what happened. The database corrupted? Anything I can do?

It is a development environment, so the data is small -- only about 120 docs (and the query shall return only about 10 docs).

NOTE: the output of cbcollect_info is uploaded to https://s3.amazonaws.com/customers.couchbase.com/zk


 Comments   
Comment by Sriram Melkote [ 07/Jul/14 ]
Nimish, can you please look at the cbcollect and see if you can analyze the reason the query did not return?
Comment by Nimish Gupta [ 09/Jul/14 ]
From the logs, it looks like 200 Ok response header was sent back to client

[couchdb:info,2014-06-27T21:43:49.754,ns_1@127.0.0.1:<0.6397.0>:couch_log:info:39]127.0.0.1 - - GET /default/_design/task/_view/by_project?stale=false&key=%22GgmBVrB9CGakdeHNnBMXZyms%22 200

From the logs, we cant figure out if couchbase didn't send the response body. On windows , we have issue of indexing getting stuck (https://www.couchbase.com/issues/browse/MB-11385). But I am not sure if that bug is root cause for this issue from the logs.





[MB-11670] Rebuild whole project when header file changes Created: 08/Jul/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Minor
Reporter: Volker Mische Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When you change a header file in the view-engine (couchdb project) the whole project should be rebuild.

Currently if you change a header file and you don't clean up the project you could end up with run-time errors like a badmatch on the #writer_acc record.

PS: I opened that as an MB bug and not as a CBD as this is valueable information about badmatch errors that should be public.

 Comments   
Comment by Chris Hillery [ 09/Jul/14 ]
This really has nothing to do with build team, and as such it's perfectly appropriate for it to be MB.

I'm assigning it back to Volker for some more information. Can you give me a specific set of actions you can take that demonstrate this not happening? Is it to do with Erlang code, or C++?
Comment by Volker Mische [ 09/Jul/14 ]
Build Couchbase with a make.

Now edit a couchdb Erlang header file. For example edit couchdb/src/couch_set_view/include/couch_set_view.hrl and comment this block out (with leading `%`):

-record(set_view_params, {
    max_partitions = 0 :: non_neg_integer(),
    active_partitions = [] :: [partition_id()],
    passive_partitions = [] :: [partition_id()],
    use_replica_index = false :: boolean()
}).

When you do a "make" again, ns_server will complain about something missing, but couchdb won't as it doesn't rebuild at all.

Chris, I hope this information is good enough, if you need more, let me know.




[MB-11327] [Windows] : Error occured during querying view ""view_group_index_updater_exit,255,<<>>"" Created: 05/Jun/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Meenakshi Goel Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: Windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-780-rel

Issue Links:
Dependency
depends on MB-11385 [Windows] indexing does not work (hang) Open
Triage: Triaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.hq.northscale.net/job/win_2008_x64--17_01--viewquery-P0/20/consoleFull

Test to Reproduce:
./testrunner -i <yourfile>ini attempt-num=20,get-cbcollect-info=True,GROUP=P0 -t view.viewquerytests.ViewQueryTests.test_simple_dataset_stale_queries,GROUP=P0

Logs:
[couchdb:error,2014-06-05T4:02:51.153,ns_1@10.3.3.38:<0.4716.0>:couch_log:error:42]Set view `default`, main group `_design/test_view-591cd33`, writer error
error: {badmatch,
             {writer_acc,<0.4711.0>,<0.4473.0>,
              {set_view_group,
               <<241,120,160,114,179,75,22,91,244,103,116,21,167,130,241,229>>,
               <0.4706.0>,<<"default">>,<<"_design/test_view-591cd33">>,[],
               [{set_view,0,
                 <<"function (doc) {if(doc.age !== undefined) { emit(doc.age, doc.name);}}">>,
                 #Ref<0.0.0.46463>,
                 {mapreduce_view,
                  [<<"test_view-591cd33">>],
                  {btree,<0.4706.0>,
                   {58631,
                    <<0,0,0,9,90,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,255,
                      255,255,255,255,255,255,255,255,255,255,255,255,255,255,
                      255,255,255,255,255,255,255,254,247,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>,
                    28664},
                   identity,identity,#Fun<mapreduce_view.14.83342314>,
                   #Fun<mapreduce_view.13.83342314>,7168,6144,true},
                  [],[]}}],
               {btree,<0.4706.0>,
                {30453,
                 <<0,0,0,9,90,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                   0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,
                   255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
                   255,255,255,255,254,247,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>,
                 30902},
                identity,identity,#Fun<couch_btree.1.39972947>,
                #Fun<couch_set_view_group.15.75295102>,7168,6144,true},
               <0.4709.0>,
               {set_view_index_header,2,1024,
                1552518092300708935148979488462502555256886017116696611139052038026050952686363478522948466231546903925489524851003593840350987650737090830000565511011107592227653101722683002122575253773411958774409245310904522180946413204809973760,
                0,0,
                [{512,1},
                 {513,2},
                 {514,1},
                 {515,0},
                 {516,1},
                 {517,2},
                 {518,1},
                 {519,3},
                 {520,0},
                 {521,2},
                 {522,6},...

Uploading Logs

 Comments   
Comment by Meenakshi Goel [ 05/Jun/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-11327/c0313cd4/10.3.3.38-652014-44-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11327/3e5067da/10.3.3.39-652014-45-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11327/b2519932/10.3.2.239-652014-424-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11327/acb963fd/10.3.2.243-652014-414-diag.zip
Comment by Meenakshi Goel [ 11/Jun/14 ]
Promoting it to Test Blocker many view query tests are failing:
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/26/consoleFull
http://qa.hq.northscale.net/job/win_2012_x64--24_01--viewquery-P0/26/consoleFull
http://qa.hq.northscale.net/job/win_2008_x64--17_01--viewquery-P0/20/consoleFull

Comment by Nimish Gupta [ 11/Jun/14 ]
From the log, I am not able to find anything useful. I was trying to set up couchbase on windows vm to reproduce the issue, but setup was not working correctly. Meenakshi, could you please give me your test setup to reproduce the issue ?
Comment by Meenakshi Goel [ 11/Jun/14 ]
Sure, Will share the cluster if the issue gets reproduced because with latest build 3.0.0-804-rel observing a different issue similar to one reported in MB-10490 .
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/32/consoleFull
Comment by Volker Mische [ 11/Jun/14 ]
I've seen similar errors on Linux when I was rebuilding it without a proper clean. Please make sure the Windows build process really does build from a clean checkout and also removes the "build" subdirectory before building.
Comment by Meenakshi Goel [ 11/Jun/14 ]
Lowering the priority to Critical as no longer seeing this issue during sanity tests though unable to run tests due to other issues occurring during View Query tests.
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/32/consoleFull
Comment by Nimish Gupta [ 12/Jun/14 ]
Meenakshi, Please add the error messages and logs for the latest test failure. If you are hitting an old issue, please mark it duplicate/blocked to that issue.
Comment by Meenakshi Goel [ 12/Jun/14 ]
I am hitting an old issue and below new issue, Will open a other bug for the old issue.
http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/35/consoleFull

Error is ""view_group_index_updater_exit,255,<<>>""
Failure message is: Exception: DDoc=test_view-9673d1e; View={"test_view-9673d1e": {"map": "function (doc) {if(doc.age !== undefined) { emit(doc.age, doc.name);}}"}};:{'stale': 'false', 'connection_timeout': 60000}: ERROR: Status 500.Error occured querying view test_view-9673d1e: {"error":"error","reason":"{view_group_index_updater_exit,255,<<>>}"}

Logs:
https://s3.amazonaws.com/bugdb/jira/MB-11327/323454fe/10.3.3.38-6122014-058-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11327/fd51dc07/10.3.3.39-6122014-10-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11327/832ec2df/10.3.2.239-6122014-11-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-11327/4e283e16/10.3.2.243-6122014-12-diag.zip
Comment by Meenakshi Goel [ 19/Jun/14 ]
View Merge tests are also failing, Not sure if it's related to same issue:
http://qa.hq.northscale.net/job/win_2008_x64--17_03--viewmerge-P0/5/consoleFull

Error:
[couchdb:info,2014-06-18T23:57:20.669,ns_1@10.3.3.38:<0.19636.11>:couch_log:info:39]Updater for main set view group `_design/test2`, set `default`, read a total of 1 changes
[couchdb:info,2014-06-18T23:57:20.669,ns_1@10.3.3.38:<0.19635.11>:couch_log:info:39]Updater for set view `default`, main group `_design/test2`, starting btree build phase
[couchdb:error,2014-06-18T23:57:20.685,ns_1@10.3.3.38:<0.19635.11>:couch_log:error:42]Set view `default`, main group `_design/test2`, writer error
error: {index_builder_exit,86,<<>>}
stacktrace: [{mapreduce_view,index_builder_wait_loop,3,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/mapreduce_view.erl"},
                  {line,200}]},
             {mapreduce_view,finish_build,3,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/mapreduce_view.erl"},
                  {line,145}]},
             {couch_set_view_updater,flush_writes,1,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,869}]},
             {couch_set_view_updater,'-update/8-fun-1-',13,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,186}]}]

[couchdb:error,2014-06-18T23:57:20.685,ns_1@10.3.3.38:<0.15211.11>:couch_log:error:42]Set view `default`, main (prod) group `_design/test2`, received error from updater: {index_builder_exit,
                                                                                     86,
                                                                                     <<>>}
[couchdb:info,2014-06-18T23:57:20.747,ns_1@10.3.3.38:<0.17608.11>:couch_log:info:39]10.3.5.7 - - GET /default/_design/test2/_view/redview_stats?stale=false&on_error=stop 20
Comment by Meenakshi Goel [ 09/Jul/14 ]
Observing similar errors in View Merge tests as mentioned in above comment with 3.0.0-936-rel.

http://qa.hq.northscale.net/job/win_2008_x64--17_03--viewmerge-P0/11/console

[couchdb:error,2014-07-09T1:10:05.318,ns_1@10.1.2.89:<0.11211.1>:couch_log:error:42]Set view `default`, main group `_design/test2`, writer error
error: {index_builder_exit,86,<<>>}
stacktrace: [{mapreduce_view,index_builder_wait_loop,3,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/mapreduce_view.erl"},
                  {line,200}]},
             {mapreduce_view,finish_build,3,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/mapreduce_view.erl"},
                  {line,145}]},
             {couch_set_view_updater,flush_writes,1,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,902}]},
             {couch_set_view_updater,'-update/8-fun-1-',13,
                 [{file,
                      "c:/Jenkins/workspace/cs_300_win6408/couchbase/couchdb/src/couch_set_view/src/couch_set_view_updater.erl"},
                  {line,187}]}]

[couchdb:error,2014-07-09T1:10:05.318,ns_1@10.1.2.89:<0.7050.1>:couch_log:error:42]Set view `default`, main (prod) group `_design/test2`, received error from updater: {index_builder_exit,
                                                                                     86,
                                                                                     <<>>}

https://s3.amazonaws.com/bugdb/jira/MB-11327/b2519932/10.1.2.89-792014-17-diag.zip

Please let us know if it's not related to original issue reported, Will open a different bug for the same. Thanks!
Comment by Meenakshi Goel [ 09/Jul/14 ]
Tried running ViewQuery tests also with latest build in order to verify that whether issue still exists or not but couldn't verify due to MB-11385

http://qa.hq.northscale.net/job/win_2008_x64--01_00--qe-sanity-P0/53/consoleFull




[MB-11385] [Windows] indexing does not work (hang) Created: 10/Jun/14  Updated: 09/Jul/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2012 64-bit

Attachments: Zip Archive collectinfo-2014-06-11T005131-ns_1@127.0.0.1.zip    
Issue Links:
Dependency
blocks MB-11327 [Windows] : Error occured during quer... Open
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Install couchbase server 3.0.0-801 from http://factory.hq.couchbase.com:8080/job/cs_300_win6408/208/artifact/voltron/couchbase-server-enterprise-3.0.0-801.setup.exe
into node 10.1.2.49
Create default bucket
Create 6 docs with 1 view per doc
Load doc to bucket. Indexing seems does not work at all

Live node 10.1.2.49 available to debug

test ran
./testrunner -i /tmp/5-nodes-win-2012_x64.ini -t view.viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in -p ,num_nodes_to_add=1,skip_rebalance=true,attempt-num=20


 Comments   
Comment by Wayne Siu [ 12/Jun/14 ]
Lowering the priority to Critical as discussed (Windows).
Comment by Thuan Nguyen [ 23/Jun/14 ]
Raise to test blocker since it blocks view test on windows




[MB-11203] SSL-enabled memcached will hang when given a large buffer containing many pipelined requests Created: 24/May/14  Updated: 09/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Test Blocker
Reporter: Mark Nunberg Assignee: Trond Norbye
Resolution: Unresolved Votes: 0
Labels: memcached
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Sample code which shows filling in a large number of pipelined requests being flushed over a single buffer.

#include <libcouchbase/couchbase.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int remaining = 0;

static void
get_callback(lcb_t instance, const void *cookie, lcb_error_t err,
    const lcb_get_resp_t *resp)
{
    printf("Remaining: %d \r", remaining);
    fflush(stdout);
    if (err != LCB_SUCCESS && err != LCB_KEY_ENOENT) {
    }
    remaining--;
}

static void
stats_callback(lcb_t instance, const void *cookie, lcb_error_t err,
    const lcb_server_stat_resp_t *resp)
{
    printf("Remaining: %d \r", remaining);
    fflush(stdout);
    if (err != LCB_SUCCESS && err != LCB_KEY_ENOENT) {
    }

    if (resp->v.v0.server_endpoint == NULL) {
        fflush(stdout);
        --remaining;
    }
}

#define ITERCOUNT 5000
static int use_stats = 1;

static void
do_stat(lcb_t instance)
{
    lcb_CMDSTATS cmd;
    memset(&cmd, 0, sizeof(cmd));
    lcb_error_t err = lcb_stats3(instance, NULL, &cmd);
    assert(err==LCB_SUCCESS);
}

static void
do_get(lcb_t instance)
{
    lcb_error_t err;
    lcb_CMDGET cmd;
    memset(&cmd, 0, sizeof cmd);
    LCB_KREQ_SIMPLE(&cmd.key, "foo", 3);
    err = lcb_get3(instance, NULL, &cmd);
    assert(err==LCB_SUCCESS);
}

int main(void)
{
    lcb_t instance;
    lcb_error_t err;
    struct lcb_create_st cropt = { 0 };
    cropt.version = 2;
    char *mode = getenv("LCB_SSL_MODE");
    if (mode && *mode == '3') {
        cropt.v.v2.mchosts = "localhost:11996";
    } else {
        cropt.v.v2.mchosts = "localhost:12000";
    }
    mode = getenv("USE_STATS");
    if (mode && *mode != '\0') {
        use_stats = 1;
    } else {
        use_stats = 0;
    }
    err = lcb_create(&instance, &cropt);
    assert(err == LCB_SUCCESS);


    err = lcb_connect(instance);
    assert(err == LCB_SUCCESS);
    lcb_wait(instance);
    assert(err == LCB_SUCCESS);
    lcb_set_get_callback(instance, get_callback);
    lcb_set_stat_callback(instance, stats_callback);
    lcb_cntl_setu32(instance, LCB_CNTL_OP_TIMEOUT, 20000000);
    int nloops = 0;

    while (1) {
        unsigned ii;
        lcb_sched_enter(instance);
        for (ii = 0; ii < ITERCOUNT; ++ii) {
            if (use_stats) {
                do_stat(instance);
            } else {
                do_get(instance);
            }
            remaining++;
        }
        printf("Done Scheduling.. L=%d\n", nloops++);
        lcb_sched_leave(instance);
        lcb_wait(instance);
        assert(!remaining);
    }
    return 0;
}


 Comments   
Comment by Mark Nunberg [ 24/May/14 ]
http://review.couchbase.org/#/c/37537/
Comment by Mark Nunberg [ 07/Jul/14 ]
Trond, I'm assigning it to you because you might be able to delegate this to another person. I can't see anything obvious in the diff since the original fix which would break it - of course my fix might have not fixed it completely but just have made it work accidentally; or it may be flush-related.
Comment by Mark Nunberg [ 07/Jul/14 ]
Oh, and I found this on an older build of master; 837, and the latest checkout (currently 055b077f4d4135e39369d4c85a4f1b47ab644e22) -- I don't think anyone broke memcached - but rather the original fix was incomplete :(




[MB-11638] Outbound mutations not correct when XDCR paused Created: 03/Jul/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0-Beta
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When XDCR is paused an a workload is ongoing, the "outbound XDCR mutations" value seems to max out at a lower value than there actually are mutations (a few thousand per node?).

Users will want to see how many mutations are outstanding while XDCR is paused to know how much will have to replicate once they resume.

 Comments   
Comment by Aleksey Kondratenko [ 03/Jul/14 ]
There's _no_ way I can do it.

I'm aware of this. Yes, unlike, 2.x we don't update our "items remainging" stats regularly. 2.x wasted some cycles on that on _every_ vbucket update. But at least it had reasonably efficient way of getting that count from couch's count reduction.

In 3.0 simply (at least as of now) have no way of getting that stat anywhere near efficiently.

So current implementation simply does it:

a) once on wakeup

b) by taking difference between highest seqno and replicated seqno (which is mere estimate already)

My understanding is that there's no way current upr implementation can do it anywhere close to efficiently.
Comment by Perry Krug [ 03/Jul/14 ]
Thanks Alk, very much understood.

My perspective on this is simply from that of an administrator pushing buttons and looking at the effect, and then asking me or support why.

What would you think about a once-per-second/5second/10second process that only kicked in while paused and grabbed the latest sequence number to update this particular stat with? In my mind, it would be similar to waking up and immediately sleeping all the replicators just to get a count.

On the one hand I realize it seems a very hackish/ad-hoc process, but on the other, I'm positive that end-users will notice this and ask the question (even if it's documented and release noted).

Comment by Aleksey Kondratenko [ 03/Jul/14 ]
We could poll for stats like that. But that means possibly nontrivial CPU overhead and I'd like to minimize it.
Comment by Perry Krug [ 03/Jul/14 ]
I certainly agree with reducing CPU overhead.

If it's trivial to do so, adding an unsupported interval config and/or on-off capability to this would help the field workaround any problems. Given that it would only take effect when XDCR is paused, we're going to have a decent amount of "free" CPU in comparison to when XDCR was running previously.




[MB-11627] Expose DCP (UPR) as a public protocol for change tracking Created: 02/Jul/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 1
Labels: upr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Enable publicly supported API for DCP (UPR) that can allow clients to open a UPR stream, detect changes for a given bucket and a given filter on keys or values.




[MB-10349] Fix upr docuementation based on Steve's comments Created: 04/Mar/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
I'm _still_ in the midst of reading all the UPR specs.


But, in general, I'd say the spec(s) don't meet a level of specification that allows an engineer to implement it without having direct access to you or looking at code.


More specific feedback...


Would be much better to use consistent terminology throughout, instead of too many synonyms.


For example, producer vs consumer vs server vs client. And, "connection" vs "channel" vs "stream". Are they the same concept? If so, then please use a single word.


Next, on some per-document feedbacks...


(Also, no need to answer by email, but just please take as feedback to improve the actual docs.)


In transport-spec.md...


In general, need more description of expected error responses for each message.


Open Connection (opcode 0x50)


- what are the rules for the key?

- what are the expected errors?

- what is a connection sequence number?

- who needs to increment the sequence number and who is the "receiving side"?

- what do the producer / consumer bits mean? Do they represent the receiver of the request or the initiator of the request?


More synonyms - "Open Connection" versus "UPR_OPEN".


Add Stream (opcode 0x51)


To which producer does the consumer "initiate a stream request"? Is the receiver of the UPR_ADD_STREAM


Close Stream (0x52)


What if the peer receives messages for a stream that's already closed? Can it tell?


"The UPR consumer will ... let the UPR producer know as soon as it receives a message bound for that vbucket."... how? Is the UPR consumer supposed to send a message?


Failover Log Request (0x54)


"If a client can't find a known failover id, it should select the vbucket with a highest sequence number..." -- a worked example here would help. From what set of vbuckets does the client (consumer?) select? After the client/consumer selects a highest vbucket, what does it do with its selection? Perhaps I read the docs out-of-order, but a pointer to the failover-scenarios.md file here might help (but failover-scenarios.md doesn't really talk about "selecting the vbucket with a highest sequence number").


On failover-scenarios.md....


"Where to rollback to is determined by the Node C which will find the last VBucket UUID that the tow nodes had in common..."


Some participants (indexes, etc) might not have fine-grained rollback-ability, and Node C won't have that knowledge. Should it not be instead the UPR consumer that determines where to rollback to instead of the UPR producer?


In the "Failover Log (Handshake)" section...


It seems inconsistent with the previous note, where it seems the consumer is making the rollback decision. If there are two alternative pathways on who decides to rollback, the the protocol should be simplified to proscribe only one pathway.


"The failover log should be persistent immediately after receiving the OK response."


Who persists the failover log? The consumer? If so, should the consumer only persist the failover log only after it has successfully rolled back?


On dead-connections.md...


The "slow consumer" messages will be tricky to handle in the midst of channel/stream metadata handshakes, like open-connections, streams, and snapshot-markers.


A worked example would be helpful. e.g., imagine the producer just sent a SET-VBUCKET-STATE msg on the wire, but the consumer is racing down a bad spiral of slow disks and sends "a response to the producer indicating a temporary failure".


And, it's possible this is all obsoleted by your recent talks on UPR memory buffering / ACK'ing with Alk but still a worked example with those "metadata" messages would be helpful.

 Comments   
Comment by Steve Yen [ 04/Mar/14 ]
Also, I think I read some of the documents out-of-normal-order, and now that I read some other docs, the transport-spec.md doc makes a bit more sense. So, another feedback would be to add some "please read this first" pointers at the top of relevant docs.
Comment by Steve Yen [ 04/Mar/14 ]
Another feedback... I see a distinction in the protocol between DELETE vs EXPIRATION messages. I might have missed this, but if not already there, a brief 1 or 2 sentences on the rationale would be good.

e.g., some clients (backup? indexes?) might not care about the difference and treat DELETION & EXPIRATION the same way? But, which consumers would care about the difference and need to handle them differently?
Comment by Steve Yen [ 04/Mar/14 ]
Yet another feedback...

Commands that close stuff (conns / streams) are interesting, in that there might be interesting edge cases of what happens to all the requests on the stream of connection that are already in-flight?

Related, an interesting (troublesome) case is when clients reopen these things but inadvertently try to use the same name? Might be too late here, but a clearer protocol might instead provide an error response if someone tries to OPEN_CONNECTION when there's a already a connection with that name, instead of implicitly closing the existing open conn.




[MB-11614] Discussion - Should we move auto-failover out of erlang? Created: 26/Nov/13  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Major
Reporter: James Mauss Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
In the field, we are seeing many time that when a node is 'slow' due to the OS, the node is auto-failed over. During this 'slow' time the memcached process is handling gets/sets from the clients without any issues.

Often the issue comes down to erlang not being able to communicate to each other for some reason that is not impacting memcached and is sometimes blamed on swap, THP, erlang's internal balancing among threads, etc.

Should we look at moving the auto-failover logic out of erlang to help prevent some of these 'false' failovers?

 Comments   
Comment by Matt Ingenthron [ 26/Nov/13 ]
So, erlang's soft real-time isn't real-time enough for you?

There are other ways of doing checks for "dead" too that we can explore. A classic technique if something seems to not be responding is to try to connect to it on a TCP port known to _not_ be listening. If you get a failure right away, you know the system is there and you maybe just need to wait longer for a reply to your message, etc.

But yes, I do think there's room for improvement here.

If we really want higher reliability, we probably need to run some very-small subset as a kernel module with real-time scheduling and locked memory pages.
Comment by Perry Krug [ 29/Nov/13 ]
We've seen pretty good success with the babysitter process helping out...would an easy win possibly be to split more things away from the ns_server component?

I know we want some of these to be improved/rewritten at some point, but in the near-term:
-XDCR
-Compaction
-View processing
-Autofailover

Could all be split into their own processes? This would also give us the ability to monitor the resource utilization of each one independently on live systems...
Comment by Perry Krug [ 16/Dec/13 ]
Just sending this onto your plate Alk, perhaps its a good place to have a discussion between the field and you/engineering on what is possible here and provide feedback into the field on what you've already got in the works.

The main reason for this ticket is to prevent false-positives as much as possible, I know there are separate discussions you're involved in around how to make auto-failover faster in certain cases which may be semi-related but is not necessarily the focus of this discussion.
Comment by Aleksey Kondratenko [ 27/Dec/13 ]
CC-ed Dustin, Steve, John, Aaron and my crew. As well as some valuable not-so-tech folks. Added Filipe too for his erlang expertise.

While testrunner runs I'll add some thoughts here.

== About erlang's soft real-time-ness....

In practice we have found this to be a myth. At least for our use cases. For the following reasons:

* if swap storm happens erlang's user-space multitasking rapidly loses it's efficiency. One lightweight process that's stuck on missing page causes entire kernel thread of erlang scheduler (aka interpreter thread) to be stuck.

* when GC happens erlang cannot switch processes. We've seen evidence that xdcr leak may cause GC to spend _hundreds_ of seconds. Stalling tons of more important processes.

* when NIF (native C functions called from erlang) is executed erlang cannot switch processes. Gameloft cases have shown us clearly that mapreduce nifs can easily cause badness. Worst of all that we have no cure at all against runaway map/reduce invocations and doing those calls on dedicated threads is believed to be too expensive. mapreduce facility limits time spent in single invocations but it cannot make this limit too small both due to correctness reasons and also because it's hard to implement very short timeouts portably.

* erlang's SMP scheduling trades latency for throughput. That one is quite a major IMHO. Most of the issues above would be a non-issues if erlang had a single runqueue where all scheduler threads pull their runnable processes. I.e. if one or even few schedulers occasionally stuck on something, there would be more threads ready to take work. Back in the day it was the case. But now it only supports runqueue per scheduler, with very tricky heuristics dealing with spreading runnable processes (this is where "famous" +swt low matters). It also has additional heuristics that disables "excessive" scheduler threads. I would give something valuable for simple and effective behavior of old erlangs, because we don't care about throughput that much.

* there's unconfirmed suspicion that erlang treats ports work (sockets, because files are a bit like nifs) as less important than other work. This is potentially causing famous "memcached op XXXX took too long" messages which are too easy to trigger. But I'm speculating here a bit.

* despite some reports that state otherwise, I've found erlang to have a _horrible_ memory management. You can see it yourself in high count of minor page faults that happen during some reasonably heavy erlang activity. And that's _after_ we tried to tune it. This alone could be fine. But other factors combine to make this one of major latency factors. I.e.:

** When Linux "reserve" of free pages runs out (well in reality that happens strictly before _all_ free pages are consumed due to further reserves for ISRs/filesystems etc), it may sometimes stall on "direct reclaim". Which is activity that finds clean pages to evict (but it may also initiate writeout/swap and is a complex beast beyond my current understanding). Hugepages and various hugepage-related measures (like pages compaction) may make matters worse by a lot. We've seen this to cause significant delays even in environments with tons of clean and inactive pages even on system without huge pages (I would guess it might be related to locking in MM layer). That's relatively known area of badness of Linux (perhaps it's weakest point for quite some time already). It's also known that direct reclaim (or maybe even any reclaim, I'm not expert sadly) causes bad (too seeky) dirty pages writeout patterns making things even worse. I've seen windows to do noticeably better at this, at least on our use-case.

** Normally this reserves do not run out (there's kswapd process that does "normal" reclaim periodically to keep it from exhausting). But our non-directio-append-only writes approach can be easily seen to force us towards direct reclaims (i.e. when pages are allocated faster then kswapd refills free pages reserves). I.e. every write causes page allocation for page cache to hold newly written data. And we never explicitly free pages (even though logically data is overwritten) instead requiring kernel to "guess" which pages are unused/garbage or just not hot enough. Views apparently make matters worse due to never fsyncing anything (which causes kernel's dirty pages limit to hit causing us to have up to 20% of memory in dirty view pages with default settings).

== Regarding splitting

I believe that's in general way to go. But I would think twice about duplicating cluster meshing in a dedicated autofailover process and "normal" cluster orchestration process. Better model, at least as of my current thinking, appears to be separating cluster orchestration _core_ out of everything else. That would split xdcr, view _and_ management REST. So that all cluster orchestration activities (including autofailover) would have dedicated OS resources.

== General question

I think most important question is the following: how come we see clients do not notice memcached unavailability, yet erlang triggers autofailover?

There may be something in our implementation we may want to understand _independently_ from splitting decision.

At least in past it was not uncommon to see autofailover due to erlang's inability to talk to memcached (and not due to talk to other erlang's as we're increasingly seeing now). I think 1.8's implementation of multiple ns_memcached workers could improve things in practice. And I think I understand why.

I don't know if we have reliable way to measure certain levels of memcached unavailability. But it's quite possible that even under heavy swap storm many memcached requests are still quick. Causing average/median latency to be decent. But autofailover perspective (as of now) is if _single_ operation on certain memcached connection is stuck we detect "down"-ness. Add to that, that erlang needs gather a bunch of other crap (stats etc) to send healthy heartbeat. BTW, 2.5 is decoupling "gathering crap" from heartbeats along with some other fixes (around running more buckets) in this area.

So I think there's some food for thought to have less of "max out of samples" and more "_all_ out of recent samples" for autofailover. I'll be thinking about this more.
Comment by Steve Yen [ 13/Jan/14 ]
> I think most important question is the following: how come we see clients do not notice memcached unavailability, yet erlang triggers autofailover?

I recall that Dustin had another thought on this (which I abbreviate here), and it's more in the category of larger design question... in short, failures should be measured (instead? / additionally?) from the perspective of the clients.




[MB-11099] Couchbase cluster provisioning, deployment, and management on Open Stack KVM and Trove Created: 12/May/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: cloud
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
support Couchbase cluster provisioning, deployment, and management on Open Stack KVM and trove.




[MB-11537] problem with ep_tap_replica_total_backlog_size stat Created: 24/Jun/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Artem Stemkovski Assignee: Artem Stemkovski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
export COUCHBASE_NUM_VBUCKETS=8
export COUCHBASE_REPL_TYPE=tap
2 nodes with some data (~1500 documents)
Nodes are rebalanced and no documents were added/changed/deleted since the rebalance

hard failover one node
you'll see the following warning:
Attention – A significant amount of data stored on this node does not yet have replica (backup) copies! Failing over the node now will irrecoverably lose that data when the incomplete replica is activated and this node is removed from the cluster. It is recommended to select "Remove Server" and rebalance to safely remove the node without any data loss.

Strange because after the rebalance replicas should be up to date and it should be safe to do the failover.

 Comments   
Comment by Artem Stemkovski [ 25/Jun/14 ]
I see that ep_tap_replica_total_backlog_size steadily equals to 4 in this case.
From the other hand ep_tap_replica_queue_drain = 0

This doesn't look right to me and it screws up our failover safeness level calculation.
Comment by Artem Stemkovski [ 25/Jun/14 ]
Artems-MacBook-Pro:couchbase artem$ install/bin/cbstats localhost:12000 tapagg
 _total:backfill_remaining: 0
 _total:backoff: 0
 _total:count: 1
 _total:drain: 0
 _total:fill: 0
 _total:itemondisk: 0
 _total:qlen: 0
 _total:total_backlog_size: 4
 replication:backfill_remaining: 0
 replication:backoff: 0
 replication:count: 1
 replication:drain: 0
 replication:fill: 0
 replication:itemondisk: 0
 replication:qlen: 0
 replication:total_backlog_size: 4
Comment by Artem Stemkovski [ 25/Jun/14 ]
I wonder should we use qlen for TAP and items_remaining for UPR instead of total_backlog_size? These stats seem to be working fine.
Comment by Mike Wiederhold [ 26/Jun/14 ]
Per the email thread.

UPR: items_remaining
TAP: total_backlog_size
Comment by Mike Wiederhold [ 26/Jun/14 ]
I deleted the comments about the upr issue since I want to keep this bug restricted to tap. Note that the fix for the upr issue is here (http://review.couchbase.org/#/c/38850/)
Comment by David Liao [ 08/Jul/14 ]
If I add the 2nd server and rebalance, the total_backlog_size would show 4 even before I insert any items. After inserting items, it remains to be 4.

If I insert items first and then add 2nd server and rebalance then the stats is 0.

The number 4 is the # of items from the 4 checkpoints (each has 1). This is due to 8 vbuckets. If there is 16 vbuckets, then there would be 8 checkpoints (each has 1) and thus make the stat to be 8. so the following seems true in this case:

total_backlog_size - ep_tap_queue_backfillremaining = num_of_vbuckets / 2.

For some reason there is always 1 item in each checkpoint in the first case. not sure if it's a bug or by design.

Chiyoung/Mike, please provide any input to clarify this.
Comment by Chiyoung Seo [ 08/Jul/14 ]
Artem,

(backfill_remaining + qlen) will give you the number of items remaining for replication. You can use those two stats instead of total_backlog_size.

total_backlog_size includes checkpoint_meta items (e.g., checkpoint_start, checkpoint_end), which are not directly related to data loss. We will fix this backlog_size stat issue separately.




[MB-11516] wal_flush_before_commit doesn't constrain memory usage, if using transactions Created: 23/Jun/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: 2.5.1
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Jens Alfke Assignee: Jung-Sang Ahn
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
As I understand it, the purpose of wal_flush_before_commit is to constrain memory usage by transferring WAL entries from heap to disk periodically before a commit is issued. This is important on mobile devices which are short on RAM and tend to react badly to running out of memory (iOS will just kill an app process if it's using more than some limited fraction of the physical RAM.)

Unfortunately, using transactions in ForestDB currently prevents the WAL from being committed to disk during a transaction. The logic at the end of fdb_set (forestdb.cc:2033-2036) calls wal_commit only if there is no database transaction in effect.

Couchbase Lite does need to use transactions, so I think this is going to be a problem for us.

 Comments   
Comment by Jung-Sang Ahn [ 28/Jun/14 ]
Hi Jens,

Thanks for bringing the issue. I clearly understand the problem and agree that setting ‘wal_flush_before_commit’ doesn’t constrain the memory usage if a transaction is activated. This is because transactional WAL entries cannot be flushed unless the associated transaction is committed.

To strictly limit the memory overhead, uncommitted transactional WAL entries have to be flushed into disk, thus all on-disk structures such as hb+trie or b+tree nodes and documents also should be modified to support transaction. I have been working on this for the past a few days, but it is not easy because the current hb+trie and b+tree implementation stores only one value per a single key. To support transaction, more than one value (committed value and all dirty values by transactions) should be mapped by a single key, and seems that it requires a lot of changes of the existing code because we have to consider various factors such as concurrent compactor or writer, and recovery. I think it takes some more time to fix.

Note that a document itself is written into disk as soon as fdb_set is called, and only its key and metadata (such as disk offset) are kept in memory. Hence, the amount of memory footprint per a document is not as large as the document size. But if you think it is still too large, splitting a big commit into several small commits can be an alternative.

Thanks,
Jung-Sang
Comment by Jens Alfke [ 28/Jun/14 ]
> Note that a document itself is written into disk as soon as fdb_set is called

Interesting; that's not what I'm seeing. I use dtrace (via the Instruments app on OS X) to monitor I/O syscalls, and there are no file writes happening at all until I commit the transaction. In this test there are about 12,000 documents, each of which has a few hundred bytes of body.
Comment by Jung-Sang Ahn [ 28/Jun/14 ]
I think that is because of (ForestDB's own) block cache. If block cache is turned on, documents are cached in block cache first, and written back into disk when 1) user manually invokes commit, or 2) there is no more space in the block cache. I guess file write calls will be caught by the I/O monitor if you disable the block cache.
Comment by Jung-Sang Ahn [ 28/Jun/14 ]
Anyway, since the block cache size is strictly fixed, the overused memory by WAL will contain key and metadata only, excluding document body.




[MB-11665] {UPR}: During a 2 node rebalance-in scenario :: Java SDK (1.4.2) usage sees a drop in OPS (get/set) by 50% and high error rate Created: 07/Jul/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Parag Agarwal
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 172.23.107.174-177

Triage: Untriaged
Operating System: Centos 64-bit
Flagged:
Release Note
Is this a Regression?: Yes

 Description   
We have compared the run of 2.5.1-0194 vs 3.0.0-918, Java SDK used 1.4.2

Common Scenario

1. Create a 2 node cluster
2. Create 1 default bucket
3. Add 15 K items with do get and set
4. Add 2 nodes and then rebalance
5. Run Get and Set again in parallel to rebalance

Issue observed during Step5: Ops drop by 50% , error rate is high for most of the times, when compared to 2.5.1

The comparative report is shared here

General Comparison Summary

https://docs.google.com/document/d/1PjQBdJvLFaK85OrrYzxOaZ54fklTXibj6yKVrrU-AOs/edit

3.0.0-918:: http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-3.0.0-918/Rb2In-HYBRID/07-03-14/068545/22bcef05a4f12ef3f9e7f69edcfc6aa4-MC.html

2.5.1-1094: http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-2.5.1-1094/Rb2In-HYBRID/06-24-14/083822/2f416c3207cf6c435ae631ae37da4861-MC.html
Attaching logs

We are trying to run more tests for different version of SDK like 1.4.3, 2.0

https://s3.amazonaws.com/bugdb/jira/MB-11665/logs_3_0_0_918_SDK_142.tar.gz


 Comments   
Comment by Parag Agarwal [ 07/Jul/14 ]
Pavel: Please add your comments for such a scenario with libcouchbase
Comment by Pavel Paulau [ 07/Jul/14 ]
Not exactly the same scenario but I'm not seeing major drops/errors in my tests (using lcb based workload generator).
Comment by Parag Agarwal [ 08/Jul/14 ]
So Deepti posted results and we are not seeing issues with 1.4.3 for the same run. What is the difference between SDK 1.4.2 Vs 1.4.3
Comment by Aleksey Kondratenko [ 08/Jul/14 ]
Given that problem seems to be sdk version specific and there's no evidence yet that it's something ns_server may cause, I'm bouncing this ticket back.
Comment by Matt Ingenthron [ 08/Jul/14 ]
Check the release notes for 1.4.3. We had an issue where there would be authentication problems, including timeouts and problems with retries. This was introduced in changes in 1.4.0 and fixed in 1.4.3. There's no direct evidence, but that sounds like a likely cause.
Comment by Matt Ingenthron [ 08/Jul/14 ]
Parag: not sure why you assigned this to me. I don't think there's any action for me. Reassigning back. I was just giving you additional information.
Comment by Wei-Li Liu [ 08/Jul/14 ]
Re-run the test with 1.4.2 SDK against 3.0.0 Server with just 4GB RAM per node ( comparing to my initial test with 16GB RAM per node)
The test result is much better. Not seeing the errors and operations rate never drop significantly
http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-3.0.0-918/Rb2In-HYBRID/07-08-14/074980/d5e2508529f1ad565ee38c9b8ab0c75b-MC.html
 
Comment by Parag Agarwal [ 08/Jul/14 ]
Sorry, Matt ! Should we close this as a documentation for release notes?
Comment by Matt Ingenthron [ 08/Jul/14 ]
Given that we believe it's an issue in a different project (JCBC), fixed and release noted there, I think we can just close this. The only other possible action, up to you and your management, is trying to verify this is the actual cause a bit more thoroughly.




[MB-11435] 1500-2000% CPU utilization by beam.smp on source nodes in XDCR scenarios (was ~500% in 2.5.x) Created: 16/Jun/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Pavel Paulau
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2680 v2 (40 vCPU)
Memory = 256 GB
Disk = RAID 10 SSD

Attachments: PNG File active_vbreps.png     PNG File beam.smp_cpu.png     PNG File replication_rate.png    
Issue Links:
Relates to
relates to MB-11434 600-800% CPU utilization by memcached... Open
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/xdcr-5x5/290/artifact/
Is this a Regression?: Yes

 Description   
5 -> 5 UniDir, 2 buckets x 500M x 1KB, 10K SETs/sec/node, LAN

 Comments   
Comment by Aleksey Kondratenko [ 16/Jun/14 ]
One additional thought to try is R14 build. We were promised occasional R14 build(s) for perf investigations.
Comment by Pavel Paulau [ 16/Jun/14 ]
Thanks for suggestion!

There are many things to try. I will keep it assigned to me until all discussed experiments are done.
The primary purpose of this ticket is visibility of existing problems.
Comment by Pavel Paulau [ 17/Jun/14 ]
As I mentioned before it is critical to optimize CPU utilization due to MB-9822.

Otherwise scheduler collapse causes high variation in latency and replication throughput. E.g., upper chart represents node with 100-200% CPU util, bottom charts represents node with 1500-2000% CPU util. As results overall replication latency goes from several milliseconds to several seconds.
Comment by Pavel Paulau [ 19/Jun/14 ]
Interestingly, in WAN tests CPU goes down to ~500% (due to lower replication rate / higher latency). But memcached only drops to 400-500%.
Comment by Pavel Paulau [ 19/Jun/14 ]
As discussed, please think about tunable batched / buffered replication.
Comment by Pavel Paulau [ 21/Jun/14 ]
With 100ms delay some nodes still demonstrate relatively high CPU utilization - 1200-1400% (other ~400%). + 600% by memcached
With 500ms peak utilization decreases to 800% and most nodes utilize only 250-300%. +500% by memcached

Please notice that 500 ms is typical latency for 2.5.1 in this scenario.
Comment by Aleksey Kondratenko [ 24/Jun/14 ]
Another possible place where we can gain a bit is better work allocation into xdcr workers. Not sure it will be a win however. Can you try lowering workers count to 1 to see whether it might by it ? Note it's not concurrent vbucket reps but this setting: https://github.com/couchbase/ns_server/blob/master/src/xdc_settings.erl#L76
Comment by Aleksey Kondratenko [ 24/Jun/14 ]
/diag/eval snippet for lowering workers count is:

ns_config:set({xdcr, worker_processes}, 1).
Comment by Cihan Biyikoglu [ 25/Jun/14 ]
Pavel, could you explain what the replication rate graph is plotting at the top and bottom. are those 2.5 and 3.0?
thanks
Comment by Pavel Paulau [ 25/Jun/14 ]
No, see https://www.couchbase.com/issues/browse/MB-11435?focusedCommentId=91208 .
Comment by Cihan Biyikoglu [ 25/Jun/14 ]
to clarify the replication rate is higher when CPU utilization is lower (top chart on replication rate vs bottom chart). So CPU time isn't spent replicating. it is spent doing some other work. is that right?
Comment by Aleksey Kondratenko [ 30/Jun/14 ]
It looks like a lot of it is due to logging overhead. If you have time for one pass, please do one run with xdcr and ns_server log levels bumped to error. Either via static_config or via diag eval: "ale:set_logelevel(ns_server, error)."
Comment by Pavel Paulau [ 30/Jun/14 ]
Will do but only after beta release.
Comment by Aleksey Kondratenko [ 08/Jul/14 ]
We've just merged some patches that should improve CPU usage.
Comment by Pavel Paulau [ 08/Jul/14 ]
I'm actually observing huge improvement after logging changes.




[MB-11661] mem_used in increasing and dropping in basic setup with 5 buckets Created: 07/Jul/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Pavel Paulau Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-928

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2680 v2 (40 vCPU)
Memory = 256 GB
Disk = RAID 10 SSD

Attachments: PNG File bucket-1-mem_used-cluster-wide.png     PNG File bucket-2-mem_used.png     PNG File bucket-4-mem_used.png     PNG File memcached_rss-172.23.100.17.png     PNG File memcached_rss-172.23.100.18.png     PNG File mem_used_2.5.1_vs_3.0.0.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/perf-dev/479/artifact/
Is this a Regression?: Yes

 Description   
2 nodes, 5 buckets, 200K x 1KB docs per bucket, 2K updates per bucket.

You can see that mem_used for bucket-1 increased from ~600M to ~1250MB after 5 hours.

It doesn't look like a fragmentation issue, at least allocator stats don't indicate that:

MALLOC: 1575414200 ( 1502.4 MiB) Bytes in use by application
MALLOC: + 24248320 ( 23.1 MiB) Bytes in page heap freelist
MALLOC: + 77763952 ( 74.2 MiB) Bytes in central cache freelist
MALLOC: + 3931648 ( 3.7 MiB) Bytes in transfer cache freelist
MALLOC: + 27337432 ( 26.1 MiB) Bytes in thread cache freelists
MALLOC: + 7663776 ( 7.3 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 1716359328 ( 1636.8 MiB) Actual memory used (physical + swap)
MALLOC: + 1581056 ( 1.5 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 1717940384 ( 1638.4 MiB) Virtual address space used
MALLOC:
MALLOC: 94773 Spans in use
MALLOC: 36 Thread heaps in use
MALLOC: 8192 Tcmalloc page size

Please notice that actual RAM usage (RSS) is pretty stable.

Another issue is dropping mem_used for bucket-2 and bucket-4 and these errors:

Mon Jul 7 10:24:59.559952 PDT 3: (bucket-2) Total memory in memoryDeallocated() >= GIGANTOR !!! Disable the memory tracker...
Mon Jul 7 10:54:58.109779 PDT 3: (bucket-4) Total memory in memoryDeallocated() >= GIGANTOR !!! Disable the memory tracker...
Mon Jul 7 10:54:58.109779 PDT 3: (bucket-4) Total memory in memoryDeallocated() >= GIGANTOR !!! Disable the memory tracker...


 Comments   
Comment by Matt Ingenthron [ 07/Jul/14 ]
Any time you see GIGANTOR, that indicates a stats underflow. That was added back in the 1.7 days to try to catch these kinds of underflow allocation problems early.
Comment by Pavel Paulau [ 07/Jul/14 ]
Just a comparison with 2.5.1.




[MB-10999] which command is CLI need to test under bin directory Created: 29/Apr/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Thuan Nguyen Assignee: Thuan Nguyen
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
cbbackup
cbenable_core_dumps.sh
cbrestore
cbworkloadgen
couch_dbdump
couch_view_group_cleanup
curl
epmd
gencfu
icu-config
moxi
cbbrowse_logs
cbepctl
cbsasladm
couchbase-cli
couch_dbinfo
couch_view_group_compactor
curl-config
erl
gencnval
typer
cbcollect_info
cbhealthchecker
cbstats
couchbase-server couchjs
couch_view_index_builder
derb
erlc
genctd
makeconv
sigar_port
uconv
cbcompact
cbrecovery
cbtransfer
couch_compact
couch_view_file_merger
couch_view_index_updater
dialyzer
escript
generate_cert
mctimings
sqlite3
vbmap
cbdump-config
cbreset_password
cbvbucketctl
couchdb
couch_view_file_sorter
ct_run
dump-guts
genbrk
genrb
memcached
to_erl

 Comments   
Comment by Wayne Siu [ 30/Apr/14 ]
Please also check the documentation.
Comment by Anil Kumar [ 23/Jun/14 ]
All the CLI mentioned in our documentation here http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#command-line-interface-overview.





[MB-11606] error displayed when node did not mark to collect Created: 01/Jul/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Thuan Nguyen Assignee: Pavel Blagodov
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Ubuntu 12.04 64-bit

Attachments: Zip Archive 192.168.171.148-712014-1459-diag.zip     Zip Archive 192.168.171.149-712014-151-diag.zip     Zip Archive 192.168.171.150-712014-153-diag.zip     Zip Archive 192.168.171.151-712014-154-diag.zip     PNG File ss_2014-07-01_at_2.36.37 PM.png     PNG File ss_2014-07-01_at_2.36.55 PM.png    
Triage: Untriaged
Operating System: Ubuntu 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_3.0.0-914-rel.deb.manifest.xml
Is this a Regression?: No

 Description   
Install couchbase server 3.0.0-914 on 4 ubuntu 12.04 64-bit
Create a 4 nodes cluster
Kill erlang in node 150.
When start cluster wide collectinfo, node 150 is not checked due to node down.
When cluster-wide collectinfo is done, error showing that it could not collect logs from node 150

 Comments   
Comment by Aleksey Kondratenko [ 01/Jul/14 ]
What exactly is a bug in this case ?
Comment by Thuan Nguyen [ 01/Jul/14 ]
Bug is that if node 150 does not check to do collect log, error should not display for node 150 when collect logs are done.
Comment by Aleksey Kondratenko [ 01/Jul/14 ]
Ok. Thanks for finding it.
Comment by Aleksey Kondratenko [ 01/Jul/14 ]
Actually. It's kinda expected. You've selected "all nodes" option and all checkboxes are disabled. So it actually tried to collect all nodes per your choice.
Comment by Thuan Nguyen [ 01/Jul/14 ]
Yes, but in this case, when I select to collect the whole cluster, all nodes should be checked.
Comment by Aleksey Kondratenko [ 07/Jul/14 ]
Lets make sure that checkboxes reflect "all nodes" option.




[MB-11383] warmup_min_items_threshold setting is not honored correctly in 3.0 warmup. Created: 10/Jun/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Venu Uppalapati Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
Steps to reproduce:

1)In 3.0 node, create default bucket and load 10,000 items using cbworkloadgen.
2)Run below at command line:
curl -XPOST -u Administrator:password -d 'ns_bucket:update_bucket_props("default", [{extra_config_string, "warmup_min_items_threshold=1"}]).' http://127.0.0.1:8091/diag/eval
3)Restart node for setting to take effect. Restart again for warmup with setting.
4)Issue, ./cbstats localhost:11210 raw warmup:
ep_warmup_estimated_key_count: 10000
ep_warmup_value_count: 1115
5)If I repeat above steps on 2.5.1 node I get:
ep_warmup_estimated_key_count: 10000
ep_warmup_value_count: 101

 Comments   
Comment by Abhinav Dangeti [ 08/Jul/14 ]
Likely because of parallelization. Could you tell me the time taken for warmup for the same scenario in 2.5.1 and 3.0.0.




[MB-11237] DCP(UPR) stats should report #errors under the web console Created: 28/May/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Cihan Biyikoglu Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
we should consider adding in a error/sec counter to the stats category under the web console. the counter would report the # errors being seen through the DCP protocol to indicate issues with communication among nodes

 Comments   
Comment by Anil Kumar [ 10/Jun/14 ]
Triage - June 10 2014 Anil




[MB-11643] Incoming workload suffers when XDCR enabled Created: 03/Jul/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, DCP
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Pavel Paulau
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Running the "standard sales" demo that puts a 50/50 workload of about 80k ops/sec across 4 nodes of m1.xlarge, 1 bucket 1 replica.

Turning on XDCR from this one bucket to a new 4 node cluster drops the ops/sec down to 60k.

Assigning to Pavel as it falls into the performance area and we would likely be best served if this behavior was reproduced and tracked.

 Comments   
Comment by Pavel Paulau [ 03/Jul/14 ]
Just a couple questions:
-- what is docs size?
-- was that east-west replication?
-- uni-dir or bi-dir?
Comment by Perry Krug [ 03/Jul/14 ]
-1kb
-same AWS region
-uni-dir




[MB-11326] [memcached] Function call argument is an uninitialised value in upr_stream_req_executor Created: 05/Jun/14  Updated: 08/Jul/14

Status: In Progress
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Dave Rigby Assignee: Trond Norbye
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File report-3a6911.html    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Bug reported by the clang static analyzer.

Description: Function call argument is an uninitialized value
File: /Users/dave/repos/couchbase/server/source/memcached/daemon/memcached.c upr_stream_req_executor()
Line: 4242

See attached report.

From speaking to Trond offline he believes that it shouldn't be possible to enter upr_stream_req_executor() with c->aiostat == ENGINE_ROLLBACK (which is what triggers this error) - in which case we should just add a suitable assert() to squash the warning.

 Comments   
Comment by Dave Rigby [ 20/Jun/14 ]
http://review.couchbase.org/#/c/38560/
Comment by Wayne Siu [ 08/Jul/14 ]
Hi Trond,
The patchset is ready for review.




[MB-11434] 600-800% CPU utilization by memcached on source nodes in XDCR scenarios (was <100% in 2.5.x) Created: 16/Jun/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2680 v2 (40 vCPU)
Memory = 256 GB
Disk = RAID 10 SSD

Attachments: PNG File memcached_cpu.png    
Issue Links:
Relates to
relates to MB-11405 ~2400% CPU consumption by memcached d... Open
relates to MB-11435 1500-2000% CPU utilization by beam.sm... Open
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/xdcr-5x5/290/artifact/
Is this a Regression?: Yes

 Description   
5 -> 5 UniDir, 2 buckets x 500M x 1KB, 10K SETs/sec/node, LAN

 Comments   
Comment by Pavel Paulau [ 26/Jun/14 ]
I have tried the same workload but without XDCR: CPU utilization is 300-400% (70-80% in 2.5.1).
Essentially there is a huge not related to XDCR overhead.

Also in MB-11435 we tried to slow down replication. It does help with Erlang CPU utilization but memcached consumption never drops below 500%.




[MB-11587] [system test] Rebalance exited with reason {unexpected_exit, {upr_wait_for_data_move_failed, ...{error,no_stats_for_this_vbucket}}}} Created: 28/Jun/14  Updated: 08/Jul/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-893

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
rebalance 3+1 node on destination cluster

seen the same stack in MB-10936(couchbase-bucket team) but in my case I don't see (upr_add_stream) vbucket = 0 opaque = 0x20 status = 0x23 (rollback) in logs so don't know whose this issue



Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.25286.9>,
{upr_wait_for_data_move_failed,"AbRegNums",
510,'ns_1@172.23.105.160',
['ns_1@172.23.105.159','ns_1@172.23.105.207'],
{error,no_stats_for_this_vbucket}}}}




[rebalance:debug,2014-06-28T13:18:40.592,ns_1@172.23.105.159:<0.25286.9>:ns_single_vbucket_mover:wait_upr_data_move:313]Will wait for backfill on all opened streams for bucket = "AbRegNums" partition 510 src node = 'ns_1@172.23.105.160' dest nodes = ['ns_1@172.23.105.159',
                                                                                                                                   'ns_1@172.23.105.207']
[ns_server:debug,2014-06-28T13:18:40.593,ns_1@172.23.105.159:<0.25281.9>:upr_commands:open_connection:53]Open consumer connection "replication:ns_1@172.23.105.160->ns_1@172.23.105.159:AbRegNums" on socket #Port<0.49741>
[ns_server:debug,2014-06-28T13:18:40.594,ns_1@172.23.105.159:<0.25282.9>:upr_commands:open_connection:53]Open producer connection "replication:ns_1@172.23.105.160->ns_1@172.23.105.159:AbRegNums" on socket #Port<0.49742>
[rebalance:info,2014-06-28T13:18:40.596,ns_1@172.23.105.159:<0.25215.9>:janitor_agent:get_vbucket_high_seqno:449]AbRegNums: Doing get_vbucket_high_seqno call for vbucket 1022 on ns_1@172.23.105.206
[ns_server:debug,2014-06-28T13:18:40.595,ns_1@172.23.105.159:upr_replicator-AbRegNums-ns_1@172.23.105.160<0.7020.3>:upr_replicator:terminate:100]Terminating with reason {{badmatch,
                          [{<0.25277.9>,
                            {done,exit,
                             {normal,
                              {gen_server,call,
                               [<0.7021.3>,
                                {maybe_close_stream,511},
                                infinity]}},
                             [{gen_server,call,3,
                               [{file,"gen_server.erl"},{line,188}]},
                              {upr_replicator,'-handle_call/3-fun-1-',2,
                               [{file,"src/upr_replicator.erl"},{line,119}]},
                              {upr_replicator,'-spawn_and_wait/1-fun-0-',1,
                               [{file,"src/upr_replicator.erl"},
                                {line,189}]}]}}]},
                         [{misc,sync_shutdown_many_i_am_trapping_exits,1,
                           [{file,"src/misc.erl"},{line,1475}]},
                          {upr_replicator,spawn_and_wait,1,
                           [{file,"src/upr_replicator.erl"},{line,211}]},
                          {upr_replicator,handle_call,3,
                           [{file,"src/upr_replicator.erl"},{line,118}]},
                          {gen_server,handle_msg,5,
                           [{file,"gen_server.erl"},{line,585}]},
                          {proc_lib,init_p_do_apply,3,
                           [{file,"proc_lib.erl"},{line,239}]}]}. Nuked connection "replication:ns_1@172.23.105.160->ns_1@172.23.105.159:AbRegNums" with result [ok,
                                                                                                                                                                 ok].
[error_logger:error,2014-06-28T13:18:40.596,ns_1@172.23.105.159:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]** Generic server 'upr_replicator-AbRegNums-ns_1@172.23.105.160' terminating
** Last message in was {takeover,511}
** When Server state == {state,<0.7022.3>,<0.7021.3>,
                               "replication:ns_1@172.23.105.160->ns_1@172.23.105.159:AbRegNums",
                               'ns_1@172.23.105.160',"AbRegNums"}
** Reason for termination ==
** {{badmatch,
        [{<0.25277.9>,
          {done,exit,
              {normal,
                  {gen_server,call,
                      [<0.7021.3>,{maybe_close_stream,511},infinity]}},
              [{gen_server,call,3,[{file,"gen_server.erl"},{line,188}]},
               {upr_replicator,'-handle_call/3-fun-1-',2,
                   [{file,"src/upr_replicator.erl"},{line,119}]},
               {upr_replicator,'-spawn_and_wait/1-fun-0-',1,
                   [{file,"src/upr_replicator.erl"},{line,189}]}]}}]},
    [{misc,sync_shutdown_many_i_am_trapping_exits,1,
         [{file,"src/misc.erl"},{line,1475}]},
     {upr_replicator,spawn_and_wait,1,
         [{file,"src/upr_replicator.erl"},{line,211}]},
     {upr_replica