[MB-10869] broken look of add server dialog Created: 16/Apr/14  Updated: 21/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Aleksey Kondratenko Assignee: Pavel Blagodov
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
http://i.imgur.com/MnyevAk.png (chrome on GNU/Linux with fonts from windows)

and

http://i.imgur.com/t6BUcN1.png (firefox)



 Comments   
Comment by Pavel Blagodov [ 21/Apr/14 ]
http://review.couchbase.org/36097




[MB-10224] Document editor - Cannot click to insert cursor after scrolling Created: 17/Feb/14  Updated: 21/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Brian Shumate Assignee: Pavel Blagodov
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Couchbase Server 2.1.0
Couchbase Server 2.2.0

Attachments: PNG File Couchbase_Console__2_2_0_-12.png     PNG File Couchbase_Console__2_2_0_-6.png    
Triage: Untriaged

 Description   
A customer reported that when attempting to edit a document in the web console UI, the mouse cursor cannot be inserted by clicking if the document is large enough to require scrolling in the web browser and scrolling has taken place prior to clicking.

Expected Behavior
---------------------------

A click anywhere in the edit text area (.CodeMirror-lines div) should move cursor to that position, and this does occur provided no scrolling in the browser takes place first.

Actual Behavior
-----------------------

Clicking in the text area will move the cursor to the desired position, but after scrolling down in the browser, a click in the text area does nothing, and the cursor must be positioned using the arrow keys instead.

Steps to Reproduce
----------------------------

0. Create a large (1.5KB+ or so) sized document.
1. Access the document.
2. Scroll down in the web browser to the bottom of the document.
3. Click anywhere in the edit text area.
4. Mouse cursor is not inserted at the position clicked.

Not sure if this is a CodeMirror bug, but initial research shows that there are indeed numerous cursor positioning bugs associated with the CodeMirror project at any given time, so perhaps the issue lies there.

Attached are some screenshots to illustrate the behavior.


 Comments   
Comment by Brian Shumate [ 17/Feb/14 ]
I should also add: the customer reported this behavior on 2.1.0, but I was able to reproduce the issue on both 2.1.0 and 2.2.0.
Comment by Pavel Blagodov [ 21/Apr/14 ]
could you provide me more information about used OS/browser




[MB-10478] cluster broken after some steps with gracefull failover: Got unhandled error: Uncaught TypeError: Cannot read property 'active' of undefined Created: 17/Mar/14  Updated: 21/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Pavel Blagodov
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-443

Attachments: PNG File MB-10478.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
I 'played' with gracefull failover feature and after some steps ( try to provide the exact steps later)
nothing happens when I click "Create New Data Bucket"

Client-side error-report for user undefined on node 'ns_1@10.3.4.144':
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36

(repeated 5 times) menelaus_web102 ns_1@10.3.4.144 06:00:47 - Mon Mar 17, 2014
Client-side error-report for user undefined on node 'ns_1@10.3.4.144':
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36
Got unhandled error: Uncaught TypeError: Cannot read property 'active' of undefined
At: http://10.3.4.144:8091/js/cells.js:64
Backtrace:
Function: collectBacktraceViaCaller
Args:

---------
Function: appOnError
Args:
"Uncaught TypeError: Cannot read property 'active' of undefined"
"http://10.3.4.144:8091/js/cells.js"
64
36
{}
---------

(repeated 1 times) menelaus_web102 ns_1@10.3.4.144 05:59:47 - Mon Mar 17, 2014

 Comments   
Comment by Andrei Baranouski [ 17/Mar/14 ]
UI is in a locked up and doesn't display any statistics


https://s3.amazonaws.com/bugdb/jira/MB-10478/0c9731ce/10.3.4.144-3172014-634-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10478/0c9731ce/10.3.4.145-3172014-638-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10478/0c9731ce/10.3.4.146-3172014-636-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10478/0c9731ce/10.3.4.147-3172014-640-diag.zip
Comment by Aleksey Kondratenko [ 17/Mar/14 ]
Something is not right only on client-side. And therefore it's unclear how this could happen. I'll need you to reproduce this again and post UI logs as well as UI network logs from Developer Tools.

Or even better if you could reliably reproduce this issue.
Comment by Aleksey Kondratenko [ 17/Mar/14 ]
CC-ed Pavel too.

Even better would be if we could finally have full stack traces in those client-side exception reports. Back in 2009 when this code was written there was no easy cross-browser way to do that. But who knows maybe modern HTML5 crowd finally built such mechanism.
Comment by Maria McDuff [ 25/Mar/14 ]
Andrei,

any update on repro'ng this issue?
Comment by Aleksey Kondratenko [ 14/Apr/14 ]
Lets discuss some improved client-side error capturing tomorrow.

Today's problem is that we do have indication that something happened, but we have no backtrace at all. There must be better way of capturing errors. Perhaps some library.
Comment by Pavel Blagodov [ 21/Apr/14 ]
I can't reproduce




[MB-10914] {UPR} ::Control connection to memcached on 'ns_1@IP' disconnected with some other crashes before upr_replicator:init/1, upr_proxy:init/1, replication_manager:init/1 Created: 21/Apr/14  Updated: 21/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Meenakshi Goel Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-594-rel

Attachments: Text File log.txt    
Triage: Triaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Link:
http://qa.sc.couchbase.com/job/ubuntu_x64--65_01--view_query_negative-P1/17/console

Notes:
Failed Test will not deterministically reproduce the error.
No Core dumps are observed on the machines.
Please refer log file log.txt attached.

Logs:
[user:info,2014-04-21T1:42:50.417,ns_1@172.23.106.196:ns_memcached-default<0.22998.7>:ns_memcached:terminate:821]Control connection to memcached on 'ns_1@172.23.106.196' disconnected: {badmatch,
                                                                        {error,
                                                                         couldnt_connect_to_memcached}}
[error_logger:error,2014-04-21T1:42:50.419,ns_1@172.23.106.196:error_logger<0.6.0>:ale_error_logger_handler:log_msg:119]** Generic server <0.22998.7> terminating
** Last message in was {'EXIT',<0.23030.7>,
                           {badmatch,{error,couldnt_connect_to_memcached}}}
** When Server state == {state,1,0,0,
                               {[],[]},
                               {[],[]},
                               {[],[]},
                               connected,
                               {1398,69765,399403},
                               "default",#Port<0.424036>,
                               {interval,#Ref<0.0.28.134215>},
                               [{<0.23031.7>,#Ref<0.0.28.136461>},
                                {<0.23029.7>,#Ref<0.0.28.134508>},
                                {<0.23032.7>,#Ref<0.0.28.134245>}],
                               []}
** Reason for termination ==
** {badmatch,{error,couldnt_connect_to_memcached}}

[error_logger:error,2014-04-21T1:42:50.420,ns_1@172.23.106.196:error_logger<0.6.0>:ale_error_logger_handler:log_report:115]
=========================CRASH REPORT=========================
  crasher:
    initial call: ns_memcached:init/1
    pid: <0.22998.7>
    registered_name: []
    exception exit: {badmatch,{error,couldnt_connect_to_memcached}}
      in function gen_server:init_it/6
    ancestors: ['single_bucket_sup-default',<0.22992.7>]
    messages: []
    links: [<0.23029.7>,<0.23031.7>,<0.23032.7>,<0.307.0>,<0.22993.7>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 196418
    stack_size: 24
    reductions: 22902
  neighbours:
    neighbour: [{pid,<0.23032.7>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'single_bucket_sup-default',<0.22992.7>]},
                  {messages,[]},
                  {links,[<0.22998.7>,#Port<0.424043>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,46368},
                  {stack_size,24},
                  {reductions,4663}]
    neighbour: [{pid,<0.23031.7>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'single_bucket_sup-default',<0.22992.7>]},
                  {messages,[]},
                  {links,[<0.22998.7>,#Port<0.424044>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,10946},
                  {stack_size,24},
                  {reductions,44938}]
    neighbour: [{pid,<0.23029.7>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'single_bucket_sup-default',<0.22992.7>]},
                  {messages,[]},
                  {links,[<0.22998.7>,#Port<0.424045>]},
  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,10946},
                  {stack_size,24},
                  {reductions,11546}]


Uploading logs.

 Comments   
Comment by Meenakshi Goel [ 21/Apr/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-10914/fd10746b/172.23.106.196-4212014-20-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10914/e390d261/172.23.106.197-4212014-22-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10914/29d793bb/172.23.106.198-4212014-24-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10914/d5568c08/172.23.106.199-4212014-26-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10914/22245367/172.23.106.200-4212014-29-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-10914/5fabfeeb/172.23.106.201-4212014-211-diag.zip




[MB-10417] we might be able to remove the binary document from UI Created: 11/Mar/14  Updated: 21/Apr/14  Resolved: 21/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Pavel Blagodov
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File binary_doc_search_result.png    

 Description   
currently we are not able to edit(save/save as) or delete binary documents from UI.
but we do not have any reason not be able to remove even the binary document

 Comments   
Comment by Aleksey Kondratenko [ 11/Mar/14 ]
Please elaborate what exactly is bug here.
Comment by Andrei Baranouski [ 11/Mar/14 ]
I think that button "delete" should be enabled here and the user should be able to remove binary document?
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
converting to bug
Comment by Pavel Blagodov [ 18/Apr/14 ]
http://review.couchbase.org/36017




[MB-10792] checkpoint commit failure at start of replication Created: 08/Apr/14  Updated: 21/Apr/14  Resolved: 21/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Sangharsh Agarwal
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 547

Triage: Untriaged
Operating System: Centos 32-bit
Is this a Regression?: Unknown

 Description   
http://qa.hq.northscale.net/job/ubuntu_x64--37_02--biXDCR-P1/15/consoleFull -> Test case 9.


./testrunner -i /tmp/ubuntu-64-2.0-biXDCR-all.ini get-cbcollect-info=True -t xdcr.biXDCR.bidirectional.load_with_failover,replicas=1,items=10000,ctopology=chain,rdirection=bidirection,standard_buckets=1,expires=60,doc-ops=create-update-delete,doc-ops-dest=create-update,failover=destination,replication_type=xmem,GROUP=P0;xmem

Test is failed with number of items mismatch on the server:
[2014-04-07 05:51:08,257] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:13,291] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:13,306] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:18,356] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:18,391] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:23,439] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:23,498] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:28,556] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:28,642] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:33,686] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
[2014-04-07 05:51:33,730] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket

[Test Steps]
1. Create 3-3 nodes SRC and DEST clusters.
2. Setup bi-directional xmem TAP based XDCR on default and standard_bucket_1. Checkpoint interval is set to 120 seconds.
3. Load 10000 items on both the buckets at both SRC and DEST cluster.
4. Perform failover/rebalance-out at destination side for one node.
5. Perform 30% update and delete at SRC side.During update, set expiration time to 60 seconds.
6. Perform 30% update at destination side. During update, set expiration time to 60 seconds.
7. Expected 11000 items each side.
 
Test is failed on Step-7, 10997 items are there on Source cluster (10.3.121.56) on default bucket.


I can see there lot of error in xdcr on Cluster 10.3.121.59:

[xdcr:error,2014-04-07T5:49:53.853,ns_1@10.3.121.59:<0.27070.18>:xdc_vbucket_rep:start_replication:1000]checkpoint commit failure at start of replication for vb 813
[xdcr:error,2014-04-07T5:49:53.854,ns_1@10.3.121.59:<0.27070.18>:xdc_vbucket_rep:terminate:534]Replication (XMem mode) `3f3e8f7fe887b7288e0e31ee0098cc72/default/default` (`default/813` -> `http://*****@10.3.4.244:8092/default%2f813%3bf806f153aba876ebc86ca21ceaceb8ce`) failed.Please see ns_server debug log for complete state dump
[xdcr:error,2014-04-07T5:49:54.655,ns_1@10.3.121.59:<0.26952.18>:xdc_vbucket_rep_ckpt:do_checkpoint_old:220]Checkpointing failed unexpectedly (or could be network problem): {local_vbuuid_mismatch,
                                                                  <<"189581637071222">>,
                                                                  <<"83846604416697">>}
[xdcr:error,2014-04-07T5:49:54.661,ns_1@10.3.121.59:<0.26952.18>:xdc_vbucket_rep:start_replication:1000]checkpoint commit failure at start of replication for vb 833
[xdcr:error,2014-04-07T5:49:54.661,ns_1@10.3.121.59:<0.26952.18>:xdc_vbucket_rep:terminate:534]Replication (XMem mode) `3f3e8f7fe887b7288e0e31ee0098cc72/default/default` (`default/833` -> `http://*****@10.3.4.244:8092/default%2f833%3bf806f153aba876ebc86ca21ceaceb8ce`) failed.Please see ns_server debug log for complete state dump
[xdcr:error,2014-04-07T5:49:55.040,ns_1@10.3.121.59:<0.26991.18>:xdc_vbucket_rep_ckpt:do_checkpoint_old:220]Checkpointing failed unexpectedly (or could be network problem): {local_vbuuid_mismatch,
                                                                  <<"31693391851147">>,
                                                                  <<"272915716248749">>}


 Comments   
Comment by Sangharsh Agarwal [ 08/Apr/14 ]
[Source]
10.3.121.56 : https://s3.amazonaws.com/bugdb/jira/MB-10792/029fee43/10.3.121.56-472014-68-diag.zip
10.3.4.244 : https://s3.amazonaws.com/bugdb/jira/MB-10792/6774f77c/10.3.4.244-472014-610-diag.zip
10.3.121.57 : https://s3.amazonaws.com/bugdb/jira/MB-10792/5e45c4f4/10.3.121.57-472014-69-diag.zip

[Destination]
10.3.121.59 : https://s3.amazonaws.com/bugdb/jira/MB-10792/a09774a8/10.3.121.59-472014-611-diag.zip
10.3.121.60 : https://s3.amazonaws.com/bugdb/jira/MB-10792/9296745f/10.3.121.60-472014-613-diag.zip
10.3.121.61 : https://s3.amazonaws.com/bugdb/jira/MB-10792/5794ff21/10.3.121.61-472014-613-diag.zip -> This node was failed over.
Comment by Aleksey Kondratenko [ 11/Apr/14 ]
Does exact same test passes on 2.5.1 ?
Comment by Aleksey Kondratenko [ 11/Apr/14 ]
How about 2.2 ? 2.1.1?
Comment by Aleksey Kondratenko [ 11/Apr/14 ]
Cannot make any sense of this test.

But most importantly not seeing required artifacts for debugging xdcr issues.

Xdcr checkpoint errors are unlikely related to data loss.
Comment by Sangharsh Agarwal [ 13/Apr/14 ]
Alk,
    Please confirm the artifacts you need here:

1. Data files on each server? e.g. /opt/couchbase/var/lib/couchbase/data?
2. __all_docs__ of each Source and Destination cluster?
3. cbcollectinfo logs of each node on cluster?

Any things else you need here?

Can you please suggest improvement in the test steps here? What is wrong and how we can improve it because it is one of the oldest tests in XDCR so far.
Comment by Aleksey Kondratenko [ 14/Apr/14 ]
Work with Aruna on what info to provide.
Comment by Aruna Piravi [ 14/Apr/14 ]
Sangharsh, the test looks ok to me. We are loading 10K items on both sides so on bi-dir XDCR, you would expect 20K on each side. We then do some updates, expiration and deletes on keys loaded to clusters. KV store expects 11k on both sides.

However, it is important to note where the mismatch is occurring. Between the two clusters or between kv store and the clusters' item counts? If it is between two clusters, we know it is a data loss issue. However if the clusters' item counts match but not with that of kv store, then we want to look into the specifics of the test. So always pls make sure you add items counts for both clusters.

And as discussed, those are the things we need to add to all data loss issues in xdcr. Pls note _all_docs is not functional anymore, for manual tests use views, for test runner implementation, we need to do getMeta() on all keys from cluster1, compare it against all keys in cluster2 i.e, force control to verify revids and log keys that are not present in the other cluster when item counts don't match between clusters(not just stop the tests there).
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Waiting for all mandatory data
Comment by Sangharsh Agarwal [ 17/Apr/14 ]
Alk, There were two issue in this bug:

1. Items mismatch. -> For items mismatch, I will log another issue with artifacts, as I am doing required changes in the test to collect useful information.
2. Checkpoint failures in the logs. (As mentioned in the description) -> Can you please analyze this from the logs, and if already fixed ,you can close this issue.
Comment by Sangharsh Agarwal [ 17/Apr/14 ]
Updated the summary of the bug for checkpoint failures.
Comment by Aleksey Kondratenko [ 18/Apr/14 ]
checkpointing errors are expected in a number of cases and is not a bug. In this case it appears to be caused by move of vbuckets that once belonged to node .59 and but then were rebalanced to .61 and failed over out of there.

Comment by Sangharsh Agarwal [ 21/Apr/14 ]
Marking as Cannot-reproduced.




[MB-10913] Mutations (1 deleted items) not replicated, caused items mismatch on destination cluster Created: 21/Apr/14  Updated: 21/Apr/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0 531, Centos 32 bit

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-10913/6715c078/10.3.121.65-4182014-422-diag.zip
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-10913/b39f3625/10.3.121.65-4182014-420-couch.tar.gz
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-10913/4eb29318/10.3.3.207-4182014-420-couch.tar.gz
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-10913/f81636e3/10.3.3.207-4182014-424-diag.zip
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-10913/106b5aed/10.3.3.209-4182014-420-couch.tar.gz
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-10913/d8cb5a74/10.3.3.209-4182014-425-diag.zip
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-10913/7ffea465/10.3.3.210-4182014-420-couch.tar.gz
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-10913/b8104406/10.3.3.210-4182014-426-diag.zip


[Destination]
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-10913/88b8d050/10.3.4.177-4182014-427-diag.zip
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-10913/e6388a6b/10.3.4.177-4182014-420-couch.tar.gz
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-10913/289ad162/10.3.121.62-4182014-429-diag.zip
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-10913/aec8a352/10.3.121.62-4182014-420-couch.tar.gz
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-10913/316af914/10.3.2.204-4182014-429-diag.zip
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-10913/c6294b49/10.3.2.204-4182014-421-couch.tar.gz
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-10913/619c133a/10.3.3.208-4182014-420-couch.tar.gz
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-10913/807ad34c/10.3.3.208-4182014-428-diag.zip
Is this a Regression?: Unknown

 Description   
[Jenkins]
http://qa.hq.northscale.net/job/centos_x64--31_01--uniXDCR-P1/29/consoleFull

[Test]
./testrunner -i /tmp/ubuntu-64-2.0-uniXDCR.ini GROUP=CHAIN,num_items=50000,get-cbcollect-info=True -t xdcr.uniXDCR.unidirectional.load_with_async_ops,items=100000,rdirection=unidirection,ctopology=chain,expires=60,standard_buckets=1,sasl_buckets=2,default_bucket=False,doc-ops=delete,GROUP=CHAIN;P1


[Test Logs]
[2014-04-18 03:54:11,221] - [rest_client:790] INFO - adding remote cluster hostname:10.3.4.177:8091 with username:password Administrator:password name:cluster1
[2014-04-18 03:54:11,295] - [rest_client:836] INFO - starting replication type:continuous from sasl_bucket_1 to sasl_bucket_1 in the remote cluster cluster1
[2014-04-18 03:54:11,506] - [xdcrbasetests:355] INFO - sleep for 5 secs. ...
[2014-04-18 03:54:16,513] - [rest_client:836] INFO - starting replication type:continuous from sasl_bucket_2 to sasl_bucket_2 in the remote cluster cluster1
[2014-04-18 03:54:16,653] - [xdcrbasetests:355] INFO - sleep for 5 secs. ...
[2014-04-18 03:54:21,659] - [rest_client:836] INFO - starting replication type:continuous from standard_bucket_1 to standard_bucket_1 in the remote cluster cluster1
[2014-04-18 03:54:21,824] - [xdcrbasetests:355] INFO - sleep for 5 secs. ...
..
..
..
2014-04-18 04:11:16,490] - [task:420] WARNING - Not Ready: curr_items 70001 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', sasl_bucket_1 bucket
[2014-04-18 04:11:16,542] - [task:420] WARNING - Not Ready: vb_active_curr_items 70001 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', sasl_bucket_1 bucket
[2014-04-18 04:11:21,585] - [task:420] WARNING - Not Ready: curr_items 70001 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', sasl_bucket_1 bucket
[2014-04-18 04:11:21,622] - [task:420] WARNING - Not Ready: vb_active_curr_items 70001 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', sasl_bucket_1 bucket


[Test Steps] - No UPR.
1. Create 4-4 Nodes SRC and DEST.
2. Create 1 sasl, 2 standard buckets.
3. Configure XMEM Uni XDCR for both buckets.
4. Load 1M Items each bucket.
5. Delete 30% items on each bucket.
6. expected items on Destination = 70000 while 70001 items on destination cluster (for sasl_bucket_1). Following key loadOne97310 has mismatch. Keys shows deleted from Source but not deleted from Destination, also it shows higher sequence number of Source, showing mutations were not replicated properly.

2014-04-18 04:18:16,419] - [task:1169] ERROR - ===== Verifying rev_ids failed for key: loadOne97310 =====
[2014-04-18 04:18:16,419] - [task:1170] ERROR - deleted mismatch: Source deleted:1, Destination deleted:0, Error Count:1
[2014-04-18 04:18:16,420] - [task:1170] ERROR - seqno mismatch: Source seqno:2, Destination seqno:1, Error Count:2
[2014-04-18 04:18:16,421] - [task:1170] ERROR - cas mismatch: Source cas:17748657181694094, Destination cas:17748657181694093, Error Count:3
[2014-04-18 04:18:16,423] - [task:1171] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 17748657181694094, 'flags': 0, 'expiration': 1397818797}
[2014-04-18 04:18:16,424] - [task:1172] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 17748657181694093, 'flags': 0, 'expiration': 0}

Items were properly replicated on other 2 buckets i.e. standard_bucekt_1, sasl_bucket_2. But not sasl_bucket_1.

 Comments   
Comment by Sangharsh Agarwal [ 21/Apr/14 ]
Bucket data-files, cb collect logs, and keys is mentioned.




[MB-10912] curr_items are not expected number of items on destination cluster Created: 21/Apr/14  Updated: 21/Apr/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0 591.

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-10912/3d973e33/10.3.121.65-4182014-327-diag.zip
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-10912/e8c46be1/10.3.121.65-4182014-325-couch.tar.gz
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-10912/1dc784cb/10.3.3.207-4182014-325-couch.tar.gz
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-10912/d6067609/10.3.3.207-4182014-329-diag.zip
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-10912/89d0a503/10.3.3.209-4182014-330-diag.zip
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-10912/eb4804a3/10.3.3.209-4182014-325-couch.tar.gz
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-10912/db73c085/10.3.3.210-4182014-325-couch.tar.gz
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-10912/e225c957/10.3.3.210-4182014-331-diag.zip


[Dest]
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-10912/81aac8cf/10.3.4.177-4182014-325-couch.tar.gz
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-10912/b376051a/10.3.4.177-4182014-332-diag.zip
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-10912/bdd6f5e9/10.3.121.62-4182014-325-couch.tar.gz
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-10912/d4e5d541/10.3.121.62-4182014-334-diag.zip
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-10912/2e8d9c32/10.3.2.204-4182014-334-diag.zip
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-10912/3b707290/10.3.2.204-4182014-325-couch.tar.gz
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-10912/412475b1/10.3.3.208-4182014-325-couch.tar.gz
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-10912/5a455ac3/10.3.3.208-4182014-333-diag.zip
Is this a Regression?: Unknown

 Description   
[Jenkins, test #2]
http://qa.hq.northscale.net/job/centos_x64--31_01--uniXDCR-P1/29/consoleFull

[Test]
./testrunner -i /tmp/ubuntu-64-2.0-uniXDCR.ini GROUP=CHAIN,num_items=50000,get-cbcollect-info=True -t xdcr.uniXDCR.unidirectional.load_with_async_ops,items=100000,rdirection=unidirection,ctopology=chain,doc-ops=delete-delete,GROUP=CHAIN;P1

[Test Logs Duration]
[2014-04-18 03:07:05,150] - [rest_client:790] INFO - adding remote cluster hostname:10.3.4.177:8091 with username:password Administrator:password name:cluster1
[2014-04-18 03:07:05,204] - [rest_client:836] INFO - starting replication type:continuous from default to default in the remote cluster cluster1
[2014-04-18 03:07:05,272] - [xdcrbasetests:355] INFO - sleep for 5 secs. ...

..
..
..
[2014-04-18 03:25:14,590] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket
[2014-04-18 03:25:19,637] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket


[Test Steps] - Non UPR
1. Create 4-4 Nodes SRC and DEST.
2. 1 default bucket.
3. Configure CAPI Uni XDCR for both buckets.
4. Load 1M Items each bucket.
5. delete 30% items on each bucket of Source
6. Wait for stats curr_items to be expected items i.e. 70000 -> Failed, curr_items stat shows 69999 items. 1 items missing.

[2014-04-18 03:21:16,826] - [task:1179] ERROR - Key:loadOne336 Memcached error #1 'Not found': for vbucket :315 to mc 10.3.2.204:11210, Error Count:1

vbucket 315 exists on Source node i.e. 10.3.3.207, from the logs it shows that items was put in the outgoing batch:

[xdcr_trace:debug,2014-04-18T3:07:48.976,ns_1@10.3.3.207:<0.9736.20>:xdc_vbucket_rep_worker:local_process_batch:110]added mutation loadOne36636@31 (rev = 1-..) to outgoing batch
[xdcr_trace:deb



 Comments   
Comment by Sangharsh Agarwal [ 21/Apr/14 ]
One items is not replicated, there is Memcached error for one key i.e. loadOne336 on 10.3.2.204:11210

[2014-04-18 03:21:01,776] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket
[2014-04-18 03:21:06,860] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket
[2014-04-18 03:21:11,902] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket


[2014-04-18 03:21:16,826] - [task:1179] ERROR - Key:loadOne336 Memcached error #1 'Not found': for vbucket :315 to mc 10.3.2.204:11210, Error Count:1


[2014-04-18 03:21:16,955] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket
[2014-04-18 03:21:21,996] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket
[2014-04-18 03:21:27,044] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', default bucket
[2014-04-18 03:21:32,115] - [task:420] WARNING - Not Ready: vb_active_curr_items 69999 == 70000 expected on '10.3.4.177:8091''10.3.3.208:8091''10.3.121.62:8091''10.3.2.204:8091', defau

It seems that key -> loadOne336 is not replicated.
Comment by Sangharsh Agarwal [ 21/Apr/14 ]
Data files are also uploaded along with log files.




[MB-8723] enable dtrace probes on memcached and beam.smp Created: 30/Jul/13  Updated: 21/Apr/14

Status: Reopened
Project: Couchbase Server
Component/s: build, couchbase-bucket
Affects Version/s: 2.1.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Matt Ingenthron Assignee: Trond Norbye
Resolution: Unresolved Votes: 0
Labels: #SmartOS
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
There were some questions about use of dtrace on Mac OS and to provide an example I went to run some old dtrace scripts. It appears we're compiling without dtrace for both memcached and possibly for erlang.

Since they don't cost anything when they're off, it'd be nice to have these enabled.

 Comments   
Comment by Mike Wiederhold [ 31/Jul/13 ]
Trond removed the dtrace stuff (at least in the master branches) since it wasn't used by us and didn't really give us very useful information. You can contact him for more details.
Comment by Matt Ingenthron [ 31/Jul/13 ]
This came up recently in a discussion on performance where folks at Couchbase wanted to use these probes to dig in a bit. Trond: was this removed as part of general cleanup, or at PM direction, or other?
Comment by Trond Norbye [ 31/Jul/13 ]
It was removed as part of the simplifying of our build process before moving to cmake. we didn't have many valuable probes injected earlier, and there wasn't a huge interest of adding new probes. you can still use the function provider if you like..
Comment by Maria McDuff [ 16/Aug/13 ]
matt,

if u agree with trond, can we close this as 'won't fix'?
Comment by Matt Ingenthron [ 16/Aug/13 ]
I don't agree with Trond, actually. There is renewed interest in adding probes from the performance folks (who may be listening). Since we have some now, we should enable them with the new build system?

It isn't high priority, but I don't think it should be closed as won't fix.
Comment by Dustin Sallings (Inactive) [ 16/Aug/13 ]
Do we think the probes weren't very useful themselves relative to the function provider?

We are actively trying to make use of dtrace probes right now (even if only via systemtap). If we can get it to build where it can, I do think that'd be good.
Comment by Trond Norbye [ 23/Aug/13 ]
I'm closing this as "won't fix" now to reduce the number of open "bugs". At this stage in the game it is more important to complete the transition to cmake than supporting USDT in memcached (erlang is built separately and not part of our build system). You should be able to use all the function-providers probes and with some extra work you should be able to get a lot of the information in D anyway (the conn parameter is always the first parameter to the functions etc).

When the transition to cmake is complete we can reopen the bug and get full fledged support for USDT (but we need someone to define them as well ;-))
Comment by Matt Ingenthron [ 23/Aug/13 ]
It seems rather odd to close it as "won't fix" because we'll fix it later to me. Why not just leave it open, potentially lower it's priority? If it's blocked by a change to cmake, then mark that? You may have a reason I'm not aware of, so maybe you can explain.
Comment by Trond Norbye [ 23/Aug/13 ]
The reason for that is to have a manageable list of bugs to work with. Fighting jiras interface to see what I'm supposed to work on and have ton of low priority RFE's popping up isn't helping me.
Comment by Pavel Paulau [ 20/Apr/14 ]
Time to re-open the ticket?

Should we consider view engine as well? Low-hanging fruit for performance improvements...

Also has anybody tried SystemTap or DTrace on Linux _recently_?
Comment by Trond Norbye [ 21/Apr/14 ]
I've never tried SystemTap (all I've read is blogs indicating that it isn't ready for production use yet). If that have changed we should probably look into if we easily could support both.




[MB-10911] TAP UniXDCR (xmem mode), time taken by ep_queue_size to 0 is significant Created: 21/Apr/14  Updated: 21/Apr/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication, storage-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Venu Uppalapati
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0 - 591

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-10911/5ba07ffa/10.3.121.65-4182014-258-diag.zip
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-10911/c70b7506/10.3.3.210-4182014-31-diag.zip
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-10911/a83a7060/10.3.3.207-4182014-259-diag.zip
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-10911/5a07c1b9/10.3.3.209-4182014-30-diag.zip

[Destination]
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-10911/96a92d36/10.3.4.177-4182014-32-diag.zip
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-10911/e4c76759/10.3.3.208-4182014-33-diag.zip
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-10911/69625ebd/10.3.121.62-4182014-33-diag.zip
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-10911/595f521b/10.3.2.204-4182014-34-diag.zip
Is this a Regression?: Unknown

 Description   
[Jenkins]
http://qa.hq.northscale.net/job/centos_x64--31_01--uniXDCR-P1/29/consoleFull

[Test]
./testrunner -i /tmp/ubuntu-64-2.0-uniXDCR.ini GROUP=CHAIN,num_items=50000,get-cbcollect-info=True -t xdcr.uniXDCR.unidirectional.load_with_async_ops,items=100000,rdirection=unidirection,ctopology=chain,doc-ops=update-delete,sasl_buckets=1,replication_type=xmem,GROUP=CHAIN;P0;xmem


[Test Logs]
2014-04-18 02:55:39 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 1748 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:55:41 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 1907 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:55:44 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 1478 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:55:46 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 1569 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:55:49 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 975 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:55:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 898 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:55:54 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 202 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:55:56 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 91 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:55:59 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:01 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:04 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:06 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:09 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:11 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:14 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:16 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:19 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:21 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:24 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:26 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:29 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:32 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:35 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:37 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:40 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:42 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:45 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:47 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:52 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:56:55 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
2014-04-18 02:56:57 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 43 == 0 expected on '10.3.121.65:8091', default bucket
2014-04-18 02:57:00 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: ep_queue_size 4 == 0 expected on '10.3.121.65:8091', sasl_bucket_1 bucket
ERROR
[('testrunner', 331, '<module>', 'result = unittest.TextTestRunner(verbosity=2).run(suite)'), ('/usr/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/usr/lib/python2.7/unittest/suite.py', 70, '__call__', 'return self.run(*args, **kwds)'), ('/usr/lib/python2.7/unittest/suite.py', 108, 'run', 'test(result)'), ('/usr/lib/python2.7/unittest/case.py', 391, '__call__', 'return self.run(*args, **kwds)'), ('/usr/lib/python2.7/unittest/case.py', 327, 'run', 'testMethod()'), ('pytests/xdcr/uniXDCR.py', 42, 'load_with_async_ops', 'self._wait_for_stats_all_buckets(self.src_nodes)'), ('pytests/xdcr/xdcrbasetests.py', 1241, '_wait_for_stats_all_buckets', 'is_verified = self._poll_for_condition(verify)'), ('pytests/xdcr/xdcrbasetests.py', 901, '_poll_for_condition', 'return self._poll_for_condition_rec(condition, interval, num_itr)'), ('pytests/xdcr/xdcrbasetests.py', 907, '_poll_for_condition_rec', 'if condition():'), ('pytests/xdcr/xdcrbasetests.py', 1234, 'verify', 'task.result(timeout)'), ('lib/tasks/future.py', 162, 'result', 'self.set_exception(TimeoutError())'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
Fri Apr 18 02:57:01 2014

[Test Steps]
1. Create 4-4 Nodes SRC and DEST.
2. Create 1 sasl, 1 default bucket.
3. Configure XMEM Uni XDCR for both buckets.
4. Load 1M Items each bucket.
5. Update 30%, delete 30% items on each bucket.
6. Wait for stats ep_queue_size to 0 -> Timeout here.

 Comments   
Comment by Sangharsh Agarwal [ 21/Apr/14 ]
Installed using , xdcr_upr=False.

python scripts/install.py -i /tmp/ubuntu-64-2.0-uniXDCR.ini -p version=3.0.0-591-rel,product=cb,parallel=True,get-logs=True,xdcr_upr=False
Comment by Sangharsh Agarwal [ 21/Apr/14 ]
Alk, let me know if other artifiacts needed here. Checked the ns_server.debug.log, at node "10.3.121.65:8091", but couldn't found any error on logs.

Seems persistence layer is slow to drain items queue to disk.
Comment by Sangharsh Agarwal [ 21/Apr/14 ]
Venu, Can you please check from ep_engine point of view.




[MB-10908] beam.smp RSS grows to 50GB during delta recovery causing OOM killer invocation and rebalance failure Created: 19/Apr/14  Updated: 20/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Pavel Paulau Assignee: Aliaksey Artamonau
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Builds 3.0.0-585+

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630
Memory = 64 GB
Disk = 2 x SSD

Attachments: PNG File beam.smp_rss_594.png     PNG File beam.smp_rss.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: Build 3.0.0-385
http://ci.sc.couchbase.com/job/apollo-64/1238/artifact/

Build 3.0.0-394
http://ci.sc.couchbase.com/job/apollo-64/1248/artifact/
Is this a Regression?: Yes

 Description   
Delta rebalance after failover, 3 -> 4 nodes, 1 bucket x 100M x 2KB, DGM, 1 ddoc with 1 view, 10K mixed ops/sec, 400 qps

Steps:
1. "Failover" one node.
2. Add it back.
3. Enable delta recovery mode.
4. Wait predefined time (20 minutes).
5. Trigger cluster rebalance, wait for rebalance to finish.




[MB-10910] Rebalance with views after failover fails due to "wait_checkpoint_persisted_failed" Created: 20/Apr/14  Updated: 20/Apr/14  Resolved: 20/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Pavel Paulau
Resolution: Duplicate Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-594

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630
Memory = 64 GB
Disk = 2 x SSD

Issue Links:
Duplicate
duplicates MB-10514 During rebalance, UPR stream gets stu... Reopened
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/apollo-64/1247/artifact/
Is this a Regression?: Yes

 Description   
Rebalance after failover, 3 -> 4 nodes, 1 bucket x 100M x 2KB, DGM, 1 x 1 views, 10K ops/sec, 400 qps

Rebalance exited with reason {unexpected_exit,
                              {'EXIT',<0.16719.261>,
                               {wait_checkpoint_persisted_failed,"bucket-1",
                                892,2204,
                                [{'ns_1@172.23.96.18',
                                  {'EXIT',
                                   {{{{badmatch,
                                       {error,
                                        {{function_clause,
                                          [{couch_set_view_group,handle_info,
                                            [{'DOWN',#Ref<16800.0.690.88151>,
                                              process,<16800.29051.86>,normal},
                                             {state,
                                              {"/ssd2",<<"bucket-1">>,
                                               {set_view_group,
                                                <<185,152,194,156,196,175,136,
                                                  156,249,192,91,24,8,146,109,
                                                  97>>,
                                                nil,<<"bucket-1">>,
                                                <<"_design/A">>,[],
                                                [{set_view,0,
                                                  <<"\n function(doc, meta) {\n emit(doc.city, null);\n }\n ">>,
                                                  undefined,
                                                  {mapreduce_view,
                                                   [<<"id_by_city">>],
                                                   nil,[],[]}}],
                                                nil,nil,
                                                {set_view_index_header,2,0,0,
                                                 0,0,[],nil,[],false,[],nil,
                                                 [],[]},
                                                main,nil,nil,nil,[],
                                                mapreduce_view,".view",prod,
                                                couch_set_view_stats_prod,0,
                                                nil}},
                                              <16800.28934.86>,
                                              {set_view_group,
                                               <<185,152,194,156,196,175,136,
                                                 156,249,192,91,24,8,146,109,97>>,
                                               <16800.28930.86>,
                                               <<"bucket-1">>,<<"_design/A">>,
                                               [],
                                               [{set_view,0,
                                                 <<"\n function(doc, meta) {\n emit(doc.city, null);\n }\n ">>,
                                                 #Ref<16800.0.690.85787>,
                                                 {mapreduce_view,
                                                  [<<"id_by_city">>],
                                                  {btree,<16800.28930.86>,
                                                   {872023493,
                                                    <<0,0,94,204,32,255,254,
                                                      254,191,207,240,120,0,0,
                                                      0,3,255,128,0,0,0,0,0,0,
                                                      0,0,3,199,65,254,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0,0,0,0,0,0,0,0,0,0,
                                                      0,0,0>>,
                                                    115226299},
                                                   identity,identity,
                                                   #Fun<mapreduce_view.14.59760005>,
                                                   #Fun<mapreduce_view.13.116710578>,
                                                   7168,6144,true},
                                                  [],[]}}],
                                               {btree,<16800.28930.86>,
                                                {757316813,
                                                 <<0,0,94,204,32,255,254,254,
                                                   191,207,240,120,0,0,0,3,255,
                                                   128,0,0,0,0,0,0,0,0,3,199,
                                                   65,254,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0>>,
                                                 98005067},
                                                identity,identity,
                                                #Fun<couch_btree.1.34546481>,
                                                #Fun<couch_set_view_group.15.54553809>,
                                                7168,6144,true},
                                               <16800.28933.86>,
                                               {set_view_index_header,2,1024,
                                                179769313486231580793728973728767334576404131515618582199733964182346700693311000189374238233220773621413668943491054022143327228131710028156464112654519069829469374388748562738720335524163543296024799521113246188546084449778835326689833933543997895438175469077364855610760285917179803342821150185391109701632,
                                                7484401160755199293711447201500438591369787779463561409980691526969024494823315590706649929017925717627609195968322657071845869263978468466858275373827242994228699112960690437335703692189245416448148245614649742243301249193480738942045004516930273328675008179153418008263268875886110850416640,
                                                0,
                                                [{820,0},
                                                 {821,0},
                                                 {822,0},
                                                 {823,0},
                                                 {824,0},
                                                 {825,98576},
                                                 {826,98613},
                                                 {827,98636},
                                                 {828,98649},
                                                 {829,98744},
                                                 {830,98776},
                                                 {831,98779},
                                                 {832,98834},
                                                 {833,0},
                                                 {834,0},
                                                 {835,0},
                                                 {836,0},
                                                 {837,0},
                                                 {838,98538},
                                                 {839,0},
                                                 {840,98596},
                                                 {841,98622},
                                                 {842,98599},
                                                 {843,0},
                                                 {844,0},
                                                 {845,0},
                                                 {846,98663},
                                                 {847,98725},
                                                 {848,98727},
                                                 {849,98793},
                                                 {850,0},
                                                 {851,0},
                                                 {852,0},
                                                 {891,0},
                                                 {892,0},
                                                 {893,0},
                                                 {894,0},
                                                 {895,0},
                                                 {896,0},
                                                 {897,0},
                                                 {898,0},
                                                 {899,0},
                                                 {900,0},
                                                 {901,0},
                                                 {902,0},
                                                 {903,0},
                                                 {904,0},
                                                 {905,0},
                                                 {906,0},
                                                 {907,0},
                                                 {908,0},
                                                 {909,0},
                                                 {910,0},
                                                 {911,0},
                                                 {912,0},
                                                 {913,0},
                                                 {914,0},
                                                 {915,0},
                                                 {916,0},
                                                 {917,0},
                                                 {918,0},
                                                 {919,0},
                                                 {920,0},
                                                 {921,0},
                                                 {922,0},
                                                 {923,0},
                                                 {924,0},
                                                 {925,0},
                                                 {926,0},
                                                 {927,98576},
                                                 {928,98605},
                                                 {929,98654},
                                                 {930,98651},
                                                 {931,98704},
                                                 {932,98721},
                                                 {933,98738},
                                                 {934,98713},
                                                 {935,98782},
                                                 {936,98779},
                                                 {937,98760},
                                                 {968,0},
                                                 {969,0},
                                                 {970,0},
                                                 {971,98877},
                                                 {972,98914},
                                                 {973,98961},
                                                 {974,99028},
                                                 {975,0},
                                                 {976,0},
                                                 {977,0},
                                                 {978,0},
                                                 {979,0},
                                                 {980,98539},
                                                 {981,98574},
                                                 {982,98586},
                                                 {983,98664},
                                                 {984,98658},
                                                 {985,98722},
                                                 {986,98704},
                                                 {987,98676},
                                                 {988,0},
                                                 {989,0},
                                                 {990,98535},
                                                 {991,98594},
                                                 {992,98591},
                                                 {993,98649},
                                                 {994,98653},
                                                 {995,98711},
                                                 {996,98717},
                                                 {997,98774},
                                                 {998,0},
                                                 {999,98396},
                                                 {1000,0},
                                                 {1001,98395},
                                                 {1002,98430},
                                                 {1003,98456},
                                                 {1004,98502},
                                                 {1005,98528},
                                                 {1006,98570},
                                                 {1007,98602},
                                                 {1008,0},
                                                 {1009,98412},
                                                 {1010,98354},
                                                 {1011,98346},
                                                 {1012,98366},
                                                 {1013,98360},
                                                 {1014,98358},
                                                 {1015,98378},
                                                 {1016,98345},
                                                 {1017,98405},
                                                 {1018,98375},
                                                 {1019,98428},
                                                 {1020,98454},
                                                 {1021,98460},
                                                 {1022,98494},
                                                 {1023,98521}],
                                                <<0,0,45,35,188,205,0,0,5,215,
                                                  112,75,0,0,94,204,32,255,254,
                                                  254,191,207,240,120,0,0,0,3,
                                                  255,128,0,0,0,0,0,0,0,0,3,
                                                  199,65,254,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                  0,0,0,0,0,0,0,0,0,0>>,
                                                [<<0,0,51,250,5,197,0,0,6,222,
                                                   54,187,0,0,94,204,32,255,
                                                   254,254,191,207,240,120,0,
                                                   0,0,3,255,128,0,0,0,0,0,0,
                                                   0,0,3,199,65,254,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                   0,0,0,0,0,0,0>>],
                                                true,[],nil,[],
                                                [{820,[{0,0}]},
                                                 {821,[{0,0}]},
                                                 {822,[{0,0}]},
                                                 {823,[{0,0}]},
                                                 {824,[{0,0}]},
                                                 {825,[{0,0}]},
                                                 {826,[{0,0}]},
                                                 {827,[{0,0}]},
                                                 {828,[{0,0}]},
                                                 {829,[{0,0}]},
                                                 {830,[{0,0}]},
                                                 {831,[{0,0}]},
                                                 {832,[{0,0}]},
                                                 {833,[{0,0}]},
                                                 {834,[{0,0}]},
                                                 {835,[{0,0}]},
                                                 {836,[{0,0}]},
                                                 {837,[{0,0}]},
                                                 {838,[{0,0}]},
                                                 {839,[{0,0}]},
                                                 {840,[{0,0}]},
                                                 {841,[{0,0}]},
                                                 {842,[{0,0}]},
                                                 {843,[{0,0}]},
                                                 {844,[{0,0}]},
                                                 {845,[{0,0}]},
                                                 {846,[{0,0}]},
                                                 {847,[{0,0}]},
                                                 {848,[{0,0}]},
                                                 {849,[{0,0}]},
                                                 {850,[{0,0}]},
                                                 {851,[{0,0}]},
                                                 {852,[{0,0}]},
                                                 {891,[{0,0}]},
                                                 {892,[{0,0}]},
                                                 {893,[{0,0}]},
                                                 {894,[{0,0}]},
                                                 {895,[{0,0}]},
                                                 {896,[{0,0}]},
                                                 {897,[{0,0}]},
                                                 {898,[{0,0}]},
                                                 {899,[{0,0}]},
                                                 {900,[{0,0}]},
                                                 {901,[{0,0}]},
                                                 {902,[{0,0}]},
                                                 {903,[{0,0}]},
                                                 {904,[{0,0}]},
                                                 {905,[{0,0}]},
                                                 {906,[{0,0}]},
                                                 {907,[{0,0}]},
                                                 {908,[{0,0}]},
                                                 {909,[{0,0}]},
                                                 {910,[{0,0}]},
                                                 {911,[{0,0}]},
                                                 {912,[{0,0}]},
                                                 {913,[{0,0}]},
                                                 {914,[{0,0}]},
                                                 {915,[{0,0}]},
                                                 {916,[{0,0}]},
                                                 {917,[{0,0}]},
                                                 {918,[{0,0}]},
                                                 {919,[{0,0}]},
                                                 {920,[{0,0}]},
                                                 {921,[{0,0}]},
                                                 {922,[{0,0}]},
                                                 {923,[{0,0}]},
                                                 {924,[{0,0}]},
                                                 {925,[{0,0}]},
                                                 {926,[{0,0}]},
                                                 {927,[{0,0}]},
                                                 {928,[{0,0}]},
                                                 {929,[{0,0}]},
                                                 {930,[{0,0}]},
                                                 {931,[{0,0}]},
                                                 {932,[{0,0}]},
                                                 {933,[{0,0}]},
                                                 {934,[{0,0}]},
                                                 {935,[{0,0}]},
                                                 {936,[{0,0}]},
                                                 {937,[{0,0}]},
                                                 {968,[{0,0}]},
                                                 {969,[{0,0}]},
                                                 {970,[{0,0}]},
                                                 {971,[{0,0}]},
                                                 {972,[{0,0}]},
                                                 {973,[{0,0}]},
                                                 {974,[{0,0}]},
                                                 {975,[{0,0}]},
                                                 {976,[{0,0}]},
                                                 {977,[{0,0}]},
                                                 {978,[{0,0}]},
                                                 {979,[{0,0}]},
                                                 {980,[{0,0}]},
                                                 {981,[{0,0}]},
                                                 {982,[{0,0}]},
                                                 {983,[{0,0}]},
                                                 {984,[{0,0}]},
                                                 {985,[{0,0}]},
                                                 {986,[{0,0}]},
                                                 {987,[{0,0}]},
                                                 {988,[{0,0}]},
                                                 {989,[{0,0}]},
                                                 {990,[{0,0}]},
                                                 {991,[{0,0}]},
                                                 {992,[{0,0}]},
                                                 {993,[{0,0}]},
                                                 {994,[{0,0}]},
                                                 {995,[{0,0}]},
                                                 {996,[{0,0}]},
                                                 {997,[{0,0}]},
                                                 {998,[{0,0}]},
                                                 {999,[{0,0}]},
                                                 {1000,[{0,0}]},
                                                 {1001,[{0,0}]},
                                                 {1002,[{0,0}]},
                                                 {1003,[{0,0}]},
                                                 {1004,[{0,0}]},
                                                 {1005,[{0,0}]},
                                                 {1006,[{0,0}]},
                                                 {1007,[{0,0}]},
                                                 {1008,[{0,0}]},
                                                 {1014,[{0,0}]},
                                                 {1015,[{0,0}]},
                                                 {1016,[{0,0}]},
                                                 {1017,[{206630118054190,0}]},
                                                 {1018,[{58851646531965,0}]},
                                                 {1019,
                                                  [{118920551874785,98345},
                                                   {103801475651071,0}]},
                                                 {1020,[{0,0}]},
                                                 {1021,[{14726742042937,0}]},
                                                 {1022,
                                                  [{210477493706492,98367},
                                                   {85820590538866,0}]},
                                                 {1023,
                                                  [{92085762404913,98351},
                                                   {277016319418037,0}]}]},
                                               main,nil,<16800.28934.86>,nil,
                                               "/ssd2/@indexes/bucket-1/main_b998c29cc4af889cf9c05b1808926d61.view.1",
                                               mapreduce_view,".view",prod,
                                               couch_set_view_stats_prod,
                                               872062976,<16800.28949.86>},
                                              nil,false,not_running,nil,nil,
                                              nil,0,[],nil,false,undefined,
                                              true,true,
                                              [171,172,173,174,175,176,177,
                                               178,179,180,181,182,183,184,
                                               185,186,187,188,189,190,191,
                                               192,193,194,195,196,197,198,
                                               199,200,201,202,203,204,205,
                                               206,207,208,209,210,211,212,
                                               213,214,215,216,217,218,219,
                                               220,221,222,223,224,225,226,
                                               227,228,229,230,231,232,233,
                                               234,235,236,237,238,239,240,
                                               241,242,243,244,245,246,247,
                                               248,249,250,251,252,253,254,
                                               255,427,428,429,430,431,432,
                                               433,434,435,436,437,438,439,
                                               440,441,442,443,444,445,446,
                                               447,448,449,450,451,452,453,
                                               454,455,456,457,458,459,460,
                                               461,462,463,464,465,466,467,
                                               468,469,470,471,472,473,474,
                                               475,476,477,478,479,480,481,
                                               482,483,484,485,486,487,488,
                                               489,490,491,492,493,494,495,
                                               496,497,498,499,500,501,502,
                                               503,504,505,506,507,508,509,
                                               510,511,682,683,684,685,686,
                                               687,688,689,690,691,692,693,
                                               694,695,696,697,698,699,700,
                                               701,702,703,704,705,706,707,
                                               708,709,710,711,712,713,714,
                                               715,716,717,718,719,720,721,
                                               722,723,724,725,726,727,728,
                                               729,730,731,732,733,734,735,
                                               736,737,738,739,740,741,742,
                                               743,744,745,746,747,748,749,
                                               750,751,752,753,754,755,756,
                                               757,758,759,760,761,762,763,
                                               764,765,766,767],
                                              [],
                                              {dict,0,16,16,8,80,48,
                                               {[],[],[],[],[],[],[],[],[],[],
                                                [],[],[],[],[],[]},
                                               {{[],[],[],[],[],[],[],[],[],
                                                 [],[],[],[],[],[],[]}}},
                                              nil,3000}]},
                                           {gen_server,handle_msg,5},
                                           {proc_lib,init_p_do_apply,3}]},
                                         {gen_server,call,
                                          [<16800.28929.86>,
                                           {monitor_partition_update,891,
                                            #Ref<16800.0.695.30261>,
                                            <16800.10511.87>},
                                           infinity]}}}},
                                      [{capi_set_view_manager,handle_call,3},
                                       {gen_server,handle_msg,5},
                                       {gen_server,init_it,6},
                                       {proc_lib,init_p_do_apply,3}]},
                                     {gen_server,call,
                                      ['capi_set_view_manager-bucket-1',
                                       {wait_index_updated,891},
                                       infinity]}},
                                    {gen_server,call,
                                     [{'janitor_agent-bucket-1',
                                       'ns_1@172.23.96.18'},
                                      {if_rebalance,<0.15317.163>,
                                       {wait_checkpoint_persisted,892,2204}},
                                      infinity]}}}}]}}}


 Comments   
Comment by Pavel Paulau [ 20/Apr/14 ]
MB-10514




[MB-10514] During rebalance, UPR stream gets stuck after sending a snapshot marker and does not send any further mutations for that stream. Created: 20/Mar/14  Updated: 20/Apr/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Sarath Lakshman Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive couchdb_logs.tar.gz     Text File couchdb_upr_client_inf_timeout.patch     GZip Archive logs.tar.gz     GZip Archive memc_logs.tar.gz     Text File ops.txt     Text File streams.txt     File upr_incoming.pcapng    
Issue Links:
Dependency
blocks MB-10490 Simple-test Rebalance failure with ba... Resolved
blocks MB-10548 Views tests failing with error "vbuck... Resolved
blocks MB-10730 Rebalance exited with reason "bulk_se... Closed
Duplicate
duplicates MB-10490 Simple-test Rebalance failure with ba... Resolved
is duplicated by MB-10910 Rebalance with views after failover f... Closed
Relates to
relates to MB-10772 During rebalance, getting timeout for... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Related to view bug ticket, MB-10490

For views, we open a single connection and reuse that connection for all the tasks such as gathering stats and streaming mutations. At at time we use this connection for requesting only one stream. But it is simultaneously used for querying stats.

Scenario:
1. Create a couchbase node with 1024 vbuckets, insert 10240 documents (no duplicates)
2. Create a default view and publish it
3. Create another couchbase node 1024 vbuckets
4. Added the second node to the cluster and rebalance

On the second node, when building index, view engine request stream for each vbucket and try to read mutations. But it is observed that after few streams or even for first stream, the open stream (seq 0 - x) succeeds with failover log. Then instead of receiving mutations 0 to x with stream_end, I am receiving a snapshot_marker and thats it. No more mutations coming for that stream and it gets stuck.

Please apply the attached couchdb patch (to keep erlang upr client timeout infinity) before reproducing it.

Also attached some debug logs with comments. Please refer streams.txt to see sequence of operations and ops.txt to see response coming from server.

Packet trace of upr (port 12002) with repro test other than the one corresponding to the debug log is also attached.


 Comments   
Comment by Sriram Melkote [ 27/Mar/14 ]
Nimish, Pratap - please wait for final confirmation from Mike that bug has been fixed before retrying.
Comment by Nimish Gupta [ 01/Apr/14 ]
Still seeing the issue, so reopening it. Getting UPR stream timeout in the logs.

[ns_server:debug,2014-04-01T14:42:08.664,n_1@127.0.0.1:<0.20498.0>:capi_set_view_manager:do_wait_index_updated:640]Got unexpected message from ddoc monitoring. Assuming that's shutdown: {updater_err, {timeout,
                                                                         {gen_server,
                                                                          call,
                                                                          [<0.15423.0>,
                                                                           {get_stream_event,
                                                                            57377}]}}}

I have revision="042873d5e22703bfae31e17869c5721f52fb6b3e" in ep-engine.
Comment by Ketaki Gangal [ 02/Apr/14 ]
Seeing failures with latest build simple-test view-rebalance - 3.0.0-537-rel

http://qa.sc.couchbase.com/view/3.0.0/job/centos_x64--00_01--simple-test-P0/204/consoleFull
Comment by Mike Wiederhold [ 07/Apr/14 ]
http://review.couchbase.org/#/c/35243/
Comment by Sarath Lakshman [ 09/Apr/14 ]
Still this problem exists. get_stream_event timeout error messages are logged in couchdb logs if you try to run above repro steps.
Volker just mentioned about a potential fix, http://review.couchbase.org/#/c/35486/ which seems to fix the problem.

Please close this bug once the fix is merged.
Comment by Sarath Lakshman [ 09/Apr/14 ]
With http://review.couchbase.org/#/c/35486/, it doesn't happen with steps mentioned in bug description. But, I can see it happening with the following test in third node.

NODES=4 TEST=rebalance.rebalancein.RebalanceInTests.incremental_rebalance_in_with_queries,blob_generator=False,items=2000,is_dev_ddoc=False,max_verify=2000 make any-test.

I am using tap replication (COUCHBASE_REPL_TYPE=tap).


Saraths-MacBook-Pro:couchbase sarath$ cd ns_server/logs/
Saraths-MacBook-Pro:logs sarath$ grep get_stream_ -R .
./n_3/couchdb.1:error: {timeout,{gen_server,call,[<0.977.0>,{get_stream_event,957}]}}
./n_3/couchdb.1: {get_stream_event,

Currently we have a 5 second timeout. Can it be a legitimate time for backfilling from ep-engine ? I can try increasing it.
Comment by Sriram Melkote [ 10/Apr/14 ]
Mike will back this out and add it back to ensure it is not causing other regressions. Sarath will increase timeout to give ep-engine more time. Also, with UPR replication, this is not occurring.
Comment by Mike Wiederhold [ 11/Apr/14 ]
From further testing it looks like my fix doesn't cause any significant issues, but it does trade one set of sporadic failures for another set. When this change is merged I see two problems.

1. Not all items are streamed to view engine (MB-10846 Mike to look at this since it is probably an ep-engine issue)
2. The active nor passive partition issue in view engine (MB-10815 which is assigned to Sarath)

I will continue to look at the ep-engine issue and we can figure out whether or not we want to merge my change before fixing MB-10815 when I'm done fixing the ep-engine problems.
Comment by Maria McDuff [ 11/Apr/14 ]
Mike,

why is this assigned to Tommie?
Is this resolved and ready for QE to test? pls confirm.
Comment by Mike Wiederhold [ 11/Apr/14 ]
Tommie assigned it to himself. I do not know.
Comment by Sriram Melkote [ 14/Apr/14 ]
Tommie, let's wait for confirmation from Mike that all planned changes with respect to stream continuity are merged before taking the bug to verification step
Comment by Tommie McAfee [ 14/Apr/14 ]
Sure, may have done this by mistake.
Comment by Sarath Lakshman [ 14/Apr/14 ]
Mike,

MB-10815 seems to be a problem around ns_server interaction. So you can ignore that problem.
Comment by Mike Wiederhold [ 14/Apr/14 ]
The ep-engine side fix for this is here: http://review.couchbase.org/35708

We will wait until ns_server and view engine have fixes for the remaining problems ready before merging this.
Comment by Sarath Lakshman [ 15/Apr/14 ]
It is happening with this patch as well with tap replication (I haven't tried upr replication)
Please see the attached logs.tar.gz

Following is the config that I am using:
diff --git a/scripts/start_cluster_and_run_tests.sh b/scripts/start_cluster_and_run_tests.sh
index ec573dc..4867ee7 100755
--- a/scripts/start_cluster_and_run_tests.sh
+++ b/scripts/start_cluster_and_run_tests.sh
@@ -72,7 +72,8 @@ else
    make dataclean
    make
 fi
-COUCHBASE_NUM_VBUCKETS=64 python ./cluster_run --nodes=$servers_count &> $wd/cluster_run.log &
+
+COUCHBASE_REPL_TYPE=tap COUCHBASE_NUM_VBUCKETS=1024 python ./cluster_run --nodes=$servers_count --loglevel=info &> $wd/cluster_run.log &
 pid=$!
 popd
 python ./testrunner $conf -i $ini $test_params 2>&1 -p makefile=True | tee make_test.log


Test:
NODES=3 TEST=rebalance.rebalancein.RebalanceInTests.incremental_rebalance_in_with_queries,blob_generator=False,items=2000,is_dev_ddoc=False,max_verify=2000,get-logs=True,get-cbcollect-info=True make any-test

You may get hit by other ns_server related exceptions, in that case you have to try luck in the next run :)
Comment by Mike Wiederhold [ 15/Apr/14 ]
There are no memcached logs in that tar file. Unfortunately they are in a different location. I'll look at this issue once we get some of the other things merged since I think I saw this happen very sporadically and it does not affect a make simple-test test case at the moment.
Comment by Sarath Lakshman [ 15/Apr/14 ]
Sorry that I missed memcached logs.
Attaching logs of different test run.
Comment by Mike Wiederhold [ 15/Apr/14 ]

I merged two changes to fix this:

http://review.couchbase.org/#/c/35708/
http://review.couchbase.org/#/c/35748/

Please reopen if you see this problem again.
Comment by Mike Wiederhold [ 16/Apr/14 ]
Appears to still be some sporadic issue. I'll look into it tomorrow.
Comment by Sarath Lakshman [ 18/Apr/14 ]
Now, with UPR replication it works fine. TAP replication still has a problem.




[MB-10909] Crash with unaligned memory access on ARM Created: 19/Apr/14  Updated: 20/Apr/14  Resolved: 20/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Jens Alfke Assignee: Chiyoung Seo
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: iPhone 5, iOS 7.1

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When I run an optimized build of Couchbase Lite on an iPhone, it's crashing while opening a database:

* thread #1: tid = 0x12b82, 0x000f2f62 Worker Bee`crc32_8 + 42, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_ARM_DA_ALIGN, address=0x18aaa085)
    frame #0: 0x000f2f62 Worker Bee`crc32_8 + 42
    frame #1: 0x000efb2c Worker Bee`_file_hash(hash*, hash_elem*) + 32
    frame #2: 0x000fa4c0 Worker Bee`hash_find + 12
    frame #3: 0x000efde8 Worker Bee`filemgr_open + 84
    frame #4: 0x000f45c2 Worker Bee`___lldb_unnamed_function2572$$Worker Bee + 114
    frame #5: 0x000f452c Worker Bee`fdb_open + 52

ARM requires non-byte memory accesses to be aligned, and the crash is due to a 32-bit load from an odd address.

The problem seems to be in _file_hash. It looks like it hashes only the last 8 bytes of the file path, but by counting back 8 bytes from the end of the string it's likely to start at an odd address. I'll try modifying this to align the address. (BTW, it looks like this was done for speed, but CRC32 is a fairly slow hash function. Something like murmurhash (https://code.google.com/p/smhasher/) would be a lot faster for in-memory use. CRC32 is still appropriate for use on disk.)

I have no idea why I only ran into this when using an optimized build; I've been running a debug build earlier today in the same app on the same device with no problems.

 Comments   
Comment by Jens Alfke [ 19/Apr/14 ]
http://review.couchbase.org/#/c/36040/
Comment by Chiyoung Seo [ 20/Apr/14 ]
Thanks a lot for catching a bug and good fix. That's quite interesting to know that ARM requires that constraint.

Yeah, as you mentioned, murmurhash is known to be a fast hash solution, and is also widely used in implementing a bloom filter.

The fix was merged.




[MB-10907] UUID difference observed in (active vs replica) vbuckets after online upgrade 2.5.1 ==> 3.0.0-593 Created: 19/Apr/14  Updated: 19/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive online_upgrade_logs_2.5.1_3.0.0.tar.gz    
Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: 1. Install 2.5.1 on 10.6.2.144, 10.6.2.145
2. Add 10.6.2.145 to 10.6.2.144 and rebalance
3. Add default bucket with 1024 vbuckets to cluster
4. Add ~ 1000 items to the buckets
5. Online upgrade cluster to 3.0.0-593 with 10.6.2.146 as our extra node
6. Finally cluster has node 10.6.2.145, 10.6.2.146

Check V-bucket id for active and replica v-bucket, after all replication is complete, plus disk queue drained.

Expectation: Should be same according to UPR

Actual Result: Different as per observation. Without an upgrade they are same.

Example of difference in UUID

On 10.6.2.145 where vb_9 is active
 vb_9:high_seqno: 14

 vb_9:purge_seqno: 0

 vb_9:uuid: 18881518640852

On 10.6.2.146 where vb_9 is replica
 vb_9:high_seqno: 14

 vb_9:purge_seqno: 0

 vb_9:uuid: 120602843033209



Is this a Regression?: No




[MB-10864] High auth latency observed from the SDKs (magnifying under load) Created: 16/Apr/14  Updated: 19/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.2.0, 2.5.0, 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Michael Nitschinger Assignee: Trond Norbye
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Happens on SDKQE environments (Linux, virtual), but also seeing locally (mac os x + virtualbox linux) slightly higher times than expected.

Attachments: Zip Archive cbcollectinfo_mb10684_11.zip     Zip Archive cbcollectinfo_mb10684_15.zip     Zip Archive cbcollectinfo_mb10684_245.zip     Zip Archive cbcollectinfo_mb10684.zip     Zip Archive mb10864_server11.zip     Zip Archive mb10864_server15.zip     Zip Archive mb10864_server245.zip     Zip Archive mb10864_tcpdump.zip    
Triage: Untriaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Unknown

 Description   
Upfront: this may or may not be harder to track down exactly, we'll see.

The gist is that during SDK development and SDKQE (especially), we observed very high auth latencies (sometimes even in the seconds), especially under load and when the node gets readded (but not exclusively).

There is a ticket opened for this on the SDKQE Side (http://www.couchbase.com/issues/browse/SDKQE-485), but don't care right now that it is marked as resolved. We implemented some better logistics int he SDK to mitigate the effect, but it is still there (could be on the server side).

You can find some logging in the zip files there, but I won't bother you with the client finest logging. Because we were seeing the issue, I added timings logging to
- the sasl mechs list
- each auth step
- the total time for auth

To me it looks like that it got worse with CRAM-MD5, not only because we now have 3 roundtrips instead of one with PLAIN. Trond mentioned something potentially wrong wrt to entropy?

Check this timings for example:

981598:[737.68 INFO] (SDKD log:137) WARNING: SASL List Mechanisms took 2791ms on {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=1, #Wops=0, #iq=0, topRop=SASL mechs operation, topWop=null, toWrite=0, interested=1}
985199:[740.73 INFO] (SDKD log:137) WARNING: SASL Step took 3042ms on {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}
985455:[740.83 INFO] (SDKD log:137) WARNING: SASL Step took 103ms on {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}
985457:[740.83 INFO] (SDKD log:137) WARNING: SASL Auth took 5938ms on {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}

You can see the individual step timings and also the total auth time.

Deepti will provide tcpdump information for a run like this so we can dig into it further.
Let me/us know what you need from the SDK side to dig into that. Note that even locally, I observed timings in the 100dreds of milliseconds, which according to trond should never take that long.

 Comments   
Comment by Trond Norbye [ 16/Apr/14 ]
A packet dump captured on the server with timings would be really interesting (to eliminate any networking problems). In addition the timing data from memcached would be interesting.
Comment by Deepti Dawar [ 17/Apr/14 ]
Attaching cbcollect_info logs and tcpdump file for reference.
Comment by Matt Ingenthron [ 17/Apr/14 ]
Timing data from memcached on auth isn't available in this case, as it was 2.5.1 being tested. Also, I'd originally requested Deepti gather captures from both sides-- will follow up with her on that. In any case, I think there's sufficient info here for the MB to be investigated.
Comment by Trond Norbye [ 17/Apr/14 ]
(as a side note we're also building the centos versions with a too old compiler so we won't have the timings stats anyway...)
Comment by Chiyoung Seo [ 17/Apr/14 ]
I can take a look at this issue, but was not involved in this area. I think Trond worked on the auth implementation in the memcached layer. He will be definitely more helpful in debugging this issue.
Comment by Trond Norbye [ 17/Apr/14 ]
What is the IP of the client and what is the IP on this server the pcap is generated from? (to make it easier to filter out the right kind of traffic for me...)
Comment by Deepti Dawar [ 18/Apr/14 ]
Client IP - 10.3.4.8, Server IPs - 10.3.4.6, 10.3.4.11, 10.3.4.15, 10.3.3.245
Will add the server side tcpdump as well.
Comment by Trond Norbye [ 19/Apr/14 ]
Looking at the pcap files it looks like they are truncated at 96 bytes per packet?
Comment by Michael Nitschinger [ 19/Apr/14 ]
Trond, could you supply the params you need so that deepti can do a tcpdump which suites your needs? Thanks :)




[MB-10882] upr_notifier should not crash on server start and on server shutdown Created: 17/Apr/14  Updated: 18/Apr/14  Resolved: 18/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Task Priority: Critical
Reporter: Artem Stemkovski Assignee: Artem Stemkovski
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
cause:

on server start: it tries to create connection before memcached is ready for it

on server shutdown: after memcached closes the socket the upr_notifier exits with normal
then it gets restarted by the sup and crashes with

=========================CRASH REPORT=========================
  crasher:
    initial call: upr_proxy:init/1
    pid: <0.15704.0>
    registered_name: []
    exception exit: {{badmatch,{error,econnrefused}},
                     [{mc_replication,connect,3,
                                      [{file,"src/mc_replication.erl"},
                                       {line,27}]},
                      {upr_proxy,connect,4,
                                 [{file,"src/upr_proxy.erl"},{line,147}]},
                      {upr_proxy,init,1,
                                 [{file,"src/upr_proxy.erl"},{line,46}]},
                      {gen_server,init_it,6,
                                  [{file,"gen_server.erl"},{line,304}]},
                      {proc_lib,init_p_do_apply,3,
                                [{file,"proc_lib.erl"},{line,239}]}]}
      in function gen_server:init_it/6 (gen_server.erl, line 328)
    ancestors: ['single_bucket_sup-gamesim-sample',<0.12788.0>]
    messages: []
    links: [<0.12789.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 75113
    stack_size: 27
    reductions: 6122
  neighbours:

 Comments   
Comment by Artem Stemkovski [ 18/Apr/14 ]
http://review.couchbase.org/36023




[MB-10906] CBTransfer in CSV mode for backup data acts like STDOUT Created: 18/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Parag Agarwal Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Scenario

I was trying to take csv format output for backup file, it was trying to give stdout and not save in a file. In my case the file /tmp/data.txt did not get created. The only work around was to redirect the output.

[root@palm-10307 bin]# ./cbtransfer /tmp/bucket/ csv:/tmp/data.txt
id,flags,expiration,cas,value,rev,vbid
ddd,0,0,3358190305579503,"{""click"":""to edit"",""new in 2.0"":""there are no reserved field names""}",49,17
  [####################] 100.0% (1/estimated 1 msgs)
bucket: default, msgs transferred...
       : total | last | per sec
 byte : 68 | 68 | 1028.6






[MB-10905] memcached cannot load libraries on mac when run as install/bin/couchbase-server Created: 18/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: build, storage-engine
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Artem Stemkovski Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
though it works without any issues if run as ns_server/cluster_run

=========================PROGRESS REPORT=========================
          supervisor: {local,ns_child_ports_sup}
             started: [{pid,<0.272.0>},
                       {name,
                           {memcached,
                               "/Users/artem/Work/couchbase/install/bin/memcached",
                               ["-C",
                                "/Users/artem/Work/couchbase/install/var/lib/couchbase/config/memcached.json"],
                               [{env,
                                    [{"EVENT_NOSELECT","1"},
                                     {"MEMCACHED_TOP_KEYS","100"},
                                     {"ISASL_PWFILE",
                                      "/Users/artem/Work/couchbase/install/var/lib/couchbase/isasl.pw"}]},
                                use_stdio,stderr_to_stdout,exit_status,
                                port_server_send_eol,stream],
                               [{"/Users/artem/Work/couchbase/install/var/lib/couchbase/config/memcached.json",
                                 <<"{\"interfaces\":[{\"host\":\"*\",\"port\":11210,\"maxconn\":10000},{\"host\":\"*\",\"port\":11209,\"maxconn\":1000},{\"host\":\"*\",\"port\":11207,\"maxconn\":10000,\"ssl\":{\"key\":\"/Users/artem/Work/couchbase/install/var/lib/couchbase/config/memcached-key.pem\",\"cert\":\"/Users/artem/Work/couchbase/install/var/lib/couchbase/config/memcached-cert.pem\"}}],\"extensions\":[{\"module\":\"/Users/artem/Work/couchbase/install/lib/memcached/stdin_term_handler.so\",\"config\":\"\"},{\"module\":\"/Users/artem/Work/couchbase/install/lib/memcached/file_logger.so\",\"config\":\"cyclesize=10485760;sleeptime=19;filename=/Users/artem/Work/couchbase/install/var/lib/couchbase/logs/memcached.log\"}],\"engine\":{\"module\":\"/Users/artem/Work/couchbase/install/lib/memcached/bucket_engine.so\",\"config\":\"admin=_admin;default_bucket_name=default;auto_create=false\"},\"verbosity\":0}">>}]}},
                       {mfargs,
                           {supervisor_cushion,start_link,
                               [memcached,5000,infinity,ns_port_server,
                                start_link,
                                [#Fun<ns_child_ports_sup.4.41629644>]]}},
                       {restart_type,permanent},
                       {shutdown,86400000},
                       {child_type,worker}]

[ns_server:info,2014-04-18T13:26:43.754,babysitter_of_ns_1@127.0.0.1:<0.273.0>:ns_port_server:log:169]memcached<0.273.0>: dyld: Library not loaded: libJSON_checker.dylib
memcached<0.273.0>: Referenced from: /Users/artem/Work/couchbase/install/bin/memcached
memcached<0.273.0>: Reason: image not found





[MB-10904] CBTransfer can read replica from couchstore and http:// Created: 18/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Parag Agarwal Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Currently we can cbtansfer for replica only from sqllite files (i.e. from backup). However, the following command does not work (for couchstore and http://)

Example of couchstore not working for replica

[root@palm-10307 bin]# /opt/couchbase/bin/cbtransfer couchstore-files:///opt/couchbase/var/lib/couchbase/data csv:/tmp/default.5838520c-c744-11e3-99de-6003089eed5e.csv -b default -u Administrator -p password --source-vbucket-state=replica --destination-vbucket-state=replica -i 59
error: only --source-vbucket-state=active is supported by this source: couchstore-files:///opt/couchbase/var/lib/couchbase/data


This request is to improve the cbtransfer to support http:// and couchstore files as well. Would be great to have it for data comparison analysis for our testing.





[MB-8537] "docs data size" varying with compaction Created: 28/Jun/13  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Artem Stemkovski
Resolution: Unresolved Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2013-06-28 at 4.10.46 PM.png     PNG File update_with_comp.png    
Triage: Triaged

 Description   
I have load running on a system but with 100% updates and no changes to the size of data. However, the "docs data size" appears to go up and down as compaction runs. It remains flat if compaction is configured not to run.

This appears to indicate that we are incorrectly monitoring the data size (note that this is _not_ referring to the file size which correctly goes up and down)

This appears to be a regression in 2.1.1 though I'm not totally sure.



 Comments   
Comment by Perry Krug [ 28/Jun/13 ]
Screenshot attached
Comment by Artem Stemkovski [ 18/Apr/14 ]
cannot repro on 3.0
loaded 10000 keys
running a ruby script that updates randomly selected keys constantly with randomly generated docs
see attached the screenshot of the stats: update_with_comp.png




[MB-10896] [Doc] Unified Protocol for Replication (UPR) Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Unified Protocol for Replication (UPR)


 Comments   
Comment by Matt Ingenthron [ 18/Apr/14 ]
The description and the summary here don't seem to match. Is this about documenting UPR, or incremental backup?

Note that UPR is, to my knowledge, a private interface in 3.0. This means it should be documented still, but perhaps in a different way.
Comment by Anil Kumar [ 18/Apr/14 ]
This is for UPR. You're right it is a private interface and we'll have concepts, stats etc documented. If you have any suggestion please let us know.




[MB-10888] [Doc] XDCR replication without persisting to disk on source Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
XDCR replication without persisting to disk on source.

http://www.couchbase.com/issues/browse/MB-9981
https://www.couchbase.com/issues/browse/MB-10400






[MB-10889] [Doc] Index building without persisting to disk on source [using UPR] Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Index building without persisting to disk on source [using UPR] .

http://www.couchbase.com/issues/browse/MB-8903.




[MB-10887] [Doc] Cluster-wide diagnostics gathering - collect_info from UI across cluster Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Cluster-wide diagnostics gathering tool on Admin UI.

http://www.couchbase.com/issues/browse/MB-10086





[MB-10890] [Doc] Delta node recovery for failed over node Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Delta node recovery for failed over node.

ns_server - https://www.couchbase.com/issues/browse/MB-9979

ui - https://www.couchbase.com/issues/browse/MB-10150

tools - https://www.couchbase.com/issues/browse/MB-10456




[MB-10891] [Doc] Gracefully failover - Node is in Repair mode Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Gracefully failover - Node is in Repair mode.

ns_server - https://www.couchbase.com/issues/browse/MB-9980

tool - https://www.couchbase.com/issues/browse/MB-10892





[MB-10893] [Doc] XDCR - pause and resume Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
XDCR - pause and resume

https://www.couchbase.com/issues/browse/MB-5487




[MB-10894] [Doc] Data encryption between client and server for bucket data - couchbase buckets Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Data encryption between client and server for bucket data - couchbase buckets.

https://www.couchbase.com/issues/browse/MB-10082
https://www.couchbase.com/issues/browse/MB-10083
https://www.couchbase.com/issues/browse/MB-10084




[MB-10895] [Doc] Incremental backup and restore (Differential, Cumulative) Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Incremental backup and restore (Differential, Cumulative).

http://www.couchbase.com/issues/browse/MB-10176





[MB-10897] [Doc] Logging/serviceability improvements Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Logging/serviceability improvements

http://www.couchbase.com/issues/browse/MB-10088
http://www.couchbase.com/issues/browse/MB-9198
http://www.couchbase.com/issues/browse/MB-10085




[MB-10898] [Doc] Password encryption between Client and Server for Admin ports credentials Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Password encryption between Client and Server for Admin ports credentials

http://www.couchbase.com/issues/browse/MB-10088
http://www.couchbase.com/issues/browse/MB-9198




[MB-10899] [Doc] Support immediate and eventual consistency level for indexes (stale=false) Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Support immediate and eventual consistency level for indexes (stale=false)






[MB-10900] [Doc] Bucket Priority for Disk I/O optimization Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Bucket Priority for Disk I/O optimization

http://www.couchbase.com/issues/browse/MB-10369
http://www.couchbase.com/issues/browse/MB-10307
http://www.couchbase.com/issues/browse/MB-10849




[MB-10902] [Doc] Progress indicator for Warm-up Operation Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Progress indicator for Warm-up Operation -

http://www.couchbase.com/issues/browse/MB-8989




[MB-10903] [Doc] Global I/O Manager Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Global I/O Manager

http://www.couchbase.com/issues/browse/MB-9036






[MB-10901] [Doc] Tunable Memory - Optional removal of keys & metadata from memory Created: 18/Apr/14  Updated: 18/Apr/14  Due: 23/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: 3.0-Beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word TunableMemory_TestPlan_1.1.docx     Microsoft Word TunableMemory_TestPlan.docx    
Flagged:
Release Note

 Description   
Tunable Memory - Optional removal of keys & metadata from memory

http://www.couchbase.com/issues/browse/CBD-1034

MB-10151




[MB-10098] Syntax for URI of sending a test email returns "not found" Created: 31/Jan/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Gwen Leong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-10097 Escape characters and text cleanup ne... Resolved

 Description   
This link: http://docs.couchbase.com/couchbase-manual-2.2/#sending-test-emails

Directs the user to use this URI: curl -i -u admin:password http://localhost:8091/settings/alerts/sendTestEmail

Which (at least on 2.2 and 2.5) returns an error of "object not found":
[root@cb1 ~]# curl -i -u Administrator:password http://localhost:8091/settings/alerts/sendtestemail -d ' '
HTTP/1.1 404 Object Not Found
Server: MochiWeb/1.0 (Any of you quaids got a smint?)
Date: Fri, 20 Dec 2013 10:58:52 GMT
Content-Type: text/plain
Content-Length: 10

Not found.

 Comments   
Comment by Amy Kurtzman [ 31/Jan/14 ]
Can you provide the correct syntax for the documentation?
Comment by Perry Krug [ 01/Feb/14 ]
According to Alk it is just 'testEmail' rather than 'sendTestEmail' but you may want to verify that with empirical evidence :)
Comment by Ruth Harris [ 04/Apr/14 ]
Fixed in 2.5. -Ruth

Gwen,
Please check 2.2, 2.1, 2.0. If they have the same piece of code, change "sendTestEmail" to "testEmail"




[MB-10802] 2.5 docs "Setting" section is missing new 2.5 "Cluster" tab and data Created: 08/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Kirk Kirkconnell Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Couchbase_Console__2_5_1_.png    
Triage: Triaged
Is this a Regression?: Unknown

 Description   
In the "Settings" section of the CB 2.5 Admin documentation (http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#settings) it is missing a part about the new tab in 2.5 for "Cluster" and there are no parts of the documentation on that tab that speak about the functionality on that new tab.

Also, all of the screen shots in the entire Settings section are missing the new Cluster tab as well.

I have attached a screen shot from a 2.5.1. install with the Cluster tab and then the 2 areas that need to be discussed for that tab section.




[MB-10276] zoom parameter for REST stats not working Created: 21/Feb/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation, ns_server, RESTful-APIs
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
Testing out some stats collection from here: http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#getting-bucket-statistics

This command (with -d zoom=minute) returns "Not Found":
curl -u admin:password -d zoom=minute http:t:8091/pools/default/buckets/default/stats

Without the "zoom=minute" the same command succeeds.



 Comments   
Comment by Ruth Harris [ 04/Apr/14 ]
2.5: removed "zoom=minute" from code line.

Gwen, please check 2.2, 2.1, and 2.0 for this same code line and remove the option, "zoom=minute"
Comment by Perry Krug [ 10/Apr/14 ]
FYI, simply removing this is not the right answer here. Apparently the zoom parameter is still valid, but for some reason needs to be specified in a different way:
curl -u Administrator:couchbase 'http://127.0.0.1:8091/pools/default/buckets/default/stats?haveTStamp=1397157182414&zoom=hour&#39;

Can you please check with engineering to understand the correct and desired behavior here? And then we need to go back through the 2.5 manual and add it back correctly.




[MB-9669] Typo: ep_warmup_min_item_threshold should be ep_warmup_min_items_threshold (itemS) Created: 03/Dec/13  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1, 2.2.0, 2.1.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Patrick Varley Assignee: Gwen Leong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Two typos where ep_warmup_min_item_threshold should be ep_warmup_min_items_threshold. (Missing the s on item)

Both typos are in the tables under:
command-line-interface-for-administration->cbstats Tool->Getting Warmup Information

I would give you the link but mb-9668 explains why not. It is under the 2nd anchor with the same name:
http://docs.couchbase.com/couchbase-manual-2.2/#getting-warmup-information

There is another problem here, same area of the manual so it makes sense to cover it in this defect.
ep_warmup_min_items_threshold is a percentage.

In a number of places in the manual it says it's the total number of documents loaded into memory before exiting warmup. This not true.

 Comments   
Comment by Dipti Borkar [ 03/Dec/13 ]
Anil, can you please fix this.
Comment by Ruth Harris [ 04/Apr/14 ]
In 2.5, FIXED.
CLI, cb_stats... fixed 2 tables (add "s" to items and change "number of key data: to "percentage of key data"), 2 other tables were ok.

These sections discuss total number of documents, not percentage:
http://docs.couchbase.com/couchbase-manual-2.2/#handling-server-warmup (in Admin tasks section)
http://docs.couchbase.com/couchbase-manual-2.2/#changing-the-warmup-threshold (in Admin tasks section)

Change:
"The total number of documents loaded into memory" to "The percentage of documents..."
"This indicates the number of items" TO "This indicates the percentage of.."


Gwen, Please update for 2.2, 2.1




[MB-10769] DOC: Administration - DITA conversion Created: 04/Apr/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Ruth Harris Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Ruth Harris [ 18/Apr/14 ]
initial conversion done on admin, install, rest, cli
broken out by h2 topics
put into xml topics (this is a more generic xml format)
generated PDF and HTML with 4 different ditamaps

Need to:
1. break down to "h3" topics where appropriate
2. redo cross-references
3. re-add noteboxes
4. fix image links and replace screenshots where appropriate
5. re-organize/consolidate topics where appropriate
6. re-do cli/rest information to conform to info template
7.




[MB-10231] Docs: Collection of small text changes Created: 17/Feb/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Don Stacy Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File dstacySimpleTextEdits_2014Feb16.html    
Triage: Untriaged

 Description   
I exchanged emails with Ruth about how to report minor issues. She suggested reporting anything very simple in a single bug and others in separate bugs. Thus, this bug is a collection of minor changes that I do not think require Dev input. The changes are low hanging fruit to perform some general cleanup. Let me know if you need more info or need these to be broken up somehow. The details are in the attached HTML file.

 Comments   
Comment by Don Stacy [ 17/Feb/14 ]
Hello Amy. I just realized that the items I logged tonight all came in a CBSE items. The process doc I was following at http://hub.internal.couchbase.com/confluence/display/techpubs/Doc+Fixes+and+Edits+Process+2013 shows examples that are MB. So perhaps I did something wrong in the entry. Please advise and I will cleanup as necessary. Thanks, and sorry for any confusion.
Comment by Dipti Borkar [ 17/Feb/14 ]
Don,

CBSE is only used for code bugs and communication about support tickets with dev.
Docs is managed in MB - the main couchbase server project. You can move any CBSE tickets you created to MB (under "more actions")
Comment by Don Stacy [ 18/Feb/14 ]
Thank you Dipti. I realized the error of my ways after the fact and let Amy and Ruth know. Amy made the necessary changes. Oddly, I did not get any email notifications about changes to the entry even though I am watching it.
Comment by Ruth Harris [ 04/Apr/14 ]
2.5 Couchbase server manual updated.
Comment by Ruth Harris [ 04/Apr/14 ]
Working on: 2.5 Dev Guide
Left off at:
Section: http://docs.couchbase.com/couchbase-devguide-2.5/#comparing-document-oriented-and-relational-data
Area: Last paragraph
Issues: Change ‘Amtel brewery’ to ‘Amstel brewery'
 **** Don't update this one. Amtel brewery is correct. Amstel and Buds are the beer.s




[MB-10229] Docs: SASL authentication version notes incorrect? Created: 17/Feb/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Don Stacy Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
Section: http://docs.couchbase.com/couchbase-devguide-2.5/#providing-sasl-authentication
Area: Third paragraph
Issues: I think this is meant to say that our approach changed in version 2.2 from PLAIN to CRAM-MD5, as it states two statuses for encrypting. Yet we say both are ‘as of Couchbase Server 2.2’.




[MB-10419] docs: ports listed in the FAQ section are incorrect Created: 11/Mar/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Dave Rigby Assignee: Gwen Leong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
From
From http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#faqs the listed ports are incorrect - I'm seeing ports "1" and "0" mentioned which aren't correct at all. From the docs:

----

What ports does couchbase Server need to run on?
The following TCP ports should be available:
8091 — GUI and REST interface
1 — Server-side Moxi port for standard memcached client access
0 — native couchbase data port
0 to 21199 — inclusive for dynamic cluster communication

----

(as an aside, this is lacking a large number of ports. Might be worth just removing this & replacing with a link to the ports table in the install guide).

 Comments   
Comment by Ruth Harris [ 04/Apr/14 ]
Fixed in 3.0

FAQ
Which ports does Couchbase Server need?

4369 - Erlang port mapper (epmd)
8091 - GUI and REST interface
8092 - Couchbase API port
11209, 11210, 11211, 11214, 11215 - Bucket and proxy ports
18091, 18092 - Internal REST and CAPI HTTPS for SSL
21100 to 21199 - Inclusive for dynamic cluster communication


Gwen,
For 2.5 put in this same information.
For 2.2 and 2.1, put in the ports that are listed in the Network Ports section (under installing)




[MB-10284] New connection management text modifications Created: 23/Feb/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
In this page: http://docs.couchbase.com/couchbase-manual-2.5/cb-release-notes/#optimized-connection-management

It states that an error would be returned if a configuration request is received on port 8091, but that's not the case as both the new clients and the new server version are backwards compatible with each other.

Can we remove that text about the error, and also add some comments around backwards compatibility with older client libraries still being able to access a 2.5 server as well as the newer clients still being able to access an older server? Also, please add a comment that it's not just the server that has to be at a certain version, but the client libraries also need updating as well.

Lastly, have we updated the rest of the 2.5 documentation (dev guide, SDK documentation, server docs) to reflect this change? There are probably a few places throughout many of the manuals that refer to the configuration and topology being gotten through 8091 and those would have to change with this. If we've already done that (or when we do), perhaps a link from the release notes to the more detailed descriptions of how this all works would be appropriate.




[MB-9143] Allow replica count to be edited Created: 17/Sep/13  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 2.5.0

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-2512 Allow replica count to be edited Closed

 Description   
Currently the replication factor cannot be edited after a bucket has been created. It would be nice to have this functionality.

 Comments   
Comment by Ruth Harris [ 06/Nov/13 ]
Currently, it's added to the 3.0 Eng branch by Alk. See MB-2512. This would be a 3.0 doc enhancement.
Comment by Perry Krug [ 25/Mar/14 ]
FYI, this is already in as-of 2.5 and probably needs to be documented there as well...if possible before 3.0




[MB-10294] Minor formatting and path update needed Created: 25/Feb/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This page: http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#user-defined-ports could use some quick cleanup for formatting.

-There is an extra bullet under the first #2
-The second set of numeric ordering restarts at #1 again
-In this second set, for #2, the path is incomplete, it should be /opt/couchbase/var/lib/couchbase/config/config.dat

Also, it may be a little bit confusing for someone to be looking at configuring "user-defined ports" before they've actually installed the software. Does it make sense to have this as an appendix instead of in the "getting started"? Additionally, this page links to the RHEL installation instructions, but doesn't mention what operating systems this section is actually supported on...if it is supported across more than just RHEL, perhaps it should have links to the other installation sections, or just leave it out.




[MB-9611] Doc: For 2.2, the Rest API information for "create/edit buckets" redirects to "Web console instructions" instead, Created: 19/Nov/13  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Ketaki Gangal Assignee: Gwen Leong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
http://docs.couchbase.com/couchbase-manual-2.2/#creating-and-editing-data-buckets

If you click on the REST API section and Create/Edit bucket link, it redirects to the Web-Console way of doing so.

Expected command for create bucket info can be referred on previous documentation or here

- Create Bucket
ake a new bucket request to the REST endpoint for buckets and provide the new bucket settings as request parameters:

shell> curl -u Administrator:password \
-d name=newBucket -d ramQuotaMB=100 -d authType=none \
-d replicaNumber=1 -d proxyPort=11215 http://localhost:8091/pools/default/buckets

You can check your new bucket exists and is running by making a request REST request to the new bucket:

curl http://localhost:8091/pools/default/buckets/newBucket

Reference link : http://www.couchbase.com/docs/couchbase-devguide-2.0/creating-a-bucket.html

 Comments   
Comment by Ruth Harris [ 25/Mar/14 ]
Issue 1:
----------
For 2.2 and earlier releases, CHANGE one of the headers. Suggestion: change the REST API > Managing Buckets >
Use "or" instead of "and":
CHANGE "Creating and Editing Data Buckets"
TO: "Creating or Editing Data Buckets"

This is not an issue with 2.5 because I broke out CB into 4 sub-folders/major topics. Admin info is separate from REST API info.

Issue 2:
----------
Additional information provided (in this bug) for the Bucket REST API.
The documentation already has the same type of information as above. No need to update documentation with it.




[MB-9163] Upgrade instructions only state 2.1 Created: 23/Sep/13  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Gwen Leong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File upgrading.markdown     PDF File upgrading.pdf    

 Description   
There are specific upgrade instructions for upgrading to 2.1, but not 2.2...in the 2.2 manual

 Comments   
Comment by Dipti Borkar [ 23/Sep/13 ]
Perry please provide link to the page that needs to be updated
Comment by Perry Krug [ 24/Sep/13 ]
http://docs.couchbase.com/couchbase-manual-2.2/#upgrading-to-couchbase-server-21 is the specific link, but in fact there are multiple places under: http://docs.couchbase.com/couchbase-manual-2.2/#installing-and-upgrading that need to be updated to reference 2.2 instead of 2.1. I also see 209 references to 2.1 in the 2.2 manual. Obviously not all need to change, but it's not just the title of links in one spot so we may need a more formal review. Also, the installation examples reference 2.1 in the code and screenshots.
Comment by Ruth Harris [ 06/Feb/14 ]
Upgrading section:

Updated 2.5 upgrade paths:
 * Couchbase 1.8.1 to Couchbase 2.0 and higher
 * Couchbase 2.0.x to Couchbase 2.1 and higher
 * Couchbase 2.1.x to Couchbase 2.2 and higher
 * Couchbase 2.2 to Couchbase 2.5 and higher
Comment by Ruth Harris [ 06/Feb/14 ]
I combed through 2.5.
Gwen, to fix 2.2, see my attachments. both .markdown and pdf are attached
Comment by Ruth Harris [ 06/Feb/14 ]
This is from the 2.5 Upgrading section
Comment by Gwen Leong [ 26/Feb/14 ]
Searched entire 2.2 Manual for all instances of "2.1." Updated as appropriate.
I also updated the code.
Perry, I didn't see the screenshots you were referring to. Please attach examples.
Comment by Perry Krug [ 27/Feb/14 ]
Thanks for the changes. A few more comments:
 
-I think we still need a better way of handling this though. Looking at the 2.5 manual, we are still referencing older versions of the software in our examples.
i.e., under this link (http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#red-hat-linux-installation) and all of the installation examples, we have different versions listed, sometimes it's 2.0 CE, sometimes it's 2.1...all of that should get updated to 2.5 and tracked for the future. Maybe there's a better way to make it generic so that you don't have to update it each time, but otherwise we'd have to.

-It will take a bit of technical understanding, but there are places where it does make sense to reference an older version. However, I think it should be done to be generic for that version and above. I.e, this link (http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#hostnames-when-upgrading-to-21) should really be "...when upgrading to 2.1+" to indicate that it's not _just_ for upgrading to 2.1, but in fact any version after that.

-On the same page (http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#hostnames-when-upgrading-to-21), the text that links back to the "upgrade" instructions still actually says "2.1" in it which should be 2.5 or just a general "Upgrading to Couchbase Server" without a version

I found all of these examples just by doing a "cmd-f" through the manual looking for 2.0/2.1/2.2/etc and ignoring the ones that are appropriate...there are certainly others in there. It's certainly not a matter of the docs being wildly incorrect, it's just a matter of cleanup to make the end-user's experience clearer and smoother.




[MB-10328] pre tag visible in documentation for upgrade Created: 02/Mar/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Matt Ingenthron Assignee: Gwen Leong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File upgrade doc bug.png    

 Description   
I happened to look at the upgrade docs and noticed that the upgrade commands aren't rendering correctly, since I see the <pre> tag there.

 Comments   
Comment by Ruth Harris [ 04/Apr/14 ]
Saw it in the Upgrading Individual Nodes sections.
More examples of poor conversion from docbook to markdown.
Remove the indenting of the text and put either the hash marks around code itself or put in the <code></code> tags
Make sure that each list platform are h3s. Ubuntu/Debian Linux¶ is tagged up wrong (h2), should be h3 (three # marks)
Fixed in 2.5.
Comment by Ruth Harris [ 04/Apr/14 ]
Need 2.2, 2.1, 2.0 fixed




[MB-9912] Verify notes are formatted correctly on 2.5 doc Created: 14/Jan/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Amy Kurtzman Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PDF File couchbase-manual-2.1.0.pdf    

 Description   
When the documentation set was converted from docbook to markdown, callouts such as notes, warnings, tips, and best practices lost their special formatting. The pages on the current documentation website need to be compared with the PDFs from the old site to find the callouts that need to be reformatted.

Start with the Couchbase Server 2.2 manual (and use the 2.1 PDF as the basis for the comparison). The next one should be the Couchbase Developer Guide 2.2. After those, we can decide how far back to take it.

When doing the comparison, in the PDF, please use the annotation feature to mark the problem callouts. That will make it easier to find the places that will need to be changed in the 2.5 documentation.

Please fix the callouts by using the HTML markup for callouts.

 Comments   
Comment by Gwen Leong [ 14/Jan/14 ]
Hi Ruth,

This is my newest-of-the-new complete finds on annotations and gray boxes to be formatted.

- Gwen




[MB-10536] Release Notes for ElasticSearch Plugin update Created: 21/Mar/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Add Release Notes for ElasticSearch Plugin update

 Comments   
Comment by Anil Kumar [ 25/Mar/14 ]
Ruth - work with Maria to get fixed, known issues list for ElasticSearch RN.




[MB-10221] Incorrect logs location for Windows and Mac OS installation on 2.X manuals Created: 14/Feb/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged

 Description   
Current log location from http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#troubleshooting

for Mac on doc : ~/Library/Logs
Correct location : /Users/couchbase/Library/Application Support/Couchbase/var/lib/couchbase/logs

for Windows on doc: C:\Program Files\Couchbase\Server\log
Correct location : C:\Program Files\Couchbase\Server\var\lib\couchbase\logs




[MB-10300] [DOC] XDCR SSL uses only internally created self-signed certificate Created: 25/Feb/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We need to clarify that certificate info which needs to be provided for XDCR encryption currently works only with internally generated slef-signed certificate and not with externally supplied certificate.




[MB-8979] bucket creation REST API incorrectly complains on valid fields Created: 27/Aug/13  Updated: 18/Apr/14  Resolved: 18/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Artem Stemkovski
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
[root@cb1 ~]# /opt/couchbase/bin/couchbase-cli bucket-create -c localhost:8091 -u Administrator -p password --bucket=beer_backup2 --bucket-ramsize=1000 --bucket-port=11211 --bucket-type=couchbase --bucket-replica=1
ERROR: unable to bucket-create; please check your username (-u) and password (-p); (400) Bad Request
{u'errors': {u'replicaNumber': u'Warning, you do not have enough servers to support this number of replicas.', u'name': u'Bucket with given name already exists', u'ramQuotaMB': u'RAM quota specified is too large to be provisioned into this cluster.'}, u'summaries': {u'ramSummary': {u'thisUsed': 0, u'thisAlloc': 1048576000, u'otherBuckets': 1153433600, u'nodesCount': 1, u'free': -1021313024, u'perNodeMegs': 1000, u'total': 1180696576}, u'hddSummary': {u'thisUsed': 0, u'total': 5887008768, u'otherBuckets': 16891451, u'free': 2295933420, u'otherData': 3574183897}}}
ERROR: Warning, you do not have enough servers to support this number of replicas.
ERROR: Bucket with given name already exists
ERROR: RAM quota specified is too large to be provisioned into this cluster.
[root@cb1 ~]#


This should just be because the bucket already exists, but the command returns all errors which makes it hard to understand which one the problem is. I've observed this same behavior with other types of errors.

 Comments   
Comment by Bin Cui [ 23/Oct/13 ]
ERROR: Warning, you do not have enough servers to support this number of replicas.
ERROR: Bucket with given name already exists
ERROR: RAM quota specified is too large to be provisioned into this cluster.

The above errors are returned from REST API call /pools/default/buckets. Anything can we do from ns_server side to narrow down the specific error?
Comment by Aleksey Kondratenko [ 23/Oct/13 ]
I disagree about narrowing. Why would we do that ?
Comment by Bin Cui [ 23/Oct/13 ]
That's exactly what Perry asks for.
Comment by Aleksey Kondratenko [ 23/Oct/13 ]
Disagree that it's bug.

We return everything that will prevent you from creating bucket. Why it's not right thing to do?
Comment by Perry Krug [ 24/Oct/13 ]
It's actually not quite that clear.

In the above request, the actual error is that the bucket still exists. However:
-There are enough servers to support this number of replicas
-The RAM quota is not too large to be provisioned in
-Changing just the name of the bucket lets the request go through

Here's another example. Create a single-node with just the default bucket at 100mb and plenty of RAM. Then I ran this:
[root@cb1 ~]# /opt/couchbase/bin/couchbase-cli bucket-create -c localhost:8091 -u Administrator -p password --bucket=default --bucket-ramsize=1000 --bucket-port=11211 --bucket-type=couchbase --bucket-replica=1
ERROR: unable to bucket-create; please check your username (-u) and password (-p); (400) Bad Request
{u'errors': {u'replicaNumber': u'Warning, you do not have enough servers to support this number of replicas.', u'name': u'Bucket with given name already exists'}, u'summaries': {u'ramSummary': {u'thisUsed': 0, u'thisAlloc': 1048576000, u'otherBuckets': 104857600, u'nodesCount': 1, u'free': 27262976, u'perNodeMegs': 1000, u'total': 1180696576}, u'hddSummary': {u'thisUsed': 0, u'total': 5887008768, u'otherBuckets': 19033557, u'free': 2354803508, u'otherData': 3513171703}}}
ERROR: Warning, you do not have enough servers to support this number of replicas.
ERROR: Bucket with given name already exists
[root@cb1 ~]# /opt/couchbase/bin/couchbase-cli bucket-create -c localhost:8091 -u Administrator -p password --bucket=default2 --bucket-ramsize=1000 --bucket-port=11211 --bucket-type=couchbase --bucket-replica=1
SUCCESS: bucket-create
[root@cb1 ~]# /opt/couchbase/bin/couchbase-cli bucket-create -c localhost:8091 -u Administrator -p password --bucket=default2 --bucket-ramsize=1000 --bucket-port=11211 --bucket-type=couchbase --bucket-replica=1
ERROR: unable to bucket-create; please check your username (-u) and password (-p); (400) Bad Request
{u'errors': {u'replicaNumber': u'Warning, you do not have enough servers to support this number of replicas.', u'name': u'Bucket with given name already exists', u'ramQuotaMB': u'RAM quota specified is too large to be provisioned into this cluster.'}, u'summaries': {u'ramSummary': {u'thisUsed': 0, u'thisAlloc': 1048576000, u'otherBuckets': 1153433600, u'nodesCount': 1, u'free': -1021313024, u'perNodeMegs': 1000, u'total': 1180696576}, u'hddSummary': {u'thisUsed': 0, u'total': 5887008768, u'otherBuckets': 19033595, u'free': 2354803508, u'otherData': 3513171665}}}
ERROR: Warning, you do not have enough servers to support this number of replicas.
ERROR: Bucket with given name already exists
ERROR: RAM quota specified is too large to be provisioned into this cluster.
[root@cb1 ~]#

You see that the first failure actually complains about username password being wrong (they're correct). The second time I changed the bucket name and it succeeded perfectly. Running a 3rd time with the new bucket name generates the same failures which are not correct...that's the bug
Comment by Aleksey Kondratenko [ 24/Oct/13 ]
Thanks. Noted.
Comment by Artem Stemkovski [ 17/Apr/14 ]
I fixed the bogus "check username" message here: http://review.couchbase.org/35983/

Though I do not see other error messages appearing without a proper cause.




[MB-10861] test in 3.0 took more than 10X time to finish the test compare to 2.5.1 Created: 15/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Thuan Nguyen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: ubuntu 12.04 64-bit

Triage: Triaged
Is this a Regression?: Unknown

 Description   
Test to run on one vm
castest.opschangecas.OpsChangeCasTests.touch_test,value_size=256

Modify active_resident_threshold to 30
Change timeout to 7200 (since it will fail in 3.0 due to test run more than 90 minutes)

In 2.5.1-1073, it takes about 8 minutes to finish
In 3.0.0-580, it takes about 100 minutes to finish

 Comments   
Comment by Chiyoung Seo [ 15/Apr/14 ]
Tony,

I'm not sure if I understand this issue correctly or not. Can you provide more details (e.g., test scenarios) for this issue?
Comment by Chiyoung Seo [ 15/Apr/14 ]
Can you please debug this issue more? In the engine side, it will take much time to debug this issue without more details regarding the test cases.




[MB-10457] XDCR: Some docs not replicated after deletion and recreation of destination bucket Created: 13/Mar/14  Updated: 18/Apr/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket, cross-datacenter-replication
Affects Version/s: 3.0, 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File cbcollect_MB10457.tar     Text File diff.txt     Text File missing_items_vb_info.txt     PNG File Screen Shot 2014-03-13 at 11.58.43 AM.png     PNG File Screen Shot 2014-04-10 at 5.04.01 PM.png     PNG File Screen Shot 2014-04-10 at 5.05.43 PM.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: No

 Description   
Build
--------
3.0.0-432

Scenario
--------------
1. Setup two 2node clusters, 1 default bucket on each cluster, bi-dir replication between them. Start loading 1000 docs.
2. Pause replication on either sides, delete destination bucket and recreate it. Do not create replication to source cluster. No workload on dest cluster.
3. Resume replication from source cluster and wait for replication to end.

Item count on source = 10000, on dest = 9990, no xdcr activity seen for 10 mins and longer.

Reproducible
------------------
Consistently reproducible with -
./testrunner -i bixdcr.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,items=10000,delete_bucket=destination,replication_type=xmem,pause=source

Observations
------------------
1. This issue is only seen in scenarios where there is no workload on source after resume and the source cluster is only replicating what it already has in memory. After resume when there's still workload happening, all docs are replicated.
2. Not sure if this can be seen with plain xdcr without pause/resume. Will try and update the issue.
3. Data replication to destination is spikey and slow(screenshot attached). Although there is no data load on source, outbound mutations are seen in spurts(like spikes), which wakes up replicators in spurts(which is justified) which is reflected on incoming_ops in dest cluster. So here it doesn't appear to be a problem with replicators itself. My qn here is - is it ok to see spikey outbound mutations when there is no data load?

Attaching cbcollect logs.
.186, .187 --> source cluster
.188, .189 --> destination cluster


 Comments   
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
I expect latest code (with beginnings of MB-10057 fix in 3.0) to address that case
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
Latest code was merged yesterday and therefore is not part of build you have
Comment by Aruna Piravi [ 13/Mar/14 ]
Alright, will test when we have a build. Thanks.
Comment by Aruna Piravi [ 20/Mar/14 ]
Tested in build: 462 (has the above commit), tested thrice, item counts never matched even once.

From last run:

Src cluster item count : 80000
Dest cluster item count: 75463

More details:
Cbcollect - https://s3.amazonaws.com/bugdb/jira/MB-10457/cbcollect.tar.gz
Items not replicated - https://s3.amazonaws.com/bugdb/jira/MB-10457/diff.txt
Data files -
[Source]
 https://s3.amazonaws.com/bugdb/jira/MB-10457/186_data.tar.gz
 https://s3.amazonaws.com/bugdb/jira/MB-10457/187_data.tar.gz
[Destination]
 https://s3.amazonaws.com/bugdb/jira/MB-10457/188_data.tar.gz
 https://s3.amazonaws.com/bugdb/jira/MB-10457/189_data.tar.gz
Comment by Aruna Piravi [ 20/Mar/14 ]
Pls note that in the above case, data load continued in source cluster even when destination bucket was deleted and recreated and even after that. This is contrary to my previous observation.
Comment by Aleksey Kondratenko [ 09/Apr/14 ]
I've added some more diagnostic to try to catch this.

I tried reproducing myself but in few attempts could not hit the bug.

See instructions here: http://review.couchbase.org/35275

on how to enable trace logging for xdcr.
Comment by Aruna Piravi [ 10/Apr/14 ]
Consistently reproducible with -
./testrunner -i bixdcr.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,items=10000,delete_bucket=destination,replication_type=xmem,pause=source

Applied trace logging for all 4 nodes.
Source item count : 10000
Destination item count : 6732

Seeing many xdcr errors on the source -

[xdcr:error,2014-04-10T16:20:33.918,ns_1@10.3.4.186:<0.3949.1>:xdc_vbucket_rep:handle_info:118]Error initializing vb replicator ({init_state,
                                   {rep,
                                    <<"c021d5d81e64e1be37baa884b8b2f716/default/default">>,
                                    <<"default">>,
"xdcr_errors.1" 6761L, 459636C
                                    [{max_concurrent_reps,32},
                                     {checkpoint_interval,1800},
                                     {doc_batch_size_kb,2048},
                                     {failure_restart_interval,1},
                                     {worker_batch_size,500},
                                     {connection_timeout,180},
                                     {worker_processes,4},
                                     {http_connections,20},
                                     {retries_per_request,2},
                                     {optimistic_replication_threshold,256},
                                     {socket_options,
                                      [{keepalive,true},{nodelay,false}]},
                                     {pause_requested,false},
                                     {supervisor_max_r,25},
                                     {supervisor_max_t,5}]},
                                   443,"xmem",<0.20748.1>,<0.20749.1>,
                                   <0.20745.1>}):{error,function_clause}
[xdcr:error,2014-04-10T16:27:04.954,ns_1@10.3.4.186:<0.26774.2>:xdc_vbucket_rep:terminate:507]Shutting xdcr vb replicator ({init_state,
                              {rep,
                               <<"c021d5d81e64e1be37baa884b8b2f716/default/default">>,
                               <<"default">>,
                               <<"/remoteClusters/c021d5d81e64e1be37baa884b8b2f716/buckets/default">>,
                               "xmem",
                               [{max_concurrent_reps,32},
                                {checkpoint_interval,1800},
                                {doc_batch_size_kb,2048},
                                {failure_restart_interval,1},
                                {worker_batch_size,500},
                                {connection_timeout,180},
                                {worker_processes,4},
                                {http_connections,20},
                                {retries_per_request,2},
                                {optimistic_replication_threshold,256},
                                {socket_options,
                                 [{keepalive,true},{nodelay,false}]},
                                {pause_requested,false},
                                {supervisor_max_r,25},
                                {supervisor_max_t,5}]},

Attaching cbcollect info. [ Source : .186, .187 Destination: .188,.189] and missing items info.
Comment by Aruna Piravi [ 10/Apr/14 ]
Pls note in the latest screenshots uploaded - Of the first 3277 items(from a query sorted based on key), only the following 10 keys have been replicated. All keys from "loadOne3278" have been replicated.

{"id":"loadOne1801","key":"loadOne1801","partition":772,"node":"http://10.3.4.189:8092/_view_merge/","value":"1-000d37b2b9926ca90000000000000000"},
{"id":"loadOne1991","key":"loadOne1991","partition":772,"node":"http://10.3.4.189:8092/_view_merge/","value":"1-000d37b2f1b940340000000000000000"},
{"id":"loadOne3263","key":"loadOne3263","partition":206,"node":"local","value":"1-000d37b55fd6f78f0000000000000000"},
{"id":"loadOne3265","key":"loadOne3265","partition":429,"node":"local","value":"1-000d37b5603634610000000000000000"},
{"id":"loadOne3266","key":"loadOne3266","partition":164,"node":"local","value":"1-000d37b56058dbb60000000000000000"},
{"id":"loadOne3269","key":"loadOne3269","partition":27,"node":"local","value":"1-000d37b560d0ca6a0000000000000000"},
{"id":"loadOne3270","key":"loadOne3270","partition":220,"node":"local","value":"1-000d37b560f1ca630000000000000000"},
{"id":"loadOne3273","key":"loadOne3273","partition":469,"node":"local","value":"1-000d37b561a839a50000000000000000"},
{"id":"loadOne3275","key":"loadOne3275","partition":182,"node":"local","value":"1-000d37b56233f3760000000000000000"},
{"id":"loadOne3276","key":"loadOne3276","partition":447,"node":"local","value":"1-000d37b5624c68f60000000000000000"},
Comment by Aleksey Kondratenko [ 11/Apr/14 ]
I do see that xdcr replicates missing docs. At least attempts to. So it's either interesting bug in ep-engine or xmem out-bound path bug.

Can you please try exact same test against 2.5.1 ?
Comment by Aruna Piravi [ 12/Apr/14 ]
I remember seeing this with CAPI too. Will try against 2.5.1 and upload logs.
Comment by Aleksey Kondratenko [ 12/Apr/14 ]
In that case please do some test runs with CAPI too. Preferably all permutations of 3.0/2.5.1 and xmem/capi.
Comment by Aruna Piravi [ 12/Apr/14 ]
ok, in which case will cbcollect alone suffice?
Comment by Aleksey Kondratenko [ 12/Apr/14 ]
>> ok, in which case will cbcollect alone suffice?

In this case just knowing in which cases we can hit the bug should be more or less sufficient.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Waiting results still
Comment by Aruna Piravi [ 17/Apr/14 ]
On my queue next. Will get you results in a bit.
Comment by Aruna Piravi [ 17/Apr/14 ]
Seen with CAPI in 3.0. Missing items are much lesser than XMEM though. Source: 10000, destination: 9916.
Let me now test with 2.5.1.
Comment by Aruna Piravi [ 17/Apr/14 ]
Reproducible in 2.5.1 with both CAPI and XMEM.

CAPI : Source:10000 Dest: 9918
XMEM: Source:10000 Dest: 9738
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Ok. It makes it more likely to be ep-engine bug.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Given that there's evidence that xdcr did replicated missing elements to ep-engine, it does look like some bug in ep-engine.




[MB-10875] Flusher queue doesn't get flushed Created: 16/Apr/14  Updated: 18/Apr/14

Status: In Progress
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Aruna Piravi Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: ep-engine
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centOS, 64 bit , build 585, seen in builds as old as 555.

Attachments: Zip Archive 10.3.4.186-4162014-1742-diag.zip     Zip Archive 10.3.4.187-4162014-1744-diag.zip     Zip Archive 10.3.4.188-4162014-1753-diag.zip     Zip Archive 10.3.4.189-4162014-1755-diag.zip     PNG File Disk_Write.png     PNG File Screen Shot 2014-04-16 at 5.11.17 PM.png    
Issue Links:
Duplicate
is duplicated by MB-10824 1 replica item gets stuck in disk wri... Closed
Triage: Triaged
Is this a Regression?: Yes

 Description   
Scenario
--------------
- flusher queue always has a few items and never goes to zero even after minutes of waiting
- seen both on cluster_run and servers on vms
- reproducible consistently with
./testrunner -i cluster_run.ini -t xdcr.uniXDCR.unidirectional.load_with_async_ops,items=1000,rdirection=unidirection,ctopology=chain,doc-ops=delete-delete
- seen with 1000, 100 items
- not seen in some cases like
./testrunner -i cluster_run.ini -t xdcr.uniXDCR.unidirectional.load_with_ops,replicas=1,items=10000,value_size=128,ctopology=chain,rdirection=unidirection,doc-ops=update-delete
- consistently seen after warmup
- all tests that wait for final verification after drain queue size becomes 0 are timing out.

Setup
---------
Source cluster : 10.3.4.186, 10.3.4.187
Destination : 10.3.4.188, 10.3.4.189

setup live at 10.3.4.187:8091 , default login credentials for SSH and GUI.


GDB info on .187
----------------------------
Thread 13 (Thread 0x7effd79ad700 (LWP 4381)):
#0 0x00007effd904174d in read () from /lib64/libc.so.6
#1 0x00007effd8fd7fe8 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00007effd8fd9aee in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00007effd8fce1ca in _IO_getline_info_internal () from /lib64/libc.so.6
#4 0x00007effd8fcd029 in fgets () from /lib64/libc.so.6
#5 0x00007effd79ae8b1 in check_stdin_thread (arg=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/extensions/daemon/stdin_check.c:38
#6 0x00007effdb0f7b6f in platform_thread_wrap (arg=0x29d4070) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#7 0x00007effd9ead9d1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007effd904eb6d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7effd6d98700 (LWP 4382)):
#0 0x00007effd9eb198e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007effdb0f78eb in cb_cond_timedwait (cond=0x7effd6face60, mutex=0x7effd6face20, ms=<value optimized out>) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:156
#2 0x00007effd6d9c548 in logger_thead_main (arg=0x2a14b00) at /home/buildbot/centos-5-x64-300-builder/build/build/memcached/extensions/loggers/file_logger.c:372
#3 0x00007effdb0f7b6f in platform_thread_wrap (arg=0x29d4080) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007effd9ead9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007effd904eb6d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7effd618a700 (LWP 4383)):
#0 0x00007effd904f163 in epoll_wait () from /lib64/libc.so.6
#1 0x00007effda68e376 in epoll_dispatch (base=0xbd60500, tv=<value optimized out>) at epoll.c:404
#2 0x00007effda679c44 in event_base_loop (base=0xbd60500, flags=<value optimized out>) at event.c:1558
#3 0x00007effdb0f7b6f in platform_thread_wrap (arg=0x29d4190) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007effd9ead9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007effd904eb6d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7effd5789700 (LWP 4384)):
#0 0x00007effd904f163 in epoll_wait () from /lib64/libc.so.6
#1 0x00007effda68e376 in epoll_dispatch (base=0xbd60280, tv=<value optimized out>) at epoll.c:404
#2 0x00007effda679c44 in event_base_loop (base=0xbd60280, flags=<value optimized out>) at event.c:1558
#3 0x00007effdb0f7b6f in platform_thread_wrap (arg=0x29d4180) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007effd9ead9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007effd904eb6d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7effd4d88700 (LWP 4385)):
#0 0x00007effd904f163 in epoll_wait () from /lib64/libc.so.6
#1 0x00007effda68e376 in epoll_dispatch (base=0xbd60c80, tv=<value optimized out>) at epoll.c:404
#2 0x00007effda679c44 in event_base_loop (base=0xbd60c80, flags=<value optimized out>) at event.c:1558
#3 0x00007effdb0f7b6f in platform_thread_wrap (arg=0x29d4170) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007effd9ead9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007effd904eb6d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7effd4387700 (LWP 4386)):
#0 0x00007effd904f163 in epoll_wait () from /lib64/libc.so.6
#1 0x00007effda68e376 in epoll_dispatch (base=0xbd60a00, tv=<value optimized out>) at epoll.c:404
#2 0x00007effda679c44 in event_base_loop (base=0xbd60a00, flags=<value optimized out>) at event.c:1558
#3 0x00007effdb0f7b6f in platform_thread_wrap (arg=0x29d4160) at /home/buildbot/centos-5-x64-300-builder/build/build/platform/src/cb_pthreads.c:19
#4 0x00007effd9ead9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007effd904eb6d in clone () from /lib64/libc.so.6


[root@centos-64-x64 ~]# /opt/couchbase/bin/cbstats localhost:11210 all
 accepting_conns: 1
 auth_cmds: 6
 auth_errors: 0
 bucket_active_conns: 1
 bucket_conns: 13
 bytes: 35154528
 bytes_read: 1277423
 bytes_written: 300104578
 cas_badval: 0
 cas_hits: 0
 cas_misses: 0
 cmd_flush: 0
 cmd_get: 0
 cmd_set: 500
 conn_yields: 85
 connection_structures: 10500
 curr_connections: 17
 curr_conns_on_port_11207: 2
 curr_conns_on_port_11209: 12
 curr_conns_on_port_11210: 3
 curr_items: 350
 curr_items_tot: 700
 curr_temp_items: 0
 daemon_connections: 6
 decr_hits: 0
 decr_misses: 0
 delete_hits: 150
 delete_misses: 0
 ep_access_scanner_last_runtime: 0
 ep_access_scanner_num_items: 0
 ep_access_scanner_task_time: 2014-04-17 23:30:57
 ep_allow_data_loss_during_shutdown: 1
 ep_alog_block_size: 4096
 ep_alog_path: /opt/couchbase/var/lib/couchbase/data/default/access.log
 ep_alog_sleep_time: 1440
 ep_alog_task_time: 10
 ep_backend: couchdb
 ep_bg_fetch_delay: 0
 ep_bg_fetched: 0
 ep_bg_meta_fetched: 0
 ep_bg_remaining_jobs: 0
 ep_bucket_priority: LOW
 ep_chk_max_items: 5000
 ep_chk_period: 1800
 ep_chk_persistence_remains: 0
 ep_chk_persistence_timeout: 10
 ep_chk_remover_stime: 5
 ep_commit_num: 2378
 ep_commit_time: 0
 ep_commit_time_total: 5599
 ep_config_file:
 ep_conflict_resolution_type: seqno
 ep_couch_bucket: default
 ep_couch_host: 127.0.0.1
 ep_couch_port: 11213
 ep_couch_reconnect_sleeptime: 250
 ep_couch_response_timeout: 180000
 ep_data_traffic_enabled: 0
 ep_db_data_size: 274320
 ep_db_file_size: 16666010
 ep_dbname: /opt/couchbase/var/lib/couchbase/data/default
 ep_degraded_mode: 0
 ep_diskqueue_drain: 1272
 ep_diskqueue_fill: 1290
 ep_diskqueue_items: 18
 ep_diskqueue_memory: 1296
 ep_diskqueue_pending: 9036
 ep_exp_pager_stime: 3600
 ep_expired_access: 0
 ep_expired_pager: 0
 ep_failpartialwarmup: 0
 ep_flush_all: false
 ep_flush_duration_total: 8
 ep_flushall_enabled: 0
 ep_flusher_state: running
 ep_flusher_todo: 0
 ep_getl_default_timeout: 15
 ep_getl_max_timeout: 30
 ep_ht_locks: 5
 ep_ht_size: 3079
 ep_initfile:
 ep_io_num_read: 0
 ep_io_num_write: 1265
 ep_io_read_bytes: 0
 ep_io_write_bytes: 261510
 ep_item_begin_failed: 0
 ep_item_commit_failed: 0
 ep_item_eviction_policy: value_only
 ep_item_flush_expired: 0
 ep_item_flush_failed: 0
 ep_item_num_based_new_chk: 1
 ep_items_rm_from_checkpoints: 2402
 ep_keep_closed_chks: 0
 ep_kv_size: 235440
 ep_max_bg_remaining_jobs: 0
 ep_max_checkpoints: 2
 ep_max_failover_entries: 25
 ep_max_item_size: 20971520
 ep_max_num_shards: 4
 ep_max_num_workers: 3
 ep_max_size: 2169503744
 ep_max_threads: 0
 ep_max_vbuckets: 1024
 ep_mem_high_wat: 1844078182
 ep_mem_low_wat: 1627127808
 ep_mem_tracker_enabled: true
 ep_meta_data_memory: 46090
 ep_mlog_compactor_runs: 0
 ep_mutation_mem_threshold: 95
 ep_num_access_scanner_runs: 0
 ep_num_eject_failures: 0
 ep_num_expiry_pager_runs: 2
 ep_num_non_resident: 0
 ep_num_not_my_vbuckets: 0
 ep_num_ops_del_meta: 0
 ep_num_ops_del_meta_res_fail: 0
 ep_num_ops_del_ret_meta: 0
 ep_num_ops_get_meta: 0
 ep_num_ops_get_meta_on_set_meta: 0
 ep_num_ops_set_meta: 0
 ep_num_ops_set_meta_res_fail: 0
 ep_num_ops_set_ret_meta: 0
 ep_num_pager_runs: 0
 ep_num_value_ejects: 0
 ep_num_workers: 4
 ep_oom_errors: 0
 ep_overhead: 27466374
 ep_pager_active_vb_pcnt: 40
 ep_pending_compactions: 0
 ep_pending_ops: 0
 ep_pending_ops_max: 0
 ep_pending_ops_max_duration: 0
 ep_pending_ops_total: 0
 ep_postInitfile:
 ep_queue_size: 18
 ep_rollback_count: 0
 ep_startup_time: 1397691056
 ep_storage_age: 0
 ep_storage_age_highwat: 1
 ep_tap_ack_grace_period: 300
 ep_tap_ack_initial_sequence_number: 1
 ep_tap_ack_interval: 1000
 ep_tap_ack_window_size: 10
 ep_tap_backfill_resident: 0.9
 ep_tap_backlog_limit: 5000
 ep_tap_backoff_period: 5
 ep_tap_bg_fetch_requeued: 0
 ep_tap_bg_fetched: 0
 ep_tap_bg_max_pending: 500
 ep_tap_keepalive: 300
 ep_tap_noop_interval: 20
 ep_tap_requeue_sleep_time: 0.1
 ep_tap_throttle_cap_pcnt: 10
 ep_tap_throttle_queue_cap: 1000000
 ep_tap_throttle_threshold: 90
 ep_tmp_oom_errors: 0
 ep_total_cache_size: 226690
 ep_total_del_items: 265
 ep_total_enqueued: 1290
 ep_total_new_items: 965
 ep_total_persisted: 1230
 ep_uncommitted_items: 0
 ep_uuid: 28b4c2f6d709668a060c9c6489f4003d
 ep_value_size: 189350
 ep_vb0: 0
 ep_vb_snapshot_total: 1704
 ep_vb_total: 1024
 ep_vbucket_del: 512
 ep_vbucket_del_avg_walltime: 67098
 ep_vbucket_del_fail: 0
 ep_vbucket_del_max_walltime: 2666206
 ep_version: 2.1.1r-602-g0e4754a
 ep_waitforwarmup: 0
 ep_warmup: 1
 ep_warmup_batch_size: 1000
 ep_warmup_dups: 0
 ep_warmup_min_items_threshold: 100
 ep_warmup_min_memory_threshold: 100
 ep_warmup_oom: 0
 ep_warmup_thread: complete
 ep_warmup_time: 461934
 get_hits: 0
 get_misses: 0
 incr_hits: 0
 incr_misses: 0
 libevent: 2.0.11-stable
 listen_disabled_num: 0
 max_conns_on_port_11207: 10000
 max_conns_on_port_11209: 1000
 max_conns_on_port_11210: 10000
 mem_used: 35154528
 memcached_version: 2.0.1-macosx-171-g493f088
 pid: 4380
 pointer_size: 64
 rejected_conns: 0
 rusage_system: 265.075702
 rusage_user: 860.965113
 tap_checkpoint_end_received: 335
 tap_checkpoint_end_sent: 322
 tap_checkpoint_start_received: 847
 tap_checkpoint_start_sent: 1346
 tap_connect_received: 2
 tap_delete_received: 150
 tap_delete_sent: 150
 tap_mutation_received: 500
 tap_mutation_sent: 500
 tap_opaque_received: 1026
 tap_opaque_sent: 1028
 threads: 4
 time: 1397698341
 total_connections: 99
 uptime: 7316
 vb_active_curr_items: 350
 vb_active_eject: 0
 vb_active_expired: 0
 vb_active_ht_memory: 12849152
 vb_active_itm_memory: 113345
 vb_active_meta_data_memory: 23045
 vb_active_num: 512
 vb_active_num_non_resident: 0
 vb_active_ops_create: 468
 vb_active_ops_delete: 118
 vb_active_ops_reject: 0
 vb_active_ops_update: 0
 vb_active_perc_mem_resident: 100
 vb_active_queue_age: 115921000
 vb_active_queue_drain: 625
 vb_active_queue_fill: 641
 vb_active_queue_memory: 1152
 vb_active_queue_pending: 8062
 vb_active_queue_size: 16
 vb_dead_num: 0
 vb_pending_curr_items: 0
 vb_pending_eject: 0
 vb_pending_expired: 0
 vb_pending_ht_memory: 0
 vb_pending_itm_memory: 0
 vb_pending_meta_data_memory: 0
 vb_pending_num: 0
 vb_pending_num_non_resident: 0
 vb_pending_ops_create: 0
 vb_pending_ops_delete: 0
 vb_pending_ops_reject: 0
 vb_pending_ops_update: 0
 vb_pending_perc_mem_resident: 0
 vb_pending_queue_age: 0
 vb_pending_queue_drain: 0
 vb_pending_queue_fill: 0
 vb_pending_queue_memory: 0
 vb_pending_queue_pending: 0
 vb_pending_queue_size: 0
 vb_replica_curr_items: 350
 vb_replica_eject: 0
 vb_replica_expired: 0
 vb_replica_ht_memory: 12849152
 vb_replica_itm_memory: 113345
 vb_replica_meta_data_memory: 23045
 vb_replica_num: 512
 vb_replica_num_non_resident: 0
 vb_replica_ops_create: 497
 vb_replica_ops_delete: 147
 vb_replica_ops_reject: 0
 vb_replica_ops_update: 0
 vb_replica_perc_mem_resident: 100
 vb_replica_queue_age: 14490000
 vb_replica_queue_drain: 647
 vb_replica_queue_fill: 649
 vb_replica_queue_memory: 144
 vb_replica_queue_pending: 974
 vb_replica_queue_size: 2
 version: 3.0.0-585-rel


Attaching cbcollect info

 Comments   
Comment by Pavel Paulau [ 16/Apr/14 ]
MB-10824?
Comment by Aruna Piravi [ 16/Apr/14 ]
Yeah, looks like it.
Comment by Aruna Piravi [ 17/Apr/14 ]
I don't mind this being closed as duplicate if that's what it looks like to Sundar too. But I have the system running if either Sundar or Abhinav wants to take a look at. Either way pls consider this issue or 10824 to be test blocker since most of our tests are failing consistently.
Comment by Pavel Paulau [ 17/Apr/14 ]
Likewise I have live cluster with this issue.)

Agree with blocker status as well.

So up to Sundar/Abhinav...

Comment by Pavel Paulau [ 17/Apr/14 ]
I closed my ticket as duplicate.
It's too expensive to keep my cluster for debugging.
Comment by Sundar Sridharan [ 17/Apr/14 ]
Looks like this could be a symptom of MB-10539
Can we verify this with a build having fix for above?
thanks
Comment by Pavel Paulau [ 17/Apr/14 ]
It happened to me in build with fixes from MB-10539.
Comment by Aruna Piravi [ 17/Apr/14 ]
This test was on 3.0.0-585 that contains this fix - http://builder.hq.couchbase.com/#/buildinfo/couchbase-server-enterprise_x86_64_3.0.0-585-rel.rpm
Comment by Sundar Sridharan [ 17/Apr/14 ]
We found this to be a stat only issue. It can be seen that even when disk queue does not go back down, the disk updates per second does show movement indicating that items are getting flushed to disk. thanks
Comment by Meenakshi Goel [ 18/Apr/14 ]
Seen likewise failures during CAS value manipulation tests, 1 item getting stuck in disk write queue.
Build Tested - 3.0.0-590-rel, Please refer attached screenshot "Disk_Write.png".
http://qa.hq.northscale.net/job/centos_x64--28_01--cas-P1/17/console




[MB-10879] Rebalance fails sporadically on employee dataset test (make simple-test) Created: 17/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown




[MB-10876] Items seems to be not getting purged from ep-engine after expiry Created: 17/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Meenakshi Goel Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-587-rel

Attachments: Text File cbstats.txt     Text File Stats.txt     Text File ViewQuery.txt    
Triage: Triaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Yes

 Description   
Seems items not getting purged from ep-engine even after expiry

Jenkins Job Link:
http://qa.hq.northscale.net/job/centos_x64--44_02--replica_read_tests-P0/41/consoleFull

Test to Reproduce:
./testrunner -i /tmp/replica_read.ini get-logs=True,wait_timeout=180,GROUP=P0,get-cbcollect-info=True,get-delays=true -t newmemcapable.GetrTests.getr_test,nodes_init=4,GROUP=P0,expiration=60,wait_expiration=true,error=Not found for vbucket,descr=#simple getr replica_count=1 expiration=60 flags = 0 docs_ops=create cluster ops = None

Logs:
2014-04-16 22:43:27 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items 1000 == 0 expected on '172.23.105.230:8091''172.23.105.231:8091''172.23.105.232:8091''172.23.105.245:8091', default bucket
2014-04-16 22:43:27 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_active_curr_items 1000 == 0 expected on '172.23.105.230:8091''172.23.105.231:8091''172.23.105.232:8091''172.23.105.245:8091', default bucket
2014-04-16 22:43:28 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 1000 == 0 expected on '172.23.105.230:8091''172.23.105.231:8091''172.23.105.232:8091''172.23.105.245:8091', default bucket
2014-04-16 22:43:28 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: curr_items_tot 2000 == 0 expected on '172.23.105.230:8091''172.23.105.231:8091''172.23.105.232:8091''172.23.105.245:8091', default bucket

/opt/couchbase/bin/couch_dbinfo master.couch.1
DB Info (master.couch.1) - header at 12288
   file format version: 11
   update_seq: 3
   no documents
   B-tree size: 116 bytes
   total disk size: 12.1 kB

cbstats:
 /opt/couchbase/bin/cbstats 172.23.105.230:11210 all | grep items
 curr_items: 249
 curr_items_tot: 510
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 5000
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 1038
 ep_total_del_items: 0
 ep_total_new_items: 510
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 249
 vb_pending_curr_items: 0
 vb_replica_curr_items: 261

Notes:
Please refer attached Stats.txt
Created View and attaching ViewQuery Logs if of any help.
Uploading logs

 Comments   
Comment by Meenakshi Goel [ 17/Apr/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-10876/7f6c45fe/172.23.105.230-011.zip
https://s3.amazonaws.com/bugdb/jira/MB-10876/0d3b37e0/172.23.105.231-012.zip
https://s3.amazonaws.com/bugdb/jira/MB-10876/e0c9457f/172.23.105.232-012.zip
https://s3.amazonaws.com/bugdb/jira/MB-10876/13f68e9c/172.23.105.245-014.zip




[MB-10856] Persistence and internal replication(TAP) are broken Created: 14/Apr/14  Updated: 18/Apr/14

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive 10.1.2.12-4172014-146-diag.zip     Zip Archive 10.1.3.93-4172014-1356-diag.zip     Zip Archive 10.1.3.94-4172014-1359-diag.zip     Zip Archive 10.1.3.95-4172014-143-diag.zip     Zip Archive 10.1.3.96-4172014-148-diag.zip     Zip Archive 10.1.3.97-4172014-1411-diag.zip     Zip Archive 10.1.3.99-4172014-1413-diag.zip    
Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Aleksey Kondratenko [ 14/Apr/14 ]
I've merged diagnostics improvement that we've discussed here: http://review.couchbase.org/35715
Comment by Mike Wiederhold [ 14/Apr/14 ]
Alk,

I re-ran the test and this time I saw that the deletes were not sent through upr, but the notifier connection was not registered for the vbucket that didn't send its deletes. As a result ns_server appears to have never received them due to the fact it never asked. I can run through this with you tomorrow.
Comment by Aleksey Kondratenko [ 15/Apr/14 ]
Sure. Feel free to NMI me any time.
Comment by Mike Wiederhold [ 15/Apr/14 ]
http://review.couchbase.org/#/c/35748/
Comment by Aruna Piravi [ 17/Apr/14 ]
Seeing data mismatch consistently on build 588 with -

./testrunner -i unixdcr.ini -t xdcr.uniXDCR.unidirectional.load_with_async_ops,items=100000,rdirection=unidirection,ctopology=chain,doc-ops=update, delete,sasl_buckets=1,replication_type=xmem

Will attach logs.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
imho given current rate of changes reopening is not most productive way to deal with tickets. But that's IMHO
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Also feel free to pass to me once you have all the data.
Comment by Aruna Piravi [ 17/Apr/14 ]
I'm seeing some weird things.

There are 2 buckets, both have mismatch, let us just take default.

Source: 70403 (cbstats is consistent with this value)
[root@cen-0401 ~]# /opt/couchbase/bin/cbstats localhost:11211 all|grep curr_items
 curr_items: 70403
Query on source returns only 70236 (http://10.1.3.93:8092/default/_design/dev_new/_view/docs?full_set=true&descending=false&stale=false&connection_timeout=60000&limit=100000&skip=0)

Destination : 7000, query returns 7000 items. (http://10.1.3.96:8092/default/_design/dev_new/_view/docs?full_set=true&descending=false&stale=false&connection_timeout=60000&limit=100000&skip=0)

The diff I provide based on the query is not going to be reliable.

The clusters are available for debugging.
Comment by Aruna Piravi [ 17/Apr/14 ]
 Source : http://10.1.3.93:8091/
 Dest : http://10.1.3.96:8091/
Comment by Aruna Piravi [ 17/Apr/14 ]
I reopened this because I did not see any mismatch on a test that had only sets and no deletes in workload. If it's not related to the deletes, let me know, will be happy to open another MB.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
In data files I'm seeing discrepancy even in source cluster between master and replica vbuckets. A lot.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Even counts don't match between master and replica in source cluster
Comment by Aruna Piravi [ 17/Apr/14 ]
Is this related to deletes?
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
probably not, but I've yet to complete looking at data
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Seeing some items _only_ on destination in .couch files:

{"loadOne1944\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="704",
    id="loadOne1944\n",
    rev="2\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="704",
    id="loadOne1944\n",
    rev="2\n">],
 "loadOne38518\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="743",
    id="loadOne38518\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="743",
    id="loadOne38518\n",
    rev="1\n">],
 "loadOne974\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="718",
    id="loadOne974\n",
    rev="2\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="718",
    id="loadOne974\n",
    rev="2\n">],
 "loadOne37948\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="745",
    id="loadOne37948\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="745",
    id="loadOne37948\n",
    rev="1\n">],
 "loadOne922\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="730",
    id="loadOne922\n",
    rev="2\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="730",
    id="loadOne922\n",
    rev="2\n">],
 "loadOne878\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="698",
    id="loadOne878\n",
    rev="2\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="698",
    id="loadOne878\n",
    rev="2\n">],
 "loadOne38529\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="717",
    id="loadOne38529\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="717",
    id="loadOne38529\n",
    rev="1\n">],
 "loadOne38649\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="721",
    id="loadOne38649\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="721",
    id="loadOne38649\n",
    rev="1\n">],
 "loadOne1975\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="746",
    id="loadOne1975\n",
    rev="2\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="746",
    id="loadOne1975\n",
    rev="2\n">],
 "loadOne38558\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="651",
    id="loadOne38558\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.96/default",
    vbucket="651",
    id="loadOne38558\n",
    rev="1\n">],
 "loadOne838\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="726",
    id="loadOne838\n",
    rev="2\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="726",
    id="loadOne838\n",
    rev="2\n">],
 "loadOne38799\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.97/default",
    vbucket="701",
    id="loadOne38799\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="701",
    id="loadOne38799\n",
    rev="1\n">],
 "loadOne39842\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="801",
    id="loadOne39842\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.2.12/default",
    vbucket="801",
    id="loadOne39842\n",
    rev="1\n">],
 "loadOne30895\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="483",
    id="loadOne30895\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.3.96/default",
    vbucket="483",
    id="loadOne30895\n",
    rev="1\n">],
 "loadOne39619\n"=>
  [#<struct DocMeta
    dir="dst/10.1.3.99/default",
    vbucket="794",
    id="loadOne39619\n",
    rev="1\n">,
   #<struct DocMeta
    dir="dst/10.1.2.12/default",
    vbucket="794",
    id="loadOne39619\n",
    rev="1\n">]}
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
But at least large fraction of them is available in memory on source. So they're in ram on source but not on disk on source.

So both persistence to disk and tap replication are broken on source.
Comment by Aruna Piravi [ 17/Apr/14 ]
You were right about filing a separate bug. Is it too late for it? I can copy all your findings.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Ask Mike. I think that creating new bug is just going to increase confusion.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Without some good tools it's very hard to investigate due to obvious mismatch between ram and disk.

I found few cases where deletions or non-deletions are not persisted. And not replicated.

Assuming ofcourse that nobody messed up the data between the time I grabbed data files and did getmetas
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Confirmed just now that there's still discrepancy between ram and disk on source cluster. I don't think it explains counts mismatch, but at least it complicates digging into xdcr sufficiently to block it. And given there's broken replication and persistence it's possible that stats are wrong too.




[MB-10883] Difference in Rev id: Disk vs (Disk+Memory) Output for CBTransfer in CSV mode Created: 17/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 6, version:: 3.0.0-587

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1. Add a default bucket to a cluster with 1 node
2. Add item to the cluster and wait till the disk queue is 0

Expectation:

The following should dump the same output

From (memory+disk)

/opt/couchbase/bin/cbtransfer http://10.6.2.144:8091 csv:/tmp/default.15f6a721-c688-11e3-8c97-6003089eed5e.csv -b default -u Administrator -p password --single-node

From (disk)

/opt/couchbase/bin/cbtransfer couchstore-files:///opt/couchbase/var/lib/couchbase/data csv:/tmp/default.16d409e6-c688-11e3-94cc-6003089eed5e.csv -b default -u Administrator -p password

See how the rev ids are different

[root@palm-10307 ~]# cat /tmp/default.15f6a721-c688-11e3-8c97-6003089eed5e.csv
id,flags,expiration,cas,value,rev,vbid
dd,0,0,3271998131903809,"{""click"":""to edit"",""new in 2.0"":""there are no reserved field names""}",49,663
[root@palm-10307 ~]# cat /tmp/default.16d409e6-c688-11e3-94cc-6003089eed5e.csv
id,flags,expiration,cas,value,rev,vbid
dd,0,0,3271998131903809,"{""click"":""to edit"",""new in 2.0"":""there are no reserved field names""}",1,663
[root@palm-10307 ~]#

Strangely for any number of items the rev id difference is 49 and 1 only.






 Comments   
Comment by Bin Cui [ 18/Apr/14 ]
http://review.couchbase.org/#/c/36024/




[MB-10400] XDCR over UPR Created: 07/Mar/14  Updated: 18/Apr/14  Resolved: 17/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Artem Stemkovski Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The work required for supporting UPR in XDCR

 Comments   
Comment by Aleksey Kondratenko [ 26/Mar/14 ]
Spec: https://docs.google.com/document/d/1hTdABxXHn1v6CIcEkta0HNSv-SznR07MkdpTnE9k_wY/edit?usp=sharing

(update: unrestricted spec due to "open by default" policy)
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
in




[MB-9198] HTTPS support in REST API and CAPI Created: 30/Sep/13  Updated: 18/Apr/14  Resolved: 10/Feb/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0, 2.0.1, 2.1.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Anil Kumar Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-10519 supply SSL verification prototype Open

 Description   
NOTE: this ticket was previously "encrypt password sent between client and server". And we hijacked it for https work. Many comments below are for original ticket.

Currently, Admin passwords are sent in plain text over the wire when client requests data from the server.
This feature is requested to encrypt the password.

#1. Backward Compatibility
#2. UI work
#3. Certificate Deployment on every node
#4. Custom Certificate Deployment



 Comments   
Comment by Aleksey Kondratenko [ 30/Sep/13 ]
More specifically we've agreed that we'll implement some advanced auth support. E.g. http digest auth. In addition to widely supported http basic auth support.
Comment by Matt Ingenthron [ 30/Sep/13 ]
As I'm sure we'll get to over the course of this, there are three public interfaces, each one needs slightly different handling.
1- The memcached/couchbase binary protocol interface, supports SASL and mechanisms now and works with one client (Java)
2- The views interface, does HTTP Basic auth only. This one may be complicated as any proper three-way auth is harder to handle in a RESTful approach.
3- The cluster configuration interface. This has two use cases. One is for client libraries to get information about cluster topology. The other is for cluster management. Both of these are HTTP and rely on HTTP Basic auth currently. These too are RESTful, and thus make it a little hard to handle properly.

For the latter two, one option is to use TLS/SSL and continue to use HTTP Basic. The other option would be to take on other HTTP digest auth as specified in RFC-2617. The upside is this is easy to implement. The downside is that it meets the requirement for "encryption", but strength is a consideration, as it should be for the binary protocol interface.

I know that's generally well understood, but I wanted to jump in here and say as phrased, this would need to be broken out into a number of issues and we'd need to align client work.
Comment by Aleksey Kondratenko [ 30/Sep/13 ]
Matt, please elaborate on "any proper three way auth is harder to handle in a RESTful approach".

Also note that cluster configuration interface is heavily used by XDCR (and over unprotected networks).

SSL is clearly option and will encrypt password too. But it presents certificate management problem if we're serious about it (and afaik we are).

Still at least for views having digest auth (as per RFC-2617 as you pointed out) looks like good thing to have.
Comment by Matt Ingenthron [ 09/Oct/13 ]
Alk: The reason I said it's harder is that the HTTP client being used may not allow us to set up authentication in response to a 401 reply. This is currently the case with some of the HTTP libraries we're using and we're just asserting authentication credentials without being asked.

IIRC, we have some responses which are different based on whether or not the authorization header is present. If any of those are in the bootstrap routine, we may have a problem. How do you send a 401 when the request is valid to someone who is anonymous?

When I originally designed this, I'd looked into RESTful authentication and it seemed, at the time, the best practice was HTTP Basic auth + TLS/SSL. That's now years ago and I am open to other ideas for sure.

At the moment, even changes to use digest over HTTP requests would require changes. Less changes than certificate management for TLS/SSL, but more changes than just turning on a switch. I know this bug as phrased isn't about client libraries, but the interface is supported to both.
Comment by Aleksey Kondratenko [ 10/Oct/13 ]
I believe is that for public service APIs (twitter, google etc) basic auth + TLS is still the most obvious way to go. Particularly because it's easy for them to have PKI.

For our deployments which must not rely on any infrastructure it's a bit harder. But is still option.

What do you think about my simple proposal of having self-signed certificate per cluster? Do you see some secure and simple way we can integrate that into clients and deploy TLS-ed port 8091 and port 8092 alternatives ?

My (updated) thinking is that there's demand for https anyways. So perhaps instead of spending time on digest auth we can simply do https (which requires dealing with PKI even if simplistic) and keep basic auth.
Comment by Matt Ingenthron [ 10/Oct/13 ]
I don't know the user requirements. Last I looked they weren't very complete.

I do agree with you that HTTP Basic + TLS may be the way to go given that people will want transport layer security as well. I do think an easy, out of the box self-signed cert would be a requirement, but based on my experience with other products we'll also need to support CA certs and server certs at the client and server side as needed.

There are a number of places where transport level security is required and even running their own CA is required to ensure things like revocation can be done if needed.

As you can imagine, the basic tools are there in things like OpenSSL. It's all about how easy/hard or how integrated we want the UX for setting things up.
Comment by Aleksey Kondratenko [ 11/Oct/13 ]
Discussed this with Dipti.

Dipti said she is fine with SSL as only option IF there are no big downsides. Otherwise we have to implement digest auth.

Here are two options and I need Matt's opinion:

a) Go with SSL (we'll deal with certificates anyways as part of securing XDCR) and keep supporting only basic auth. Looks like the only potential issue here is cost of encryption on client and server side. Particularly if certain clients are doing bootstrap too often (PHP); i.e. it it's widely known that SSL is most expensive on connection establishment phase which requires somewhat expensive asymmetric crypto. Do you see some other downsides here ?

b) Do digest auth as described by RFC. Matt, will deploying this be harder than SSL on client side?

Also my understanding is that CCCP is going to significantly reduce client use of REST APIs. Basically to nothing with notable exception of memcached type buckets.
Comment by Matt Ingenthron [ 11/Oct/13 ]
At the moment, I believe digest auth is generally easier for the most part, but it obviously delivers less functionality. The two are pretty equivalent in terms of implementation work, but SSL brings in more dependencies and testing scenarios. Sergey raised a good point too, and that's whether or not this would be implemented as optional for all clients or at least optional for 'local' things?

The other thing I would look at here is whether or not it gives us the right REST interface. I'm slightly less concerned about the implementation.

Also, do we have user demand for security everywhere?
Comment by Aleksey Kondratenko [ 11/Oct/13 ]
>> Also, do we have user demand for security everywhere?

yes we do. It's not top of the list yet. But Dipti even wants to be able to encrypt node-to-node traffic (erlang etc).
Comment by Aaron Miller (Inactive) [ 16/Oct/13 ]
A few things:

1. Digest auth necessarily adds an extra round trip to each authenticated request since the challenge value sent requires a server generated nonce to calculate.
2. Digest auth *prevents* the use of a strong password storage mechanism (PBKDF2, bcrypt), which is both the Right Thing to do, and a feature that certain important customers really want.
3. There are basically no situations where an attacker can *eavesdrop* your connection (the only reason you'd want to hide your password in the first place) but cannot easily hijack it (modify your request in-flight). TLS actually does help here, since it's difficult for an attacker to do anything useful with a hijacked TLS connection, not having the session keys.
Comment by Aleksey Kondratenko [ 16/Oct/13 ]
Interestingly, all of Aaron's comments apply equally well to cram-md5 auth we recently implemented for memcached. Back then folks actually decided that securing the wire is more important than securing the disk. We might want to reconsider.

On his point 3 above I cannot comment. Other points I confirm as correct.
Comment by Aaron Miller (Inactive) [ 16/Oct/13 ]
w.r.t. #3 http://ettercap.github.io/ettercap/ makes executing such an attack very easy, and even that may not be necessary depending on *how* an attacker has managed to gain the ability to eavesdrop your connection.
Comment by Aleksey Kondratenko [ 03/Feb/14 ]
not done yet
Comment by Aleksey Kondratenko [ 10/Feb/14 ]
Here's one pager spec for https:


== Project Summary

We'll have HTTPS equivalents for ports 8091 and 8092. And certificates
used for those endpoints will be set up in a way that allows clients
to ensure authenticity of nodes they're speaking with.

== Project Description

Customers want HTTPS. Particularly there's demand for auth that
doesn't expose passwords on the wire and https solves the problem
without having to introduce different type of auth (from existing and
widely supported http basic auth which is plain text).

For https we'll have every node serve certificate that matches it's
hostname. And for https clients will get a chain of two certificates
from node certificates to cluster certificate. That cluster
certificate will either be self-signed and configured on clients to be
trusted. Or it'll be created through organization's CA facility and
given to ns_server (and trusted on clients through trust into CA).

== Requirements, Risks and Assumptions

We assume we'll have it enabled by default. So we'll generate
certificates as necessary on cluster initialization and node
joins/leaves.

2.5 is shipping some https already. And 2.5's xdcr-over-ssl facility
needs to be compatible with whatever 3.0's https will be. That implies
that https needs to include cluster certificate (or any other, but
_one_ particular certificate for entire cluster) into its SSL
handshake. 2.5 handles a case when chain of certificates is served
with last certificate being expected certificate.

== Business Summary
== Problem Area

Securing passwords on the wire and securing view/admin requests on the
wire.

== Technical Description

== UI/HTTP-API changes

There will be UI to upload externally generated (i.e. via
organization's CA) cluster certificate and matching private key. There
will be publicly exposed/documented HTTP API endpoint for that too.

Existing (since 2.5) cluster certificate field will still contain
cluster certificate (auto-generated or given). There will be no UI for
looking at per-node certificates.

Regenerate certificate button (and corresponding 2.5's HTTP API) will
work only if certificate is cluster-generated (i.e. not supplied from
external source).

== Implementation Details

Existing http ports (8091 and 8092) will continue to serve plain,
unencrypted http. In addition to ports 8091 (for management/ui rest
api) and 8092 (capi), we'll have _https_ server instances on ports
18091 (for management ui/rest api) and 18092 (for capi api).

On cluster initialization we'll generate self-signed cluster
certificate and we will use it to create and sign per-node
certificates. On node changing hostname we'll re-generate it's
certificate to match new hostname.

Each node will serve _both_ https endpoints (and in fact all it's ssl
endpoints) using it's certificate/private-key pair. And ssl handshake
will send both node certificate and cluster certificate. The later is
a requirement for CA integration (when clients only have CA
certificate) and for 2.5 backwards compat.

Cluster certificate and it's matching private key will be stored in
ns_config and replicated to all nodes using ns_config's stock config
replication mechanism (using unencrypted erlang-to-erlang channels, as
usual). It also means they will be stored unencrypted inside
config.dat file together with all other information contained in
ns_config. We'll use same ns_config key as 2.5 uses today for storing
it (cert_and_pkey).

In default setup (auto-generated cluster certificate), clients wishing
to use https endpoints will need to add self-signed cluster
certificate to their list of trusted certificates.

When external certificate/private-key pair is given, clients are
expected to be configured to trust CA that issued cluster certificate.

== Upgrade Details

2.5 serves https endpoints using only cluster certificate (which is
not going pass client's hostname checking).

Cluster will switch to per-node certificates (as per this proposal)
when cluster compat mode is bumped to 3.0. Which happens automatically
as soon as all nodes are 3.0+. We might have to regenerate cluster
certificate at this moment (to be figured out soon).

== Interfaces

HTTPS ports 18091 and 18092. Both are discoverable (since 2.5 in fact)
through existing pool details and bucket details HTTP APIs. See 2.5's
api.txt for details (but note that 2.5's api.txt lists those fields as
private and internal for now).

== Doc Impact

We'll need to figure out how every sdk is configured to trust cluster
certificate.

== Admin/Config Impact

Clients and/or client machines will need to be taught to trust cluster
certificate. Optionally, cluster admins might wish to integrate into
their CA.

== Packaging and Delivery Impact

None

== Security Impact

We store private keys unencrypted. That's no different from storing
bucket and rest api passwords in plain text we do already.

It's not clear how customers wishing to not store private keys in
plain text on their disks can be satisfied in secure way. But securing
this data is out of scope of this proposal.

== Dependencies

No server side dependencies. Client SDKs might need to start depending
on things that allow them to do https.

I assume that ssl-ed endpoint of memcached is going to use node's
certificate/private key from this proposal.

== Resources and Schedule
== Projected Availability

Server side: 3.0 feature freeze date.

Comment by Aleksey Kondratenko [ 10/Feb/14 ]
This stuff is in as of Saturday. I was planning to have 3rd party certificates/pkeys support. But it doesn't seem to be strictly necessary and looks like it won't happen for 3.0 freeze date.

Work left on my side:

* un-hide certificate on UI for community edition

* test backwards compat with 2.5
Comment by Aleksey Kondratenko [ 10/Feb/14 ]
Verified backwards compat works. Cert un-hide commit is in gerrit.
Comment by Aleksey Kondratenko [ 10/Feb/14 ]
Closing as done.

Note that one-pager is mentioning support of feeding externally-generated certificate into the system, but that's not done for 3.0. We'll have to add a separate ticket for that when the time for that work will come.
Comment by Aleksey Kondratenko [ 10/Feb/14 ]
Also I successfully tested with firefox and chrome (both have libnss-based crypto)
Comment by Tommie McAfee [ 18/Mar/14 ]
Matt, wondering if we have any sdk's capable of handling certificates from server for secure communication. Also, will be able to use sdks to feed an externally generated cert into the system as Alk has mentioned in previous comment?
Comment by Matt Ingenthron [ 19/Mar/14 ]
Tommie: As was passed along to Maria in case it would be useful for QE, we do have one Javascript experimental SDK that worked well with the SSL support in 3.0. At the moment, I don't have any feature enhancements planned for adding extenrally generated certs.

That could be considered here. Usually you need to do this through the filesystem though, since you can't trust an insecure path for adding the cert. What do you have in mind?




[MB-10858] Couchbase server won't start after beam.smp killed and node reject by failover in centos 6.4 Created: 15/Apr/14  Updated: 18/Apr/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Aliaksey Artamonau
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 6.4 64-bit

Attachments: Zip Archive 10.1.2.84-4152014-1144-diag.zip     Zip Archive 10.1.2.86-4152014-1143-diag.zip     Zip Archive 10.3.4.225-4152014-1135-diag.zip     Zip Archive 10.3.4.227-4152014-1136-diag.zip     Zip Archive 10.3.4.228-4152014-1137-diag.zip     Zip Archive 10.3.4.229-4152014-1138-diag.zip     Zip Archive 10.3.4.230-4152014-1139-diag.zip     Zip Archive 10.3.4.231-4152014-1140-diag.zip     Zip Archive 10.3.4.232-4152014-1142-diag.zip     Zip Archive 10.3.4.233-4152014-1142-diag.zip     Zip Archive 10.3.4.234-4152014-1141-diag.zip     Zip Archive 10.3.4.235-4152014-1143-diag.zip    
Issue Links:
Relates to
relates to MB-10312 [windows] UI failed to start after re... Closed
Triage: Triaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: Manifest file for this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_centos6_x86_64_3.0.0-579-rel.rpm.manifest.xml
Is this a Regression?: Unknown

 Description   
Environment: 12 centos 6.4 64-bit
1:10.3.4.225
2:10.3.4.227
3:10.3.4.228
4:10.3.4.229
5:10.3.4.230
6:10.3.4.231
7:10.3.4.234
8:10.3.4.233
9:10.3.4.232
10:10.3.4.235
11:10.1.2.86
12:10.1.2.84

Run RZA test with build 3.0.0-579-rel with openssl1
rackzone.rackzonetests.RackzoneTests:
    test_replica_distribution_in_zone,items=100000,zone=2,replicas=1,standard_buckets=1,shutdown_zone=1

This test will create 2 zones (A and B)
Then shutdown completely all nodes in zone B by killing beam.smp in this all nodes of this zone.
Failover all down nodes.
Restart couchbase server in all down nodes in zone B, couchbase server crash.
Live crash nodes available at
7:10.3.4.234
8:10.3.4.233
9:10.3.4.232
10:10.3.4.235
11:10.1.2.86
12:10.1.2.84


 Comments   
Comment by Thuan Nguyen [ 15/Apr/14 ]
This bug may relate to MB-10312
Comment by Aleksey Kondratenko [ 15/Apr/14 ]
Found this to be unexpected regression in our code to delete config keys of removed nodes
Comment by Aleksey Kondratenko [ 15/Apr/14 ]
Unrelated to MB-10312.

And we've reverted commit that caused regression. So next build should be ok. But we still plan to re-apply original commit with fix for regression.
Comment by Thuan Nguyen [ 17/Apr/14 ]
tested on build 3.0.0-588, I could not reproduce this bug
Comment by Aliaksey Artamonau [ 18/Apr/14 ]
Reopening since it's fixed only because we reverted the change that broke it.
Comment by Aleksey Kondratenko [ 18/Apr/14 ]
Lowered to critical given it's not blocking anyone. We do plan to reapply original change that we had to revert in order to fix the bug in short-term (and unblock people). Of course reapplied commit will have the right long-term fix for this issue.




[MB-10892] [Tools] Support graceful failover Created: 13/Mar/14  Updated: 18/Apr/14  Resolved: 19/Mar/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Task Priority: Critical
Reporter: Bin Cui Assignee: Thuan Nguyen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Bin Cui [ 13/Mar/14 ]
http://review.couchbase.org/#/c/34476/




[MB-9697] Unnecessary ejection during rebalance in? Created: 07/Dec/13  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630
Memory = 64 GB
Disk = 2 x SSD

Build 2.5.0-961

Attachments: PNG File ep_mem_high_wat-mem_used.png     PNG File mem_used_node_1.png     PNG File mem_used_node_2.png     PNG File mem_used_node_3.png     PNG File mem_used.png     PNG File mem_used.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/apollo-64/589/artifact/

 Description   
Rebalance-in, 3 -> 4, 1 bucket x 350M x 256B, Heavy DGM. More details:

https://raw.github.com/pavel-paulau/perfrunner/master/tests/experimental/reb_in_kv_350M_dgm_2rep_lthr.test

Not a bug, not a regression. Just a question. Why do we start ejecting items during rebalance-in? See attached charts for details.

Full report:
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=apollo_ssd_250-961_123_rebalance&report=BaseRebalanceReport




 Comments   
Comment by Chiyoung Seo [ 10/Dec/13 ]
The memory usage per node is very close to the high watermark. When the rebalance started, each node will perform the diskbackfill operations for takeover vbuckets and also receive the backfill items from other nodes. This can cause the memory usage to go slightly beyond the high watermark, which consequently causes the item pager to be scheduled. The reason why the memory usage wasn't above high watermark in the graph is mostly because of the timings between stats collection and the memory usage value at that point.
Comment by Pavel Paulau [ 10/Dec/13 ]
Will re-run with higher sampling rate (every 1 second).
Comment by Pavel Paulau [ 10/Dec/13 ]
Another thing to try is higher low water mark (80%) and investigate impact on rebalance performance.
Comment by Pavel Paulau [ 20/Dec/13 ]
Tried higher sampling rate (every 1 second) - the same issue.

Attached graph shows difference between "high watermark" and "memory used" on one of nodes over time. It's always positive and always higher than ~1GB.

It means that according to your explanation there must be very rapid spike in memory utilization within 1 second range.
Comment by Pavel Paulau [ 21/Dec/13 ]
Changing low watermark to 80 doesn't affect rebalance speed. The difference in memory utilization is really minor (green - default 75, yellow - 80).




[MB-10506] if doc id is too long error message is not visible for "Lookup Id" Created: 19/Mar/14  Updated: 18/Apr/14  Resolved: 18/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Minor
Reporter: Andrei Baranouski Assignee: Pavel Blagodov
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: chrome

Attachments: PNG File ID_long.png    
Issue Links:
Dependency
depends on MB-7081 Long key is not shown properly in the... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
please see screenshot

Better to put the error under long ID

 Comments   
Comment by Pavel Blagodov [ 14/Apr/14 ]
http://review.couchbase.org/35676




[MB-10857] Rebalance + Views fails with reason {{badmatch,{error,closed}} with UPR Replication Created: 15/Apr/14  Updated: 18/Apr/14  Resolved: 16/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Meenakshi Goel Assignee: Meenakshi Goel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-580-rel

Attachments: Text File log.txt    
Triage: Triaged
Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Tests Failed:
rebalance_in_and_out_with_ddoc_ops,ddoc_ops=create,test_with_view=True,num_ddocs=3,num_views_per_ddoc=2,items=200000
rebalance_in_and_out_with_ddoc_ops,ddoc_ops=update,test_with_view=True,num_ddocs=2,num_views_per_ddoc=3,items=200000
rebalance_in_and_out_with_ddoc_ops,ddoc_ops=delete,test_with_view=True,num_ddocs=2,num_views_per_ddoc=3,items=200000
rebalance_in_and_out_with_ddoc_ops,ddoc_ops=create,test_with_view=False,num_ddocs=3,num_views_per_ddoc=2,items=200000
rebalance_in_and_out_with_ddoc_ops,ddoc_ops=update,test_with_view=False,num_ddocs=2,num_views_per_ddoc=3,items=200000
rebalance_in_and_out_with_ddoc_ops,ddoc_ops=delete,test_with_view=False,num_ddocs=2,num_views_per_ddoc=3,items=200000

Jenkins Job Link:
http://qa.sc.couchbase.com/job/centos_x64--29_01--new_view_all-P1/40/consoleFull

Logs:
Please refer log.txt attached

Notes:
Test runs fine with TAP Replication
Last Node added goes into the pending State

 Comments   
Comment by Meenakshi Goel [ 15/Apr/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-10857/7f6c45fe/172.23.105.230-20140415-0111.zip
https://s3.amazonaws.com/bugdb/jira/MB-10857/0d3b37e0/172.23.105.231-20140415-0112.zip
https://s3.amazonaws.com/bugdb/jira/MB-10857/e0c9457f/172.23.105.232-20140415-0113.zip
https://s3.amazonaws.com/bugdb/jira/MB-10857/13f68e9c/172.23.105.245-20140415-0114.zip
Comment by Ketaki Gangal [ 15/Apr/14 ]
Moving this to a test blocker. Cannot run upr view rebalance tests with current builds.
Comment by Aleksey Kondratenko [ 15/Apr/14 ]
Some badness on consumer side. It closed socket after this:

[rebalance:warn,2014-04-15T1:13:40.303,ns_1@172.23.105.230:upr_consumer_conn-default-ns_1@172.23.105.231<0.12787.1>:upr_proxy:process_packet:139]Received error response: RESPONSE: 0x57 (upr_mutation) vbucket = 0 opaque = 0x1000000 status = 0x4 (einval)
81 57 00 00
00 00 00 04
00 00 00 11
01 00 00 00
00 00 00 00
00 00 00 00
49 6E 76 61
6C 69 64 20
61 72 67 75
6D 65 6E 74
73
Comment by Mike Wiederhold [ 15/Apr/14 ]
Yes I know exactly what caused this. I should be able to fix it today.
Comment by Mike Wiederhold [ 16/Apr/14 ]
Assigning to Abhinav since this looks like a datatype regression.
Comment by Abhinav Dangeti [ 16/Apr/14 ]
http://review.couchbase.org/#/c/35893/3
Comment by Wayne Siu [ 17/Apr/14 ]
Meenakshi,
Can you please run your test and update the ticket before tomorrow (Friday's) sync up? Thanks.
Comment by Meenakshi Goel [ 18/Apr/14 ]
Verified with build 3.0.0-590-rel and no longer seeing this issue. Hence Closing the issue.
http://qa.sc.couchbase.com/job/centos_x64--29_01--new_view_all-P1/43/consoleFull




[MB-8760] Do not install in path with spaces if 8dot3name disabled Created: 06/Aug/13  Updated: 18/Apr/14

Status: Reopened
Project: Couchbase Server
Component/s: installer, ns_server, view-engine
Affects Version/s: 2.1.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: David Haikney Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: erlang, windows, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows Server 2008 R2, latest patches installed, 8dot3name disabled, default install dir

Issue Links:
Duplicate
duplicates MB-9998 switch to R16 for 3.0 (was: investiga... Open
is duplicated by MB-10245 Couchbase server not starting up Wind... Resolved
Triage: Triaged

 Description   
WORKAROUND: Install Couchbase afresh at a location with no space characters (ex., C:\Couchbase)


On windows, the NTFS filesystem has an "8dot3name" attribute that, when enabled, gives files and directories a short filename where necessary. e.g "Program Files" is given the short filename "PROGR~1".

Erlang relies on this capability to start CB when installed in the default directory. If 8dot3name behaviour is disabled, the babysitter fails to launch the os monitor process with the following:

erlang:open_port({spawn, "c:/Program Files/Couchbase/Server/lib/os_mon-2.2.7/priv/bin/win32sysinfo.exe"}, []).
exception error: enoent
  in function open_port/2
     called as open_port({spawn,"c:/Program Files/Couchbase/Server/lib/os_mon-2.2.7/priv/bin/win32sysinfo.exe"},
                         [])

The spawn command does not tolerate the whitespace in the path. A workaround is to install CB into a different directory that does not contain a space in the path.

The 8dot3name attribute can be set and queried using the fsutil command: http://technet.microsoft.com/en-us/library/ff621566.aspx?ppud=4


 Comments   
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
What exactly is needed from me?
Comment by Sriram Melkote [ 14/Aug/13 ]
Sorry, looks like my comment didn't get saved. Details on CBSE-632.

The problem is that os_mon is calling open_port({spawn}) with a full path, instead of calling {spawn_executable}.

It looks it is being fixed in R16B01, so I'm assigning this to you to eventually close when we upgrade to R16B01.

https://github.com/erlang/otp/commit/5f8867fb985b2b899e2ba8391652c7111f9df9bb
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
Workaround and issue itself should be mentioned in release notes. Affects all our versions and up-coming 2.2.0
Comment by Sriram Melkote [ 18/Feb/14 ]
Retarget to 3.0 - let us either upgrade to R16 or change default install location on Windows to someplace without spaces.
Comment by Aleksey Kondratenko [ 18/Feb/14 ]
BTW decision not to do it for 3.0 was based on thinking that "3.0 has enough stuff already". So second option seems like better idea right now.
Comment by Aleksey Kondratenko [ 18/Feb/14 ]
Passing to Ravi for decision making w.r.t. 3.0
Comment by Sriram Melkote [ 26/Feb/14 ]
Increasing priority as it seems fairly common to turn off 8dot3.

A minimum effort fix would be to change the default install directory for non-upgrade installs to a location without space.
Comment by Anil Kumar [ 18/Mar/14 ]
Alk - Did we make this switch to R16B01 if not is it planned for 3.0 ?
Comment by Anil Kumar [ 18/Mar/14 ]
Triaged by Don and Anil as per Windows Developer plan.
Comment by Sriram Melkote [ 19/Mar/14 ]
As MB-9998 was deferred, the only possible 3.0 fix is to workaround in installer.
Comment by Bin Cui [ 04/Apr/14 ]
http://review.couchbase.org/#/c/35281/
Comment by Sriram Melkote [ 18/Apr/14 ]
Chris, we may need to roll this back if we move to Erlang R16 because it appears R16 does not require 8dot3
Comment by Sriram Melkote [ 18/Apr/14 ]
Bin, I'm reopening and assigning to Chris, so we can track this as a part of the R16 update effort
Comment by Chris Hillery [ 18/Apr/14 ]
What does "roll this back" mean? Close the bug as no longer relevant?




[MB-9998] switch to R16 for 3.0 (was: investigate possibility of switching to R16B03) Created: 23/Jan/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Aleksey Kondratenko Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by MB-8760 Do not install in path with spaces if... Reopened
is duplicated by MB-10884 Erlang Upgrade to R16 (R16B03-1) Resolved
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-10885 Linux - Upgrade the manifests to pick... Technical task Open Chris Hillery  
MB-10886 Windows - Confirm if it's on R16. Technical task Closed Chris Hillery  
Triage: Triaged
Is this a Regression?: Yes

 Description   
Updating erlang will fix some issues (there's at least one windows issue that states that we're waiting for R16).

R16 can also have 64-bit erlang VM on windows which might be somewhat important for use-cases that "require" erlang to have tons of memory (32-bit vm dies at about 2 gigs of memory eaten on windows).

R1603 also has +MMscs (supercarrier something...) which might be useful for memory tuning.

There's however known downsides (at least):

* I think we've had data that R15 and R16 is slower on our use-cases

* R16's ssl is quite broken. At least with it's implementation of elliptic curves DH exchange. Might be fixable by disabling it.



 Comments   
Comment by Sriram Melkote [ 18/Mar/14 ]
Can we make a call on this? As QE has started mainstream 3.0 testing, it is best if we upgrade now, or not at all for 3.0, to ensure the chosen VM version receives enough testing on all platforms.
Comment by Sriram Melkote [ 19/Mar/14 ]
Decide today that vm will not be upgraded for 3.0 as we have enough problems already
Comment by Aleksey Kondratenko [ 20/Mar/14 ]
Some stuff is making me think that we will soon be forced into 16b for 3.0. But let's wait and see if I'm right.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
We'll take R16 into 3.0 as per recent decision
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Siri is orchestrating it.

Note that ssl issues of R16 are fixed as of latest R16 update.
Comment by Sriram Melkote [ 18/Apr/14 ]
Details:
- We're using R16B03-1
- So far, only one patch may be applicable (futex patch on Linux)
- Initially, we'll try vanilla Erlang to see if we really need to support very old kernels
- We'll try not to have separate builders for R14 and R16 (unless we hit a concrete reason to separate them)
- I understand Windows is already on R16B03-1




[MB-10884] Erlang Upgrade to R16 (R16B03-1) Created: 17/Apr/14  Updated: 18/Apr/14  Due: 23/Apr/14  Resolved: 18/Apr/14

Status: Resolved
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates MB-9998 switch to R16 for 3.0 (was: investiga... Open

 Comments   
Comment by Sriram Melkote [ 18/Apr/14 ]
Let's track this on the older bug MB-9998 so there's only one place for R16 upgrade related information




switch to R16 for 3.0 (was: investigate possibility of switching to R16B03) (MB-9998)

[MB-10886] Windows - Confirm if it's on R16. Created: 17/Apr/14  Updated: 18/Apr/14  Due: 23/Apr/14  Resolved: 17/Apr/14

Status: Closed
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Technical task Priority: Critical
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If not,
a. Install R16 on Windows
b. update the path to pick up R16.

 Comments   
Comment by Chris Hillery [ 17/Apr/14 ]
3.0 build are using R16B03-1.




switch to R16 for 3.0 (was: investigate possibility of switching to R16B03) (MB-9998)

[MB-10885] Linux - Upgrade the manifests to pick up R16 Created: 17/Apr/14  Updated: 18/Apr/14  Due: 23/Apr/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Technical task Priority: Critical
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-10871] UPR:: Rebalance-out/in node results in change in vbucket UUID Created: 16/Apr/14  Updated: 18/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos-6

Attachments: File rebalance_in_detail     GZip Archive rebalance_in.tar.gz     GZip Archive rebalance_out.tar.gz    
Is this a Regression?: Yes

 Description   
Using UPR, centos-64 bit, Version:: 3.0.0-584

1. Add 5 nodes to cluster
2. Create default bucket with replica=1
3. Add 50000 items
4. Rebalance out 1 node

Cluster Nodes

10.6.2.144
10.6.2.147
10.6.2.148
10.6.2.149
10.6.2.150

Rebalanced-out Node:: 10.6.2.150

Expected UUIDs of buckets is the same after we rebalance out

Observed Difference in UUIDs which are a part of 10.6.2.150

 vb_28 : {'uuid': 'Expected 170724551552615 :: Actual 10627939076719'}

 vb_29 : {'uuid': 'Expected 45359036031519 :: Actual 59640339796757'}

 vb_26 : {'uuid': 'Expected 133471459893767 :: Actual 114252468159585''}

 vb_27 : {'uuid': 'Expected 104126006114437 :: Actual 86374568011280'}

 vb_6 : {'uuid': 'Expected 175130199501286 :: Actual 23856384718339'}

 vb_13 : {'uuid': 'Expected 84680084219325 :: Actual 233007425883197'}

 vb_31 : {'uuid': 'Expected 170201395300887 :: Actual 174005664911105'}

 vb_30 : {'uuid': 'Expected 83362933025898 :: Actual 229555825112404'}

Dump of Bucket, Node, Vbucket Information (uuid, high_seqno, purge_seqno)

Before we rebalanced-out
______________________

----- Bucket default -----

-------------Node 10.6.2.147------------

   for vbucket vb_9

            :: for key high_seqno = 1558

            :: for key uuid = 224689180220009

            :: for key purge_seqno = 0

   for vbucket vb_8

            :: for key purge_seqno = 0

            :: for key uuid = 24173416366515

            :: for key high_seqno = 1569

   for vbucket vb_1

            :: for key purge_seqno = 0

            :: for key uuid = 3382665152986

            :: for key high_seqno = 1569

   for vbucket vb_0

            :: for key purge_seqno = 0

            :: for key uuid = 252550884809196

            :: for key high_seqno = 1551

   for vbucket vb_7

            :: for key high_seqno = 1551

            :: for key uuid = 160369294390021

            :: for key purge_seqno = 0

   for vbucket vb_11

            :: for key purge_seqno = 0

            :: for key uuid = 91922888028629

            :: for key high_seqno = 1560

   for vbucket vb_10

            :: for key high_seqno = 1568

            :: for key uuid = 98217872738904

            :: for key purge_seqno = 0

   for vbucket vb_13

            :: for key purge_seqno = 0

            :: for key uuid = 84680084219325

            :: for key high_seqno = 1568

   for vbucket vb_12

            :: for key high_seqno = 1560

            :: for key uuid = 75150401842511

            :: for key purge_seqno = 0

   for vbucket vb_15

            :: for key purge_seqno = 0

            :: for key uuid = 21343486345542

            :: for key high_seqno = 1569

   for vbucket vb_16

            :: for key high_seqno = 1560

            :: for key uuid = 113002939605931

            :: for key purge_seqno = 0

   for vbucket vb_21

            :: for key purge_seqno = 0

            :: for key uuid = 131979757081143

            :: for key high_seqno = 1559

   for vbucket vb_28

            :: for key high_seqno = 1553

            :: for key uuid = 170724551552615

            :: for key purge_seqno = 0

-------------Node 10.6.2.144------------

   for vbucket vb_8

            :: for key purge_seqno = 0

            :: for key uuid = 24173416366515

            :: for key high_seqno = 1569

   for vbucket vb_1

            :: for key purge_seqno = 0

            :: for key uuid = 3382665152986

            :: for key high_seqno = 1569

   for vbucket vb_0

            :: for key purge_seqno = 0

            :: for key uuid = 252550884809196

            :: for key high_seqno = 1551

   for vbucket vb_3

            :: for key high_seqno = 1570

            :: for key uuid = 189410128542953

            :: for key purge_seqno = 0

   for vbucket vb_2

            :: for key high_seqno = 1552

            :: for key uuid = 202175152188264

            :: for key purge_seqno = 0

   for vbucket vb_5

            :: for key high_seqno = 1552

            :: for key uuid = 272505847148341

            :: for key purge_seqno = 0

   for vbucket vb_4

            :: for key purge_seqno = 0

            :: for key uuid = 170782289546886

            :: for key high_seqno = 1570

   for vbucket vb_7

            :: for key high_seqno = 1551

            :: for key uuid = 160369294390021

            :: for key purge_seqno = 0

   for vbucket vb_6

            :: for key purge_seqno = 0

            :: for key uuid = 175130199501286

            :: for key high_seqno = 1569

   for vbucket vb_14

            :: for key purge_seqno = 0

            :: for key uuid = 9666779467087

            :: for key high_seqno = 1558

   for vbucket vb_20

            :: for key high_seqno = 1569

            :: for key uuid = 226337594275581

            :: for key purge_seqno = 0

   for vbucket vb_26

            :: for key purge_seqno = 0

            :: for key uuid = 133471459893767

            :: for key high_seqno = 1569

   for vbucket vb_27

            :: for key purge_seqno = 0

            :: for key uuid = 104126006114437

            :: for key high_seqno = 1553

-------------Node 10.6.2.150------------

   for vbucket vb_5

            :: for key high_seqno = 1552

            :: for key uuid = 272505847148341

            :: for key purge_seqno = 0

   for vbucket vb_6

            :: for key purge_seqno = 0

            :: for key uuid = 175130199501286

            :: for key high_seqno = 1569

   for vbucket vb_28

            :: for key high_seqno = 1553

            :: for key uuid = 170724551552615

            :: for key purge_seqno = 0

   for vbucket vb_29

            :: for key purge_seqno = 0

            :: for key uuid = 45359036031519

            :: for key high_seqno = 1569

   for vbucket vb_13

            :: for key purge_seqno = 0

            :: for key uuid = 84680084219325

            :: for key high_seqno = 1568

   for vbucket vb_26

            :: for key purge_seqno = 0

            :: for key uuid = 133471459893767

            :: for key high_seqno = 1569

   for vbucket vb_19

            :: for key purge_seqno = 0

            :: for key uuid = 251498636389290

            :: for key high_seqno = 1569

   for vbucket vb_27

            :: for key purge_seqno = 0

            :: for key uuid = 104126006114437

            :: for key high_seqno = 1553

   for vbucket vb_24

            :: for key high_seqno = 1571

            :: for key uuid = 184565815683126

            :: for key purge_seqno = 0

   for vbucket vb_25

            :: for key high_seqno = 1552

            :: for key uuid = 266765133502323

            :: for key purge_seqno = 0

   for vbucket vb_31

            :: for key purge_seqno = 0

            :: for key uuid = 170201395300887

            :: for key high_seqno = 1571

   for vbucket vb_30

            :: for key high_seqno = 1552

            :: for key uuid = 83362933025898

            :: for key purge_seqno = 0

-------------Node 10.6.2.148------------

   for vbucket vb_9

            :: for key high_seqno = 1558

            :: for key uuid = 224689180220009

            :: for key purge_seqno = 0

   for vbucket vb_10

            :: for key high_seqno = 1568

            :: for key uuid = 98217872738904

            :: for key purge_seqno = 0

   for vbucket vb_3

            :: for key high_seqno = 1570

            :: for key uuid = 189410128542953

            :: for key purge_seqno = 0

   for vbucket vb_2

            :: for key high_seqno = 1552

            :: for key uuid = 202175152188264

            :: for key purge_seqno = 0

   for vbucket vb_29

            :: for key purge_seqno = 0

            :: for key uuid = 45359036031519

            :: for key high_seqno = 1569

   for vbucket vb_15

            :: for key purge_seqno = 0

            :: for key uuid = 21343486345542

            :: for key high_seqno = 1569

   for vbucket vb_14

            :: for key purge_seqno = 0

            :: for key uuid = 9666779467087

            :: for key high_seqno = 1558

   for vbucket vb_17

            :: for key purge_seqno = 0

            :: for key uuid = 50245200692736

            :: for key high_seqno = 1570

   for vbucket vb_16

            :: for key high_seqno = 1560

            :: for key uuid = 113002939605931

            :: for key purge_seqno = 0

   for vbucket vb_19

            :: for key purge_seqno = 0

            :: for key uuid = 251498636389290

            :: for key high_seqno = 1569

   for vbucket vb_18

            :: for key high_seqno = 1559

            :: for key uuid = 5501229932749

        &nb