[MB-12227] [3.0.1-Windows] Add back operation failed Created: 23/Sep/14  Updated: 23/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sangharsh Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1313-rel

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
[Jenkins]
http://qa.hq.northscale.net/job/win_2008_x64--16_02--XDCR_SSL-P0/28/consoleFull

[Error Logs]
[2014-09-22 10:21:08,856] - [biXDCR:275] INFO - Failing over Destination Non-Master Node 10.3.2.244:8091
[2014-09-22 10:21:10,996] - [task:2257] INFO - Failing over 10.3.2.244:8091
[2014-09-22 10:22:46,005] - [rest_client:1040] INFO - fail_over node ns_1@10.3.2.244 successful
[2014-09-22 10:22:46,005] - [task:2237] INFO - 0 seconds sleep after failover, for nodes to go pending....
[2014-09-22 10:22:46,006] - [biXDCR:278] INFO - Add back Destination Non-Master Node 10.3.2.244:8091
[2014-09-22 10:22:46,155] - [rest_client:1073] INFO - add_back_node ns_1@10.3.2.244 successful
[2014-09-22 10:22:47,227] - [rest_client:1095] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.5.2.226%2Cns_1%4010.3.2.250%2Cns_1%4010.3.2.244
[2014-09-22 10:22:47,241] - [rest_client:1099] INFO - rebalance operation started
[2014-09-22 10:22:47,252] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:22:57,267] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:23:07,283] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:23:17,299] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:23:27,314] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:23:37,332] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:23:47,354] - [rest_client:1217] INFO - rebalance percentage : 0.00 %
[2014-09-22 10:23:57,371] - [rest_client:1200] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
[2014-09-22 10:23:57,408] - [rest_client:2033] INFO - Latest logs from UI on 10.5.2.226:
[2014-09-22 10:23:57,409] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.5.2.226', u'code': 2, u'text': u"Rebalance exited with reason {not_all_nodes_are_ready_yet,['ns_1@10.3.2.250']}\n", u'shortText': u'message', u'serverTime': u'2014-09-22T10:23:52.185Z', u'module': u'ns_orchestrator', u'tstamp': 1411406632185, u'type': u'info'}
[2014-09-22 10:23:57,409] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.3.2.244', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.2.244\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:54.650Z', u'module': u'ns_memcached', u'tstamp': 1411406574650, u'type': u'info'}
[2014-09-22 10:23:57,410] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.5.2.226', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:52.140Z', u'module': u'ns_rebalancer', u'tstamp': 1411406572140, u'type': u'info'}
[2014-09-22 10:23:57,410] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.3.2.244', u'code': 0, u'text': u'Deleting old data files of bucket "default"', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:50.072Z', u'module': u'ns_storage_conf', u'tstamp': 1411406570072, u'type': u'info'}
[2014-09-22 10:23:57,411] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.3.2.244', u'code': 0, u'text': u'Deleting old data files of bucket "sasl_bucket_1"', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:50.072Z', u'module': u'ns_storage_conf', u'tstamp': 1411406570072, u'type': u'info'}
[2014-09-22 10:23:57,411] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.3.2.244', u'code': 0, u'text': u'Deleting old data files of bucket "standard_bucket_1"', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:50.072Z', u'module': u'ns_storage_conf', u'tstamp': 1411406570072, u'type': u'info'}
[2014-09-22 10:23:57,412] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.5.2.226', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.5.2.226','ns_1@10.3.2.250',\n 'ns_1@10.3.2.244'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:48.006Z', u'module': u'ns_orchestrator', u'tstamp': 1411406568006, u'type': u'info'}
[2014-09-22 10:23:57,412] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.5.2.226', u'code': 0, u'text': u"Failed over 'ns_1@10.3.2.244': ok", u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:46.774Z', u'module': u'ns_rebalancer', u'tstamp': 1411406566774, u'type': u'info'}
[2014-09-22 10:23:57,413] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.5.2.226', u'code': 0, u'text': u'Janitor cleanup of "standard_bucket_1" failed after failover of \'ns_1@10.3.2.244\': {error,\n {badmatch,\n {error,\n {failed_nodes,\n [\'ns_1@10.3.2.250\']}}}}', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:46.774Z', u'module': u'ns_rebalancer', u'tstamp': 1411406566774, u'type': u'critical'}
[2014-09-22 10:23:57,414] - [rest_client:2034] ERROR - {u'node': u'ns_1@10.3.2.244', u'code': 0, u'text': u'Shutting down bucket "standard_bucket_1" on \'ns_1@10.3.2.244\' for deletion', u'shortText': u'message', u'serverTime': u'2014-09-22T10:22:18.619Z', u'module': u'ns_memcached', u'tstamp': 1411406538619, u'type': u'info'}
ERROR
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/taskmanager.py', 31, 'run', 'task.step(self)'), ('lib/tasks/task.py', 58, 'step', 'self.check(task_manager)'), ('lib/tasks/task.py', 370, 'check', 'self.set_exception(ex)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]

Failed after this. Issue seen on windows only.

[Test Steps]
1. Setup XDCR 3-3 Nodes.
2. Bidirectional CAPI mode replication between buckets: default, sasl_bucket_1, standard_bucket_1.
3. Load 10K items on each source and and destination cluster's bucket.
4. Failover a non-master node on destination. Add back same node. -> failed here

 Comments   
Comment by Sangharsh Agarwal [ 23/Sep/14 ]
Rebalance exited with reason {not_all_nodes_are_ready_yet,['ns_1@10.3.2.250'] -> While this node neither failed over or added back.

Comment by Sangharsh Agarwal [ 23/Sep/14 ]
Seen some crashes on 10.3.2.250:

[error_logger:info,2014-09-22T10:23:01.068,ns_1@10.3.2.250:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================INFO REPORT=========================
                      83
                      83
                      76
                      58
                      32
                      83
                      111
                      99
                      107
                      101
                      116
                      32
                      101
                      114
                      114
                      111
                      114
                      58
                      32
                      "econnaborted"
                      32
                      "\n"

[error_logger:error,2014-09-22T10:23:02.568,ns_1@10.3.2.250:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
  crasher:
    initial call: mochiweb_acceptor:init/3
    pid: <0.27022.15>
    registered_name: []
    exception error: no match of right hand side value {error,closed}
      in function mochiweb_http:request/2 (c:/Jenkins/workspace/cs_301_win6408/couchbase/couchdb/src/mochiweb/mochiweb_http.erl, line 54)
    ancestors: [https,ns_ssl_services_sup,menelaus_sup,ns_server_sup,
                  ns_server_cluster_sup,<0.57.0>]
    messages: [{ssl_closed,
                      {sslsocket,
                          {gen_tcp,#Port<0.62932>,tls_connection},
                          <0.27098.15>}}]
    links: [<0.19444.13>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 6782
  neighbours:





[MB-12226] [3.0.1] Rebalance operation hanged during online upgrade Created: 23/Sep/14  Updated: 23/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sangharsh Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Online Upgrade from 2.0.1-170 to 3.0.1309.

CentOS 5 64 bit.

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: 10.3.121.199 : https://s3.amazonaws.com/bugdb/jira/MB-12226/2fde8bb1/10.3.121.199-8091-diag.txt.gz
10.3.121.199 : https://s3.amazonaws.com/bugdb/jira/MB-12226/80625d16/10.3.121.199-9222014-518-diag.zip
10.3.3.126 : https://s3.amazonaws.com/bugdb/jira/MB-12226/ce3a5cf4/10.3.3.126-8091-diag.txt.gz
10.3.3.126 : https://s3.amazonaws.com/bugdb/jira/MB-12226/d6a72dec/10.3.3.126-9222014-58-diag.zip
10.3.5.11 : https://s3.amazonaws.com/bugdb/jira/MB-12226/7deec852/10.3.5.11-8091-diag.txt.gz
10.3.5.11 : https://s3.amazonaws.com/bugdb/jira/MB-12226/f1015de1/10.3.5.11-9222014-511-diag.zip
10.3.5.60 : https://s3.amazonaws.com/bugdb/jira/MB-12226/2b6552e4/10.3.5.60-9222014-516-diag.zip
10.3.5.60 : https://s3.amazonaws.com/bugdb/jira/MB-12226/33fcedfb/10.3.5.60-8091-diag.txt.gz
10.3.5.61 : https://s3.amazonaws.com/bugdb/jira/MB-12226/2cfdfd22/10.3.5.61-9222014-514-diag.zip
10.3.5.61 : https://s3.amazonaws.com/bugdb/jira/MB-12226/6bdd00ae/10.3.5.61-8091-diag.txt.gz
Is this a Regression?: Unknown

 Description   
[Live Cluster]
http://10.3.121.199:8091/index.html#sec=servers

Re-balance progress is still 66.6 since yesterday itself.

[Test Logs]
https://friendpaste.com/1mF2hXiYmVjpqoTHIGMd3F

[Error Logs]
014-09-22 04:56:44,289 - root - INFO - adding remote node @10.3.3.126:8091 to this cluster @10.3.121.199:8091
2014-09-22 04:56:47,986 - root - INFO - adding node 10.3.5.11:8091 to cluster
2014-09-22 04:56:47,986 - root - INFO - adding remote node @10.3.5.11:8091 to this cluster @10.3.121.199:8091
2014-09-22 04:56:51,696 - root - INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.121.199%2Cns_1%4010.3.3.126%2Cns_1%4010.3.5.11
2014-09-22 04:56:51,706 - root - INFO - rebalance operation started
2014-09-22 04:56:51,717 - root - INFO - rebalance percentage : 0.00 %
2014-09-22 04:57:01,735 - root - INFO - rebalance percentage : 6.51 %
2014-09-22 04:57:11,753 - root - INFO - rebalance percentage : 15.47 %
2014-09-22 04:57:21,770 - root - INFO - rebalance percentage : 24.84 %
2014-09-22 04:57:31,796 - root - INFO - rebalance percentage : 31.64 %
2014-09-22 04:57:41,814 - root - INFO - rebalance percentage : 38.55 %
2014-09-22 04:57:51,837 - root - INFO - rebalance percentage : 47.71 %
2014-09-22 04:58:01,859 - root - INFO - rebalance percentage : 57.72 %
2014-09-22 04:58:11,877 - root - INFO - rebalance percentage : 64.93 %
2014-09-22 04:58:21,894 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:58:31,913 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:58:41,931 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:58:51,949 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:01,966 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:11,983 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:22,000 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:32,017 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:42,034 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:52,052 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:02,069 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:12,086 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:22,103 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:32,120 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:42,137 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:52,154 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:02,172 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:12,189 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:22,206 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:32,223 - root - INFO - rebalance percentage : 66.65 %

[Test Steps]
1. Setup XDCR cluster 2-2 nodes. CAPI Mode. Node version: 2.0.1-170-rel.

Source: 10.3.3.126, 10.3.5.11
Destination: 10.3.5.60, 10.3.5.61

2. Do online upgrade on Source:
    a. add new node (10.3.121.199) with 3.0.1-1309-rel. check if new orchrestor node is new node.
    b. remove both old nodes from the cluster.
    c. Re-install both old nodes with 3.0.1-1309-rel.
    d. re-add both the nodes to the cluster. Failed here. Re-balance hangs here itself.


Cluster is liver for investigation. Issue is always reproducible on Centos cluster with build 3.0.1-1309-rel

 Comments   
Comment by Sangharsh Agarwal [ 23/Sep/14 ]
http://10.3.121.199:8091/index.html#sec=servers




[MB-12225] Seq iterator returns duplicated docs that have different seq numbers Created: 23/Sep/14  Updated: 23/Sep/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: bug-backlog
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Jung-Sang Ahn Assignee: Jung-Sang Ahn
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When we create an iterator for sequence numbers, ForestDB makes an AVL-tree to index documents in WAL, while the other documents in sequence B+tree are accessed through B+tree's iterator.

For each next operation, the ForestDB's iterator checks whether a document returned from the sequence B+tree is up-to-date or not, by traversing the AVL-tree. If the same document exists in the AVL-tree, the document from the sequence B+tree is discarded.

The problem is, if we create an iterator over a specific range, documents that corresponding to the range are only indexed in the AVL-tree. This may cause the failure of duplication check.

For example, suppose that there are 10 documents, and we insert and update the documents as follows:

Insert doc#1 -> seq num: 1
Insert doc#2 -> seq num: 2
...
Insert doc#10 -> seq num: 10
Update doc #2 -> seq num: 11
Update doc #4 -> seq num: 12
Update doc #6 -> seq num: 13
Update doc #8 -> seq num: 14
Update doc #10 -> seq num: 15

In this case, if we create an iterator over seq number range 1~12, the correct iteration sequence results are as follows:

doc #1 (seq num 1)
doc #3 (seq num 3)
doc #5 (seq num 5)
doc #7 (seq num 7)
doc #9 (seq num 9)
doc #2 (seq num 11)
doc #4 (seq num 12)

However, since the AVL-tree only contains doc#2 and doc#4, duplication checks for doc#6, #8, and #10 fail so that the actual results are as follows:

doc #1 (seq num 1)
doc #3 (seq num 3)
doc #5 (seq num 5)
doc #6 (seq num 6)
doc #7 (seq num 7)
doc #8 (seq num 8)
doc #9 (seq num 9)
doc #10 (seq num 10)
doc #2 (seq num 11)
doc #4 (seq num 12)

To solve this problem, we should index all WAL documents into the AVL-tree even though the documents are out of the range.





[MB-12224] Active vBuckets on one server drops to zero when increasing replia count from 1 to 2 Created: 22/Sep/14  Updated: 22/Sep/14  Resolved: 22/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Anil Kumar Assignee: Aleksey Kondratenko
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Scenario - Increasing the replica set from 1 to 2

1. 5 node cluster, single bucket 'default'
2. On bucket edit settings change the settings for replica count from 1 to 2 and Save
3. Rebalance operation
4. Monitoring Stats - vBucket Resources shows while rebalance operation is happening active vbucket counts drops less than 1024 (attached screenshots)

Expectation - We needed to increase the copies of data without affecting the active vbuckets.

What we are seeing is partial active vbuckets were not available at that?

 Comments   
Comment by Aleksey Kondratenko [ 22/Sep/14 ]
This is likely simply due to inconsistency of stats between nodes.

We need logs to diagnose this. Giving us access to machines is not as useful because we don't plan to deal with with bug soon.




[MB-12223] Test Automation Advancements for Sherlock Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: sherlock
Security Level: Public

Type: Improvement Priority: Major
Reporter: Raju Suravarjjala Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
placeholder for all the test automation work




[MB-12222] Duplicate existing cluster management ui using angularjs Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: sherlock
Fix Version/s: techdebt-backlog, sherlock
Security Level: Public

Type: Story Priority: Major
Reporter: Aleksey Kondratenko Assignee: Pavel Blagodov
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We're having difficulties maintaining current cells.js-based ui. Therefore to make our code base more accessible for wider audience of js developers we're working on rewriting js of our ui using mega-popular angularjs.




[MB-12221] N1QL should return version information Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Cihan Biyikoglu Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
n1ql should have a version() function that return version information. this will be useful for cases where behavioral changes are implemented and apps can issue queries that is tuned to specific n1ql versions.

if n1ql_version()=1.0 query='...' else if n1ql_version()=2.0 query='+++'






[MB-12220] Add unique id generation functions to n1ql Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: sherlock
Security Level: Public

Type: Improvement Priority: Major
Reporter: Cihan Biyikoglu Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
add unique id generation function to n1ql
new_uuid()




[MB-12219] HINTs for N1QL to suggest index selection and execution path Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: sherlock
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
ability to specify hints for index selection and execution path -
index selection scenario: multiple predicates in the WHERE and ORDER BY where it isn't obvious from stats which index out of many to pick. User gets to suggest one to N1QL.
execution path scenarios: type of join to apply or optimize for fast first results vs fast total execution etc.

pointing at sherlock but we can live without this in v1.

 Comments   
Comment by Cihan Biyikoglu [ 22/Sep/14 ]
feel free to push out of the sherlock release if this isn't being done in sherlock.
-cihan




[MB-12218] DGM cluster saw "out of memory" errors from couchstore on vbucket snapshot path Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Jim Walker Assignee: Jim Walker
Resolution: Unresolved Votes: 0
Labels: error-handling, memory
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Linux 3.2.0-68-virtual
[info] OS Version : Ubuntu 12.04.5 LTS
[info] CB Version : 3.0.0-1209-rel-enterprise

[info] Architecture : x86_64
[info] Virtual Host : Microsoft HyperV
[ok] Installed CPUs : 4
[ok] Installed RAM : 28140 MB
[ok] Used RAM : 69.9% (19658 / 28139 MB)

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: Some memcached.log files from cbase-43

http://customers.couchbase.com.s3.amazonaws.com/jimw/cbase-43-memcached.log.5.txt
http://customers.couchbase.com.s3.amazonaws.com/jimw/cbase-43-memcached.log.4.txt
Is this a Regression?: Unknown

 Description   
Raising this defect after looking at a large DGM cluster that had a stalled rebalance. It looks like some failures in couchstore (memory issues) lead to memcached termination and stall of the rebalance, whereas maybe the error could of been handled and ejection performed?

The cluster is a 4 node "large" scale cluster hosted in Azure. Cihan provided me access via a private key which I would rather people request from Cihan rather than me spreading the key around :) At the moment the cluster is stuck and there is historical logging data on a number of nodes indicating memory errors were caught, but lead to termination and I suspect the stall.

The tail end of the following file shows memory problems are detected and logged:
 
* http://customers.couchbase.com.s3.amazonaws.com/jimw/cbase-43-memcached.log.4.txt

Starting at 10:31 we see the following pattern.

Sat Sep 13 10:31:31.375401 UTC 3: (b1_full_ejection) Warning: couchstore_open_db failed, name=/data/couchbase/b1_full_ejection/1020.couch.1 option=1 rev=1 error=failed to allocate buffer [errno = 12: 'Cannot allocate memory']
Sat Sep 13 10:31:31.375461 UTC 3: (b1_full_ejection) Warning: failed to open database, name=/data/couchbase/b1_full_ejection/1020.couch.1020
Sat Sep 13 10:31:31.375474 UTC 3: (b1_full_ejection) Warning: failed to set new state, active, for vbucket 1020
Sat Sep 13 10:31:31.375398 UTC 3: (b1_full_ejection) Warning: couchstore_open_db failed, name= option=1 rev=1 error=failed to allocate buffer []
Sat Sep 13 10:31:31.375481 UTC 3: (b1_full_ejection) VBucket snapshot task failed!!! Rescheduling

And finally the file ends with:

Sat Sep 13 10:31:31.577731 UTC 3: (b1_full_ejection) nonio_worker_9: Exception caught in task "Checkpoint Remover on vb 189": std::bad_alloc

Next version of memcached.log is the following file which indicates that memcached was restarted:

* http://customers.couchbase.com.s3.amazonaws.com/jimw/cbase-43-memcached.log.5.txt

Sat Sep 13 10:32:29.783313 UTC 3: (b1_full_ejection) Trying to connect to mccouch: "127.0.0.1:11213"
Sat Sep 13 10:32:29.787504 UTC 3: (b1_full_ejection) Connected to mccouch: "127.0.0.1:11213"
Sat Sep 13 10:32:29.797130 UTC 3: (No Engine) Bucket b1_full_ejection registered with low priority
Sat Sep 13 10:32:29.797244 UTC 3: (No Engine) Spawning 4 readers, 4 writers, 1 auxIO, 1 nonIO threads
Sat Sep 13 10:32:30.100791 UTC 3: (b1_full_ejection) metadata loaded in 301 ms

cbcollect logs from 3 of 4 nodes (/tmp is tiny on node 41) which may be useful, but don't have the historical data from the live node as above)

http://customers.couchbase.com.s3.amazonaws.com/jimw/cbbase-43.zip
http://customers.couchbase.com.s3.amazonaws.com/jimw/cbbase-42.zip
http://customers.couchbase.com.s3.amazonaws.com/jimw/cbbase-40.zip

 Comments   
Comment by Jim Walker [ 22/Sep/14 ]
I'll take this unless there's an obvious dup or something already in the pipeline.




[MB-12217] Wrong parameter order in xdcr debug message Created: 22/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Chris Malarky Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The following message:

[xdcr:debug,2014-09-19T12:29:31.255,ns_1@ec2-xxx-xxx-xxx.compute-1.amazonaws.com:<0.25112.24>:concurrency_throttle:handle_call:88]no token available (total tokens:<0.25337.24>), put (pid:32, signal: start_replication, targetnode: "ec2-yyy-yyy-yyy-yyy.us-west-2.compute. amazonaws.com:8092") into waiting pool (active reps: 32, waiting reps: 305)

Is generated by:

http://src.couchbase.org/source/xref/2.5.1/ns_server/src/concurrency_throttle.erl#88

The parameters Pid and TotalTokens on line 90 need to be swapped around.




[MB-12216] XDCR@next release - simplified end-to-end test with kvfeed, router and xmem Created: 19/Sep/14  Updated: 19/Sep/14

Status: In Progress
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 24h
Time Spent: Not Specified
Original Estimate: 24h

Epic Link: XDCR next release




[MB-12215] Monitor the opening files or file discriptors via REST and create alert at certain threshold Created: 19/Sep/14  Updated: 19/Sep/14

Status: Open
Project: Couchbase Server
Component/s: RESTful-APIs
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Larry Liu Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to

 Description   
can we have a feature to monitor the opening files or file discriptors via REST and create alert at certain threshold?




[MB-12214] Move Sqoop provider to DCP Created: 19/Sep/14  Updated: 19/Sep/14

Status: Open
Project: Couchbase Server
Component/s: DCP
Affects Version/s: 3.0
Fix Version/s: sherlock
Security Level: Public

Type: Improvement Priority: Major
Reporter: Cihan Biyikoglu Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: tracking item for moving sqoop over to DCP.





[MB-12213] Get the couchbase-server_src.tar.gz for 3.0.0 Created: 18/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-12212] Update AMI for Sync Gateway to 1.0.2 Created: 18/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.2.0
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Jessica Liu Assignee: Wei-Li Liu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Sync Gateway was updated to 1.0.2. The AMIs related to Sync Gateway need to be updated including:

Sync Gateway Enterprise, only: https://aws.amazon.com/marketplace/pp/B00M28SG0E/ref=sp_mpg_product_title?ie=UTF8&sr=0-2

Couchbase Server + Sync Gateway Community: https://aws.amazon.com/marketplace/pp/B00FA8DO50/ref=sp_mpg_product_title?ie=UTF8&sr=0-5




[MB-12211] Investigate noop not closing connection in case where a dead connection is still attached to a failed node Created: 18/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
See MB-12158 for information on how to reproduce this issue and why it needs to be looked at on the ep-engine side.




[MB-12210] xdcr related services sometimes log debug and error messages to non-xdcr logs (was: XDCR Error Logging Improvement) Created: 18/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Chris Malarky Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: logging
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When debugging an XDCR issue some very useful information was in the ns_server.error.log but not the ns_server.xdcr_errors.log

ns_server.xdcr_errors.log:

[xdcr:error,2014-09-18T7:02:12.674,ns_1@ec2-XX-XX-XX-XX.compute-1.amazonaws.com:<0.8020.1657>:xdc_vbucket_rep:init_replication_state:496]Error in fetching remot bucket, error: timeout,sleep for 30 secs before retry.
[xdcr:error,2014-09-18T7:02:12.674,ns_1@ec2-XX-XX-XX-XX.compute-1.amazonaws.com:<0.8021.1657>:xdc_vbucket_rep:init_replication_state:503]Error in fetching remot bucket, error: all_nodes_failed, msg: <<"Failed to grab remote bucket `wi_backup_bucket_` from any of known nodes">>sleep for 30 secs before retry

ns_server.error.log:

[ns_server:error,2014-09-18T7:02:12.674,ns_1@ec2-XX-XX-XX-XX.compute-1.amazonaws.com:<0.8022.1657>:remote_clusters_info: do_mk_json_get:1460]Request to http://Administrator:****@10.x.x.x:8091/pools failed:
{error,rest_error,
       <<"Error connect_timeout happened during REST call get to http://10.x.x.x:8091/pools.">>,
       {error,connect_timeout}}
[ns_server:error,2014-09-18T7:02:12.674,ns_1@ec2-xx-xx-xx-xx.compute-1.amazonaws.com:remote_clusters_info<0.20250.6>: remote_clusters_info:handle_info:435]Failed to grab remote bucket `wi_backup_bucket_`: {error,rest_error,
                                                   <<"Error connect_timeout happened during REST call get to http://10.x.x.x:8091/pools.">>,
                                                   {error,connect_timeout}}

Is there any way these messages could appear in with the xdcr_errors.log ?

 Comments   
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
Yes. Valid request. And some of that but not all has been addressed in 3.0.
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
Good candidate for 3.0.1 but not necessarily important enough. I.e. in light of ongoing rewrite.




[MB-12209] [windows] failed to offline upgrade from 2.5.x to 3.0.1-1299 Created: 18/Sep/14  Updated: 19/Sep/14  Resolved: 19/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server 2008 r2 64-bit

Attachments: Zip Archive 12.11.10.145-9182014-1010-diag.zip     Zip Archive 12.11.10.145-9182014-922-diag.zip    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 2.5.1 on one node
Create default bucket
Load 1000 items to bucket
Offline upgrade from 2.5.1 to 3.0.1-1299
After upgrade, node reset to initial setup


 Comments   
Comment by Thuan Nguyen [ 18/Sep/14 ]
I got the same issue when offline upgrade from 2.5.0 to 3.0.1-1299. Updated the title
Comment by Thuan Nguyen [ 18/Sep/14 ]
cbcollectinfo of node failed to offline upgrade from 2.5.0 to 3.0.1-1299
Comment by Bin Cui [ 18/Sep/14 ]
http://review.couchbase.org/#/c/41473/




[MB-12208] Security Risk: XDCR logs emit entire Document contents in a error situations Created: 17/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.2.0, 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Critical
Reporter: Gokul Krishnan Assignee: Don Pinto
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Per recent discussions with CFO and contract teams, we need to ensure that Customer's Data (Document Keys and Values) aren't emitted in the logs. This poses a security risk and we need default logging throttle levels that don't emit document data in readable format.

Support team have noticed this in the 2.2 version, verifying if this behavior still exists in 2.5.1.

Example posted in a private comment below

 Comments   
Comment by Patrick Varley [ 18/Sep/14 ]
At the same time we need the ability to increase the log level on the fly and include this information, when we hit a wall and need that extra information.

Summarise:

default setting: Do not expose customer data.

Increase logging on the fly that might include customer data. Which the support team will explain to the end-user.
Comment by Cihan Biyikoglu [ 22/Sep/14 ]
lets triage for 3.0.1




[MB-12207] Related links could be clearer. Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I think it would be better if "Related link" at the bottom of the page was layout a little different and we added the ability to navigate (MB-12205) from the bottom of a page(Think long pages).

Maybe something like this could work:

Links

Parent Topic:
    Installation and upgrade
Previous Topic:
    Welcome to couchbase
Next Topic:
    uninstalling couchbase
Related Topics:
    Initial server setup
    Testing Couchbase Server
    Upgrading




[MB-12206] New 3.0 Doc Site, View and query pattern samples unparsed markup Created: 17/Sep/14  Updated: 17/Sep/14  Resolved: 17/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Ian McCloy Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
On the page

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Views/views-querySample.html

The view code examples under 'General advice' are not displayed properly.

 Comments   
Comment by Ruth Harris [ 17/Sep/14 ]
Fixed. Legacy formatting issues from previous source code.




[MB-12205] Doc-system: does not have a next page button. Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0-Beta
Security Level: Public

Type: Bug Priority: Major
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When reading a manual you normally want to go to the next page. It would be good to have a "next" button at the bottom of the page. Here is a good example:

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Views/views-operation.html




[MB-12204] New doc-system does not have anchors Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The support team uses anchors all the time to link customers directly to the selection that has the information they required.

 know that we have broken a number of sections out into their own page but there are still some long pages for example:

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Misc/security-client-ssl.html


It would be good if we could link the customer directly to: "Configuring the PHP client for SSL"

I have marked this as a blocker as it will affect the way the support team works today.




[MB-12203] Available-stats table formatted incorrectly Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Task Priority: Minor
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#available-stats


 Description   
See the pending_ops cell in the link below.

http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#available-stats

I believe "client connections blocked for operations in pending vbuckets" should all be in one cell.




[MB-12202] UI shows a cbrestore as XDCR ops Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Ian McCloy Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Linux 3.13.0-30-generic
[info] OS Version : Ubuntu 14.04 LTS
[info] CB Version : 2.5.1-1083-rel-enterprise

Attachments: PNG File cbrestoreXDCRops.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I noticed while doing a cbrestore of a backup on a cluster that doesn't have any XDCR configured that the stats in the UI showed ongoing ops for XDCR. (screenshot attached)

the stats code at
http://src.couchbase.org/source/xref/2.5.1/ns_server/src/stats_collector.erl#334 is including all set with meta as XDCR ops.

 Comments   
Comment by Aleksey Kondratenko [ 17/Sep/14 ]
That's the way it is. We have no way to distinguish sources of set-with-metas.




[MB-12201] Hotfix Rollup Release Created: 16/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Cihan Biyikoglu Assignee: Raju Suravarjjala
Resolution: Unresolved Votes: 0
Labels: hotfix
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
representing the rollup hotfix for 2.5.1 that includes all hotfixes (without the V8 change) release to date (sept 2014)

 Comments   
Comment by Dipti Borkar [ 16/Sep/14 ]
is this rollup still 2.5.1? it will create lots of confusion. can we tag it 2.5.2? or does that lead to another round of testing? there are way too many hot fixes so really need a new . release.
Comment by Cihan Biyikoglu [ 17/Sep/14 ]
Hi Dipti, to improve the hotfix management, we are changing the way we'll do hotfixes. the rollup will bring in more hotfixes together and ensure we provide customers the all fixes we know about. if we fixed an issue already at the time you requested your hotfix, there is not reason why we should risk exposing you to known+fixed issues in the version you are using. side effects of this should also improve life for support.
-cihan




[MB-12200] Seg fault during indexing on view-toy build testing Created: 16/Sep/14  Updated: 17/Sep/14  Resolved: 17/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Ketaki Gangal Assignee: Harsha Havanur
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: -3.0.0-700-hhs-toy
-Cen 64 Machines
- 7 Node cluster, 2 Buckets, 2 Views

Attachments: Zip Archive 10.6.2.168-9162014-106-diag.zip     Zip Archive 10.6.2.187-9162014-1010-diag.zip     File crash_beam.smp.rtf     File crash_toybuild.rtf    
Issue Links:
Duplicate
is duplicated by MB-11917 One node slow probably due to the Erl... Open
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
1. Load 70M, 100M on either bucket
2. Wait for initial indexing to complete
3. Start updates on the cluster 1K gets, 7K sets across the cluster

Seeing numerous cores from beam.smp.

Stack is attached.

Adding logs from the nodes.


 Comments   
Comment by Sriram Melkote [ 16/Sep/14 ]
Harsha, this appears to clearly be a NIF related regression. We need to discuss why our own testing didn't find this after you figure out the problem.
Comment by Volker Mische [ 16/Sep/14 ]
Siri, I haven't checked if it's the same issue, but the current patch doesn't pass our unit tests. See my comment at http://review.couchbase.org/41221
Comment by Ketaki Gangal [ 16/Sep/14 ]
Logs https://s3.amazonaws.com/bugdb/bug-12200/bug_12200.tar
Comment by Harsha Havanur [ 17/Sep/14 ]
The issue Volker mentioned is one of queue size. I am suspecting that if a context is in queue beyond 5 seconds and terminator loop destroys context and when doMapDoc loop dequeues the task it will result in SEGV if the ctx is already destroyed. Trying a fix with both increasing queue size as well as handling destroyed contexts.
Comment by Sriram Melkote [ 17/Sep/14 ]
Folks, let's follow this on MB-11917 as it's clear now that this bug is caused by the toy build as a result of proposed fix for MB-11917.




[MB-12199] curl -H arguments need to use double quotes Created: 16/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Matt Ingenthron Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Current documentation states:

Indicates that an HTTP PUT operation is requested.
-H 'Content-Type: application/json'

And that will fail, seemingly owing to the single quotes. See also:
https://twitter.com/RamSharp/status/511739806528077824


 Comments   
Comment by Ruth Harris [ 16/Sep/14 ]
TASK for TECHNICAL WRITER
Fix in 3.0 == FIXED: Added single quotes or removed quotes from around the http string in appropriate examples.
Design Doc rest file - added single quotes, Compaction rest file ok, Trbl design doc file ok

FIX in 2.5: TBD

-----------------------

CONCLUSION:
At least with PUT, both single and double quotes work around: Content-Type: application/json. Didn't check GET or DELETE.
With PUT and DELETE, no quotes and single quotes around the http string work. Note: Some of the examples are missing a single quote around the http string. Meaning, one quote is present, but either the ending or beginning quote is missing. Didn't check GET.

Perhaps a missing single quote around the http string was the problem?
Perhaps there was formatting tags associated with ZlatRam's byauth.ddoc code that was causing the problem?

----------------------

TEST ONE:
1. create a ddoc and view from the UI = testview and testddoc
2. retrieve the ddoc using GET
3. use single quotes around Content-Type: application/json and around the http string. Note: Some of the examples are missing single quotes around the http string.
code: curl -X GET -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_testddoc'
results: {
    "views": {
        "testview": {
            "map": "function (doc, meta) {\n emit(meta.id, null);\n}"
        }
    }
}

TEST TWO:
1. delete testddoc
2. use single quotes around Content-Type: application/json and around the http string
code: curl -X DELETE -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_testddoc'
results: {"ok":true,"id":"_design/dev_testddoc"}
visual check via UI: Yep, it's gone


TEST THREE:
1. create a myauth.ddoc text file using the code in the Couchbase design doc documentation page.
2. Use PUT to create a dev_myauth design doc
3. use single quotes around Content-Type: application/json and around the http string. Note: I used "| python -m json.tool" to get pretty print output

myauth.ddoc contents: {"views":{"byloc":{"map":"function (doc, meta) {\n if (meta.type == \"json\") {\n emit(doc.city, doc.sales);\n } else {\n emit([\"blob\"]);\n }\n}"}}}
code: curl -X PUT -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_myauth' -d @myauth.ddoc | python -m json.tool
results: {
    "id": "_design/dev_myauth",
    "ok": true
}
visual check via UI: Yep, it's there.

TEST FOUR:
1. copy myauth.ddoc to zlat.ddoc
2. Use PUT to create a dev_zlat design doc
3. use double quotes around Content-Type: application/json and single quotes around the http string.

zlat.ddoc contents: {"views":{"byloc":{"map":"function (doc, meta) {\n if (meta.type == \"json\") {\n emit(doc.city, doc.sales);\n } else {\n emit([\"blob\"]);\n }\n}"}}}
code: curl -X PUT -H "Content-Type: application/json" 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlat' -d @zlat.ddoc | python -m json.tool
results: {
    "id": "_design/dev_zlat",
    "ok": true
}
visual check via UI: Yep, it's there.


TEST FIVE:
1. create a ddoc text file using ZlatRam's ddoc code
2. flattened the formatting so it reflected the code in the Couchbase example (used above)
3. Use PUT and single quotes.

zlatram contents: {"views":{"byauth":{"map":"function (doc, username) {\n if (doc.type == \"session\" && doc.user == username && Date.Parse(doc.expires) > Date.Parse(Date.Now()) ) {\n emit(doc.token, null);\n }\n}"}}}
code: curl -X PUT -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlatram' -d @zlatram.ddoc | python -m json.tool
results: {
    "id": "_design/dev_zlatram",
    "ok": true
}
visual check via UI: Yep, it's there.

TEST SIX:
1. delete zlatram ddoc but without quotes around the http string: curl -X DELETE -H 'Content-Type: application/json' http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlatram
2. results: {
    "id": "_design/dev_zlatram",
    "ok": true
}
3. verify via UI: Yep, it gone
4. add zlatram but without quotes around the http string: curl -X PUT -H 'Content-Type: application/json' http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlatram
5. results: {
    "id": "_design/dev_zlatram",
    "ok": true
}
6. verify via UI: Yep, it back.




[MB-12197] Bucket deletion failing with error 500 reason: unknown {"_":"Bucket deletion not yet complete, but will continue."} Created: 16/Sep/14  Updated: 22/Sep/14  Resolved: 22/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Meenakshi Goel Assignee: Meenakshi Goel
Resolution: Fixed Votes: 0
Labels: windows, windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1299-rel

Attachments: Text File test.txt    
Triage: Triaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.hq.northscale.net/job/win_2008_x64--14_01--replica_read-P0/32/consoleFull
http://qa.hq.northscale.net/job/win_2008_x64--59--01--bucket_flush-P1/14/console
http://qa.hq.northscale.net/job/win_2008_x64--59_01--warmup-P1/6/consoleFull

Test to Reproduce:
newmemcapable.GetrTests.getr_test,nodes_init=4,GROUP=P0,expiration=60,wait_expiration=true,error=Not found for vbucket,descr=#simple getr replica_count=1 expiration=60 flags = 0 docs_ops=create cluster ops = None
flush.bucketflush.BucketFlushTests.bucketflush,items=20000,nodes_in=3,GROUP=P0

*Note that test doesn't fail but further do fails with "error 400 reason: unknown ["Prepare join failed. Node is already part of cluster."]" because cleanup wasn't successful.

Logs:
[rebalance:error,2014-09-15T9:36:01.989,ns_1@10.3.121.182:<0.6938.0>:ns_rebalancer:do_wait_buckets_shutdown:307]Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.3.121.182',
                                                         {'EXIT',
                                                          {old_buckets_shutdown_wait_failed,
                                                           ["default"]}}}]

[error_logger:error,2014-09-15T9:36:01.989,ns_1@10.3.121.182:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
  crasher:
    initial call: erlang:apply/2
    pid: <0.6938.0>
    registered_name: []
    exception exit: {buckets_shutdown_wait_failed,
                        [{'ns_1@10.3.121.182',
                             {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                     ["default"]}}}]}
      in function ns_rebalancer:do_wait_buckets_shutdown/1 (src/ns_rebalancer.erl, line 308)
      in call from ns_rebalancer:rebalance/5 (src/ns_rebalancer.erl, line 361)
    ancestors: [<0.811.0>,mb_master_sup,mb_master,ns_server_sup,
                  ns_server_cluster_sup,<0.57.0>]
    messages: []
    links: [<0.811.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 46422
    stack_size: 27
    reductions: 5472
  neighbours:

[user:info,2014-09-15T9:36:01.989,ns_1@10.3.121.182:<0.811.0>:ns_orchestrator:handle_info:483]Rebalance exited with reason {buckets_shutdown_wait_failed,
                              [{'ns_1@10.3.121.182',
                                {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                  ["default"]}}}]}
[ns_server:error,2014-09-15T9:36:09.645,ns_1@10.3.121.182:ns_memcached-default<0.4908.0>:ns_memcached:terminate:798]Failed to delete bucket "default": {error,{badmatch,{error,closed}}}

Uploading Logs

 Comments   
Comment by Meenakshi Goel [ 16/Sep/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12197/11dd43ca/10.3.121.182-9152014-938-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/e7795065/10.3.121.183-9152014-940-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/6442301b/10.3.121.102-9152014-942-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/10edf209/10.3.121.107-9152014-943-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/9f16f503/10.1.2.66-9152014-945-diag.zip
Comment by Ketaki Gangal [ 16/Sep/14 ]
Assigning to ns_server team for a first look.
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
For cases like this it's very useful to get sample of backtraces from memcached on bad node. Is it still running ?
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
Eh. It's windows....
Comment by Aleksey Kondratenko [ 17/Sep/14 ]
I've merged diagnostics commit (http://review.couchbase.org/41463). Please rerun, reproduce and give me new set of logs.
Comment by Meenakshi Goel [ 18/Sep/14 ]
Tested with 3.0.1-1307-rel, Please find logs below.
https://s3.amazonaws.com/bugdb/jira/MB-12197/c2191900/10.3.121.182-9172014-2245-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/28bc4a83/10.3.121.183-9172014-2246-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/8f1efbe5/10.3.121.102-9172014-2248-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/91a89d6a/10.3.121.107-9172014-2249-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/2d272074/10.1.2.66-9172014-2251-diag.zip
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
BTW I am indeed quite interested if this is specific to windows or not.
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
This continues to be superweird. Possibly another erlang bug. I need somebody to answer the following:

* can we reliably reproduce this on windows ?

* 100 % of the time ?

* if not (roughly) how often?

* can we reproduce this (at all) on GNU/Linux? How frequently?
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
No need to diagnose it any further. Thanks to Aliaksey we managed to understand this case and fix is going to be merged shortly.
Comment by Venu Uppalapati [ 18/Sep/14 ]
Here is my empirical observation for this issue:
1)I have the following inside a .bat script

C:\"Program Files"\Couchbase\Server\bin\couchbase-cli.exe bucket-delete -c 127.0.0.1:8091 --bucket=default -u Administrator -p password

C:\"Program Files"\Couchbase\Server\bin\couchbase-cli.exe rebalance -c 127.0.0.1:8091 --server-remove=172.23.106.180 -u Administrator -p password

2)I execute this script against a two node cluster with default bucket created, but with no data.

3)I see bucket deletion and rebalance fail in succession. This happened 4 times out of 4 trials.
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
http://review.couchbase.org/41474
Comment by Meenakshi Goel [ 19/Sep/14 ]
Tested with 3.0.1-1309-rel and no longer seeing the issue.
http://qa.hq.northscale.net/job/win_2008_x64--14_01--replica_read-P0/34/console




[MB-12196] [Windows] When I run cbworkloadgen.exe, I see a Warning message Created: 15/Sep/14  Updated: 19/Sep/14  Resolved: 19/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build 1299

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install 3.0.1_1299 build
Go to bin directory on the installation directory, run cbworkloadgen.exe
You will see the following warning:
WARNING:root:could not import snappy module. Compress/uncompress function will be skipped.

Expected behavior: The above warning should not appear


 Comments   
Comment by Bin Cui [ 19/Sep/14 ]
http://review.couchbase.org/#/c/41514/




[MB-12195] Update notifications does not seem to be working Created: 15/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Ian McCloy
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 5.8
2.5.0

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I have installed 2.5.0 build and enabled Update Notifications
Even though I enabled "Enable software Update Notifications", I keep getting "No Updates available"
I thought I will be notified in the UI that there is a 2.5.1 is available.

I have consulted Tony to see if I have done something wrong but he also confirmed that this seems to be an issue and is a bug

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
Based on dev tools we're getting "no new version" from phone home requests. So it's not UI bug.
Comment by Ian McCloy [ 17/Sep/14 ]
Added the missing available upgrade paths to the database,

2.5.0-1059-rel-enterprise -> 2.5.1-1083-rel-enterprise
2.2.0-837-rel-enterprise -> 2.5.1-1083-rel-enterprise
2.1.0-718-rel-enterprise -> 2.2.0-837-rel-enterprise

but it looks like the code that parses http://ph.couchbase.net/v2?callback=jQueryxxx isn't checking the database.




[MB-12194] [Windows] When you try to uninstall CB server it comes up with Installer wizard instead of uninstall Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build: 3.0.1_1299

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install Windows 3.0.1_1299 build
Try to uninstall the CB server
You will see the CB InstallShield Installation Wizard and then it comes up with the prompt of removing the selected application and all of its features

Expected result: It would be nice to come up with Uninstall Wizard instead of confusing Installation wizard




[MB-12193] Docs should explicitly state that we don't support online downgrades in the installation guide Created: 15/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Gokul Krishnan Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
In the installation guide, we should call out the fact that online downgrades (from 3.0 to 2.5.1) isn't supported and downgrades will require servers to be taken offline.

 Comments   
Comment by Ruth Harris [ 15/Sep/14 ]
In the 3.0 documentation:

Upgrading >
<note type="important">Online downgrades from 3.0 to 2.5.1 is not supported. Downgrades require that servers be taken offline.</note>

Should this be in the release notes too?
Comment by Matt Ingenthron [ 15/Sep/14 ]
"online" or "any"?
Comment by Ruth Harris [ 18/Sep/14 ]
Talked to Raju (QE) and all online downgrades are not supported. This is not a behavior change and is not appropriate for the core documentation. Removed the note from the upgrading section. Please advise whether this should be explicitly stated for all downgrades.

--Ruth




[MB-12192] XDCR : After warmup, replica items are not deleted in destination cluster Created: 15/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, DCP
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.x, 3.0.1-1297-rel

Attachments: Zip Archive 172.23.106.45-9152014-1553-diag.zip     GZip Archive 172.23.106.45-9152014-1623-couch.tar.gz     Zip Archive 172.23.106.46-9152014-1555-diag.zip     GZip Archive 172.23.106.46-9152014-1624-couch.tar.gz     Zip Archive 172.23.106.47-9152014-1558-diag.zip     GZip Archive 172.23.106.47-9152014-1624-couch.tar.gz     Zip Archive 172.23.106.48-9152014-160-diag.zip     GZip Archive 172.23.106.48-9152014-1624-couch.tar.gz    
Triage: Untriaged
Is this a Regression?: Yes

 Description   
Steps
--------
1. Setup uni-xdcr between 2 clusters with atleast 2 nodes
2. Load 5000 items onto 3 buckets at source, they get replicated to destination
3. Reboot a non-master node on destination (in this test .48)
4. After warmup, perform 30% updates and 30% deletes on source cluster
5. Deletes get propagated to active vbuckets on destination but replica vbuckets only experience partial deletion.

Important note
--------------------
This test had passed on 3.0.0-1208-rel and 3.0.0-1209-rel. However I'm able to reproduce this consistently on 3.0.1. Unsure if this is a recent regression.

2014-09-15 14:43:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', sasl_bucket_1 bucket
2014-09-15 14:43:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', standard_bucket_1 bucket
2014-09-15 14:43:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', default bucket

Testcase
------------
./testrunner -i /tmp/bixdcr.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,reboot=dest_node,items=5000,rdirection=unidirection,replication_type=xmem,standard_buckets=1,sasl_buckets=1,pause=source,doc-ops=update-delete,doc-ops-dest=update-delete

On destination cluster
-----------------------------

Arunas-MacBook-Pro:bin apiravi$ ./cbvdiff 172.23.106.47:11210,172.23.106.48:11210
VBucket 512: active count 4 != 6 replica count

VBucket 513: active count 2 != 4 replica count

VBucket 514: active count 8 != 11 replica count

VBucket 515: active count 3 != 4 replica count

VBucket 516: active count 8 != 10 replica count

VBucket 517: active count 5 != 6 replica count

VBucket 521: active count 0 != 1 replica count

VBucket 522: active count 7 != 11 replica count

VBucket 523: active count 3 != 5 replica count

VBucket 524: active count 6 != 10 replica count

VBucket 525: active count 4 != 6 replica count

VBucket 526: active count 4 != 6 replica count

VBucket 528: active count 7 != 10 replica count

VBucket 529: active count 3 != 4 replica count

VBucket 530: active count 3 != 4 replica count

VBucket 532: active count 0 != 2 replica count

VBucket 533: active count 1 != 2 replica count

VBucket 534: active count 8 != 10 replica count

VBucket 535: active count 5 != 6 replica count

VBucket 536: active count 7 != 11 replica count

VBucket 537: active count 3 != 5 replica count

VBucket 540: active count 3 != 4 replica count

VBucket 542: active count 6 != 10 replica count

VBucket 543: active count 4 != 6 replica count

VBucket 544: active count 6 != 10 replica count

VBucket 545: active count 3 != 4 replica count

VBucket 547: active count 0 != 1 replica count

VBucket 548: active count 6 != 7 replica count

VBucket 550: active count 7 != 10 replica count

VBucket 551: active count 4 != 5 replica count

VBucket 552: active count 9 != 11 replica count

VBucket 553: active count 4 != 6 replica count

VBucket 554: active count 4 != 5 replica count

VBucket 555: active count 1 != 2 replica count

VBucket 558: active count 7 != 10 replica count

VBucket 559: active count 3 != 4 replica count

VBucket 562: active count 6 != 10 replica count

VBucket 563: active count 4 != 5 replica count

VBucket 564: active count 7 != 10 replica count

VBucket 565: active count 4 != 5 replica count

VBucket 566: active count 4 != 5 replica count

VBucket 568: active count 3 != 4 replica count

VBucket 570: active count 8 != 10 replica count

VBucket 571: active count 4 != 6 replica count

VBucket 572: active count 7 != 10 replica count

VBucket 573: active count 3 != 4 replica count

VBucket 574: active count 0 != 1 replica count

VBucket 575: active count 0 != 1 replica count

VBucket 578: active count 8 != 10 replica count

VBucket 579: active count 4 != 6 replica count

VBucket 580: active count 8 != 11 replica count

VBucket 581: active count 3 != 4 replica count

VBucket 582: active count 3 != 4 replica count

VBucket 583: active count 1 != 2 replica count

VBucket 584: active count 3 != 4 replica count

VBucket 586: active count 6 != 10 replica count

VBucket 587: active count 3 != 4 replica count

VBucket 588: active count 7 != 10 replica count

VBucket 589: active count 4 != 5 replica count

VBucket 591: active count 0 != 2 replica count

VBucket 592: active count 8 != 10 replica count

VBucket 593: active count 4 != 6 replica count

VBucket 594: active count 0 != 1 replica count

VBucket 595: active count 0 != 1 replica count

VBucket 596: active count 4 != 6 replica count

VBucket 598: active count 7 != 10 replica count

VBucket 599: active count 3 != 4 replica count

VBucket 600: active count 6 != 10 replica count

VBucket 601: active count 3 != 4 replica count

VBucket 602: active count 4 != 6 replica count

VBucket 606: active count 7 != 10 replica count

VBucket 607: active count 4 != 5 replica count

VBucket 608: active count 7 != 11 replica count

VBucket 609: active count 3 != 5 replica count

VBucket 610: active count 3 != 4 replica count

VBucket 613: active count 0 != 1 replica count

VBucket 614: active count 6 != 10 replica count

VBucket 615: active count 4 != 6 replica count

VBucket 616: active count 7 != 10 replica count

VBucket 617: active count 3 != 4 replica count

VBucket 620: active count 3 != 4 replica count

VBucket 621: active count 1 != 2 replica count

VBucket 622: active count 9 != 11 replica count

VBucket 623: active count 5 != 6 replica count

VBucket 624: active count 5 != 6 replica count

VBucket 626: active count 7 != 11 replica count

VBucket 627: active count 3 != 5 replica count

VBucket 628: active count 6 != 10 replica count

VBucket 629: active count 4 != 6 replica count

VBucket 632: active count 0 != 1 replica count

VBucket 633: active count 0 != 1 replica count

VBucket 634: active count 7 != 10 replica count

VBucket 635: active count 3 != 4 replica count

VBucket 636: active count 8 != 10 replica count

VBucket 637: active count 5 != 6 replica count

VBucket 638: active count 5 != 6 replica count

VBucket 640: active count 2 != 4 replica count

VBucket 641: active count 7 != 11 replica count

VBucket 643: active count 5 != 7 replica count

VBucket 646: active count 3 != 5 replica count

VBucket 647: active count 7 != 10 replica count

VBucket 648: active count 4 != 6 replica count

VBucket 649: active count 8 != 10 replica count

VBucket 651: active count 0 != 1 replica count

VBucket 653: active count 4 != 6 replica count

VBucket 654: active count 3 != 4 replica count

VBucket 655: active count 7 != 10 replica count

VBucket 657: active count 4 != 5 replica count

VBucket 658: active count 2 != 4 replica count

VBucket 659: active count 7 != 11 replica count

VBucket 660: active count 3 != 5 replica count

VBucket 661: active count 7 != 10 replica count

VBucket 662: active count 0 != 2 replica count

VBucket 666: active count 4 != 6 replica count

VBucket 667: active count 8 != 10 replica count

VBucket 668: active count 3 != 4 replica count

VBucket 669: active count 7 != 10 replica count

VBucket 670: active count 1 != 2 replica count

VBucket 671: active count 2 != 3 replica count

VBucket 673: active count 0 != 1 replica count

VBucket 674: active count 3 != 4 replica count

VBucket 675: active count 7 != 10 replica count

VBucket 676: active count 5 != 6 replica count

VBucket 677: active count 8 != 10 replica count

VBucket 679: active count 5 != 6 replica count

VBucket 681: active count 6 != 7 replica count

VBucket 682: active count 3 != 5 replica count

VBucket 683: active count 8 != 12 replica count

VBucket 684: active count 3 != 6 replica count

VBucket 685: active count 7 != 11 replica count

VBucket 688: active count 3 != 4 replica count

VBucket 689: active count 7 != 10 replica count

VBucket 692: active count 1 != 2 replica count

VBucket 693: active count 2 != 3 replica count

VBucket 694: active count 5 != 6 replica count

VBucket 695: active count 8 != 10 replica count

VBucket 696: active count 3 != 5 replica count

VBucket 697: active count 8 != 12 replica count

VBucket 699: active count 4 != 5 replica count

VBucket 700: active count 0 != 1 replica count

VBucket 702: active count 3 != 6 replica count

VBucket 703: active count 7 != 11 replica count

VBucket 704: active count 3 != 5 replica count

VBucket 705: active count 8 != 12 replica count

VBucket 709: active count 4 != 5 replica count

VBucket 710: active count 3 != 6 replica count

VBucket 711: active count 7 != 11 replica count

VBucket 712: active count 3 != 4 replica count

VBucket 713: active count 7 != 10 replica count

VBucket 715: active count 3 != 4 replica count

VBucket 716: active count 1 != 2 replica count

VBucket 717: active count 0 != 2 replica count

VBucket 718: active count 5 != 6 replica count

VBucket 719: active count 8 != 10 replica count

VBucket 720: active count 0 != 1 replica count

VBucket 722: active count 3 != 5 replica count

VBucket 723: active count 8 != 12 replica count

VBucket 724: active count 3 != 6 replica count

VBucket 725: active count 7 != 11 replica count

VBucket 727: active count 5 != 7 replica count

VBucket 728: active count 2 != 4 replica count

VBucket 729: active count 3 != 5 replica count

VBucket 730: active count 3 != 4 replica count

VBucket 731: active count 7 != 10 replica count

VBucket 732: active count 5 != 6 replica count

VBucket 733: active count 8 != 10 replica count

VBucket 737: active count 3 != 4 replica count

VBucket 738: active count 4 != 6 replica count

VBucket 739: active count 8 != 10 replica count

VBucket 740: active count 3 != 4 replica count

VBucket 741: active count 7 != 10 replica count

VBucket 743: active count 0 != 1 replica count

VBucket 746: active count 2 != 4 replica count

VBucket 747: active count 7 != 11 replica count

VBucket 748: active count 3 != 5 replica count

VBucket 749: active count 7 != 10 replica count

VBucket 751: active count 3 != 4 replica count

VBucket 752: active count 4 != 6 replica count

VBucket 753: active count 9 != 11 replica count

VBucket 754: active count 1 != 2 replica count

VBucket 755: active count 4 != 5 replica count

VBucket 758: active count 3 != 4 replica count

VBucket 759: active count 7 != 10 replica count

VBucket 760: active count 2 != 4 replica count

VBucket 761: active count 7 != 11 replica count

VBucket 762: active count 0 != 1 replica count

VBucket 765: active count 6 != 7 replica count

VBucket 766: active count 3 != 5 replica count

VBucket 767: active count 7 != 10 replica count

VBucket 770: active count 3 != 5 replica count

VBucket 771: active count 7 != 11 replica count

VBucket 772: active count 4 != 6 replica count

VBucket 773: active count 6 != 10 replica count

VBucket 775: active count 3 != 4 replica count

VBucket 777: active count 3 != 4 replica count

VBucket 778: active count 3 != 4 replica count

VBucket 779: active count 7 != 10 replica count

VBucket 780: active count 5 != 6 replica count

VBucket 781: active count 8 != 10 replica count

VBucket 782: active count 1 != 2 replica count

VBucket 783: active count 0 != 2 replica count

VBucket 784: active count 3 != 5 replica count

VBucket 785: active count 7 != 11 replica count

VBucket 786: active count 0 != 1 replica count

VBucket 789: active count 4 != 6 replica count

VBucket 790: active count 4 != 6 replica count

VBucket 791: active count 6 != 10 replica count

VBucket 792: active count 3 != 4 replica count

VBucket 793: active count 8 != 11 replica count

VBucket 794: active count 2 != 4 replica count

VBucket 795: active count 4 != 6 replica count

VBucket 798: active count 5 != 6 replica count

VBucket 799: active count 8 != 10 replica count

VBucket 800: active count 4 != 6 replica count

VBucket 801: active count 8 != 10 replica count

VBucket 803: active count 3 != 4 replica count

VBucket 804: active count 0 != 1 replica count

VBucket 805: active count 0 != 1 replica count

VBucket 806: active count 3 != 4 replica count

VBucket 807: active count 7 != 10 replica count

VBucket 808: active count 3 != 4 replica count

VBucket 809: active count 6 != 10 replica count

VBucket 813: active count 4 != 5 replica count

VBucket 814: active count 4 != 5 replica count

VBucket 815: active count 7 != 10 replica count

VBucket 816: active count 1 != 2 replica count

VBucket 817: active count 4 != 5 replica count

VBucket 818: active count 4 != 6 replica count

VBucket 819: active count 8 != 10 replica count

VBucket 820: active count 3 != 4 replica count

VBucket 821: active count 7 != 10 replica count

VBucket 824: active count 0 != 1 replica count

VBucket 826: active count 3 != 4 replica count

VBucket 827: active count 6 != 10 replica count

VBucket 828: active count 4 != 5 replica count

VBucket 829: active count 7 != 10 replica count

VBucket 831: active count 6 != 7 replica count

VBucket 833: active count 4 != 6 replica count

VBucket 834: active count 3 != 4 replica count

VBucket 835: active count 6 != 10 replica count

VBucket 836: active count 4 != 5 replica count

VBucket 837: active count 7 != 10 replica count

VBucket 840: active count 0 != 1 replica count

VBucket 841: active count 0 != 1 replica count

VBucket 842: active count 4 != 6 replica count

VBucket 843: active count 8 != 10 replica count

VBucket 844: active count 3 != 4 replica count

VBucket 845: active count 7 != 10 replica count

VBucket 847: active count 4 != 6 replica count

VBucket 848: active count 3 != 4 replica count

VBucket 849: active count 6 != 10 replica count

VBucket 851: active count 3 != 4 replica count

VBucket 852: active count 0 != 2 replica count

VBucket 854: active count 4 != 5 replica count

VBucket 855: active count 7 != 10 replica count

VBucket 856: active count 4 != 6 replica count

VBucket 857: active count 8 != 10 replica count

VBucket 860: active count 1 != 2 replica count

VBucket 861: active count 3 != 4 replica count

VBucket 862: active count 3 != 4 replica count

VBucket 863: active count 8 != 11 replica count

VBucket 864: active count 3 != 4 replica count

VBucket 865: active count 7 != 10 replica count

VBucket 866: active count 0 != 1 replica count

VBucket 867: active count 0 != 1 replica count

VBucket 869: active count 5 != 6 replica count

VBucket 870: active count 5 != 6 replica count

VBucket 871: active count 8 != 10 replica count

VBucket 872: active count 3 != 5 replica count

VBucket 873: active count 7 != 11 replica count

VBucket 875: active count 5 != 6 replica count

VBucket 878: active count 4 != 6 replica count

VBucket 879: active count 6 != 10 replica count

VBucket 882: active count 3 != 4 replica count

VBucket 883: active count 7 != 10 replica count

VBucket 884: active count 5 != 6 replica count

VBucket 885: active count 9 != 11 replica count

VBucket 886: active count 1 != 2 replica count

VBucket 887: active count 3 != 4 replica count

VBucket 889: active count 3 != 4 replica count

VBucket 890: active count 3 != 5 replica count

VBucket 891: active count 7 != 11 replica count

VBucket 892: active count 4 != 6 replica count

VBucket 893: active count 6 != 10 replica count

VBucket 894: active count 0 != 1 replica count

VBucket 896: active count 8 != 10 replica count

VBucket 897: active count 4 != 6 replica count

VBucket 900: active count 2 != 3 replica count

VBucket 901: active count 2 != 3 replica count

VBucket 902: active count 7 != 10 replica count

VBucket 903: active count 3 != 4 replica count

VBucket 904: active count 7 != 11 replica count

VBucket 905: active count 2 != 4 replica count

VBucket 906: active count 4 != 5 replica count

VBucket 909: active count 0 != 2 replica count

VBucket 910: active count 7 != 10 replica count

VBucket 911: active count 3 != 5 replica count

VBucket 912: active count 0 != 1 replica count

VBucket 914: active count 8 != 10 replica count

VBucket 915: active count 4 != 6 replica count

VBucket 916: active count 7 != 10 replica count

VBucket 917: active count 3 != 4 replica count

VBucket 918: active count 4 != 6 replica count

VBucket 920: active count 5 != 7 replica count

VBucket 922: active count 7 != 11 replica count

VBucket 923: active count 2 != 4 replica count

VBucket 924: active count 7 != 10 replica count

VBucket 925: active count 3 != 5 replica count

VBucket 928: active count 4 != 5 replica count

VBucket 930: active count 8 != 12 replica count

VBucket 931: active count 3 != 5 replica count

VBucket 932: active count 7 != 11 replica count

VBucket 933: active count 3 != 6 replica count

VBucket 935: active count 0 != 1 replica count

VBucket 938: active count 7 != 10 replica count

VBucket 939: active count 3 != 4 replica count

VBucket 940: active count 8 != 10 replica count

VBucket 941: active count 5 != 6 replica count

VBucket 942: active count 2 != 3 replica count

VBucket 943: active count 1 != 2 replica count

VBucket 944: active count 8 != 12 replica count

VBucket 945: active count 3 != 5 replica count

VBucket 946: active count 6 != 7 replica count

VBucket 950: active count 7 != 11 replica count

VBucket 951: active count 3 != 6 replica count

VBucket 952: active count 7 != 10 replica count

VBucket 953: active count 3 != 4 replica count

VBucket 954: active count 0 != 1 replica count

VBucket 956: active count 5 != 6 replica count

VBucket 958: active count 8 != 10 replica count

VBucket 959: active count 5 != 6 replica count

VBucket 960: active count 7 != 10 replica count

VBucket 961: active count 3 != 4 replica count

VBucket 962: active count 3 != 5 replica count

VBucket 963: active count 2 != 4 replica count

VBucket 966: active count 8 != 10 replica count

VBucket 967: active count 5 != 6 replica count

VBucket 968: active count 8 != 12 replica count

VBucket 969: active count 3 != 5 replica count

VBucket 971: active count 0 != 1 replica count

VBucket 972: active count 5 != 7 replica count

VBucket 974: active count 7 != 11 replica count

VBucket 975: active count 3 != 6 replica count

VBucket 976: active count 3 != 4 replica count

VBucket 978: active count 7 != 10 replica count

VBucket 979: active count 3 != 4 replica count

VBucket 980: active count 8 != 10 replica count

VBucket 981: active count 5 != 6 replica count

VBucket 982: active count 0 != 2 replica count

VBucket 983: active count 1 != 2 replica count

VBucket 986: active count 8 != 12 replica count

VBucket 987: active count 3 != 5 replica count

VBucket 988: active count 7 != 11 replica count

VBucket 989: active count 3 != 6 replica count

VBucket 990: active count 4 != 5 replica count

VBucket 993: active count 0 != 1 replica count

VBucket 994: active count 7 != 11 replica count

VBucket 995: active count 2 != 4 replica count

VBucket 996: active count 7 != 10 replica count

VBucket 997: active count 3 != 5 replica count

VBucket 998: active count 5 != 6 replica count

VBucket 1000: active count 4 != 5 replica count

VBucket 1001: active count 1 != 2 replica count

VBucket 1002: active count 9 != 11 replica count

VBucket 1003: active count 4 != 6 replica count

VBucket 1004: active count 7 != 10 replica count

VBucket 1005: active count 3 != 4 replica count

VBucket 1008: active count 7 != 11 replica count

VBucket 1009: active count 2 != 4 replica count

VBucket 1012: active count 4 != 5 replica count

VBucket 1014: active count 7 != 10 replica count

VBucket 1015: active count 3 != 5 replica count

VBucket 1016: active count 8 != 10 replica count

VBucket 1017: active count 4 != 6 replica count

VBucket 1018: active count 3 != 4 replica count

VBucket 1020: active count 0 != 1 replica count

VBucket 1022: active count 7 != 10 replica count

VBucket 1023: active count 3 != 4 replica count

Active item count = 3500

Same at source
----------------------
Arunas-MacBook-Pro:bin apiravi$ ./cbvdiff 172.23.106.45:11210,172.23.106.46:11210
Active item count = 3500

Will attach cbcollect and data files.


 Comments   
Comment by Mike Wiederhold [ 15/Sep/14 ]
This is not a bug. We no longer do this because a replica vbucket cannot delete items on it's own due to dcp.
Comment by Aruna Piravi [ 15/Sep/14 ]
I do not understand why this is not a bug. This is a case where replica items = 4250 and active = 3500. Both were initially 5000 before warmup. However 50% of the actual deletes have happened on replica bucket(5000->4250). And so I would expect the another 750 items to be deleted too so active=replica. If this is not a bug, in case of failover, the cluster will end up having more items than it did before the failover.
Comment by Aruna Piravi [ 15/Sep/14 ]
> We no longer do this because a replica vbucket cannot delete items on it's own due to dcp
Then I would expect the deletes to be propagated from active vbuckets through dcp..but these never get propagated. If you do a cbdiff even now, you can see the mismatch.
Comment by Sriram Ganesan [ 17/Sep/14 ]
Aruna

If there is a testrunner script available for steps (1) - (5), please update the bug. Thanks.
Comment by Aruna Piravi [ 17/Sep/14 ]
Done.
Comment by Aruna Piravi [ 19/Sep/14 ]
On 3 runs in 3.0.1-1309 in same environment where I was able to consistently reproduce until build1307, I do not see this mismatch. I'm not sure if any recent check-in helped. It seems to me a tricky case that is visible is some builds but not others. In any case I request that we look at the logs in cases where we have reproduced this problem to ascertain the cause. Thanks.
Comment by Aruna Piravi [ 22/Sep/14 ]
Not seeing this in most recent build - 3.0.1-1313 either. Reducing severity. Will resolve once the cause is known.




[MB-12191] forestdb needs an fdb_destroy() api to clean up a db Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Sundar Sridharan Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: Unknown

 Description   
forestdb does not have an option to clean up a database.
Manual deletion of the database files after fdb_close() and fdb_shutdown() is the workaround.
fdb_destroy() option needs to be added which will erase all forestdb files cleanly.




[MB-12190] Typo in the output of couchbase-cli bucket-flush Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: cli
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
There should be a space between the full stop and Do.

[patrick:~] 2 $ couchbase-cli bucket-flush -b Test -c localhost
Running this command will totally PURGE database data from disk.Do you really want to do it? (Yes/No)

Another Typo when the command times out:

Running this command will totally PURGE database data from disk.Do you really want to do it? (Yes/No)TIMED OUT: command: bucket-flush: localhost:8091, most likely bucket is not flushed





[MB-12189] (misunderstanding) XDCR REST API "max-concurrency" only works for 1 of 3 documented end-points. Created: 15/Sep/14  Updated: 17/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server, RESTful-APIs
Affects Version/s: 2.5.1, 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Jim Walker Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: supportability, xdcr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Couchbase Server 2.5.1
RHEL 6.4
VM (VirtualBox0
1 node "cluster"

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
This defect relates to the following REST APIs:

* xdcrMaxConcurrentReps (default 32) http://localhost:8091/internalSettings/
* maxConcurrentReps (default 32) http://localhost:8091/settings/replications/
* maxConcurrentReps (default 32) http://localhost:8091/settings/replications/ <replication_id>

The documentation suggests these all do the same thing, but with the scope of change being different.

<docs>
/settings/replications/ — global settings applied to all replications for a cluster
settings/replications/<replication_id> — settings for specific replication for a bucket
/internalSettings - settings applied to all replications for a cluster. Endpoint exists in Couchbase 2.0 and onward.
</docs>

This defect is because only "settings/replications/<replication_id>" has any effect. The other REST endpoints have no effect.

Out of these APIs I can confirm that changing "/settings/replications/<replication_id>" has an effect. The XDCR code shows that the concurrent reps setting feeds into the concurreny throttle as the number of available tokens. I use xdcr log files where we print the concurrency throttle token data to observe that the setting has an effect.

For example, a cluster in the default configuration has a total tokens of 32. We can grep to see this.

[root@localhost logs]# grep "is done normally, total tokens:" xdcr.*
2014-09-15T13:09:03.886,ns_1@127.0.0.1:<0.32370.0>:concurrency_throttle:clean_concurr_throttle_state:275]rep <0.33.1> to node "192.168.69.102:8092" is done normally, total tokens: 32, available tokens: 32,(active reps: 0, waiting reps: 0)

Now changing the setting to 42 the log file shows the change take affect.

curl -u Administrator:password http://localhost:8091/settings/replications/01d38792865ba2d624edb4b2ad2bf07f%2fdefault%2fdefault -d maxConcurrentReps=42

[root@localhost logs]# grep "is done normally, total tokens:" xdcr.*
dcr.1:[xdcr:debug,2014-09-15T13:17:41.112,ns_1@127.0.0.1:<0.32370.0>:concurrency_throttle:clean_concurr_throttle_state:275]rep <0.2321.1> to node "192.168.69.102:8092" is done normally, total tokens: 42, available tokens: 42,(active reps: 0, waiting reps: 0)

Since this defect is that both of the other two REST end-points don't appear to have any affect here's an example changing "settings/replication". This example was on a clean cluster, i.e. no other settings have been changed. Only creating bucket and replication + client writes has been performed.

root@localhost logs]# curl -u Administrator:password http://localhost:8091/settings/replications/ -d maxConcurrentReps=48
{"maxConcurrentReps":48,"checkpointInterval":1800,"docBatchSizeKb":2048,"failureRestartInterval":30,"workerBatchSize":500,"connectionTimeout":180,"workerProcesses":4,"httpConnections":20,"retriesPerRequest":2,"optimisticReplicationThreshold":256,"socketOptions":{"keepalive":true,"nodelay":false},"supervisorMaxR":25,"supervisorMaxT":5,"traceDumpInvprob":1000}

Above shows that the JSON has acknowledged the value of 48 but the log files show no change. After much waiting and re-checking grep shows no evidence.

[root@localhost logs]# grep "is done normally, total tokens:" xdcr.* | grep "total tokens: 48" | wc -l
0
[root@localhost logs]# grep "is done normally, total tokens:" xdcr.* | grep "total tokens: 32" | wc -l
7713

The same was observed for /internalSettings/

Found on both 2.5.1 and 3.0.

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
This is because global settings affect new replications or replications without per-replication settings defined. UI always defines all per-replication settings.
Comment by Jim Walker [ 16/Sep/14 ]
Have you pushed a documentation update for this?
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
No. I don't own docs.
Comment by Jim Walker [ 17/Sep/14 ]
Then this issue is not resolved.

Closing/resolving this defect with breadcrumbs to the opening of an issue on a different project would suffice as a satisfactory resolution.

You can also very easily put a pull request into docs on github with the correct behaviour.

Can you please perform *one* of those task so that the REST api here is correctly documented with the behaviours you are aware of and this matter can be closed.
Comment by Jim Walker [ 17/Sep/14 ]
Resolution requires either:

* Corrected documentation pushed to documentation repository.
* Enough accurate API information placed into a documentation defect so docs-team can correct.





[MB-12188] we should not duplicate log messages if we already have logs with "repeated n times" template Created: 15/Sep/14  Updated: 15/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Andrei Baranouski Assignee: Aleksey Kondratenko
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File MB-12188.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
please see screenshot,

think that logs without "repeated n times" are unnecessary

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
They _are_. The logic (and it's same logic as many logging products have) is _if_ in short period of time (say 5 minutes) you have a bunch of same messages, it'll log them once. But if periods between messages is larger, then they're logged separately.




[MB-12187] Webinterface is not displaying items above 2.5kb in size Created: 15/Sep/14  Updated: 15/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.5.1, 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Philipp Fehre Assignee: Aleksey Kondratenko
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: MacOS, Webinterface

Attachments: PNG File document_size_couchbase.png    

 Description   
When trying to display a document which is above 2.5kb the web-interface will block the display. 2.5kb seems like a really low limit and is easily reach by regular documents, which makes using the web-interface inefficient especially when a bucket contains many documents that are close to this limit.
It makes sense to have a limit to not having to load really big documents into the interface but 2.5kb seems like a really low limit.

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
by design. Older browsers have trouble with larger docs. And there must be duplicate of this somewhere




[MB-12186] If flush can not be completed because of a timeout, we should not display a message "Failed to flush bucket" when it's still in progress Created: 15/Sep/14  Updated: 17/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Andrei Baranouski Assignee: Aleksey Kondratenko
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1208

Attachments: PNG File MB-12186.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When I tried to flush heavily loaded cluster I received "Failed To Flush Bucket" popup, in fact it not failed, but simply has not been completed for a set period of time(30 sec)?

expected behaviour: message like "flush is not complete, but continue..."

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
timeout is timeout. We can say "it timed out" be we cannot be sure if it's continuing or not.
Comment by Andrei Baranouski [ 15/Sep/14 ]
hm, we get timeout when removing bucket occurs much long, but we inform that the removal is still in progress, right?
Comment by Aleksey Kondratenko [ 17/Sep/14 ]
You're right. I don't think we're entirely precise on bucket deletion timeout message. It's one of our mid-term goals to be better on this longer running ops and how their progress or results are exposed to user. I see not much value in tweaking messages. Instead we'll just make this entire thing work "right".




[MB-12185] update to "couchbase" from "membase" in gerrit mirroring and manifests Created: 14/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.0, 2.5.1, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Blocker
Reporter: Matt Ingenthron Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-8297 Some key projects are still hosted at... Open

 Description   
One of the key components of Couchbase is still only at github.com/membase and not at github.com/couchbase. I think it's okay to mirror to both locations (not that there's an advantage), but for sure it should be at couchbase and the manifest for Couchbase Server releases should be pointing to Couchbase.

I believe the steps here are as follows:
- Set up a github.com/couchbase/memcached project (I've done that)
- Update gerrit's commit hook to update that repository
- Change the manifests to start using that repository

Assigning this to build as a component, as gerrit is handled by the build team. Then I'm guessing it'll need to be handed over to Trond or another developer to do the manifest change once gerrit is up to date.

Since memcached is slow changing now, perhaps the third item can be done earlier.

 Comments   
Comment by Chris Hillery [ 15/Sep/14 ]
Actually manifests are owned by build team too so I will do both parts.

However, the manifest for the hopefully-final release candidate already exists, and I'm a teensy bit wary about changing it after the fact. The manifest change may need to wait for 3.0.1.
Comment by Matt Ingenthron [ 15/Sep/14 ]
I'll leave it to you to work out how to fix it, but I'd just point out that manifest files are mutable.
Comment by Chris Hillery [ 15/Sep/14 ]
The manifest we build from is mutable. The historical manifests recording what we have already built really shouldn't be.
Comment by Matt Ingenthron [ 15/Sep/14 ]
True, but they are. :) That was half me calling back to our discussion about tagging and mutability of things in the Mountain View office. I'm sure you remember that late night conversation.

If you can help here Ceej, that'd be great. I'm just trying to make sure we have the cleanest project possible out there on the web. One wart less will bring me to 999,999 or so. :)
Comment by Trond Norbye [ 15/Sep/14 ]
Just a FYI, we've been ramping up the changes to memcached, so it's no longer a slow moving component ;-)
Comment by Matt Ingenthron [ 15/Sep/14 ]
Slow moving w.r.t. 3.0.0 though, right? That means the current github.com/couchbase/memcached probably has the commit planned to be released, so it's low risk to update github.com/couchbase/manifest with the couchbase repo instead of membase.

That's all I meant. :)
Comment by Trond Norbye [ 15/Sep/14 ]
_all_ components should be slow moving with respect to 3.0.0 ;)
Comment by Chris Hillery [ 16/Sep/14 ]
Matt, it appears that couchbase/memcached is a *fork* of membase/memcached, which is probably undesirable. We can actively rename the membase/memcached project to couchbase/memcached, and github will automatically forward requests from the old name to the new so it is seamless. It also means that we don't have to worry about migrating any commits, etc.

Does anything refer to couchbase/memcached already? Could we delete that one outright and then rename membase/memcached instead?
Comment by Matt Ingenthron [ 16/Sep/14 ]
Ah, that would be my fault. I propose deleting the couchbase/memcached and then transferring ownership from membase/memcached to couchbase/memcached. I think that's what you meant by "actively rename", right? Sounds like a great plan.

I think that's all in your hands Ceej, but I'd be glad to help if needed.

I still think in the interest of reducing warts, it'd be good to fix the manifest.
Comment by Chris Hillery [ 16/Sep/14 ]
I will do that (rename the repo), just please confirm explicitly that temporarily deleting couchbase/memcached won't cause the world to end. :)
Comment by Matt Ingenthron [ 16/Sep/14 ]
It won't since it didn't exist until this last Sunday when I created this ticket. If something world-ending happens as a result, I'll call it a bug to have depended on it. ;)
Comment by Chris Hillery [ 18/Sep/14 ]
I deleted couchbase/memcached and then transferred ownership of membase/memcached to couchbase. The original membase/memcached repository had a number of collaborators, most of which I think were historical. For now, couchbase/memcached only has "Owners" and "Robots" listed as collaborators, which is generally the desired configuration.

http://review.couchbase.org/#/c/41470/ proposes changes to the active manifests. I see no problem with committing that.

As for the historical manifests, there are two:

1. Sooner or later we will add a "released/3.0.0.xml" manifest to the couchbase/manifest repository, representing the exact SHAs which were built. I think it's probably OK to retroactively change the remote on that manifest since the two repositories are aliases for each other. This will affect any 3.0.0 hotfixes which are built, etc.

2. However, all of the already-built 3.0 packages (.deb / .rpm / .zip files) have embedded in them the manifest which was used to build them. Those, unfortunately, cannot be changed at this time. Doing so would require re-packaging the deliverables which have already undergone QE validation. While it is technically possible to do so, it would be a great deal of manual work, and IMHO a non-trivial and unnecessary risk. The only safe solution would be to trigger a new build, but in that case I would argue we would need to re-validate the deliverables, which I'm sure is a non-starter for PM. I'm afraid this particular sub-wart will need to wait for 3.0.1 to be fully addressed.
Comment by Matt Ingenthron [ 18/Sep/14 ]
Excellent, thanks Ceej. I think this is a great improvement-- espeically if 3.0.0's release manifest no longer references membase.

I'll leave it to the build team to manage, but I might suggest that gerrit and various other things pointing to membase should slowly change as well, in case someone decides someday to cancel the membase organization subscription to github.




[MB-12184] Enable logging to a remote server Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: James Mauss Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It would be nice to be able to configure Couchbase Server to log events into a remote syslog-ng or the like server.




[MB-12183] View Query Thruput regression compared with previous and 2.5.1 builds Created: 12/Sep/14  Updated: 12/Sep/14  Resolved: 12/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thomas Anderson Assignee: Harsha Havanur
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 4xnode cluster; 2xSSD

Issue Links:
Duplicate
duplicates MB-11917 One node slow probably due to the Erl... Open
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/leto/597/artifact/172.23.100.29.zip
http://ci.sc.couchbase.com/job/leto/597/artifact/172.23.100.30.zip
http://ci.sc.couchbase.com/job/leto/597/artifact/172.23.100.31.zip
http://ci.sc.couchbase.com/job/leto/597/artifact/172.23.100.32.zip
Is this a Regression?: Yes

 Description   
query thruput, 1 Bucket, 20Mx2KB, nonDGM, 4x1 views, 500 mutations/sec/node.
performance on 2.5.1 - 2185; on 3.0.0-1205 (RC2) 1599; on 3.0.0-1208 (RC3) 1635; on 3.0.0-1209 (RC4) 331.
92% regression with 2.5.1, 72% regression with 3.0.0-1208 (RC3)

 Comments   
Comment by Sriram Melkote [ 12/Sep/14 ]
Sarath looked at it. Data points:

- First run was fine, second run was slow
http://showfast.sc.couchbase.com/#/runs/query_thr_20M_leto_ssd/3.0.0-1209

- CPU utilization in second run was much less in on node 31, indicative of scheduler collapse
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1209_0fb_access

So this is a duplicate of MB-11917
Comment by Thomas Anderson [ 12/Sep/14 ]
reboot of cluster, rerun with same paramters. 3.0.0-1209 now shows same performance as previous 3.0 builds. it is still a 25% regression to 2.5.1, but is now a duplicate of MB-11917 , assigned to 3.0.1, sporadic Erlang Scheduler slowdown on one node in cluster causing various performance and functional issues.
 
Comment by Thomas Anderson [ 12/Sep/14 ]
closed as duplicate of planned 3.0.1 fix for Erlang scheduler collapse, MB-11917




[MB-12182] XDCR@next release - unit test "asynchronize" mode of XmemNozzle Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 16h
Time Spent: Not Specified
Original Estimate: 16h

Epic Link: XDCR next release




[MB-12181] XDCR@next release - rethink XmemNozzle's configuration parameters Created: 12/Sep/14  Updated: 12/Sep/14  Resolved: 12/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Done Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 8h
Time Spent: Not Specified
Original Estimate: 8h

Epic Link: XDCR next release

 Description   
rethink XmemNozzle's configuration parameters. Some of them should be construction-time parameters, some of them are runtime parameters


 Comments   
Comment by Xiaomei Zhang [ 12/Sep/14 ]
https://github.com/Xiaomei-Zhang/couchbase_goxdcr_impl/commit/44921e06e141f0c9df9cfc4ab43d106643e9b766
https://github.com/Xiaomei-Zhang/couchbase_goxdcr_impl/commit/80a8a059201b9a61bbd1784abef96859670ac233




[MB-12180] Modularize the DCP code Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We need to modularize the DCP code so that we can write unit tests to ensure that we have fewer bugs and less regressions from future changes.




[MB-12179] Allow incremental pausable backfills Created: 12/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Task Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Currently ep-engine requires that backfills run from start to end and cannot be paused. This creates a problem for a few reasons. First off, if a user has a large dataset then we will potentially need to backfill a large amount of data from disk and into memory. Without the ability to pause and resume a backfill we cannot control the memory overhead created from reading items off of disk. This can affect the resident ratio if the data that needs to be read by the backfill is large.

A second issue is that this means that we can only run one (or two if there are enough cpu cores) backfill at a time and all backfill must be run serially. In the future we plan on allowing more DCP connections to be created to a server. If many connections require backfill we may have some connections that do not receive data for an extended period of time because these connections are waiting for their backfills to be scheduled.




[MB-12178] Fix race condition in checkpoint persistence command Created: 12/Sep/14  Updated: 12/Sep/14  Resolved: 12/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Gokul Krishnan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Thread 11 (Thread 0x43fcd940 (LWP 6218)):

#0 0x00000032e620d524 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000032e6208e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x00000032e6208cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaaaaf345ca in Mutex::acquire (this=0x1e79ac48) at src/mutex.cc:79
#4 0x00002aaaaaf595bb in lock (this=0x1e79a880, chkid=7, cookie=0x1d396580) at src/locks.hh:48
#5 LockHolder (this=0x1e79a880, chkid=7, cookie=0x1d396580) at src/locks.hh:26
#6 VBucket::addHighPriorityVBEntry (this=0x1e79a880, chkid=7, cookie=0x1d396580) at src/vbucket.cc:234
#7 0x00002aaaaaf1b580 in EventuallyPersistentEngine::handleCheckpointCmds (this=0x1d494a00, cookie=0x1d396580, req=<value optimized out>,
    response=0x40a390 <binary_response_handler>) at src/ep_engine.cc:3795
#8 0x00002aaaaaf20228 in processUnknownCommand (h=0x1d494a00, cookie=0x1d396580, request=0x1d3d6800, response=0x40a390 <binary_response_handler>) at src/ep_engine.cc:949
#9 0x00002aaaaaf2117c in EvpUnknownCommand (handle=<value optimized out>, cookie=0x1d396580, request=0x1d3d6800, response=0x40a390 <binary_response_handler>)
    at src/ep_engine.cc:1050
---Type <return> to continue, or q <return> to quit---
#10 0x00002aaaaacc4de4 in bucket_unknown_command (handle=<value optimized out>, cookie=0x1d396580, request=0x1d3d6800, response=0x40a390 <binary_response_handler>)
    at bucket_engine.c:2499
#11 0x00000000004122f7 in process_bin_unknown_packet (c=0x1d396580) at daemon/memcached.c:2911
#12 process_bin_packet (c=0x1d396580) at daemon/memcached.c:3238
#13 complete_nread_binary (c=0x1d396580) at daemon/memcached.c:3805
#14 complete_nread (c=0x1d396580) at daemon/memcached.c:3887
#15 conn_nread (c=0x1d396580) at daemon/memcached.c:5744
#16 0x0000000000406355 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x1d396580) at daemon/memcached.c:6012
#17 0x00002b52b162df3c in event_process_active_single_queue (base=0x1d46ec80, flags=<value optimized out>) at event.c:1308
#18 event_process_active (base=0x1d46ec80, flags=<value optimized out>) at event.c:1375
#19 event_base_loop (base=0x1d46ec80, flags=<value optimized out>) at event.c:1572
#20 0x0000000000414e34 in worker_libevent (arg=<value optimized out>) at daemon/thread.c:301
#21 0x00000032e620673d in start_thread () from /lib64/libpthread.so.0
#22 0x00000032e56d44bd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x4a3d7940 (LWP 6377)):

#0 0x00000032e620d524 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000032e6208e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x00000032e6208cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000415e16 in notify_io_complete (cookie=<value optimized out>, status=ENGINE_SUCCESS) at daemon/thread.c:485
#4 0x00002aaaaaf5a857 in notifyIOComplete (this=0x1e79a880, e=..., chkid=7) at src/ep_engine.h:423
#5 VBucket::notifyCheckpointPersisted (this=0x1e79a880, e=..., chkid=7) at src/vbucket.cc:250
#6 0x00002aaaaaf038fd in EventuallyPersistentStore::flushVBucket (this=0x1d77e000, vbid=109) at src/ep.cc:2033
---Type <return> to continue, or q <return> to quit---
#7 0x00002aaaaaf2c9e9 in doFlush (this=0x18c70dc0, tid=1046) at src/flusher.cc:222
#8 Flusher::step (this=0x18c70dc0, tid=1046) at src/flusher.cc:152
#9 0x00002aaaaaf36e74 in ExecutorThread::run (this=0x1d4c28c0) at src/scheduler.cc:159
#10 0x00002aaaaaf3746d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:36
#11 0x00000032e620673d in start_thread () from /lib64/libpthread.so.0
#12 0x00000032e56d44bd in clone () from /lib64/libc.so.6


 Comments   
Comment by Mike Wiederhold [ 12/Sep/14 ]
http://review.couchbase.org/#/c/41363/
Comment by Gokul Krishnan [ 12/Sep/14 ]
Thanks Mike!




[MB-12177] document SDK usage of CA and self-signed certs Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Matt Ingenthron Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: finish-start
has to be done after MB-12173 SSL certificate should allow importin... Open

 Description   
To be done after Couchbase Server supports this.




[MB-12176] Missing port number on the network ports documentation for 3.0 Created: 12/Sep/14  Updated: 18/Sep/14  Resolved: 18/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Ruth Harris [ 16/Sep/14 ]
Network Ports section of the Couchbase Server 3.0 beta doc has been updated with the new ssl port, 11207, and the table with the details for all of the ports has been updated.

http://docs.couchbase.com/prebuilt/couchbase-manual-3.0/Install/install-networkPorts.html
The site (and network ports section) should be refreshed soon.

thanks, Ruth




[MB-12175] Need a way to enforce SSL for admin and data access Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Don Pinto
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
today we allow both unencrypted and encrypted communication and one can use firewalls to control which one stays available to communicating with couchbase server. it would be great to have a way to enforce secure communication through a switch and disable any unencrypted access to help compliance with security standards easily.




[MB-12174] Clarification on SSL communication documentation for 3.0 Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown




[MB-12173] SSL certificate should allow importing certs besides server generated certs Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: finish-start
has to be done before MB-12177 document SDK usage of CA and self-sig... Open
Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Matt Ingenthron [ 12/Sep/14 ]
Existing SDKs should be compatible with this, but importing the CA certs will need to be documented.




[MB-12172] UI displays duplicate warnings after gracelful failover when >1 replica configured Created: 11/Sep/14  Updated: 11/Sep/14  Resolved: 11/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I setup a bucket with 3 replicas (4 nodes) and performed a graceful failover on one node. The "server nodes" screen now displays both:
Fail Over Warning: Additional active servers required to provide the desired number of replicas!
and
Fail Over Warning: Rebalance recommended, some data does not have the desired replicas configuration!


Seems a bit duplicative and also not the same behavior you see after graceful failover with only one replica configured.

 Comments   
Comment by Aleksey Kondratenko [ 11/Sep/14 ]
Those are not exact duplicates. One is saying "in order to take advantage of 2 replicas you need 3 nodes at least". And second is saying "some of your configured replicas are missing and it will be fixed by rebalance".
Comment by Perry Krug [ 11/Sep/14 ]
Okay, I'll let that slide ;)




[MB-12171] Typo missing space on point 4 couchbase data files Created: 11/Sep/14  Updated: 11/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 1.8.0, 2.0.1, 2.1.0, 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: documentation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-data-files
http://docs.couchbase.com/couchbase-manual-2.1/#couchbase-data-files
http://docs.couchbase.com/couchbase-manual-2.0/#couchbase-data-files
http://docs.couchbase.com/couchbase-manual-1.8/#couchbase-data-files

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Point 4 needs a space between and monitor.

Start the service again andmonitor the “warmup” of the data.

 Comments   
Comment by Ruth Harris [ 11/Sep/14 ]
Fixed in 2.5. N/A in 3.0




[MB-12170] Memory usage did not go down after flush Created: 10/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Wayne Siu Assignee: Gokul Krishnan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Microsoft Windows Server 2008 R2 Enterprise
[info] OS Version : 6.1.7601 Service Pack 1 Build 7601
[info] HW Platform : PowerEdge M420
[info] CB Version : 2.5.0-1059-rel-enterprise
[info] CB Uptime : 31 days, 10 hours, 3 minutes, 51 seconds
[info] Architecture : x64-based PC
[ok] Installed CPUs : 16
[ok] Installed RAM : 98259 MB
[warn] Server Quota : 81.42% of total RAM. Max recommended is 80.00%
        (Quota: 80000 MB, Total RAM: 98259 MB)
[ok] Erlang VM vsize : 546 MB
[ok] Memcached vsize : 142 MB
[ok] Swap used : 0.00%
[info] Erlang VM scheduler : swt low is not set

Issue Links:
Relates to
relates to MB-9992 Memory is not released after 'flush' Closed
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Original problem was reported by our customer.

Steps to reproduce in their setup:
- Setup 4 node cluster (probably does not matter) bucket with 3GB, Replication of 1

- The program write 10MB binary objects from 3 threads parallely, 50 items in each thread.
Run the program (sometimes it crashes, I do not know the reason), simply run it again.
At the end of the run, there is a difference of 500 MB in ep_kv_size to the sum of vb_active_itm_memory and vb_replica_itm_memory (this might depend much on the network speed, I am using just a 100Mbit connection to the server, on production we have a faster network of course)
- Do the flush, ep_kv_size has the size of the difference even though the bucket is empty.
- Repeat this. On each run, the resident items percentage will go down.
- On the fourth or fifth run, it will throw an hard memory error, after insert only a part of the 150 items.




 Comments   
Comment by Wayne Siu [ 10/Sep/14 ]
Raju,
Can you please assign?
Comment by Raju Suravarjjala [ 10/Sep/14 ]
Tony, can you see if you can reproduce this bug? Please note it is 2.5.1 Windows 64bit
Comment by Anil Kumar [ 10/Sep/14 ]
Just a FYI previously we had opened similar issue which was on CentOS but resolved as cannot reproduce.
Comment by Ian McCloy [ 11/Sep/14 ]
It's 2.5.0 not 2.5.1 on Windows 2008 64bit
Comment by Thuan Nguyen [ 11/Sep/14 ]
Follow instruction from here,
Steps to reproduce in their setup:
- Setup 4 node cluster (probably does not matter) bucket with 3GB, Replication of 1

- The program write 10MB binary objects from 3 threads parallely, 50 items in each thread.
Run the program (sometimes it crashes, I do not know the reason), simply run it again.
At the end of the run, there is a difference of 500 MB in ep_kv_size to the sum of vb_active_itm_memory and vb_replica_itm_memory (this might depend much on the network speed, I am using just a 100Mbit connection to the server, on production we have a faster network of course)
- Do the flush, ep_kv_size has the size of the difference even though the bucket is empty.
- Repeat this. On each run, the resident items percentage will go down.
- On the fourth or fifth run, it will throw an hard memory error, after insert only a part of the 150 items.


I could not reproduce this bug after 6 flushes.
After each flush, mem use on both active and replica went down to zero.
Comment by Thuan Nguyen [ 11/Sep/14 ]
Using our loader, I could not reproduce this bug. I will use customer loader to test again.
Comment by Raju Suravarjjala [ 12/Sep/14 ]
Gokul: As we discussed can you folks try to reproduce this bug?




[MB-12169] Unexpected disk creates during graceful failover Created: 10/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
4-node cluster with beer-sample bucket plus 300k items. Workload is 50/50 gets/sets, but sets are over same 300k items constantly.

When I do a graceful failover of one node, I see a fair amount of disk creates even though no new data is being inserted.

If there is a reasonable explanation great, but I am concerned that there may be something incorrect going on either with the identification of new data or the movement of vbuckets.

Logs are here:
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-193-230-57.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-215-23-198.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-215-29-139.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-215-40-174.us-west-1.compute.amazonaws.com.zip




[MB-12168] Documentation: Clarification around server RAM quota best practice Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Brian Shumate Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The sizing[1] and RAM quota[2] documentation should be more clear about the specific best practices around general server RAM quota of no greater than 80% physical RAM per node on nodes with 16GB or more, or no greater than 60% on nodes with less than 16GB.

Emphasizing that the 20% or 40% remainder of RAM is required for the operating system, file system caches, and so on would be helpful as well.

Additionally, the RAM quota sub-section of the Memory quota section[3] reads as if it is abruptly cut off or otherwise incomplete:

--------
RAM quota

You will not be able to allocate all your machine RAM to the per_node_ram_quota as there may be other programs running on your machine.
--------

1. http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#couchbase-bestpractice-sizing
2. http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#ram-quotas
3. http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota






[MB-12167] Remove Minor / Major / Page faults graphs from the UI Created: 10/Sep/14  Updated: 16/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.5.1, 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Ian McCloy Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 1
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Customers often ask what is wrong with their system when they see anything greater than 0 page faults in the UI graphs. What are customers supposed to do with the information ? This isn't a useful metric to customers and we shouldn't show it in the UI. If needed for development debug we can query it from the REST API.

 Comments   
Comment by Matt Ingenthron [ 10/Sep/14 ]
Just to opine: +1. There are a number of things in the UI that aren't actionable. I know they help us when we look back over time, but as presented it's not useful.
Comment by Aleksey Kondratenko [ 10/Sep/14 ]
So it's essentially expression of our belief that majority of our customers are ignorant enough to be confused by "fault" in name of this stat ?

Just want to make sure that there's no misunderstanding on this.

On Matt's point I'd like to say that all our stats are not actionable. They're just information that might end up helpful occasionally. And yes especially major page faults are _tremendously_ helpful sign of issues.
Comment by Matt Ingenthron [ 10/Sep/14 ]
I don't think the word "fault" is at issue, but maybe others do. I know there are others that aren't actionable and to be honest, I take issue with them too. This one is just one of the more egregious examples. :) The problem is, in my opinion, it's not clear what one would do with minor page fault data. One can't really know what's good or bad without looking at trends or doing further analysis.

While I'm tossing out opinions, similarly visualizing everything as a queue length isn't always good. To the app, latency and throughput matter-- how many queues and where they are affects this, but doesn't define it. A big queue length with fast storage can still have very good latency/throughput and equally a short queue length with slow or variable (i.e., EC2 EBS) storage can have poor latency/throughput. An app that will slow down with higher latencies won't make the queue length any bigger.

Anyway, pardon the wide opinion here-- I know you know all of this and I look forward to improvements when we get to them.

You raise a good point on major faults though.

If it only helps occasionally, then it's consistent with the request (to remove it from the UI, but still have it in there). I'm merely a user here, so please discount my opinion accordingly!
Comment by Aleksey Kondratenko [ 10/Sep/14 ]
>> If it only helps occasionally, then it's consistent with the request (to remove it from the UI, but still have it in there).

Well but then it's true for almost all of our stats isn't? Doesn't it mean that we need to hide them all then ?
Comment by Matt Ingenthron [ 10/Sep/14 ]
>> Well but then it's true for almost all of our stats isn't? Doesn't it mean that we need to hide them all then ?

I don't think so. That's an extreme argument. I'd put ops/s which is directly proportional to application load and minor faults which is affected by other things on the system in very different categories. Do we account for minor faults at a per-bucket level? ;)
Comment by Aleksey Kondratenko [ 10/Sep/14 ]
>> I'd put ops/s which is directly proportional to application load and minor faults which is affected by other things on the system in very different categories.

True.

>> Do we account for minor faults at a per-bucket level? ;)

No. And good point. Indeed lacking better UI we show all system stats (including some high-usefulness category things like count of memcached connections) as part of showing any bucket's stats. Despite gathering and storing system stats separately.

In any case, I'm not totally against hiding page fault stats. It's indeed minor topic.

But I'd like to see good reason for that. Because for _anything_ that we do there will all be some at least one user that's confused, which isn't IMO valid reason for "lets hide it".
 
My team has spent some effort getting this stats and we did for specifically because we knew that major page faults is important to be aware of. And we also know that on linux even minor page faults might be "major" in terms of latency impact. We've seen it with our own eyes.

I.e. when you're running out of list of free page, one can think that Linux is just supposed to grab one of clean pages from page cache, but we've seen this to take seconds for reason's I'm not quite sure. It does look like linux might routinely delay minor page fault for IO (perhaps due to some locking impacts). And things like huge page "minor" page fault may have even more obviously hard effect (i.e. because you need physically contiguous run of memory, getting this might require "memory compaction", locking etc). And our system doing constant non-direct-io writes routinely hits this hard condition. I.e. because near-every write from ep-engine or view engine has to allocate brand new page(s) for that data due to append-onlyness of out design (forest db's direct io path plus custom buffer cache management should help dramatically here).

Comment by Patrick Varley [ 10/Sep/14 ]
I think there are three main consumers of stats:

* Customers (cmd_get)
* Support (ep_bg_load_avg)
* Sevelopers of the component (erlang memory atom_used)

As a result we display and collect these stats in different way i.e UI, cbstats, ns_doctor, etc

A number of our users find the amount of stats in the UI overwhelming, a lot of the time they do not know which one are important.

Some of our user do not even understand what a virtual memory system is let alone what a page fault is.

I do not think we should display the page faults in the UI, but we should still collect them. I believe we can make better use of the space in the ui. For example: network usage or byte_written or byte_read, tcp retransmissions, Disk performance.
Comment by David Haikney [ 11/Sep/14 ]
+ 1 for removing page faults. The justification:
* We put them front and centre of the UI. Customers see Minor faults, Major Faults and Total faults before # gets, # sets.
* They have not proven useful for support in diagnosing an issue. In fact they cause more "false positive" questions ("my minor faults look high, is that a problem?")
* Overall this constitutes "noise" that our customers can do without. The stats can quite readily be captured elsewhere if we want to record them.

It would be easy to expand this into a wider discussion of how we might like to reorder / expand all of the current graphs in the UI - and that's a useful discussion. But I propose we keep this ticket to the question of removing the page fault stats.
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
http://review.couchbase.org/41333
Comment by Ian McCloy [ 16/Sep/14 ]
Which version of Couchbase Server is this fixed in ?




[MB-12166] Linux: Warnings on install are poorly formatted and unlikely to be read by a user. Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Dave Rigby Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6

Attachments: PNG File Screen Shot 2014-09-10 at 15.21.55.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
When installing the 3.0 RPM, we check for various OS settings and print warnings if they don't meet our recommendations.

This is a great idea in principle, but the actual output isn't very well presented, meaning users are (IMHO) likely to not spot the issues which are being raised.

I've attached a screenshot to show this exactly as displayed in the console, but the verbatim text is:

---cut ---
$ sudo rpm -Uvh couchbase-server-enterprise_centos6_x86_64_3.0.0-1209-rel.rpm
Preparing... ########################################### [100%]
Warning: Transparent hugepages may be used. To disable the usage
of transparent hugepages, set the kernel settings at runtime with
echo never > /sys/kernel/mm/transparent_hugepage/enabled
Warning: Transparent hugepages may be used. To disable the usage
of transparent hugepages, set the kernel settings at runtime with
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
Warning: Swappiness is not 0.
You can set the swappiness at runtime with
sysctl vm.swappiness=0
Minimum RAM required : 4 GB
System RAM configured : 0.97 GB

Minimum number of processors required : 4 cores
Number of processors on the system : 1 cores

   1:couchbase-server ########################################### [100%]
Starting couchbase-server[ OK ]

You have successfully installed Couchbase Server.
Please browse to http://localhost.localdomain:8091/ to configure your server.
Please refer to http://couchbase.com for additional resources.

Please note that you have to update your firewall configuration to
allow connections to the following ports: 11211, 11210, 11209, 4369,
8091, 8092, 18091, 18092, 11214, 11215 and from 21100 to 21299.

By using this software you agree to the End User License Agreement.
See /opt/couchbase/LICENSE.txt.
$
---cut ---

A couple of observations:

1) Everything is run together, including informational things (Preparing, Installation successful), things the user should act on (Warning: Swappiness, THP, Firewall information).

2) It's not very clear how serious some of these messages are - Is the fact I'm running with 1/4 of the minimum RAM just a minor thing, or a showstopper? Similary with THP - Support have seen on many occasions this can can cause false-positive fail overs, but we just casually say here:

"Warning: Transparent hugepages may be used. To disable the usage of transparent hugepages, set the kernel settings at runtime with echo never > /sys/kernel/mm/transparent_hugepage/enabled"


Suggestions:

1) Make the Warnings more pronounced - e.g prefix with "[WARNING]" and add some blank lines between things

2) Make clearer why these things are listed - linking back to more detailed information in our install guide if necessary. For example: "THP may cause slowdown of the cluster manager and false positive fail overs. Couchbase recommend disabling it. See http://docs.couchbase.com/THP for more details."

3) For things like THP which we can actually fix, ask the user if they want them fixed - after all we are already root if we are installing - e.g. "THP bad.... Would you like to change system THP setting to be changed to the recommended value (madvise) (y/n)?"

4) For things we can't fix (low memory, low CPUs) make the user confirm their decision to continue - e.g. "CPUs below minimum. Couchbase recommends at least XXX for production systems. Please type "test system" to continue installation.



 Comments   
Comment by David Haikney [ 10/Sep/14 ]
+1 from me - we can clearly improve the presentation here. I expect making the install interactive ("should I fix THP?") could be difficult. Are there existing precedents we can refer to here to help consistency?
Comment by Dave Rigby [ 10/Sep/14 ]
@DaveH: Admittedly I don't think they use RPM, but VMware guest tools springs to mind - they present the user a number of questions when installing - "do you want to automatically update kernel modules?", "do you want to use printer sharing", etc.

Admittedly they don't have a secondary config stage unlike us with our GUI, *but* if we are going to fix things like THP, swappiness, then we need to be root to do so (and so install-time is the only option).




[MB-12165] UI: Log - Collect Information. Upload options text boxes should be 'grayed out' when "Upload to couchbase" is not selected. Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Jim Walker Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: log, ui
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6 CB server (1 node cluster, VirtualBox VM)
Client browsers all running on OSX 10.9.4

Triage: Untriaged
Operating System: MacOSX 64-bit
Is this a Regression?: Unknown

 Description   
Couchbase Server Version: 3.0.0 Enterprise Edition (build-1208)

When going to the log upload area of the UI I found that all text boxes in the Upload Options section are read only with out any visual indicator.

It took a bit of clicking and checking browser liveness that it was because the check box "Upload to couchbase" was not checked.

The input boxes should be grayed out or some other visual indicator showing they're not usable.

* Tested with Chrome 37.0.2062.120
* Tested with Safari 7.0.6 (9537.78.2)




[MB-12164] UI: Cancelling a pending add should not show "reducing capacity" dialog Created: 10/Sep/14  Updated: 15/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Improvement Priority: Trivial
Reporter: David Haikney Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
3.0.0 Beta build 2

Steps to reproduce:
In the UI click "Server add".
Add the credentials for a server to be added
In the Pending Rebalance pane click "Cancel"

Actual Behaviour:
See a dialog stating"Warning – Removing this server from the cluster will reduce cache capacity across all data buckets. Are you sure you want to remove this server?"

Expected behaviour:
Dialog is not applicable in this context since not adding an unaided node will do nothing to the cluster capacity. Would expect either no dialog or a dialog acknowledging that "This node will no longer be added to the cluster on next rebalance"

 Comments   
Comment by Aleksey Kondratenko [ 10/Sep/14 ]
But it _is_ applicable because you're returning node to "pending remove" state.
Comment by David Haikney [ 10/Sep/14 ]
A node that has never held any data or actively participated in the cluster cannot possibly reduce the cluster's capacity.
Comment by Aleksey Kondratenko [ 10/Sep/14 ]
It looks like I misunderstood this request as referring to cancelling add-back after failover. Which it isn't.

Makes sense now.
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
http://review.couchbase.org/41428




[MB-12163] Memcached Closing connection due to read error: Unknown error Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: memcached
Affects Version/s: 2.5.0, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Ian McCloy Assignee: Dave Rigby
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Microsoft Windows Server 2008 R2 Enterprise
[info] OS Version : 6.1.7601 Service Pack 1 Build 7601
[info] CB Version : 2.5.0-1059-rel-enterprise

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The error message "Closing connection due to read error: Unknown error" doesn't explain what the problem is. Unfortunately on Windows we aren't parsing the error code properly. We need to call FormatMessage() not strerror().

Code At
http://src.couchbase.org/source/xref/2.5.0/memcached/daemon/memcached.c#5360




[MB-12162] Performance Test for Rebalance after failover fails Created: 09/Sep/14  Updated: 10/Sep/14  Resolved: 10/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: performance
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Thomas Anderson Assignee: Thomas Anderson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 4 node cluster, each node 16 core, 64G memory

Triage: Triaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: ci.sc.couchbase.com:/tmp/Reb_Failover_100M_DGM_Views
{172.23.100.29.zip, 172.23.100.30.zip, 172.23.100.31.zip, 172.23.100.32.zip)
additionally in same folder
{node1_memory_usage, node2...}
Is this a Regression?: Unknown

 Description   
Rebalance after failover fails to complete. after ~ 4hrs of processing, hangs and makes no progress.
eventually beam.smp will declare itself out of memory (system log), and that beam.smp invoked comm.killer. memory allocated at time of failure is 62G of 64G. memcached has allocated ~40G.
other characteristics, 100M documents, DGM and 4 views.

this test passed on 3.0.0-1205 and earlier; fails on 3.0.0-1208.
similar test with only 20M documents , no views, no-DGM passed just prior

 Comments   
Comment by Thomas Anderson [ 10/Sep/14 ]
problem research shows that the memory utilization issue is a known issue with memcached, that under high load, memcached can consume memory to the point where other processes fail with 'no memory' conditions.
the working solution is to not allow memcached unlimited memory, using startup parameter -m. if memcached is limited to 50% of available memory, the O/S and Couchbase processes do not exhibit memory pressure issues.
Comment by Thomas Anderson [ 10/Sep/14 ]
see qualifying description tying problem to memcached memory issue.




[MB-12161] per-server UI does not refresh properly when adding a node Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Admittedly quite minor, but a little annoying.

When you're looking at a single stat across all nodes of a cluster (i.e active vbuckets):

-Add a new node to the cluster from another tab open to the UI
-Note that the currently open stats screen stops displaying graphs for the existing nodes and does not update that a new node has joined until you refresh the screen




[MB-12160] setWithMeta() is able to update a locked remote key Created: 09/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: all, 3.0.0-1208

Attachments: Zip Archive 10.3.4.186-992014-168-diag.zip     Zip Archive 10.3.4.188-992014-1611-diag.zip    
Triage: Untriaged
Is this a Regression?: No

 Description   
A simple test to check if setWithMeta() refrains from updating a locked key-

Steps
--------
1. uni-xdcr on default bucket from .186 --> .188
2. create a key 'pymc1098' with "old_value" on .186
3. sleep for 10 secs, it gets replicated to .188.
4. Now getAndLock() on 'pymc1098' on .188 for 20s
5. Meanwhile, update same key at .186
6. After 10s(lock should not have expired now, also see timestamps in test log below), do a getMeta() at source and dest, they match
Destination key contains "new_doc".


def test_replication_after_getAndLock_dest(self):
        src = MemcachedClient(host=self.src_master.ip, port=11210)
        dest = MemcachedClient(host=self.dest_master.ip, port=11210)
        self.log.info("Initial set = key:pymc1098, value=\"old_doc\" ")
        src.set('pymc1098', 0, 0, "old_doc")
       # wait for doc to replicate
        self.sleep(10)
       # apply lock on destination
        self.log.info("getAndLock at destination for 20s ...")
        dest.getl('pymc1098', 20, 0)
       # update source doc
        self.log.info("Updating 'pymc1098' @ source with value \"new_doc\"...")
        src.set('pymc1098', 0, 0, "new_doc")
        self.sleep(10)
        self.log.info("getMeta @ src: {}".format(src.getMeta('pymc1098')))
        self.log.info("getMeta @ dest: {}".format(dest.getMeta('pymc1098')))
        src_doc = src.get('pymc1098')
        dest_doc = dest.get('pymc1098')


2014-09-09 15:27:13 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] Initial set = key:pymc1098, value="old_doc"
2014-09-09 15:27:13 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 10 secs for doc to be replicated ...
2014-09-09 15:27:23 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] getAndLock at destination for 20s ...
2014-09-09 15:27:23 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] Updating 'pymc1098' @ source with value "new_doc"...
2014-09-09 15:27:23 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 10 secs. ...
2014-09-09 15:27:33 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] getMeta @ src: (0, 0, 0, 2, 16849348715855509)
2014-09-09 15:27:33 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] getMeta @ dest: (0, 0, 0, 2, 16849348715855509)
2014-09-09 15:27:33 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] src_doc = (0, 16849348715855509, 'new_doc')
dest_doc =(0, 16849348715855509, 'new_doc')

Will attach cbcollect.

 Comments   
Comment by Aruna Piravi [ 09/Sep/14 ]
Causes inconsistency when the server by itself disallows set but allows set through setWithMeta when locked.




[MB-12159] Memcached throws an irrelevant message while trying to update a locked key Created: 09/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Aruna Piravi Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1208

Triage: Untriaged
Is this a Regression?: No

 Description   
A simple test to see if updates are possible on locked keys

def test_lock(self):
        src = MemcachedClient(host=self.src_master.ip, port=11210)
        # first set
        src.set('pymc1098', 0, 0, "old_doc")
        # apply lock
        src.getl('pymc1098', 30, 0)
        # update key
        src.set('pymc1098', 0, 0, "new_doc")

throws the following Memcached error -

  File "pytests/xdcr/uniXDCR.py", line 784, in test_lock
    src.set('pymc1098', 0, 0, "new_doc")
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 163, in set
    return self._mutate(memcacheConstants.CMD_SET, key, exp, flags, 0, val)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 132, in _mutate
    cas)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 128, in _doCmd
    return self._handleSingleResponse(opaque)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 121, in _handleSingleResponse
    cmd, opaque, cas, keylen, extralen, data = self._handleKeyedResponse(myopaque)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 117, in _handleKeyedResponse
    raise MemcachedError(errcode, rv)
MemcachedError: Memcached error #2 'Exists': Data exists for key for vbucket :0 to mc 10.3.4.186:11210






[MB-12158] erlang gets stuck in gen_tcp:send despite socket being closed (was: Replication queue grows unbounded after graceful failover) Created: 09/Sep/14  Updated: 18/Sep/14  Resolved: 18/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File dcp_proxy.beam    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
After speaking with Mike briefly, sounds like this may be a known issue. My apologies if there is a duplicate issue already filed.

Logs are here:
 https://s3.amazonaws.com/customers.couchbase.com/perry/replicationqueuegrowth/collectinfo-2014-09-09T205123-ns_1%40ec2-54-176-128-88.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/customers.couchbase.com/perry/replicationqueuegrowth/collectinfo-2014-09-09T205123-ns_1%40ec2-54-193-231-33.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/customers.couchbase.com/perry/replicationqueuegrowth/collectinfo-2014-09-09T205123-ns_1%40ec2-54-219-111-249.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/customers.couchbase.com/perry/replicationqueuegrowth/collectinfo-2014-09-09T205123-ns_1%40ec2-54-219-84-241.us-west-1.compute.amazonaws.com.zip

 Comments   
Comment by Mike Wiederhold [ 10/Sep/14 ]
Perry,

The stats seem to be missing for dcp streams so I cannot look further into this. If you can still reproduce this on 3.0 build 1209 then assign it back to me and include the logs.
Comment by Perry Krug [ 11/Sep/14 ]
Mike, does the cbcollect_info include these stats or do you need me to gather something specifically when the problem occurs?

If not, let's also get them included for future builds...
Comment by Perry Krug [ 11/Sep/14 ]
Hey Mike, I'm having a hard time reproducing this on build 1209 where it seemed rather easy on previous builds. Do you think any of the changes from the "bad_replicas" bug would have affected this? Is it worth reproducing on a previous build where it was easier in order to get the right logs/stats or do you think it may be fixed already?
Comment by Mike Wiederhold [ 11/Sep/14 ]
This very well could be related to MB-12137. I'll take a look at the cluster and if I don't find anything worth investigating further then I think we should close this as cannot reproduce since it doesn't seem to happen anymore on build 1209. If there is still a problem I'm sure it will be reproduced again later in one of our performance tests.
Comment by Mike Wiederhold [ 11/Sep/14 ]
It looks like one of the dcp connections to the failed over node was still active. My guess is that the node when down and came back up quickly. As a result it's possible that ns_server re-established the connection with the downed node. Can you attach the logs and assign this to Alk so he can take a look?
Comment by Perry Krug [ 11/Sep/14 ]
Thanks Mike.

Alk, logs are attached from the first time this was reproduced. Let me know if you need me to do so again.

Comment by Aleksey Kondratenko [ 11/Sep/14 ]
Mike, btw for the future, if you could post exact details (i.e. node and name of connection) of stuff you want me to double-check/explain it could have saved me time.

Also, let me note that it's replica and node master who establishes replication. I.e. we're "pulling" rather than "pushing" replication.

I'll look at all this and see if I can find something.
Comment by Aleksey Kondratenko [ 11/Sep/14 ]
Sorry, replica instead of master, who initiates replication.
Comment by Aleksey Kondratenko [ 11/Sep/14 ]
Indeed I'm seeing dcp connection from memcached on .33 to beam of .88. And it appears that something in dcp replicator is stuck. I'll need a bit more time to figure this out.
Comment by Aleksey Kondratenko [ 11/Sep/14 ]
Looks like socket send gets blocked somehow despite socket actually being closed already.

Might be serious enough to be a show stopper for 3.0.

Do you by any chance still have nodes running? Or if not, can you easily reproduce this? Having direct access to bad node might be very handy to diagnose this further.
Comment by Aleksey Kondratenko [ 11/Sep/14 ]
Moved back to 3.0. Because if it's indeed erlang bug it might be very hard to fix and because it may happen not just during failover.
Comment by Cihan Biyikoglu [ 12/Sep/14 ]
triage - need and update pls.
Comment by Perry Krug [ 12/Sep/14 ]
I'm reproducing now and will post both the logs and the live systems momentarily
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
Able to reproduce this condition with erlang outside of our product (which is great news):

* connect gen_tcp socket to nc or irb process listening

* spawn erlang process that will send stuff infinitely on that socket and will eventually block

* from erlang console do gen_tcp:close (i.e. while other erlang process is blocked writing)

* observe how erlang process that's blocked is still blocked

* observe with lsof that socket isn't really closed

* close the socket on the other end (by killing nc)

* observe with lsof that socket is closed

* observe how erlang process is still blocked (!) despite underlying socket fully dead

The fact that it's not a race is really great because dealing with deterministic bug (even if it's "feature" from erlang's point of view) is much easier
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
Fix is at: http://review.couchbase.org/41396

I need approval to get this in 3.0.0.
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
Attaching fixed dcp_proxy.beam if somebody wants to be able to test the fix without waiting for build
Comment by Perry Krug [ 12/Sep/14 ]
Awesome as usual Alk, thanks very much.

I'll give this a try on my side for verification.
Comment by Parag Agarwal [ 12/Sep/14 ]
Alk, will this issue occur in TAP as well? during upgrades.
Comment by Mike Wiederhold [ 12/Sep/14 ]
Alk,

I apologize or not including a better description of what happened. In the future I'll make sure to leave better details before assigning bugs to others so that we don't have multiple people duplicating the same work.
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
>> Alk, will this issue occur in TAP as well? during upgrades.

No.
Comment by Perry Krug [ 12/Sep/14 ]
As of yet unable to reproduce this on build 1209+dcp_proxy.beam.

Thanks for the quick turnaround Alk.
Comment by Cihan Biyikoglu [ 12/Sep/14 ]
triage discussion:
under load this may happen frequently -
there is good chance that this recovers itself in few mins - it should but we should validate.
if we are in this state, we can restart erlang to get out of the situation - no app unavailability required
fix could be risky to take at this point

decision: not taking this for 3.0
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
Mike, need you ACK on this:

Because of dcp nops between replicators, dcp producer should after few minutes, close his side of the socket and release all resources.

Am I right? I said this in meeting just few minutes ago and it affected decision. If I'm wrong (say if you decided to disable nops in the end, or if you know it's broken etc), then we need to know it.
Comment by Perry Krug [ 12/Sep/14 ]
FWIW, I have seen that this does not recover after a few minutes. However, I agree that it is workaround-able both by restarting beam or bringing the node back into the cluster. Unless we think this will happen much more often, I agree it could be deferred out of 3.0.
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
Well if it does not recover then it can be argued that we have another bug on ep-engine side that may lead to similar badness (queue size and resources eated) _without_ clean workaround.

Mike, we'll need your input on DCP NOPs.
Comment by Mike Wiederhold [ 12/Sep/14 ]
I was curious about this myself. As far as I know the noop code is working properly and we have some tests to make sure it is. I can work with Perry to try to figure out what is going on on the ep-engine side and see if the noops are actually being sent. I know this sounds unlikely, but I was curious whether or not the noops were making it through to the failed over node for some reason.
Comment by Aleksey Kondratenko [ 12/Sep/14 ]
>> I know this sounds unlikely, but I was curious whether or not the noops were making it through to the failed over node for some reason.

I can rule this out. We do have connection between destination's beam and source's memcached. And we _dont_ have connection to beam's connection to destination memcached anymore. Erlang is stuck writing to dead socket. So there's no way you could get nop acks back.
Comment by Perry Krug [ 15/Sep/14 ]
I've confirmed that this state persists for much longer than a few minutes...I've not ever seen it recover itself, and have left it to run for 15-20 minutes at least.

Do you need a live system to diagnose?
Comment by Cihan Biyikoglu [ 15/Sep/14 ]
thanks for the update - Mike, sounds like we should open an issue for DCP to reliably detect these conditions. We should add this in for 3.0.1.
Perry, Could you confirm restarting the erlang process resolves the issue Perry?
thanks
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
http://review.couchbase.org/41410

Mike will open different ticket for NOPs in DCP.




[MB-12157] Intrareplication falls behind OPs causing data loss situation Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.1, 3.0, 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thomas Anderson Assignee: Thomas Anderson
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 4 node cluster; 4 core nodes; beer-sample application run at 60Kops (50/50 ratio), nodes provisioned on RightScale EC2 x1.large images

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
the intra-replication queue grows to unacceptable limits, exposing dataloss of multiple seconds of queued replication.
the problem is more pronounced on the RightScale provision cluster, but can be seen on local physical clusters with long enough test run (>20min). recovery requires stopping input request queue.
initial measurements of the erlang process suggests that minor retries on scheduled network i/o eventually build up into a limit for push of replication data, scheduler_wait appears to be the consuming element, epoll_wait counter increases per measurement, as does the mean time wait, suggesting thrashing in the erlang event scheduler. there are various papers/presentations that suggest Erlang is sensitive to the balance of tasks (a mix of long event and short event can cause performance thruput issues).

cbcollectinfo logs will be attached shortly

 Comments   
Comment by Aleksey Kondratenko [ 09/Sep/14 ]
Still don't have any evidence. Cannot own this ticket until evidence is provided.




[MB-12156] time of check/time of use race in data path change code of ns_server may lead to deletion of all buckets after adding node to cluster Created: 09/Sep/14  Updated: 15/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.0, 1.8.1, 2.0, 2.1.0, 2.2.0, 2.1.1, 2.5.0, 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
Triage: Untriaged
Is this a Regression?: No

 Description   
SUBJ.

In code that changes data path we first check if node is provisioned (without preventing provision-ness to be changed after that) and the proceed with change of data path. As part of change of data path we delete buckets.

So if node gets added to cluster after check but before data path is actually changed, we'll delete all buckets of cluster.

As improbable as it may seem, it actually occurred in practice. See CBSE-1387.


 Comments   
Comment by Aleksey Kondratenko [ 10/Sep/14 ]
Whether it's a must have for 3.0.0 is not for me to decide but here's my thinking.

* the bug was there at least since 2.0.0 and it really requires something outstanding in customer's environment to actually occur

* 3.0.1 is just couple months away

* 3.0.0 is done

But if we're still open to adding this fix to 3.0.0, my team will surely be glad to do it.
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
http://review.couchbase.org/41332
http://review.couchbase.org/41333




[MB-12155] View query and index compaction failing on 1 node with error view_undefined Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Ian McCloy Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Customer upgraded their 6 node cluster from 2.2 to 2.5.1 running on Microsoft Windows Server 2008 R2 Enterprise and one of their views stopped working.

It appears the indexing and index compaction stopped working on 1 node out of the 6. This appeared to only affect 1 design document.

snips from problem node -->>

[couchdb:error,2014-09-08T17:20:31.840,ns_1@HOST:<0.23288.321>:couch_log:error:42]Uncaught error in HTTP request: {throw,view_undefined}

Stacktrace: [{couch_set_view,get_group,3},
             {couch_set_view,get_map_view,4},
             {couch_view_merger,get_set_view,5},
             {couch_view_merger,simple_set_view_query,3},
             {couch_httpd,handle_request,6},
             {mochiweb_http,headers,5},
             {proc_lib,init_p_do_apply,3}]
[couchdb:info,2014-09-08T17:20:31.840,ns_1@HOST:<0.23288.321>:couch_log:info:39]10.7.43.229 - - POST /_view_merge/?stale=false 500

=====

[ns_server:warn,2014-09-08T17:25:10.506,ns_1@HOST:<0.14357.327>:compaction_daemon:do_chain_compactors:725]Compactor for view `Bucket/_design/DDOC/main` (pid [{type,view},
                                                {important,true},
                                                {name,
                                                  <<"Bucket/_design/DDoc/main">>},
                                                {fa,
                                                  {#Fun<compaction_daemon.16.22390493>,
                                                  [<<"Bucket">>,
                                                    <<"_design/DDoc">>,main,
                                                    {config,
                                                    {30,18446744073709551616},
                                                    {30,18446744073709551616},
                                                    undefined,false,false,
                                                    {daemon_config,30,
                                                      131072}},
                                                    false,
                                                    {[{type,bucket}]}]}}]) terminated unexpectedly: {error,
                                                                                                    view_undefined}
[ns_server:warn,2014-09-08T17:25:10.506,ns_1@HOST:<0.14267.327>:compaction_daemon:do_chain_compactors:730]Compactor for view `Bucket/_design/DDoc` (pid [{type,view},
                                            {name,<<"Bucket/_design/DDoc">>},
                                            {important,false},
                                            {fa,
                                            {#Fun<compaction_daemon.20.107749383>,
                                              [<<"Bucket">>,<<"_design/DDoc">>,
                                              {config,
                                                {30,18446744073709551616},
                                                {30,18446744073709551616},
                                                undefined,false,false,
                                                {daemon_config,30,131072}},
                                              false,
                                              {[{type,bucket}]}]}}]) terminated unexpectedly (ignoring this): {error,
                                                                                                                view_undefined}
[ns_server:debug,2014-09-08T17:25:10.506,ns_1@HOST:compaction_daemon<0.480.0>:compaction_daemon:handle_info:505]Finished compaction iteration.




[MB-12154] Create 3.x branch on gerrit Created: 09/Sep/14  Updated: 11/Sep/14  Due: 09/Sep/14  Resolved: 11/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Harsha Havanur Assignee: Chris Hillery
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Harsha Havanur [ 09/Sep/14 ]
Please create 3.x branch for couchdb on gerrit.
Comment by Harsha Havanur [ 11/Sep/14 ]
Ceej,

 Can you please help us in creating this branch so that we can continue for 3.0.1 on master? Currently we are holding off all checkins to master.

Thanks
Comment by Chris Hillery [ 11/Sep/14 ]
Sorry for the delay, overlooked this request somehow. I have created the branch "3.x" on on gerrit (based on the current master SHA) and pushed it to github.

As a note, since this branch name isn't unique to a specific release such as 3.0 or 3.0.1, it should never be directly referenced in a rel-x.x.x.xml manifest.
Comment by Wayne Siu [ 11/Sep/14 ]

________________________________________
From: Harsha Havanur
Sent: Thursday, September 11, 2014 3:48:10 AM (UTC-08:00) Pacific Time (US & Canada)
To: community_admin
Subject: Re: Couchbase Issues: (MB-12154) Create 3.x branch on gerrit
Thanks a lot Ceej,

 If I have to mention revision as 3.0.x in manifest and not SHA at which we branched out, is there an option?

Thanks,
-Harsha
Comment by Chris Hillery [ 11/Sep/14 ]
I'm afraid I don't understand Harsha's question, so I'll just highlight the rules I've been using for manifests.

First, I should amend my earlier comment: a branch that is not named for a specific release should never be directly referenced in a rel-x.x.x.xml manifest *when the manifest is being locked down prior to release*. During the active development cycle, the manifest can contain whatever the developers tell me they want it to contain.

When a manifest is locked down for a release, then the "revision" for each project must be either:

1. A branch named for precisely the release (eg, 3.0, 3.0.0, release-3.0.0, etc.)
2. A specific commit SHA

Rule #1 basically says that we trust developers not to put things into a release-specific branch which are not intended for that release. However, if the branch is named "3.x", then clearly it is intended to hold things other than a single release's changes, so it isn't safe to put that branch into a locked-down manifest. (Also, FYI, at the very late stages of the release, we will probably drop rule #1 and only allow specific commit SHAs, to prevent any accidents.)

I have my own opinions about the "best" branching strategy for projects which I'm happy to share if a project has interest. But ultimately it is the developers' decision what branches should exist and what they are used for. For the most part it is a joint decision what goes into the manifest, with the above rules only coming into play once we lock down and start creating release candidates.




[MB-12153] Garbage value seen ns_server.debug.log Created: 09/Sep/14  Updated: 09/Sep/14  Resolved: 09/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Sangharsh Agarwal Assignee: Aleksey Kondratenko
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
https://s3.amazonaws.com/bugdb/jira/MB-12143/197e82ba/10.3.3.210-952014-2022-diag.zip


[ns_server:debug,2014-09-05T19:57:02.188,ns_1@10.3.3.210:dcp_consumer_conn-sasl_bucket_1-ns_1@10.3.121.65<0.7402.35>:dcp_proxy:handle_info:88]Socket #Port<0.27798> was closed. Closing myself. State = {state,
                                                           #Port<0.27798>,
                                                           {consumer,
                                                            "replication:ns_1@10.3.121.65->ns_1@10.3.3.210:sasl_bucket_1",
                                                            'ns_1@10.3.3.210',
                                                            "sasl_bucket_1"},
                                                           <<>>,
                                                           dcp_consumer_conn,
                                                           {state,idle,
                                                            "«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"},
                                                           #Port<0.27799>,
                                                           <0.7339.35>}



 Comments   
Comment by Aleksey Kondratenko [ 09/Sep/14 ]
Not a bug. And not the first time people try to raise it. That's how erlang formats list of number sometimes.




[MB-12152] phonehome ph.couchbase.net is not encrypted on https connections with 3.0 Created: 08/Sep/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Ian McCloy
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
https needs to be supported on the other end first




[MB-12151] {UI}:: Delta Recovery option shown for failover node when cluster only has memcached bucket Created: 08/Sep/14  Updated: 08/Sep/14  Resolved: 08/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   

1. Create 3 node cluster
2. Add a memcached bucket
3, Failover 1 node

After Step 3, the failed over node is shown both delta and full recovery option. However, we should not expect delta recovery option since it is not possible for memcached buckets. On picking 'delta' option and hitting rebalance delta rebalance is disallowed. The user has to cancel this option and then pick 'full' recovery.



 Comments   
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
MB-12148




[MB-12150] [Windows] Cleanup unnecessary files that are part of the windows installer Created: 08/Sep/14  Updated: 22/Sep/14  Resolved: 22/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build 3.0.1-1261

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install windows build 3.0.1-1261
As part of the installation you will see 2 files couchbase_console.html and also membase_console.html. You do not need membase_console.html. Please remove it

 Comments   
Comment by Bin Cui [ 22/Sep/14 ]
http://review.couchbase.org/#/c/41567/




[MB-12149] [Windows] Cleanup unnecessary files that are part of the windows builder Created: 08/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build 3.0.1-1261

Attachments: PNG File Screen Shot 2014-09-09 at 2.22.28 PM.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install windows build 3.0.1-1261
As part of the installation you will see the following directories:

1. cmake -- Does this need to be there?
2. erts-5.10.4 under server directory and also it is under lib directory but some files are duplicated please remove the duplicated files
3. licenses.tgz file -- This can be removed (I do not find this in Linux anymore)



 Comments   
Comment by Raju Suravarjjala [ 09/Sep/14 ]
I did a search on erts_mt and found 4 of them, looks like there are duplicate files 2 for each eras_MT.lib and eras_MTD.lib in two different folders
Comment by Sriram Melkote [ 09/Sep/14 ]
I can help with erts stuff (if removing one of them breaks anything, that is)




[MB-12148] {UI}: Cluster with Only memcached bucket allows graceful failover of a node Created: 08/Sep/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: all

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1. Create 2 node cluster
2. Create a memcached bucket
3. Try Graceful failover of node

Step 3 shows us that graceful failover is possible for memcached bucket. However, it does not make sense to have graceful failover only for memcached bucket. The option to do graceful failover should not be displayed in the UI.




[MB-12147] {UI} :: Memcached Bucket with 0 items indicates NaNB / NaNB for Data/Disk Usage Created: 08/Sep/14  Updated: 15/Sep/14  Resolved: 15/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Any environment

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
On a 1 node cluster create a memcached bucket with 0 items. UI says NaNB /
NaNB for Data/Disk Usage

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
http://review.couchbase.org/41379




[MB-12146] Test the performance impact of increasing XDCR workers per Replication Created: 08/Sep/14  Updated: 08/Sep/14

Status: In Progress
Project: Couchbase Server
Component/s: cross-datacenter-replication, documentation, performance
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Thomas Anderson
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
In 3.0 we have new XDCR setting "XDCR workers per Replication" which we believe will be useful setting to optimize (boast) performance depending hardware spec.

We need to evaluate the performance impact of this setting on
- physical resources on the system
- along with XDCR Max Replications per Bucket



 Comments   
Comment by Thomas Anderson [ 08/Sep/14 ]
have already shown that on smaller systems (<8 cores OR <16GB) memory, the current default settings of workers per Replication results in unexpected system overhead. the current release defaults to 16 MAX, 4 initial. for smaller systems, this is changed to a value of 1. observations on loss of operations per second, and additional sensitivity to views.
current XDCR thruput and latency tests will be used to test XDCR workers per Replication setting.




[MB-12145] {DCP}:: After Rebalance ep_queue_size Stat gives incorrect info about persistence Created: 08/Sep/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 1208, 10.6.2.145-10.6.2.150

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.145-982014-1126-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.145-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.146-982014-1129-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.146-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.147-982014-1132-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.147-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.148-982014-1135-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.148-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.149-982014-1138-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.149-982014-1144-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.150-982014-1141-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.150-982014-1144-couch.tar.gz
Is this a Regression?: Yes

 Description   

1. Create 6 cluster
2. Create default bucket with 10 K items
3. After ep_queue_size =0, take snap-shot of all data using cbtransfer (for couchstore files)
4. Rebalance-out 1 Node
5. After ep_queue_size =0, sleep for 30 seconds, take snap-shot of all data using cbtransfer (for couchstore files)

Step 5 and Step 3 shows inconsistency in expected keys as we find some keys missing. We also do data verification using another client which does not fail. Also, active and replica items counts are as expected. Issue seen in our expected items in couch store files

mike1651

 mike6340

 mike8616

 mike5380

 mike2691

 mike4740

 mike6432

 mike9418

 mike9769

 mike244

 mike7561

 mike5613

 mike6743

 mike2073

 mike1252

 mike4431

 mike9346

 mike4343

 mike9037

 mike6866

 mike2302

 mike3652

 mike7889

 mike2998

Note that on increasing the delay after we see ep_queue_size =0, from 30 to 60 to 120, we still hit issue when some keys are missing. Had adjusted the delay to 240 seconds and did not see the missing keys.

This is a not a case of data loss. Only stats (ep_queue_size =0) are incorrect. I have verified cbtransfer functionality and it does not break during the test runs.

Test Case:: ./testrunner -i ~/run_tests/palm.ini -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_after_ops,nodes_out=1,replicas=1,items=10000,skip_cleanup=True

Also, with vbuckets=128 this problem does not repro. So please try it for 1024 vbuckets.

Seen this issues in different places for failover+rebalance.



 Comments   
Comment by Ketaki Gangal [ 12/Sep/14 ]
Run into same issue with ./testrunner -i /tmp/rebal.ini active_resident_threshold=100,dgm_run=true,get-delays=True,get-cbcollect-info=True,eviction_policy=fullEviction,max_verify=100000 -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_after_ops,nodes_out=1,replicas=1,items=10000,GROUP=OUT

It uses same verification method as above and fails due to ep_queue_size stat
1. Create cluster
3. After ep_queue_size =0, take snap-shot of all data using cbtransfer (for couchstore files)
4. Rebalance-out 1 Node
5. After ep_queue_size =0, sleep for 30 seconds, take snap-shot of all data using cbtransfer (for couchstore files)




[MB-12144] REST API doesn't always return 60 stat entries Created: 08/Sep/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: RESTful-APIs
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Ian McCloy Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The output of the REST API stats doesn't always return 60 entries for each stat.. sometimes 59 or 61 are returned.

for i in {1..1000}; do /opt/couchbase/bin/curl localhost:8091/pools/default/buckets/default/stats 2>/dev/null| cut -d":" -f4 | cut -d \" -f1 | tr , "\n" | grep -v '^$' | wc -l | grep -v 60 ; done




[MB-12143] [RC3-SSL-XDCR-Unidirectional] Deletions were not replicated to destination after warmup Created: 08/Sep/14  Updated: 09/Sep/14  Resolved: 09/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sangharsh Agarwal Assignee: Sangharsh Agarwal
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1208-rel
Platform: Centos 5.8

Issue Links:
Relates to
relates to MB-12063 KV+XDCR System test : Between expirat... Closed
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-12143/62bf6851/10.3.121.65-952014-2013-diag.zip
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-12143/ea077cb3/10.3.121.65-diag.txt.gz
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-12143/818d1add/10.3.3.207-952014-2016-diag.zip
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-12143/c9a85985/10.3.3.207-diag.txt.gz
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-12143/8e6ee944/10.3.3.209-diag.txt.gz
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-12143/ac4a5723/10.3.3.209-952014-2019-diag.zip
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-12143/197e82ba/10.3.3.210-952014-2022-diag.zip
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-12143/ad681f46/10.3.3.210-diag.txt.gz


[Destination]

10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-12143/0884fcda/10.3.4.177-diag.txt.gz
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-12143/21d42ea2/10.3.4.177-952014-2025-diag.zip
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-12143/3f5f4a4c/10.3.121.62-diag.txt.gz
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-12143/ec8c6c89/10.3.121.62-952014-2029-diag.zip
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-12143/902df50c/10.3.2.204-diag.txt.gz
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-12143/cb1c1b14/10.3.2.204-952014-2031-diag.zip
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-12143/d97620a8/10.3.3.208-diag.txt.gz
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-12143/f39a4073/10.3.3.208-952014-2027-diag.zip

DataFiles
========

[Source]
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-12143/c302d1be/10.3.121.65-952014-2032-couch.tar.gz
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-12143/a1ff94a5/10.3.3.207-952014-2033-couch.tar.gz
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-12143/fe1f20a9/10.3.3.209-952014-2033-couch.tar.gz
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-12143/fb5428b7/10.3.3.210-952014-2033-couch.tar.gz

[Destination]
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-12143/6c0dfc5a/10.3.4.177-952014-2033-couch.tar.gz
10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-12143/2ca67b5e/10.3.121.62-952014-2033-couch.tar.gz
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-12143/99ba869a/10.3.3.208-952014-2033-couch.tar.gz
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-12143/21bc47a2/10.3.2.204-952014-2033-couch.tar.gz
Is this a Regression?: Yes

 Description   
[Jenkins]
http://qa.hq.northscale.net/job/centos_x64--31_02--uniXDCR_SSL-P1/47/consoleFull

[Test]
./testrunner -i centos_x64--31_01--uniXDCR-P1.ini GROUP=CHAIN,get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True,demand_encryption=1 -t xdcr.uniXDCR.unidirectional.load_with_async_ops_with_warmup,items=100000,rdirection=unidirection,ctopology=chain,sasl_buckets=1,doc-ops=delete,warm=source,replication_type=xmem,GROUP=P0;CHAIN;xmem


[Test Steps]
1. Setup 4-4 Node Uni-directional XMEM replication. SSL=True
2. Buckets: default, sasl_bucket_1
3. Load 1M items on each buckets.
4. Warmup One Source node (10.3.3.210).
5. Perform 30K deletions on Source cluster.
6. Verify items on each cluster.
       a) Items on Source cluster is 70K on both the buckets which was expected.
       b) Items on default of Destination cluster is 70K as expected.
       c) Items mismatch on Destination Cluster "sasl_bucket_1". Expected: 70000 Actual: 70024.

There were Outbound mutations left on Source cluster's sasl_bucket_1:

[2014-09-05 20:04:50,113] - [xdcrbasetests:1318] INFO - Current outbound mutations on cluster node: 10.3.121.65 for bucket sasl_bucket_1 is 77
[2014-09-05 20:04:50,273] - [xdcrbasetests:375] INFO - sleep for 10 secs. ...
[2014-09-05 20:05:00,456] - [xdcrbasetests:1318] INFO - Current outbound mutations on cluster node: 10.3.121.65 for bucket sasl_bucket_1 is 77
[2014-09-05 20:05:00,637] - [xdcrbasetests:375] INFO - sleep for 10 secs. ...
[2014-09-05 20:05:10,830] - [xdcrbasetests:1318] INFO - Current outbound mutations on cluster node: 10.3.121.65 for bucket sasl_bucket_1 is 77
[2014-09-05 20:05:11,012] - [xdcrbasetests:375] INFO - sleep for 10 secs. ...
[2014-09-05 20:05:21,212] - [xdcrbasetests:1318] INFO - Current outbound mutations on cluster node: 10.3.121.65 for bucket sasl_bucket_1 is 77
[2014-09-05 20:05:21,370] - [xdcrbasetests:375] INFO - sleep for 10 secs. ...
[2014-09-05 20:05:31,579] - [xdcrbasetests:1318] INFO - Current outbound mutations on cluster node: 10.3.121.65 for bucket sasl_bucket_1 is 77
[2014-09-05 20:05:31,744] - [xdcrbasetests:375] INFO - sleep for 10 secs. ...
[2014-09-05 20:05:41,923] - [xdcrbasetests:1318] INFO - Current outbound mutations on cluster node: 10.3.121.65 for bucket sasl_bucket_1 is 77



Following Keys has meta data mis-match On below 24 keys: [' loadOne88725 ', ' loadOne86102 ', ' loadOne82917 ', ' loadOne83013 ', ' loadOne87806 ', ' loadOne85323 ', ' loadOne72828 ', ' loadOne80232 ', ' loadOne77939 ', ' loadOne80540 ', ' loadOne91549 ', ' loadOne85451 ', ' loadOne78168 ', ' loadOne94458 ', ' loadOne83761 ', ' loadOne83183 ', ' loadOne89953 ', ' loadOne92768 ', ' loadOne87996 ', ' loadOne88057 ', ' loadOne86092 ', ' loadOne86670 ', ' loadOne97679 ', ' loadOne82887 ']

2014-09-05 20:11:52,181] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne88725 =====
[2014-09-05 20:11:52,182] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9102037377909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:11:52,182] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9102037377908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:11:53,265] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne86102 =====
[2014-09-05 20:11:53,266] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9099987685909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:11:53,266] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9099987685908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:11:57,369] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne82917 =====
[2014-09-05 20:11:57,373] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9096223797909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:11:57,374] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9096223797908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:04,955] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne83013 =====
[2014-09-05 20:12:04,962] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9097160866909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:04,964] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9097160866908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:11,906] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne87806 =====
[2014-09-05 20:12:11,909] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9101036419909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:11,909] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9101036419908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:16,143] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne85323 =====
[2014-09-05 20:12:16,147] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9099078731909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:16,149] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9099078731908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:17,078] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne72828 =====
[2014-09-05 20:12:17,082] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9087763065909, 'flags': 0, 'expiration': 1409972309}
[2014-09-05 20:12:17,082] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9087763065908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:24,889] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne80232 =====
[2014-09-05 20:12:24,891] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9094688048909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:24,892] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9094688048908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:29,005] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne77939 =====
[2014-09-05 20:12:29,006] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9092194411909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:29,007] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9092194411908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:34,407] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne80540 =====
[2014-09-05 20:12:34,410] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9094737801909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:34,412] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9094737801908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:36,252] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne91549 =====
[2014-09-05 20:12:36,256] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9104651742909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:36,257] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9104651742908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:39,751] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne85451 =====
[2014-09-05 20:12:39,752] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9099043273909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:39,752] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9099043273908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:42,400] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne78168 =====
[2014-09-05 20:12:42,401] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9093103149909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:42,401] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9093103149908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:49,561] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne94458 =====
[2014-09-05 20:12:49,562] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9107323149909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:49,562] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9107323149908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:50,421] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne83761 =====
[2014-09-05 20:12:50,422] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9097159065909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:50,423] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9097159065908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:51,071] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne83183 =====
[2014-09-05 20:12:51,074] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9097139357909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:12:51,075] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9097139357908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:54,902] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne89953 =====
[2014-09-05 20:12:54,905] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9102751366909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:54,907] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9102751366908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:55,267] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne92768 =====
[2014-09-05 20:12:55,271] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9105516359909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:55,272] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9105516359908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:56,682] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne87996 =====
[2014-09-05 20:12:56,687] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9101032935909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:56,688] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9101032935908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:58,984] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne88057 =====
[2014-09-05 20:12:58,989] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9102011877909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:58,990] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9102011877908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:59,679] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne86092 =====
[2014-09-05 20:12:59,683] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9099988888909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:59,684] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9099988888908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:12:59,746] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne86670 =====
[2014-09-05 20:12:59,750] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9099972218909, 'flags': 0, 'expiration': 1409972374}
[2014-09-05 20:12:59,752] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9099972218908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:13:06,284] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne97679 =====
[2014-09-05 20:13:06,291] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9109372769909, 'flags': 0, 'expiration': 1409972410}
[2014-09-05 20:13:06,293] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9109372769908, 'flags': 0, 'expiration': 0}
[2014-09-05 20:13:09,311] - [task:1228] ERROR - ===== Verifying rev_ids failed for key: loadOne82887 =====
[2014-09-05 20:13:09,315] - [task:1230] ERROR - Source meta data: {'deleted': 1, 'seqno': 2, 'cas': 9096243219909, 'flags': 0, 'expiration': 1409972341}
[2014-09-05 20:13:09,315] - [task:1231] ERROR - Dest meta data: {'deleted': 0, 'seqno': 1, 'cas': 9096243219908, 'flags': 0, 'expiration': 0}


All the keys belongs to the Warmed-up node i.e. 10.3.3.210, vbucket=815. Except loadOne85323.

Test is passed on non-ssl XDCR.

 Comments   
Comment by Sangharsh Agarwal [ 08/Sep/14 ]
Aruna, Please check if bug has enough information and assign it to Alk.
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
xdcr_trace on node .210 is not available after restart. Will likely bounce the ticket for that.
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
Without xdcr traces I cannot do anything here.

Logs indicate that xdcr believes that it replicated the everything. I.e. I see that last checkpoint was at seqno 118.
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
Also I'll need you to double check that clock is synchronized between all nodes.
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
There are some indications of something weird going on with setup. Specifically there's a bunch of ehostunreach errors on node 210 towards node 177 which are then resolved. But I don't see any issues in logs of node 177, nor I see any signs of incoming xdcr on node 177 since that.

It's almost like somebody booted another node with ip .177 and got xdcr traffic from node 210 to be routed there.
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
Correction: econnrefused, rather than ehostunreach errors.
Comment by Aruna Piravi [ 08/Sep/14 ]
Clocks are synchronized, added code to enable xdcr trace logging after restart but I have run this test atleast 10 times today, both as standalone and as a part of the entire job on jenkins, still unable to reproduce even once.

Sangharsh can you also pls try to reproduce the problem? Thanks.
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
Then it increasingly looks like another instance of testing environment chaos. I.e. see https://www.couchbase.com/issues/browse/MB-12129
Comment by Sangharsh Agarwal [ 09/Sep/14 ]
>It's almost like somebody booted another node with ip .177 and got xdcr traffic from node 210 to be routed there.

Alk, I couldn't find anything on the syslog (/var/log/messages.1 or /var/log/secure) on .177 node in this test duration. If you find, please mention.

Comment by Sangharsh Agarwal [ 09/Sep/14 ]
Chiyoung/Mike: Request you to please check from ep-engine point of view also. There were two patches merged between build 1205 - 1208 related to warmup:

MB-12063: http://review.couchbase.org/#/c/41219/
MB-12100: http://review.couchbase.org/#/c/41177/

Please cross check.
Comment by Sangharsh Agarwal [ 09/Sep/14 ]
Chiyoung/Mike: ''loadOne85323" (One of deleted key on Source) is not present in any of *couch* files on the Source side i.e. 10.3.3.207, 10.3.3.209, 10.3.3.210, 10.3.121.65.

[root@cen-1413 test_5]# find . -name "*.couch*" | xargs grep "loadOne85323"
Binary file ./210/opt/couchbase/var/lib/couchbase/data/default/815.couch.1 matches
Binary file ./65/opt/couchbase/var/lib/couchbase/data/default/815.couch.1 matches

While for other deleted key i.e. loadOne82917


[root@cen-1413 test_5]# find . -name "*.couch*" | xargs grep "loadOne82917"
Binary file ./210/opt/couchbase/var/lib/couchbase/data/sasl_bucket_1/815.couch.1 matches
Binary file ./210/opt/couchbase/var/lib/couchbase/data/default/815.couch.1 matches
Binary file ./65/opt/couchbase/var/lib/couchbase/data/sasl_bucket_1/815.couch.1 matches
Binary file ./65/opt/couchbase/var/lib/couchbase/data/default/815.couch.1 matches
Comment by Raju Suravarjjala [ 09/Sep/14 ]
Sangharsh seems to be correct, this seem related to the following change: http://review.couchbase.org/#/c/41219/
Comment by Chiyoung Seo [ 09/Sep/14 ]
I don't think these two changes caused this issue. Those changes simply disable deleting expired items during the warmup, but instead let the daemon expiry pager remove them after the warmup is completed.
Comment by Chiyoung Seo [ 09/Sep/14 ]
Also note that those mismatched items are not expired, but instead explicitly deleted by the client in the source cluster.
Comment by Aruna Piravi [ 09/Sep/14 ]
1. The changes below are totally related to the expiry after warmup. However there is no expiration done in this test. These are mere deletes. So it is not a regression.

       MB-12063: http://review.couchbase.org/#/c/41219/
       MB-12100: http://review.couchbase.org/#/c/41177/

2. Alk saw some network errors in logs which need to be isolated to understand what the real problem causing data loss is. Since one set of logs is all we have and they point to environmental issues, we would need to reproduce this problem again.

3. Since I have not been able to reproduce 11 times (+1 today), on discussion with Chiyoung and Raju, I agree with closing this issue as not reproducible and reopening when encountered again.

Thanks!




[MB-12142] Rebalance Exit due to Bad Replicas Error has no support documentation Created: 05/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Rebalance exits with Bad Replicas which can be caused by ns_server or couchbase-bucket. In such situations, rebalance on re-try fails. To fix such an issue, we need manual intervention to diagnose the problem. For the support team we need to provide documentation as a part of our release notes. Please define a process for the same and then re-assign the bug to Ruth for adding it to our documentation for our release notes

 Comments   
Comment by Chiyoung Seo [ 12/Sep/14 ]
Mike,

Please provide more details on bad replica issues in DCP and assign this back to the doc team.
Comment by Mike Wiederhold [ 12/Sep/14 ]
Bad replicas is an error message that means that replication streams could not be created. As a result there may be many reasons for this to happen. One reason that this might happen is if some of the vbuckets sequence numbers which are maintained internally in Couchbase are invalid. If this happpens you will see a log message in the memcached logs that looks something like this.

(DCP Producer) some_dcp_stream_name (vb 0) Stream request failed because the snap start seqno (100) <= start seqno (101) <= snap end seqno (100) is required

In order for a DCP producer to accept a request for a DCP stream the following must be true.

snapshot start seqno <= start seqno <= snapshot end seqno

If the above condition is not true for a stream request then a customer should contact support so that we can resolve the issue using a script to "reset" the sequence numbers. I can provide this script at a later time, but it is worth noting that we do not expect this scenario to happen and have resolved all bugs we have seen related to this error.
Comment by Ruth Harris [ 12/Sep/14 ]
Put it into the release notes (not in beta but for GA) for Known Issues MB-12142.
Is this the correct MB issue?




[MB-12141] Try to delete a Server group that is empty. The error message needs to be descriptive Created: 05/Sep/14  Updated: 16/Sep/14  Resolved: 16/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Pavel Blagodov
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows build 3.0.1_1261
Environment: Windows 7 64 bit

Attachments: PNG File Screen Shot 2014-09-05 at 5.22.08 PM.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Login to the Couchbase console
http://10.2.2.52:8091/ (Administrator/Password)
Click on Server Nodes
Try to create a group and then click to delete
You will see the error as seen in the screenshot
Expected behavior: Removing Server Group as the tile and should say "Are you sure you want to remove the Server group" or some thing like that

 Comments   
Comment by Pavel Blagodov [ 11/Sep/14 ]
http://review.couchbase.org/41359




[MB-12140] Meaningful error should be given to the user Created: 05/Sep/14  Updated: 05/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows build 3.0.1_1261
Environment: Windows 7 64 bit

Attachments: PNG File Screen Shot 2014-09-05 at 5.09.59 PM.png    
Triage: Untriaged
Is this a Regression?: No

 Description   
Login to the Couchbase console
http://10.2.2.52:8091/ (Administrator/Password)
Click on Server Nodes
Try to Add a server
Give the Server IP address (10.3.2.43)
In the Security give a read only user name and password
You will see the error as seen in the screenshot
Expected behavior: Attention - Authentication failed as Readonly username and password are not allowed.

 Comments   
Comment by Aleksey Kondratenko [ 05/Sep/14 ]
It is against security recommendations (including PCI DSS) to reveal security sensitive details such us "such user exists".
Comment by Raju Suravarjjala [ 05/Sep/14 ]
Anil: I have logged this bug as per your suggestion. Please advice




[MB-12139] [windows] couchbase-cli add node twice in rebalance option Created: 05/Sep/14  Updated: 11/Sep/14  Resolved: 11/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Bin Cui
Resolution: Cannot Reproduce Votes: 0
Labels: windows, windows-3.0-beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 R2 64-bit

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Run command add a node and rebalance as shown in help of couchbase-cli

 Add a node to a cluster and rebalance:
    couchbase-cli rebalance -c 192.168.0.1:8091 \
       --server-add=192.168.0.2:8091 \
       --server-add-username=Administrator1 \
       --server-add-password=password1 \
       --group-name=group1 \
       -u Administrator -p password

It adds a node to cluster, rebalance and add same node into cluster again. So it throws out error
ERROR: unable to add server '172.23.106.173:8091' to group 'Group 1' (400) Bad Request
[u'Prepare join failed. Node is already part of cluster.']

Command to run:
/cygdrive/c/Program\ Files/Couchbase/Server/bin/couchbase-cli.exe rebalance --cluster=localhost -u Administrator -p password --server-add=172.23.106.173:8091 --server-add-username=Administrator --server-add-password=password --group-name="Group 1"

Output from command ran:
INFO: rebalancing
SUCCESS: rebalanced cluster
ERROR: unable to add server '172.23.106.173:8091' to group 'Group 1' (400) Bad Request
[u'Prepare join failed. Node is already part of cluster.']


 Comments   
Comment by Bin Cui [ 11/Sep/14 ]
c:\t1\bin>couchbase-cli rebalance -c localhost:8091 -u Administrator -p 123456 -
-server-add=10.6.2.91 --group-name="Group 1"
SUCCESS: add server '10.6.2.91:8091' to group 'Group 1'
INFO: rebalancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SUCCESS: rebalanced cluster

Comment by Bin Cui [ 11/Sep/14 ]
Reproduce steps:

1. Add one node with default bucket and sample data uploaded.
2. Add another node which is not initialized yet.
3. Run the following script:

c:\t1\bin>couchbase-cli rebalance -c localhost:8091 -u Administrator -p 123456 --server-add=10.6.2.91 --group-name="Group 1"




[MB-12138] {Windows - DCP}:: View Query fails with error 500 reason: error {"error":"error","reason":"{index_builder_exit,89,<<>>}"} Created: 05/Sep/14  Updated: 19/Sep/14  Resolved: 19/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Parag Agarwal Assignee: Nimish Gupta
Resolution: Fixed Votes: 0
Labels: windows, windows-3.0-beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1267, Windows 2012, 64 x, machine:: 172.23.105.112

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12138/172.23.105.112-952014-1511-diag.zip
Is this a Regression?: Yes

 Description   


1. Create 1 Node cluster
2. Create default bucket and add 100k items
3. Create views and query it

Seeing the following exceptions

http://172.23.105.112:8092/default/_design/ddoc1/_view/default_view0?connectionTimeout=60000&full_set=true&limit=100000&stale=false error 500 reason: error {"error":"error","reason":"{index_builder_exit,89,<<>>}"}

We cannot run any view tests as a result


 Comments   
Comment by Anil Kumar [ 16/Sep/14 ]
Nimish/Siri - Any update on this.
Comment by Meenakshi Goel [ 17/Sep/14 ]
Seeing similar issue in Views DGM test http://qa.hq.northscale.net/job/win_2008_x64--69_06_view_dgm_tests-P1/1/console
Test : view.createdeleteview.CreateDeleteViewTests.test_view_ops,ddoc_ops=update,test_with_view=True,num_ddocs=4,num_views_per_ddoc=10,items=200000,active_resident_threshold=10,dgm_run=True,eviction_policy=fullEviction
Comment by Nimish Gupta [ 17/Sep/14 ]
We have found the root cause and working on the fix.
Comment by Nimish Gupta [ 19/Sep/14 ]
http://review.couchbase.org/#/c/41480




[MB-12137] dcp_wait_for_data_move_failed error Created: 05/Sep/14  Updated: 10/Sep/14  Resolved: 10/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Setup 4-node cluster with beer-sample bucket and workload running on it. Graceful failover one node, wait a few minutes, delta recover back in. The rebalance failed once due to dcp_wait_for_data_move_failed. I gathered logs and then retried the rebalance a number of times, it failed all times with the same error.

Build 1208

Logs are at:
https://s3.amazonaws.com/cb-customers/mb11980/collectinfo-2014-09-05T202245-ns_1%40ec2-54-176-254-88.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/mb11980/collectinfo-2014-09-05T202245-ns_1%40ec2-54-176-54-69.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/mb11980/collectinfo-2014-09-05T202245-ns_1%40ec2-54-219-84-241.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/mb11980/collectinfo-2014-09-05T202245-ns_1%40ec2-54-219-94-38.us-west-1.compute.amazonaws.com.zip

 Comments   
Comment by Mike Wiederhold [ 05/Sep/14 ]
Perry,

I'll start looking at this, but can you add information on what you were doing with the cluster to cause this issue?
Comment by Parag Agarwal [ 05/Sep/14 ]
Just had a discussion with Perry, the scenario does not repro in rc1 and rc2. So this is a regression for RC3

Also, in the scenario mentioned for the bug we have word load of (40k writes/sec, 40k reads/sec)

Scenario

Setup 4-node cluster with beer-sample (has view as well) bucket and workload running on it. Graceful failover one node, wait a few minutes, delta recover back in. The rebalance failed once due to dcp_wait_for_data_move_failed. I gathered logs and then retried the rebalance a number of times, it failed all times with the same error.
Comment by Perry Krug [ 05/Sep/14 ]
(spoke too soon)

I was able to reproduce it again on 1208. my biggest concern is that once we hit this, it seems to get stuck forever and can't rebalance.

Logs are at:
https://s3.amazonaws.com/cb-customers/mb12137/collectinfo-2014-09-06T003041-ns_1%40ec2-54-176-54-34.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/mb12137/collectinfo-2014-09-06T003041-ns_1%40ec2-54-193-134-117.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/mb12137/collectinfo-2014-09-06T003041-ns_1%40ec2-54-193-163-255.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/mb12137/collectinfo-2014-09-06T003041-ns_1%40ec2-54-193-189-191.us-west-1.compute.amazonaws.com.zip
Comment by Perry Krug [ 05/Sep/14 ]
I did try with previous builds, but I'm not confident that not being able to reproduce was a symptom as it took me 5 times to get it to happen again with 1208.
Comment by Mike Wiederhold [ 05/Sep/14 ]
The workaround in a customer scenario would be to manually modify the vbucket data file. Clearly we don't want to have to have customers do this, but that is the only way to get out of this situation once it happens.
Comment by Perry Krug [ 05/Sep/14 ]
Is there anything from the logs that tells us how we got into this situation in the first place?
Comment by Mike Wiederhold [ 05/Sep/14 ]
No, we write this data every time we persist mutations to disk so logging this sort of thing would generate too much garbage.
Comment by Parag Agarwal [ 09/Sep/14 ]
Worked with Perry to repro the issue. Was able to repro the issue both in centos and ubuntu. Mike analyzed the issue and has given a toy build. Will update once I test it out
Comment by Chiyoung Seo [ 09/Sep/14 ]
The following change from Mike was merged into ep-engine 3.0 branch:

http://review.couchbase.org/#/c/41289/

Parag already verified this fix.
Comment by Perry Krug [ 10/Sep/14 ]
Can't reproduce this any long, thanks Mike and Parag!




[MB-12136] XDCR@next release - Router Created: 05/Sep/14  Updated: 10/Sep/14  Resolved: 10/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Yu Sui
Resolution: Done Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 32h
Time Spent: Not Specified
Original Estimate: 32h

Epic Link: XDCR next release




[MB-12135] [Windows] Once the installation of CB server is completed, it launches console page that comes up with blank page Created: 05/Sep/14  Updated: 11/Sep/14  Resolved: 11/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: Windows, windows-3.0-beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows build 3.0.1_1261
Environment: Windows 7 64 bit


Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: No

 Description   
I was able to successfully install the build on Windows 7 64-bit. Once the installation of CB server is completed, it launches the Couchbase console in the web browser. Please notice it is blank because there is a delay in the server getting started. You have to hit refresh couple of times and then you will see it working

 Comments   
Comment by Sriram Melkote [ 05/Sep/14 ]
We could use PortTaken() function and wait for server to come up before launching the console URL
Comment by Bin Cui [ 10/Sep/14 ]
http://review.couchbase.org/#/c/41329/




[MB-12134] moxi does not check cluster-map connection, which may be dropped Created: 05/Sep/14  Updated: 05/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Alexander Petrossian (PAF) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: our solution is deployed in environment, where tcp-connections are resources.
and are dropped on inactivity of 1 hour.

we can't change this.

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
1. moxi initiates connection to couchbase to get cluster-map and it's updates.
2. since normally there are no updates, this connection gets silently dropped by a network element that treats connections as valuable resources.
3. new bucket/rebalance
4. couchbase cluster-map get changed
5. couchbase tries to push cluster-map updates down to moxi
6. tcp packet get silently dropped by network element in the middle

we can't change behavior of this network element.

we want moxi to regularly check that vital tcp connection with cluster-map updates is still alive, and, if not, reconnect automatically.

 Comments   
Comment by Alexander Petrossian (PAF) [ 05/Sep/14 ]
http://jira.teligent.ru/browse/....-7028




[MB-12133] GETL metric in statistics missing Created: 05/Sep/14  Updated: 05/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Alexander Petrossian (PAF) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: cmd_get does not count GETL operations.

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
cmd_get -- have
cmd_getl -- don't have



 Comments   
Comment by Alexander Petrossian (PAF) [ 05/Sep/14 ]
https://tracker.teligent.ru/issues/15305
Comment by Alexander Petrossian (PAF) [ 05/Sep/14 ]
Since
ops per second
are
Total amount of operations per second to this bucket (measured from cmd_get + cmd_set + incr_misses + incr_hits + decr_misses + decr_hits + delete_misses + delete_hits)

getl operations are missing from "ops per second" too.

in our solution we mostly use getl+cas all the time, so "ops per second" show approx 1/2 of all operations.

(we have extensive monitoring on client side too, and we see those getl-s there)




[MB-12132] [Doc] Support Windows Server 2012 R2 in Production Created: 04/Sep/14  Updated: 05/Sep/14  Resolved: 05/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-9494 Support Windows Server 2012 R2 in Pro... Closed

 Description   
We need to support Windows Server 2012 R2 in production for 3.0.

 Comments   
Comment by Ruth Harris [ 05/Sep/14 ]
Updated in Support platforms section and added to Windows installation section

Windows 2012 R2 SP1 64 bit Developer and Production




[MB-12131] Add Debian as new platform for Couchbase Server Created: 04/Sep/14  Updated: 05/Sep/14  Resolved: 05/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-10960 Add Debian package to 3.0 release - D... Closed

 Description   
Debian is consistently in the top 3 distributions in server market, by almost every count. For example, below ranks the distribution of the top 10 million web servers:

http://w3techs.com/technologies/overview/operating_system/all

You can look at various other surveys, and you'll see the message is the same. Debian is pretty much at the top for servers. Yet, we don't ship packages for it. This is quite hard to understand because we're already building .deb for Ubuntu, and it takes only a few minor changes to make it compatible with Debian/Stable.

While I don't track customer requests, I've anecdotally seen them requesting the exact same thing in unambiguous terms.


 Comments   
Comment by Ruth Harris [ 05/Sep/14 ]
Added to Support platforms and changed/updated installation section to Ubuntu/Debian installation:

Debian Linux 7 64 bit Developer and Production Debian 7.0




[MB-12130] Document the Graceful Failover feature under Maintenance Mode Created: 04/Sep/14  Updated: 05/Sep/14  Resolved: 05/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
As discussed we need to document the Graceful Failover feature as part of Maintenance Mode.


 Comments   
Comment by Ruth Harris [ 05/Sep/14 ]
Now a sub-section under Cluster management > Server node maintenance
along with delta node recovery




[MB-12129] XDCR : replication broken on build 3.0.0-1206 Created: 04/Sep/14  Updated: 10/Sep/14  Resolved: 10/Sep/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, cross-datacenter-replication
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Aruna Piravi Assignee: Aruna Piravi
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centOS 6.x build 1206

Triage: Untriaged
Is this a Regression?: Yes

 Description   
Is a regression from 1205 where xdcr worked fine.

Steps
--------
1. Create buckets on .44(4 nodes) and .54(4 nodes) clusters
2. Load till ~50 dgm on both sides
3. Set up xdcr.
     standardbucket1(.44) ---> standardbucket1(.54)
     standardbucket(.44) <---->standardbucket(.54)
4. 50% gets and 50% deletes on both sides for 15 mins
5. Rebalance-out one node on .44
6. Rebalance -in same node on .44
7. Failover and remove same node on .44 (we failover by killing beam and erlang so there is warmup involved)
8. Failover and addback same node on .44
9. Rebalance-out one node on .54
10. Rebalance -in same node on .54
11. Failover and remove same node on .54
12. Failover and addback same node on .54
13. Soft restart all 3 nodes in .44
14. Soft restart all 3 nodes in .54

At the end of test, no keys were found in standardbucket1(uni-xdcr with no load on destination) and keys did not match for the bi-xdcr buckets. I think replication never happened, these were initially loaded keys.

Cross-checked couch files to see if this is a stats issue (indeed found no docs for standardbucket1 on .54)-
[root@guinep-s10501 standardbucket1]# /opt/couchbase/bin/couch_dbdump *.couch.1
Dumping "0.couch.1":
Dumping "100.couch.1":
Dumping "101.couch.1":
:
Dumping "99.couch.1":
Dumping "9.couch.1":
Dumping "master.couch.1":

Total docs: 0

Some investigation
--------------------------
Could be a regression from MB-12100.

Seeing "startReplication" messages like

"batchSizeItems":500,"numWorkers":4,"seq":23411,"snapshotStart":23411,"snapshotEnd":23411"
pls note seq, snapshotStart and snapshotEnd are same, as recorded in xdcr_trace for all startReplication events. I'm not completely sure if that's the root cause for issue but for initial xdcr, seq, snapshotStart and snapshotEnd being same for a vbucket looks weird.

[root@soursop-s11201 logs]# grep "startReplication" xdcr_trace.log
{"pid":"<0.5674.1>","type":"startReplication","ts":1409849619.748034,"batchSizeItems":500,"numWorkers":4,"seq":23411,"snapshotStart":23411,"snapshotEnd":23411,"failoverUUUID":130175293122263,"supportsDatatype":false,"changesReader":"<0.25996.101>","changesQueue":"<0.24382.101>","changesManager":"<0.25839.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25913.101>","<0.21455.101>","<0.25712.101>","<0.22686.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
{"pid":"<0.5613.1>","type":"startReplication","ts":1409849619.758986,"batchSizeItems":500,"numWorkers":4,"seq":23416,"snapshotStart":23416,"snapshotEnd":23416,"failoverUUUID":104945028016240,"supportsDatatype":false,"changesReader":"<0.25799.101>","changesQueue":"<0.25842.101>","changesManager":"<0.25908.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25104.101>","<0.22805.101>","<0.25126.101>","<0.25806.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
{"pid":"<0.5631.1>","type":"startReplication","ts":1409849619.759695,"batchSizeItems":500,"numWorkers":4,"seq":23205,"snapshotStart":23205,"snapshotEnd":23205,"failoverUUUID":264920799474449,"supportsDatatype":false,"changesReader":"<0.26039.101>","changesQueue":"<0.25320.101>","changesManager":"<0.25997.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25573.101>","<0.25707.101>","<0.24003.101>","<0.24432.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
{"pid":"<0.5674.1>","type":"startReplication","ts":1409849619.760088,"batchSizeItems":500,"numWorkers":4,"seq":23423,"snapshotStart":23423,"snapshotEnd":23423,"failoverUUUID":130175293122263,"supportsDatatype":false,"changesReader":"<0.26047.101>","changesQueue":"<0.25221.101>","changesManager":"<0.25790.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25803.101>","<0.15404.101>","<0.25974.101>","<0.15339.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
{"pid":"<0.5749.1>","type":"startReplication","ts":1409849619.761978,"batchSizeItems":500,"numWorkers":4,"seq":22971,"snapshotStart":22971,"snapshotEnd":22971,"failoverUUUID":134507485417479,"supportsDatatype":false,"changesReader":"<0.25943.101>","changesQueue":"<0.26046.101>","changesManager":"<0.25674.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25792.101>","<0.25894.101>","<0.25977.101>","<0.25991.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
{"pid":"<0.15915.85>","type":"startReplication","ts":1409849619.762293,"batchSizeItems":500,"numWorkers":4,"seq":23612,"snapshotStart":23612,"snapshotEnd":23612,"failoverUUUID":54183205161167,"supportsDatatype":false,"changesReader":"<0.26030.101>","changesQueue":"<0.24274.101>","changesManager":"<0.25898.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.9700.99>","<0.16284.101>","<0.15155.101>","<0.11809.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
{"pid":"<0.5719.1>","type":"startReplication","ts":1409849619.762483,"batchSizeItems":500,"numWorkers":4,"seq":22938,"snapshotStart":22938,"snapshotEnd":22938,"failoverUUUID":184213973604977,"supportsDatatype":false,"changesReader":"<0.15419.101>","changesQueue":"<0.25851.101>","changesManager":"<0.20302.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.26474.99>","<0.20073.101>","<0.20391.101>","<0.25980.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}
:
:

Collecting and attaching logs. Meanwhile if you want to have a look at the cluster - http://172.23.105.44:8091/index.html#sec=buckets

 Comments   
Comment by Aruna Piravi [ 04/Sep/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12129/44.tar
https://s3.amazonaws.com/bugdb/jira/MB-12129/54.tar
Comment by Aruna Piravi [ 04/Sep/14 ]
The clusters will be available for an hour. If you need it for investigation pls let me know. Thanks.
Comment by Aruna Piravi [ 04/Sep/14 ]
Of course I didn't need a system test to say xdcr is not working, but found this while verifying a bug through system test. However I ran a couple of xdcr functional tests and they passed on 1206.
Comment by Aleksey Kondratenko [ 04/Sep/14 ]
Looks like not a bug. But operator error instead. (And certainly not a regression).

It looks like cluster .54 is a brand new cluster. It has different uuid than uuid configured in xdcr on cluster .44. And looking at timestamps in user-visible logs on node .54 I see that indeed it was set up at about same time when xdcr issues started.

It's true that xdcr's error reporting story is a mess and it's bad that it's nearly impossible to find real cause of broken xdcr. But it has nothing to do with this "bug".
Comment by Aruna Piravi [ 04/Sep/14 ]
> It looks like cluster .54 is a brand new cluster. It has different uuid than uuid configured in xdcr on cluster .44.

Ok, may I ask what caused .54's uuid to change? From the cluster UI log, I see that .54 was neither rebalanced out nor failed over since setting up xdcr. It was .55 that was rebalanced-out, then back in. .55 was also then failed-over and removed. Then it was .58 that was failed-over but added back. .54 was totally untouched. what can explain the "brand new cluster or uuid change"?
Comment by Aleksey Kondratenko [ 04/Sep/14 ]
My guess is that you re-installed it at around 9am:something. Which certainly resets uuid
Comment by Aruna Piravi [ 04/Sep/14 ]
Ok, I see that one of jenkins jobs got triggered when the test was on. Sorry for the trouble. Closing this issue.




[MB-12128] Stale=false may not ensure RYOW property (Regression) Created: 03/Sep/14  Updated: 16/Sep/14  Resolved: 16/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Sarath Lakshman Assignee: Sarath Lakshman
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
For performance reasons, we tried to reply for stale=false query readers immediately after updater internal checkpoint. This may result in sending index updates after partial snapshot reads and user may not observe RYOWs. For ensuring RYOW, we should always returns results after processing a complete upr snapshot.

We just need to revert this commit to fix the problem, https://github.com/couchbase/couchdb/commit/e866fe9330336ab1bda92743e0bd994530532cc8

It is fairly confident that reverting this change will not break anything. It was added as a pure performance improvement.

 Comments   
Comment by Sarath Lakshman [ 04/Sep/14 ]
Added a unit test to prove this case
http://review.couchbase.org/#/c/41192

Here is the change for reverting corresponding commit
http://review.couchbase.org/#/c/41193/
Comment by Wayne Siu [ 04/Sep/14 ]
As discussed in the release meeting on 09.04.14, this is scheduled for 3.0.1.
Comment by Sarath Lakshman [ 16/Sep/14 ]
Merged




[MB-12127] Provide an optional daemon thread that performs flushing the WAL entries into the main index Created: 03/Sep/14  Updated: 03/Sep/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Chiyoung Seo Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
From the performance tests, we observed that a write throughput drops when a writer thread flushes WAL entries into the main index (i.e., HB+-Trie or B+-Tree). To mitigate this performance issue, we may need to explore the option of having a separate daemon thread that takes care of this WAL flush task.




[MB-12126] there is not manifest file on windows 3.0.1-1253 Created: 03/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 r2 64-bit

Attachments: PNG File ss 2014-09-03 at 12.05.41 PM.png    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 3.0.1-1253 on windows server 2008 r2 64-bit. There is not manifest file in directory c:\Program Files\Couchbase\Server\



 Comments   
Comment by Chris Hillery [ 03/Sep/14 ]
Also true for 3.0 RC2 build 1205.
Comment by Chris Hillery [ 03/Sep/14 ]
(Side note: While fixing this, log onto build slaves and delete stale "server-overlay/licenses.tgz" file so we stop shipping that)
Comment by Anil Kumar [ 17/Sep/14 ]
Ceej - Any update on this?
Comment by Chris Hillery [ 18/Sep/14 ]
No, not yet.




[MB-12125] rebalance swap regression of 39.3% compared with 2.5.1 Created: 03/Sep/14  Updated: 04/Sep/14

Status: Open
Project: Couchbase Server
Component/s: performance
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thomas Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 6.5/ 2xSSD:: leto

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/leto/556/artifact/172.23.100.29.zip
http://ci.sc.couchbase.com/job/leto/556/artifact/172.23.100.30.zip
http://ci.sc.couchbase.com/job/leto/556/artifact/172.23.100.31.zip
http://ci.sc.couchbase.com/job/leto/556/artifact/172.23.100.32.zip
Is this a Regression?: Yes

 Description   
perfrunner test reb_swap_100M_dgm_views,tests


 Comments   
Comment by Cihan Biyikoglu [ 04/Sep/14 ]
moving to 3.0.1 for triage.




[MB-12124] rebalance after failover, 3->4 nodes regression of 54.9% compared with 2.5.1 Created: 03/Sep/14  Updated: 04/Sep/14

Status: Open
Project: Couchbase Server
Component/s: performance
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thomas Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 6.5/2 x SSD (leto)

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/leto/557/artifact/172.23.100.29.zip
http://ci.sc.couchbase.com/job/leto/557/artifact/172.23.100.30.zip
http://ci.sc.couchbase.com/job/leto/557/artifact/172.23.100.31.zip
http://ci.sc.couchbase.com/job/leto/557/artifact/172.23.100.32.zip
Is this a Regression?: Yes

 Description   
perfrunner test:: reb_failover_100M_dgm_views.test
metric:: Rebalance after failover (min), 3 -> 4, 1 bucket x 100M x 2KB, 1 x 1 views, 10K ops/sec, 400 queries/sec
regression compared with 2.5.1:: time value increases to 286 from 185 (54.9%)




[MB-12123] INITCAP makes all letters uppercased Created: 03/Sep/14  Updated: 03/Sep/14  Resolved: 03/Sep/14

Status: Resolved
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public