[MB-11998] Working set is screwed up during rebalance with delta recovery (>95% cache miss rate) Created: 18/Aug/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Venu Uppalapati
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1169

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = RAID 10 HDD

Attachments: PNG File cache_miss_rate.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/ares-dev/45/artifact/
Is this a Regression?: No

 Description   
1 of 4 nodes is being re-added after failover.
500M x 2KB items, 10K mixed ops/sec.

Steps:
1. Failover one of nodes.
2. Add it back.
3. Enabled delta recovery.
4. Sleep 20 minutes.
5. Rebalance cluster.

 Comments   
Comment by Abhinav Dangeti [ 17/Sep/14 ]
Warming up during the delta recovery without an access log seems to be the cause for this.
Comment by Abhinav Dangeti [ 18/Sep/14 ]
Venu, my suspicion here is that there was no access log generated during the course of this test. Can you set the access log task time to zero, and its sleep interval to say 5-10 minutes and retest this scenario? I think you will need to be using the performance framework to be able to plot the cache miss ratio.




[MB-12210] xdcr related services sometimes log debug and error messages to non-xdcr logs (was: XDCR Error Logging Improvement) Created: 18/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Chris Malarky Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: logging
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When debugging an XDCR issue some very useful information was in the ns_server.error.log but not the ns_server.xdcr_errors.log

ns_server.xdcr_errors.log:

[xdcr:error,2014-09-18T7:02:12.674,ns_1@ec2-XX-XX-XX-XX.compute-1.amazonaws.com:<0.8020.1657>:xdc_vbucket_rep:init_replication_state:496]Error in fetching remot bucket, error: timeout,sleep for 30 secs before retry.
[xdcr:error,2014-09-18T7:02:12.674,ns_1@ec2-XX-XX-XX-XX.compute-1.amazonaws.com:<0.8021.1657>:xdc_vbucket_rep:init_replication_state:503]Error in fetching remot bucket, error: all_nodes_failed, msg: <<"Failed to grab remote bucket `wi_backup_bucket_` from any of known nodes">>sleep for 30 secs before retry

ns_server.error.log:

[ns_server:error,2014-09-18T7:02:12.674,ns_1@ec2-XX-XX-XX-XX.compute-1.amazonaws.com:<0.8022.1657>:remote_clusters_info: do_mk_json_get:1460]Request to http://Administrator:****@10.x.x.x:8091/pools failed:
{error,rest_error,
       <<"Error connect_timeout happened during REST call get to http://10.x.x.x:8091/pools.">>,
       {error,connect_timeout}}
[ns_server:error,2014-09-18T7:02:12.674,ns_1@ec2-xx-xx-xx-xx.compute-1.amazonaws.com:remote_clusters_info<0.20250.6>: remote_clusters_info:handle_info:435]Failed to grab remote bucket `wi_backup_bucket_`: {error,rest_error,
                                                   <<"Error connect_timeout happened during REST call get to http://10.x.x.x:8091/pools.">>,
                                                   {error,connect_timeout}}

Is there any way these messages could appear in with the xdcr_errors.log ?

 Comments   
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
Yes. Valid request. And some of that but not all has been addressed in 3.0.
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
Good candidate for 3.0.1 but not necessarily important enough. I.e. in light of ongoing rewrite.




[MB-12197] [Windows]: Bucket deletion failing with error 500 reason: unknown {"_":"Bucket deletion not yet complete, but will continue."} Created: 16/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Meenakshi Goel Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: windows, windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1299-rel

Attachments: Text File test.txt    
Triage: Triaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.hq.northscale.net/job/win_2008_x64--14_01--replica_read-P0/32/consoleFull
http://qa.hq.northscale.net/job/win_2008_x64--59--01--bucket_flush-P1/14/console
http://qa.hq.northscale.net/job/win_2008_x64--59_01--warmup-P1/6/consoleFull

Test to Reproduce:
newmemcapable.GetrTests.getr_test,nodes_init=4,GROUP=P0,expiration=60,wait_expiration=true,error=Not found for vbucket,descr=#simple getr replica_count=1 expiration=60 flags = 0 docs_ops=create cluster ops = None
flush.bucketflush.BucketFlushTests.bucketflush,items=20000,nodes_in=3,GROUP=P0

*Note that test doesn't fail but further do fails with "error 400 reason: unknown ["Prepare join failed. Node is already part of cluster."]" because cleanup wasn't successful.

Logs:
[rebalance:error,2014-09-15T9:36:01.989,ns_1@10.3.121.182:<0.6938.0>:ns_rebalancer:do_wait_buckets_shutdown:307]Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.3.121.182',
                                                         {'EXIT',
                                                          {old_buckets_shutdown_wait_failed,
                                                           ["default"]}}}]

[error_logger:error,2014-09-15T9:36:01.989,ns_1@10.3.121.182:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
  crasher:
    initial call: erlang:apply/2
    pid: <0.6938.0>
    registered_name: []
    exception exit: {buckets_shutdown_wait_failed,
                        [{'ns_1@10.3.121.182',
                             {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                     ["default"]}}}]}
      in function ns_rebalancer:do_wait_buckets_shutdown/1 (src/ns_rebalancer.erl, line 308)
      in call from ns_rebalancer:rebalance/5 (src/ns_rebalancer.erl, line 361)
    ancestors: [<0.811.0>,mb_master_sup,mb_master,ns_server_sup,
                  ns_server_cluster_sup,<0.57.0>]
    messages: []
    links: [<0.811.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 46422
    stack_size: 27
    reductions: 5472
  neighbours:

[user:info,2014-09-15T9:36:01.989,ns_1@10.3.121.182:<0.811.0>:ns_orchestrator:handle_info:483]Rebalance exited with reason {buckets_shutdown_wait_failed,
                              [{'ns_1@10.3.121.182',
                                {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                  ["default"]}}}]}
[ns_server:error,2014-09-15T9:36:09.645,ns_1@10.3.121.182:ns_memcached-default<0.4908.0>:ns_memcached:terminate:798]Failed to delete bucket "default": {error,{badmatch,{error,closed}}}

Uploading Logs

 Comments   
Comment by Meenakshi Goel [ 16/Sep/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12197/11dd43ca/10.3.121.182-9152014-938-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/e7795065/10.3.121.183-9152014-940-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/6442301b/10.3.121.102-9152014-942-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/10edf209/10.3.121.107-9152014-943-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/9f16f503/10.1.2.66-9152014-945-diag.zip
Comment by Ketaki Gangal [ 16/Sep/14 ]
Assigning to ns_server team for a first look.
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
For cases like this it's very useful to get sample of backtraces from memcached on bad node. Is it still running ?
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
Eh. It's windows....
Comment by Aleksey Kondratenko [ 17/Sep/14 ]
I've merged diagnostics commit (http://review.couchbase.org/41463). Please rerun, reproduce and give me new set of logs.
Comment by Meenakshi Goel [ 18/Sep/14 ]
Tested with 3.0.1-1307-rel, Please find logs below.
https://s3.amazonaws.com/bugdb/jira/MB-12197/c2191900/10.3.121.182-9172014-2245-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/28bc4a83/10.3.121.183-9172014-2246-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/8f1efbe5/10.3.121.102-9172014-2248-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/91a89d6a/10.3.121.107-9172014-2249-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/2d272074/10.1.2.66-9172014-2251-diag.zip
Comment by Aleksey Kondratenko [ 18/Sep/14 ]
BTW I am indeed quite interested if this is specific to windows or not.




[MB-12211] Investigate noop not closing connection in case where a dead connection is still attached to a failed node Created: 18/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
See MB-12158 for information on how to reproduce this issue and why it needs to be looked at on the ep-engine side.




[MB-12209] [windows] failed to offline upgrade from 2.5.x to 3.0.1-1299 Created: 18/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server 2008 r2 64-bit

Attachments: Zip Archive 12.11.10.145-9182014-1010-diag.zip     Zip Archive 12.11.10.145-9182014-922-diag.zip    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 2.5.1 on one node
Create default bucket
Load 1000 items to bucket
Offline upgrade from 2.5.1 to 3.0.1-1299
After upgrade, node reset to initial setup


 Comments   
Comment by Thuan Nguyen [ 18/Sep/14 ]
I got the same issue when offline upgrade from 2.5.0 to 3.0.1-1299. Updated the title
Comment by Thuan Nguyen [ 18/Sep/14 ]
cbcollectinfo of node failed to offline upgrade from 2.5.0 to 3.0.1-1299
Comment by Bin Cui [ 18/Sep/14 ]
http://review.couchbase.org/#/c/41473/




[MB-6972] distribute couchbase-server through yum and ubuntu package repositories Created: 19/Oct/12  Updated: 18/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.1.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Anil Kumar Assignee: Phil Labee
Resolution: Unresolved Votes: 3
Labels: devX
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-8693 [Doc] distribute couchbase-server thr... Reopened
blocks MB-7821 yum install couchbase-server from cou... Resolved
Duplicate
duplicates MB-2299 Create signed RPM's Resolved
is duplicated by MB-9409 repository for deb packages (debian&u... Resolved
Flagged:
Release Note

 Description   
this helps us in handling dependencies that are needed for couchbase server
sdk team has already implemented this for various sdk packages.

we might have to make some changes to our packaging metadata to work with this schema

 Comments   
Comment by Steve Yen [ 26/Nov/12 ]
to 2.0.2 per bug-scrub

first step is do the repositories?
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Farshid Ghods (Inactive) [ 19/Dec/12 ]
Phil,
please sync up with Farshid and get instructions that Sergey and Pavel sent
Comment by Farshid Ghods (Inactive) [ 28/Jan/13 ]
we should resolve this task once 2.0.1 is released .
Comment by Dipti Borkar [ 29/Jan/13 ]
Have we figured out the upgrade process moving forward. for example from 2.0.1 to 2.0.2 or 2.0.1 to 2.1 ?
Comment by Jin Lim [ 04/Feb/13 ]
Please ensure that we also confirm/validate the upgrade process moving from 2.0.1 to 2.0.2. Thanks.
Comment by Phil Labee [ 06/Feb/13 ]
Now have DEB repo working, but another issue has come up: We need to distribute the public key so that users can install the key before running apt-get.

wiki page has been updated.
Comment by kzeller [ 14/Feb/13 ]
Added to 2.0.1 RN as:

Fix:

We now provide Couchbase Server as a yum and Debian package
repositories.
Comment by Matt Ingenthron [ 09/Apr/13 ]
What are the public URLs for these repositories? This was mentioned in the release notes here:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-server-rn_2-0-0l.html
Comment by Matt Ingenthron [ 09/Apr/13 ]
Reopening, since this isn't documented that I can find. Apologies if I'm just missing it.
Comment by Dipti Borkar [ 23/Apr/13 ]
Anil, can you work with Phil to see what are the next steps here?
Comment by Anil Kumar [ 24/Apr/13 ]
Yes I'll be having discussion with Phil and will update here with details.
Comment by Tim Ray [ 28/Apr/13 ]
could we either remove the note about yum/deb repo's in the release notes or get those repo locations / sample files / keys added to public pages? The only links that seem that they 'might' contain the info point to internal pages I don't have access to.
Comment by Anil Kumar [ 14/May/13 ]
thanks Tim, we have removed it from release notes. we will add instructions about yum/deb repo's locations/files/keys to documentation once its available. thanks!
Comment by kzeller [ 14/May/13 ]
Removing duplicate ticket:

http://www.couchbase.com/issues/browse/MB-7860
Comment by h0nIg [ 24/Oct/13 ]
any update? maybe i created a duplicate issue: http://www.couchbase.com/issues/browse/MB-9409 but it seems that the repositories are outdated on http://hub.internal.couchbase.com/confluence/display/CR/How+to+Use+a+Linux+Repo+--+debian
Comment by Sriram Melkote [ 22/Apr/14 ]
I tried to install on Debian today. It failed badly. One .deb package didn't match the libc version of stable. The other didn't match the openssl version. Changing libc or openssl is simply not an option for someone using Debian stable because it messes with the base OS too deeply. So as of 4/23/14, we don't have support for Debian.
Comment by Sriram Melkote [ 22/Apr/14 ]
Anil, we have accumulated a lot of input in this bug. I don't think this will realistically go anywhere for 3.0 unless we define specific goals and some considered platform support matrix expansion. Can you please create a goal for 3.0 more precisely?
Comment by Matt Ingenthron [ 22/Apr/14 ]
+1 on Siri's comments. Conversations I had with both Ubuntu (who recommend their PPAs) and Red Hat experts (who recommend setting up a repo or getting into EPEL or the like) indicated that's the best way to ensure coverage of all OSs. Binary packages built on one OS and deployed on another are risky, run into dependency issues.
Comment by Anil Kumar [ 28/Apr/14 ]
This ticket specially for distributing DEB and RPM repositories through YUM and APT repo. We have another ticket for supporting Debian platform MB-10960.
Comment by Anil Kumar [ 23/Jun/14 ]
Assigning ticket to Tony for verification.
Comment by Phil Labee [ 21/Jul/14 ]
Need to do before closing:

[ ] capture keys and process used for build that is currently posted (3.0.0-628), update tools and keys of record in build repo and wiki page
[ ] distribute 2.5.1 and 3.0.0-beta1 builds using same process, testing update capability
[ ] test update from 2.0.0 to 2.5.1 to 3.0.0
Comment by Phil Labee [ 21/Jul/14 ]
re-opening to assign to sprint to prepare the distribution repos for testing
Comment by Wayne Siu [ 30/Jul/14 ]
Phil,
has build 3.0.0-973 be updated in the repos for beta testing?
Comment by Wayne Siu [ 29/Aug/14 ]
Phil,
Please refresh it with build 3.0.0-1205. Thanks.
Comment by Phil Labee [ 04/Sep/14 ]
Due to loss of private keys used to post 3.0.0-628, created new key pairs. Upgrade testing was never done, so starting with 2.5.1 release version (2.5.1-1100).

upload and test using location http://packages.couchbase.com/linux-repos/TEST/:

  [X] ubuntu-12.04 x86_64
  [X] ubuntu-10.04 x86_64

  [X] centos-6-x86_64
  [X] centos-5-x86_64
Comment by Anil Kumar [ 04/Sep/14 ]
Phil / Wayne - Not sure whats happening here please clarify.
Comment by Wayne Siu [ 16/Sep/14 ]
Please refresh with the build 3.0.0-1209.
Comment by Phil Labee [ 17/Sep/14 ]
upgrade to 3.0.0-1209:

  [ ] ubuntu-12.04 x86_64
  [ ] ubuntu-10.04 x86_64

  [X] centos-6-x86_64
  [ ] centos-5-x86_64

  [ ] debian-7-x86_64




[MB-12185] update to "couchbase" from "membase" in gerrit mirroring and manifests Created: 14/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.0, 2.5.1, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Blocker
Reporter: Matt Ingenthron Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-8297 Some key projects are still hosted at... Open

 Description   
One of the key components of Couchbase is still only at github.com/membase and not at github.com/couchbase. I think it's okay to mirror to both locations (not that there's an advantage), but for sure it should be at couchbase and the manifest for Couchbase Server releases should be pointing to Couchbase.

I believe the steps here are as follows:
- Set up a github.com/couchbase/memcached project (I've done that)
- Update gerrit's commit hook to update that repository
- Change the manifests to start using that repository

Assigning this to build as a component, as gerrit is handled by the build team. Then I'm guessing it'll need to be handed over to Trond or another developer to do the manifest change once gerrit is up to date.

Since memcached is slow changing now, perhaps the third item can be done earlier.

 Comments   
Comment by Chris Hillery [ 15/Sep/14 ]
Actually manifests are owned by build team too so I will do both parts.

However, the manifest for the hopefully-final release candidate already exists, and I'm a teensy bit wary about changing it after the fact. The manifest change may need to wait for 3.0.1.
Comment by Matt Ingenthron [ 15/Sep/14 ]
I'll leave it to you to work out how to fix it, but I'd just point out that manifest files are mutable.
Comment by Chris Hillery [ 15/Sep/14 ]
The manifest we build from is mutable. The historical manifests recording what we have already built really shouldn't be.
Comment by Matt Ingenthron [ 15/Sep/14 ]
True, but they are. :) That was half me calling back to our discussion about tagging and mutability of things in the Mountain View office. I'm sure you remember that late night conversation.

If you can help here Ceej, that'd be great. I'm just trying to make sure we have the cleanest project possible out there on the web. One wart less will bring me to 999,999 or so. :)
Comment by Trond Norbye [ 15/Sep/14 ]
Just a FYI, we've been ramping up the changes to memcached, so it's no longer a slow moving component ;-)
Comment by Matt Ingenthron [ 15/Sep/14 ]
Slow moving w.r.t. 3.0.0 though, right? That means the current github.com/couchbase/memcached probably has the commit planned to be released, so it's low risk to update github.com/couchbase/manifest with the couchbase repo instead of membase.

That's all I meant. :)
Comment by Trond Norbye [ 15/Sep/14 ]
_all_ components should be slow moving with respect to 3.0.0 ;)
Comment by Chris Hillery [ 16/Sep/14 ]
Matt, it appears that couchbase/memcached is a *fork* of membase/memcached, which is probably undesirable. We can actively rename the membase/memcached project to couchbase/memcached, and github will automatically forward requests from the old name to the new so it is seamless. It also means that we don't have to worry about migrating any commits, etc.

Does anything refer to couchbase/memcached already? Could we delete that one outright and then rename membase/memcached instead?
Comment by Matt Ingenthron [ 16/Sep/14 ]
Ah, that would be my fault. I propose deleting the couchbase/memcached and then transferring ownership from membase/memcached to couchbase/memcached. I think that's what you meant by "actively rename", right? Sounds like a great plan.

I think that's all in your hands Ceej, but I'd be glad to help if needed.

I still think in the interest of reducing warts, it'd be good to fix the manifest.
Comment by Chris Hillery [ 16/Sep/14 ]
I will do that (rename the repo), just please confirm explicitly that temporarily deleting couchbase/memcached won't cause the world to end. :)
Comment by Matt Ingenthron [ 16/Sep/14 ]
It won't since it didn't exist until this last Sunday when I created this ticket. If something world-ending happens as a result, I'll call it a bug to have depended on it. ;)
Comment by Chris Hillery [ 18/Sep/14 ]
I deleted couchbase/memcached and then transferred ownership of membase/memcached to couchbase. The original membase/memcached repository had a number of collaborators, most of which I think were historical. For now, couchbase/memcached only has "Owners" and "Robots" listed as collaborators, which is generally the desired configuration.

http://review.couchbase.org/#/c/41470/ proposes changes to the active manifests. I see no problem with committing that.

As for the historical manifests, there are two:

1. Sooner or later we will add a "released/3.0.0.xml" manifest to the couchbase/manifest repository, representing the exact SHAs which were built. I think it's probably OK to retroactively change the remote on that manifest since the two repositories are aliases for each other. This will affect any 3.0.0 hotfixes which are built, etc.

2. However, all of the already-built 3.0 packages (.deb / .rpm / .zip files) have embedded in them the manifest which was used to build them. Those, unfortunately, cannot be changed at this time. Doing so would require re-packaging the deliverables which have already undergone QE validation. While it is technically possible to do so, it would be a great deal of manual work, and IMHO a non-trivial and unnecessary risk. The only safe solution would be to trigger a new build, but in that case I would argue we would need to re-validate the deliverables, which I'm sure is a non-starter for PM. I'm afraid this particular sub-wart will need to wait for 3.0.1 to be fully addressed.
Comment by Matt Ingenthron [ 18/Sep/14 ]
Excellent, thanks Ceej. I think this is a great improvement-- espeically if 3.0.0's release manifest no longer references membase.

I'll leave it to the build team to manage, but I might suggest that gerrit and various other things pointing to membase should slowly change as well, in case someone decides someday to cancel the membase organization subscription to github.




[MB-4593] Windows Installer hangs on "Computing Space Requirements" Created: 27/Dec/11  Updated: 18/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.0-developer-preview-3, 2.0-developer-preview-4
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Bin Cui Assignee: Don Pinto
Resolution: Unresolved Votes: 3
Labels: windows, windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7 Ultimate 64. Sony Vaio, i3 with 4GB RAM and 200 GB of 500 GB free. Also on a Sony Vaio, Windows 7 Ultimate 64, i7, 6 GB RAM and a 750GB drive with about 600 GB free.

Attachments: PNG File couchbase-installer.png     PNG File image001.png     PNG File ss 2014-08-28 at 4.16.09 PM.png    
Triage: Triaged

 Description   
When installing the Community Server 2.0 DP3 on Windows, the installer hangs on the "Computing space requirements screen." There is no additional feedback from the installer. After 90-120 minutes or so, it does move forward and complete. The same issue was reported on Google Groups a few months back - http://groups.google.com/group/couchbase/browse_thread/thread/37dbba592a9c150b/f5e6d80880f7afc8?lnk=gst&q=msi.

Executable: couchbase-server-community_x86_64_2.0.0-dev-preview-3.setup.exe

WORKAROUND IN 3.0 - Create a registry key HKLM\SOFTWARE\Couchbase, name=SkipVcRuntime, type=DWORD, value=1 to skip installing VC redistributable installation which is causing this issue. If VC redistributable is necessary, it must be installed manually if the registry key is set to skip automatic install of it.


 Comments   
Comment by Filip Stas [ 23/Feb/12 ]
Is there any solution for this? I'm experiencing the same problem. Running the unpacked msi does not seem to work because the Installshield setup has been configured to require to install through the exe.

Comment by Farshid Ghods (Inactive) [ 22/Mar/12 ]
from Bin:

Looks like it is related to installshield engine. Maybe installshield tries to access system registry and it is locked by other process. The suggestion is to shut down other running programs and try again if such problem pops up.
Comment by Farshid Ghods (Inactive) [ 22/Mar/12 ]
we were unable to reproduce this on windows 2008 64-bit

the bug mentions this happened on windows 7 64-bit which is not a supported platform but that should not make any difference
Comment by Farshid Ghods (Inactive) [ 23/Mar/12 ]
From Bin:

Windows 7 is my dev environment. And I have no problem to install and test it. From your description, I cannot tell whether it is failed during the installation or after installation finishes but couchcbase server cannot start.
 
If it is due to installshield failure, you can generate the log file for debugging as:
setup.exe /debuglog"C:\PathToLog\setupexe.log"
 
If Couchbase server fails to start, the most possible reason is due to missing or incompatible Microsoft runtime library. You can manually service_start.bat under bin directory and check what is going on. And you can run cbbrowse_log.bat to generate log file for further debugging.
Comment by John Zablocki (Inactive) [ 23/Mar/12 ]
This is an installation only problem. There's not much more to it other than the installer hangs on the screen (see attachment).

However, after a failed install, I did get it to work by:

a) deleting C:\Program Files\Couchbase\*

b) deleting all registry keys with Couchbase Server left over from the failed install

c) rebooting

Next time I see this problem, I'll run it again with the /debuglog

I think the problem might be that a previous install of DP3 or DP4 (nightly build) failed and left some bits in place somewhere.
Comment by Steve Yen [ 05/Apr/12 ]
from Perry...
Comment by Thuan Nguyen [ 05/Apr/12 ]
I can not repo this bug. I test on Windows 7 Professional 64 bit and Windows Server 2008 64 bit.
Here are steps:
- Install couchbase server 2.0.0r-388 (dp3)
- Open web browser and go to initial setup in web console.
- Uninstall couchbase server 2.0.0r-388
- Install couchbase server 2.0.0dp4r-722
- Open web browser and go to initial setup in web console.
Install and uninstall couchbase server go smoothly without any problem.
Comment by Bin Cui [ 25/Apr/12 ]
Maybe we need to get the installer verbose log file to get some clues.

setup.exe /verbose"c:\temp\logfile.txt"
Comment by John Zablocki (Inactive) [ 06/Jul/12 ]
Not sure if this is useful or not, but without fail, every time I encounter this problem, simply shutting down apps (usually Chrome for some reason) causes the hanging to stop. Right after closing Chrome, the C++ redistributable dialog pops open and installation completes.
Comment by Matt Ingenthron [ 10/Jul/12 ]
Workarounds/troubleshooting for this issue:


On installshield's website, there are similar problems reported for installshield. There are several possible reasons behind it:

1. The installation of the Microsoft C++ redistributable is blocked by some other running program, sometimes Chrome.
2. There are some remote network drives that are mapped to local system. Installshield may not have enough network privileges to access them.
3. Couchbase server was installed on the machine before and it was not totally uninstalled and/or removed. Installshield tried to recover from those old images.

To determine where to go next, run setup with debugging mode enabled:
setup.exe /debuglog"C:\temp\setupexe.log"

The contents of the log will tell you where it's getting stuck.
Comment by Bin Cui [ 30/Jul/12 ]
Matt's explanation should be included in document and Q&A website. I reproduced the hanging problem during installation if Chrome browser is running.
Comment by Farshid Ghods (Inactive) [ 30/Jul/12 ]
so does that mean the installer should wait until chrome and other browsers are terminated before proceeding ?

i see this as a very common use case with many installers that they ask the user to stop those applications and if user does not follow the instructions the set up process does not continue until these conditions are met.
Comment by Dipti Borkar [ 31/Jul/12 ]
Is there no way to fix this? At the least we need to provide an error or guidance that chrome needs to be quit before continuing. Is chrome the only one we have seen causing this problem?
Comment by Steve Yen [ 13/Sep/12 ]
http://review.couchbase.org/#/c/20552/
Comment by Steve Yen [ 13/Sep/12 ]
See CBD-593
Comment by Øyvind Størkersen [ 17/Dec/12 ]
Same bug when installing 2.0.0 (build-1976) on Windows 7. Stopping Chrome did not help, but killing the process "Logitech ScrollApp" (KhalScroll.exe) did..
Comment by Joseph Lam [ 13/Sep/13 ]
It's happening to me when installing 2.1.1 on Windows 7. What is this step for and it is really necessary? I see that it happens after the files have been copied to the installation folder. No entirely sure what it's computing space requirements for.
Comment by MikeOliverAZ [ 16/Nov/13 ]
Same problem on 2.2.0x86_64. I have tried everything, closing down chrome and torch from Task Manager to ensure no other apps are competing. Tried removing registry entries but so many, my time please. As is noted above this doesn't seem to be preventing writing the files under Program Files so what's it doing? So I cannot install, it now complains it cannot upgrade and run the installer again.

BS....giving up and going to MongoDB....it installs no sueat.

Comment by Sriram Melkote [ 18/Nov/13 ]
Reopening. Testing on VMs is a problem because they are all clones. We miss many problems like these.
Comment by Sriram Melkote [ 18/Nov/13 ]
Please don't close this bug until we have clear understanding of:

(a) What is the Runtime Library that we're trying to install that conflicts with all these other apps
(b) Why we need it
(c) A prioritized task to someone to remove that dependency on 3.0 release requirements

Until we have these, please do not close the bug.

We should not do any fixes on the lines of checking for known apps that conflict etc, as that is treating the symptom and not fixing the cause.
Comment by Bin Cui [ 18/Nov/13 ]
We install window runtime library because erlang runtime libraries depend on it. Not any runtime library, but the one that comes with erlang distribution package. Without it or with incompatible versions, erl.exe won't run.

In stead of checking any particular applications, the current solution is:
Run a erlang test script. If it runs correctly, no runtime library installed. Otherwise, installer has to install the runtime library.

Please see CBD-593.

Comment by Sriram Melkote [ 18/Nov/13 ]
My suggestion is that let us not attempt to install MSVCRT ourselves.

Let us check the library we need is present or not prior to starting the install (via appropriate registry keys).

If it is absent, let us direct the user to download and install it and exit.
Comment by Bin Cui [ 18/Nov/13 ]
The approach is not totally right. Even if the msvcrt exists, we still need to install it. Here the key is the absolute same msvrt package that comes with erlang distribution. We had problems before that with the same version, but different build of msvcrt installed, erlang won't run.

One possible solution is to ask user to download the msvcrt library from our website and make it a prerequisite for installing couchbase server.
Comment by Sriram Melkote [ 18/Nov/13 ]
OK. It looks like MS distributes some versions of VC runtime with the OS itself. I doubt that Erlang needs anything newer.

So let us rebuild Erlang and have it link to the OS supplied version of MSVCRT (i.e., msvcr70.dll) in Couchbase 3.0 onwards

In the meanwhile, let us point the user to the vcredist we ship in Couchbase 2.x versions and ask them to install it from there.
Comment by Steve Yen [ 23/Dec/13 ]
Saw this in the email inboxes...

From: Tal V
Date: December 22, 2013 at 1:19:36 AM PST
Subject: Installing Couchbase on Windows 7

Hi CouchBase support,
I would like to get your assist on an issue I’m having. I have a windows 7 machine on which I tried to install Couchbase, the installation is stuck on the “Computing space requirements”.
I tried several things without success:

1. 1. I tried to download a new installation package.

2. 2. I deleted all records of the software from the Registry.

3. 3. I deleted the folder that was created under C:\Program Files\Couchbase

4. 4. I restart the computer.

5. 5. Opened only the installation package.

6. 6. Re-install it again.
And again it was stuck on the same step.
What is the solution for it?

Thank you very much,


--
Tal V
Comment by Steve Yen [ 23/Dec/13 ]
Hi Bin,
Not knowing much about installshield here, but one idea - are there ways of forcibly, perhaps optionally, skipping the computing space requirements step? Some environment variable flag, perhaps?
Thanks,
Steve

Comment by Bin Cui [ 23/Dec/13 ]
This "Computing space requirements" is quite misleading. It happens at the post install step while GUI still shows that message. Within the step, we run the erlang test script and fails and the installer runs "vcredist.exe" for microsoft runtime library which gets stuck.

For the time being, the most reliable way is not to run this vcredist.exe from installer. Instead, we should provide a link in our download web site.

1. During installation, if we fails to run the erlang test script, we can pop up a warning dialog and ask customers to download and run it after installation.
 
Comment by Bin Cui [ 23/Dec/13 ]
To work around the problem, we can instruct the customer to download the vcredist.exe and run it manually before set up couchbase server. If running environment is set up correctly, installer will bypass that step.
Comment by Bin Cui [ 30/Dec/13 ]
Use windows registry key to install/skip the vcredist.exe step:

On 32bit windows, Installer will check HKEY_LOCAL_MACHINE\SOFTWARE\Couchbase\SkipVcRuntime
On 64bit windows, Installer will check HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Couchbase\SkipVcRuntime,
where SkipVcRuntime is a DWORD (32-bit) value.

When SkipVcRuntime is set to 1, installer will skip the step to install vcredist.exe. Otherwise, installer will follow the same logic as before.
vcredist_x86.exe can be found in the root directory of couchbase server. It can be run as:
c:\<couchbase_root>\vcredist_x86.exe

http://review.couchbase.org/#/c/31501/
Comment by Bin Cui [ 02/Jan/14 ]
Check into branch 2.5 http://review.couchbase.org/#/c/31558/
Comment by Iryna Mironava [ 22/Jan/14 ]
tested with Win 7 and Win Server 2008
I am unable to reproduce this issue(build 2.0.0-1976, dp3 is no longer available)
Installed/uninstalled couchbase several times
Comment by Sriram Melkote [ 22/Jan/14 ]
Unfortunately, for this problem, if it did not reproduce, we can't say it is fixed. We have to find a machine where it reproduces and then verify a fix.

Anyway, no change made actually addresses the underlying problem (the registry key just gives a way to workaround it when it happens), so reopening the bug and targeting for 3.0
Comment by Sriram Melkote [ 23/Jan/14 ]
Bin - I just noticed that the Erlang installer itself (when downloaded from their website) installs VC redistributable in non-silent mode. The Microsoft runtime installer dialog pop us up, indicates it will install VC redistributable and then complete. Why do we run it in silent mode (and hence assume liability of it running properly)? Why do we not run the MSI in interactive mode like ESL Erlang installer itself does?
Comment by Wayne Siu [ 05/Feb/14 ]
If we could get the information on the exact software version, it could be helpful.
From registry, Computer\HKLM\Software\Microsoft\WindowsNT\CurrentVersion
Comment by Wayne Siu [ 12/Feb/14 ]
Bin, looks like the erl.ini was locked when this issue happened.
Comment by Pavel Paulau [ 19/Feb/14 ]
Just happened to me in 2.2.0-837.
Comment by Anil Kumar [ 18/Mar/14 ]
Triaged by Don and Anil as per Windows Developer plan.
Comment by Bin Cui [ 08/Apr/14 ]
http://review.couchbase.org/#/c/35463/
Comment by Chris Hillery [ 13/May/14 ]
I'm new here, but it seems to me that vcredist_x64.exe does exactly the same thing as the corresponding MS-provided merge module for MSVC2013. If that's true, we should be able to just include that merge module in our project, and not need to fork out to install things. In fact, as of a few weeks ago, the 3.0 server installers are doing just that.

http://msdn.microsoft.com/en-us/library/dn501987.aspx

Is my understanding incomplete in some way?
Comment by Chris Hillery [ 14/May/14 ]
I can confirm that the most recent installers do install msvcr120.dll and msvcp120.dll in apparently the correct places, and the server can start with them. I *believe* this means that we no longer need to fork out vcredist_x64.exe, or have any of the InstallShield tricks to detect whether it is needed and/or skip installing it, etc. I'm leaving this bug open to both verify that the current merge module-based solution works, and to track removal of the unwanted code.
Comment by Sriram Melkote [ 16/May/14 ]
I've also verified that 3.0 build installed VCRT (msvcp100) is sufficient for Erlang R16.
Comment by Bin Cui [ 15/Sep/14 ]
Recently I happen to reproduce this problem on my own laptop. Use setup.exe /verbose"c:\temp\verbose.log", i generated a log file with more verbose debugging information. At the end the file, it looks something like :

MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: OVERVIEW.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_admin\overview\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: BUCKETS.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_admin\buckets\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: MN_DIALOGS.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_dialogs\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: ABOUT.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_dialogs\about\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: ALLUSERSPROFILE , Object: Q:\
MSI (c) (C4:C0) [10:51:36:274]: PROPERTY CHANGE: Adding INSTALLLEVEL property. Its value is '1'.

It means that installer tried to populate some property values for alluser profile after it copied all data to install location even though it shows this notorious "Computing space requirements" message.

From every installation, installer will use user temp directory to populate installer related data. After I delete or rename temp data under
c:\Users\<logonuser>\AppData\Temp, I reboot the machine. I solve the problem. at least for my laptop.

Conclusion:

1. After installed copied files, it needs to set alluser profiles. This action is synchronous and it waits and checks exit code. And certainly it will hangs on if this action never returns.

2. This is an issue related to setup environment, i.e. caused by other running applications, etc.

Suggestion:

1. Stop any other browers and applications when you install couchbase.
2. Kill the installation process and uninstall the failed setup.
3. Delete/rename the temp location under c:\Users\<logonuser>\AppData\Temp
4. Reboot and try again.

Comment by Bin Cui [ 17/Sep/14 ]
Turns out, it is really about the installation environment, not about a particular installation step.

Suggest to document the work around method.
Comment by Don Pinto [ 17/Sep/14 ]
Bin, some installers kill conflicting processes before installation starts so that it can complete. Why can't we do this?

(Maybe using something like this - http://stackoverflow.com/questions/251218/how-to-stop-a-running-process-during-an-msi-based-un-install)

Thanks,
Don




[MB-12208] Security Risk: XDCR logs emit entire Document contents in a error situations Created: 17/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.2.0, 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Major
Reporter: Gokul Krishnan Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Per recent discussions with CFO and contract teams, we need to ensure that Customer's Data (Document Keys and Values) aren't emitted in the logs. This poses a security risk and we need default logging throttle levels that don't emit document data in readable format.

Support team have noticed this in the 2.2 version, verifying if this behavior still exists in 2.5.1.

Example posted in a private comment below

 Comments   
Comment by Patrick Varley [ 18/Sep/14 ]
At the same time we need the ability to increase the log level on the fly and include this information, when we hit a wall and need that extra information.

Summarise:

default setting: Do not expose customer data.

Increase logging on the fly that might include customer data. Which the support team will explain to the end-user.




[MB-12126] there is not manifest file on windows 3.0.1-1253 Created: 03/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 r2 64-bit

Attachments: PNG File ss 2014-09-03 at 12.05.41 PM.png    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 3.0.1-1253 on windows server 2008 r2 64-bit. There is not manifest file in directory c:\Program Files\Couchbase\Server\



 Comments   
Comment by Chris Hillery [ 03/Sep/14 ]
Also true for 3.0 RC2 build 1205.
Comment by Chris Hillery [ 03/Sep/14 ]
(Side note: While fixing this, log onto build slaves and delete stale "server-overlay/licenses.tgz" file so we stop shipping that)
Comment by Anil Kumar [ 17/Sep/14 ]
Ceej - Any update on this?
Comment by Chris Hillery [ 18/Sep/14 ]
No, not yet.




[MB-9897] Implement upr cursor dropping Created: 13/Jan/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Major
Reporter: Mike Wiederhold Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Chiyoung Seo [ 17/Sep/14 ]
This requires some significant changes in DCP and checkpointing in ep-engine. Moving this to post 3.0.1




[MB-12201] Hotfix Rollup Release Created: 16/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Cihan Biyikoglu Assignee: Raju Suravarjjala
Resolution: Unresolved Votes: 0
Labels: hotfix
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
representing the rollup hotfix for 2.5.1 that includes all hotfixes (without the V8 change) release to date (sept 2014)

 Comments   
Comment by Dipti Borkar [ 16/Sep/14 ]
is this rollup still 2.5.1? it will create lots of confusion. can we tag it 2.5.2? or does that lead to another round of testing? there are way too many hot fixes so really need a new . release.
Comment by Cihan Biyikoglu [ 17/Sep/14 ]
Hi Dipti, to improve the hotfix management, we are changing the way we'll do hotfixes. the rollup will bring in more hotfixes together and ensure we provide customers the all fixes we know about. if we fixed an issue already at the time you requested your hotfix, there is not reason why we should risk exposing you to known+fixed issues in the version you are using. side effects of this should also improve life for support.
-cihan




[MB-12084] Create 3.0.0 chef-based rightscale template for EE and CE Created: 27/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cloud
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Thuan Nguyen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Need this before 3.0 GA




[MB-12083] Create 3.0.0 legacy rightscale templates for Enterprise and Community Edition (non-chef) Created: 27/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cloud
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Anil Kumar Assignee: Thuan Nguyen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We need this before 3.0 GA




[MB-10789] Bloom Filter based optimization to reduce the I/O overhead Created: 07/Apr/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Chiyoung Seo Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
A bloom filter can be considered as an optimization to reduce the disk IO overhead. Basically, we maintain a separate bloom filter per vbucket database file, and rebuild the bloom filter (e.g., increasing the filter size to reduce a false positive error rate) as part of vbucket database compaction.

As we know the number of items in a vBucket database file, we can determine the number of hash functions and the size of the bloom filter to achieve the desired false positive error rate. Note that Murmur hash has been widely used in Hadoop and Cassandra because it is much faster than MD5 and Jenkins. It has been widely known that fewer than 10 bits per element are required for a 1% false positive probability, independent of the number of elements in the set.

We expect that having a bloom filter will enhance both XDCR and full-ejection cache management performance at the expense of the filter's memory overhead.



 Comments   
Comment by Abhinav Dangeti [ 17/Sep/14 ]
Design Document:
https://docs.google.com/document/d/13ryBkiLltJDry1WZV3UHttFhYkwwWsmyE1TJ_6tKddQ




[MB-11999] Resident ratio of active items drops from 3% to 0.06% during rebalance with delta recovery Created: 18/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.0-1169

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = RAID 10 HDD

Attachments: PNG File vb_active_resident_items_ratio.png     PNG File vb_replica_resident_items_ratio.png    
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/ares-dev/45/artifact/
Is this a Regression?: No

 Description   
1 of 4 nodes is being re-added after failover.
500M x 2KB items, 10K mixed ops/sec.

Steps:
1. Failover one of nodes.
2. Add it back.
3. Enabled delta recovery.
4. Sleep 20 minutes.
5. Rebalance cluster.

Most importantly it happens due to excessive memory usage.

 Comments   
Comment by Abhinav Dangeti [ 17/Sep/14 ]
http://review.couchbase.org/#/c/41468/




[MB-12202] UI shows a cbrestore as XDCR ops Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Ian McCloy Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Linux 3.13.0-30-generic
[info] OS Version : Ubuntu 14.04 LTS
[info] CB Version : 2.5.1-1083-rel-enterprise

Attachments: PNG File cbrestoreXDCRops.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I noticed while doing a cbrestore of a backup on a cluster that doesn't have any XDCR configured that the stats in the UI showed ongoing ops for XDCR. (screenshot attached)

the stats code at
http://src.couchbase.org/source/xref/2.5.1/ns_server/src/stats_collector.erl#334 is including all set with meta as XDCR ops.

 Comments   
Comment by Aleksey Kondratenko [ 17/Sep/14 ]
That's the way it is. We have no way to distinguish sources of set-with-metas.




[MB-12189] (misunderstanding) XDCR REST API "max-concurrency" only works for 1 of 3 documented end-points. Created: 15/Sep/14  Updated: 17/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server, RESTful-APIs
Affects Version/s: 2.5.1, 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Jim Walker Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: supportability, xdcr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Couchbase Server 2.5.1
RHEL 6.4
VM (VirtualBox0
1 node "cluster"

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
This defect relates to the following REST APIs:

* xdcrMaxConcurrentReps (default 32) http://localhost:8091/internalSettings/
* maxConcurrentReps (default 32) http://localhost:8091/settings/replications/
* maxConcurrentReps (default 32) http://localhost:8091/settings/replications/ <replication_id>

The documentation suggests these all do the same thing, but with the scope of change being different.

<docs>
/settings/replications/ — global settings applied to all replications for a cluster
settings/replications/<replication_id> — settings for specific replication for a bucket
/internalSettings - settings applied to all replications for a cluster. Endpoint exists in Couchbase 2.0 and onward.
</docs>

This defect is because only "settings/replications/<replication_id>" has any effect. The other REST endpoints have no effect.

Out of these APIs I can confirm that changing "/settings/replications/<replication_id>" has an effect. The XDCR code shows that the concurrent reps setting feeds into the concurreny throttle as the number of available tokens. I use xdcr log files where we print the concurrency throttle token data to observe that the setting has an effect.

For example, a cluster in the default configuration has a total tokens of 32. We can grep to see this.

[root@localhost logs]# grep "is done normally, total tokens:" xdcr.*
2014-09-15T13:09:03.886,ns_1@127.0.0.1:<0.32370.0>:concurrency_throttle:clean_concurr_throttle_state:275]rep <0.33.1> to node "192.168.69.102:8092" is done normally, total tokens: 32, available tokens: 32,(active reps: 0, waiting reps: 0)

Now changing the setting to 42 the log file shows the change take affect.

curl -u Administrator:password http://localhost:8091/settings/replications/01d38792865ba2d624edb4b2ad2bf07f%2fdefault%2fdefault -d maxConcurrentReps=42

[root@localhost logs]# grep "is done normally, total tokens:" xdcr.*
dcr.1:[xdcr:debug,2014-09-15T13:17:41.112,ns_1@127.0.0.1:<0.32370.0>:concurrency_throttle:clean_concurr_throttle_state:275]rep <0.2321.1> to node "192.168.69.102:8092" is done normally, total tokens: 42, available tokens: 42,(active reps: 0, waiting reps: 0)

Since this defect is that both of the other two REST end-points don't appear to have any affect here's an example changing "settings/replication". This example was on a clean cluster, i.e. no other settings have been changed. Only creating bucket and replication + client writes has been performed.

root@localhost logs]# curl -u Administrator:password http://localhost:8091/settings/replications/ -d maxConcurrentReps=48
{"maxConcurrentReps":48,"checkpointInterval":1800,"docBatchSizeKb":2048,"failureRestartInterval":30,"workerBatchSize":500,"connectionTimeout":180,"workerProcesses":4,"httpConnections":20,"retriesPerRequest":2,"optimisticReplicationThreshold":256,"socketOptions":{"keepalive":true,"nodelay":false},"supervisorMaxR":25,"supervisorMaxT":5,"traceDumpInvprob":1000}

Above shows that the JSON has acknowledged the value of 48 but the log files show no change. After much waiting and re-checking grep shows no evidence.

[root@localhost logs]# grep "is done normally, total tokens:" xdcr.* | grep "total tokens: 48" | wc -l
0
[root@localhost logs]# grep "is done normally, total tokens:" xdcr.* | grep "total tokens: 32" | wc -l
7713

The same was observed for /internalSettings/

Found on both 2.5.1 and 3.0.

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
This is because global settings affect new replications or replications without per-replication settings defined. UI always defines all per-replication settings.
Comment by Jim Walker [ 16/Sep/14 ]
Have you pushed a documentation update for this?
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
No. I don't own docs.
Comment by Jim Walker [ 17/Sep/14 ]
Then this issue is not resolved.

Closing/resolving this defect with breadcrumbs to the opening of an issue on a different project would suffice as a satisfactory resolution.

You can also very easily put a pull request into docs on github with the correct behaviour.

Can you please perform *one* of those task so that the REST api here is correctly documented with the behaviours you are aware of and this matter can be closed.
Comment by Jim Walker [ 17/Sep/14 ]
Resolution requires either:

* Corrected documentation pushed to documentation repository.
* Enough accurate API information placed into a documentation defect so docs-team can correct.





[MB-11917] One node slow probably due to the Erlang scheduler Created: 09/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Volker Mische Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File crash_toy_701.rtf     PNG File leto_ssd_300-1105_561_build_init_indexleto_ssd_300-1105_561172.23.100.31beam.smp_cpu.png    
Issue Links:
Duplicate
duplicates MB-12200 Seg fault during indexing on view-toy... Resolved
duplicates MB-9822 One of nodes is too slow during indexing Closed
is duplicated by MB-12183 View Query Thruput regression compare... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
One node is slow, that's probably due to the "scheduler collapse" bug in the Erlang VM R16.

I will try to find a way to verify that it is really the scheduler and no other problem. This is basically a duplicate of MB-9822. Though that bug has a long history, hence I dare to create a new one.

 Comments   
Comment by Volker Mische [ 09/Aug/14 ]
I forgot to add that our issue sounds exactly like that one: http://erlang.org/pipermail/erlang-questions/2012-October/069503.html
Comment by Sriram Melkote [ 11/Aug/14 ]
Upgrading to blocker as this is doubling initial index time in recent runs on showfast.
Comment by Volker Mische [ 12/Aug/14 ]
I verified that it's the "scheduler collapse". Have a look at the chart I've attached (It's from [1] [172.23.100.31] beam.smp_cpu). It starts with a utilization of around 400% at around 120 I reduced the online schedulers to 1 (with running erlang:system_flag(schedulers_online, 1) via a remote shell). I then increased the schedulers_online again at around 150 to the original value of 24. You can see that it got back to normal.

[1]: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1105_561_build_init_index
Comment by Volker Mische [ 12/Aug/14 ]
I would try to run on R16 and see how often it happens with COUCHBASE_NS_SERVER_VM_EXTRA_ARGS=["+swt", "low", "+sfwi", "100"] set (as suggested in MB-9822 [1]).

[1]: https://www.couchbase.com/issues/browse/MB-9822?focusedCommentId=89219&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-89219
Comment by Pavel Paulau [ 12/Aug/14 ]
We agreed to try:

+sfwi 100/500 and +sbwt long

Will run test 5 times with these options.
Comment by Pavel Paulau [ 13/Aug/14 ]
5 runs of tests/index_50M_dgm.test with -sfwi 100 -sbwt long:

http://ci.sc.couchbase.com/job/leto-dev/19/
http://ci.sc.couchbase.com/job/leto-dev/20/
http://ci.sc.couchbase.com/job/leto-dev/21/
http://ci.sc.couchbase.com/job/leto-dev/22/
http://ci.sc.couchbase.com/job/leto-dev/23/

3 normal runs, 2 with slowness.
Comment by Volker Mische [ 13/Aug/14 ]
I see only one slow run (22): http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1137_6a0_build_init_index

But still :-/
Comment by Pavel Paulau [ 13/Aug/14 ]
See (20), incremental indexing: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1137_ed9_build_incr_index
Comment by Volker Mische [ 13/Aug/14 ]
Oh, I was only looking at the initial building.
Comment by Volker Mische [ 13/Aug/14 ]
I got a hint in the #erlang IRC channel. I'll try to use the erlang:bump_reductions(2000) and see if that helps.
Comment by Volker Mische [ 13/Aug/14 ]
Let's see if bumping the reductions make it work: http://review.couchbase.org/40591
Comment by Aleksey Kondratenko [ 13/Aug/14 ]
merged that commit.
Comment by Pavel Paulau [ 13/Aug/14 ]
Just tested build 3.0.0-1150, rebalance test but with initial indexing phase.

2 nodes are super slow and utilize only single core.
Comment by Volker Mische [ 18/Aug/14 ]
I can't reproduce it locally. I tend towards closing this issue as "won't fix". We should really not have long running NIFS.

I also think that it won't happen much under real work loads. And even if, the workaround would be to reduce the number of online schedulers to 1 and immediately increasing it again back to the original number.
Comment by Volker Mische [ 18/Aug/14 ]
Assigning to Siri to make the call on whether we close it or not.
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC1
Comment by Raju Suravarjjala [ 19/Aug/14 ]
Triage: Siri will put additional information and this bug is being retargeted to 3.0.1
Comment by Sriram Melkote [ 19/Aug/14 ]
Folks, for too long we've had trouble that get pinned to our NIFs. In 3.5, let's solve them whatever is the correct Erlang approach to running heavy high performance code. Port, or reporting reductions, or moving to R17 with dirty schedulers, or some other option I missed - whatever is the best solution, let us implement in 3.5 and be done.
Comment by Volker Mische [ 09/Sep/14 ]
I think we should close this issue and rather create a new one for whatever we come up with (e.g. the async mapreduce NIF).
Comment by Harsha Havanur [ 10/Sep/14 ]
Toy Build for this change at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-hhs-x86_64_3.0.0-702-toy.deb

Review in progress at
http://review.couchbase.org/#/c/41221/4
Comment by Harsha Havanur [ 12/Sep/14 ]
Please find udpated toy build for this
http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-hhs-x86_64_3.0.0-704-toy.deb
Comment by Sriram Melkote [ 12/Sep/14 ]
Another occurrence of this, MB-12183.

I'm making this a blocker.
Comment by Harsha Havanur [ 13/Sep/14 ]
Centos build at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent64-3.0.0-toy-hhs-x86_64_3.0.0-700-toy.rpm
Comment by Ketaki Gangal [ 16/Sep/14 ]
Filed bug MB-12200 for this toy-build
Comment by Ketaki Gangal [ 17/Sep/14 ]
Attaching stack from toy-build 701
File

crash_toy_701.rtf

Access to machine is as mentioned previously on MB-12200.




[MB-11060] Build and test 3.0 for 32-bit Windows Created: 06/May/14  Updated: 17/Sep/14  Due: 09/Jun/14

Status: Open
Project: Couchbase Server
Component/s: build, ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Blocker
Reporter: Chris Hillery Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7/8 32-bit

Issue Links:
Dependency
Duplicate

 Description   
For the "Developer Edition" of Couchbase Server 3.0 on Windows 32-bit, we need to first ensure that we can build 32-bit-compatible binaries. It is not possible to build 3.0 on a 32-bit machine due to the MSVC 2013 requirement. Hence we need to configure MSVC as well as Erlang on a 64-bit machine to produce 32-bit compatible binaries.

 Comments   
Comment by Chris Hillery [ 06/May/14 ]
This is assigned to Trond who is already experimenting with this. He should:

 * test being able to start the server on a 32-bit Windows 7/8 VM

 * make whatever changes are necessary to the CMake configuration or other build scripts to produce this build on a 64-bit VM

 * thoroughly document the requirements for the build team to reproduce this build

Then he can assign this bug to Chris to carry out configuring our build jobs accordingly.
Comment by Trond Norbye [ 16/Jun/14 ]
Can you give me a 32 bit windows installation I can test on. My MSDN license have expired and I don't have Windows media available (and the internal wiki page just have a limited set of licenses and no download links)

Then assign it back to me and I'll try it
Comment by Chris Hillery [ 16/Jun/14 ]
I think you can use 172.23.106.184 - it's a 32-bit Windows 2008 VM that we can't use for 3.0 builds anyway.
Comment by Trond Norbye [ 24/Jun/14 ]
I copied the full result of a build where I set target_platform=x86 on my 64 bit windows server (the "install" directory) over to a 32 bit windows machine and was able to start memcached and it worked as expected.

Our installers make other magic like install the service etc needed in order to start the full server. Once we have such an installer I can do further testing
Comment by Chris Hillery [ 24/Jun/14 ]
Bin - could you take a look at this (figuring out how to make InstallShield on a 64-bit machine create a 32-bit compatible installer)? I won't likely be able to get to it for at least a month, and I think you're the only person here who still has access to an InstallShield 2010 designer anyway.
Comment by Bin Cui [ 04/Sep/14 ]
PM should make the call that whether or not we want to have 32bit support for windows.
Comment by Anil Kumar [ 05/Sep/14 ]
Bin - As confirmed back in March-April supported platforms for Couchbase Server 3.0 - we decided to continue to build 32bit Windows for Development-Only support. As mentioned in our documentation deprecation page http://docs.couchbase.com/couchbase-manual-2.5/deprecated/#platforms.

Comment by Bin Cui [ 17/Sep/14 ]
1. create a 64bit builder with 32bit target.
2. Create a 32bit builder.
3. Transfer 64bit staging image to 32bit builder
4. Run the packaging steps and generate the final package out of 32bit builder.




[MB-11084] Build python snappy module on windows Created: 09/May/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Task Priority: Minor
Reporter: Bin Cui Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows


 Description   
To deal with compressed datatype, we need to python support for snappy function. We need to build https://github.com/andrix/python-snappy on windows and make it part of package.

 Comments   
Comment by Bin Cui [ 09/May/14 ]
I implement related logic for centos 5.x, 6.x and ubuntu. Please look at http://review.couchbase.org/#/c/36902/
Comment by Trond Norbye [ 16/Jun/14 ]
I've updated the windows build depot with the modules built for Python 2.7.6.

Please populate the depot to the builder and reassing the bug to Bin for verification.
Comment by Chris Hillery [ 13/Aug/14 ]
Depot was updated yesterday, so pysnappy is expanded into the install directory before the Couchbase build is started. I'm not sure what needs to be done to then use this package; passing off to Bin.
Comment by Don Pinto [ 03/Sep/14 ]
Question : Given that compressed datatype is not in 3.0 - is this still a requirement?

Thanks,




[MB-8508] installer - windows packages should be signed Created: 26/Nov/12  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0, 2.1.0, 2.2.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Critical
Reporter: Steve Yen Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-5577 print out Couchbase in the warning sc... Open
relates to MB-9165 Windows 8 Smartscreen blocks Couchbas... Resolved

 Description   
see also: http://www.couchbase.com/issues/browse/MB-7250
see also: http://www.couchbase.com/issues/browse/MB-49


 Comments   
Comment by Steve Yen [ 10/Dec/12 ]
Part of the challenge here would be figuring out the key-ownership process. Perhaps PM's should go create, register and own the signing keys/certs.
Comment by Steve Yen [ 31/Jan/13 ]
Reassigning as I think Phil has been tracking down the keys to the company.
Comment by Phil Labee [ 01/May/13 ]
Need more information:

Why do we need to sign windows app?
What problems are we addressing?
Do you want to release through the Windows Store?
What versions of Windows do we need to support?
Comment by Phil Labee [ 01/May/13 ]
need to know what problem we're trying to solve
Comment by Wayne Siu [ 06/Sep/13 ]
No security warning box is the objective.
Comment by Wayne Siu [ 20/Jun/14 ]
Anil,
I assume this is out of 3.0. Please update if it's not.
Comment by Anil Kumar [ 20/Jun/14 ]
we should still consider it for 3.0 unless there is no time to fix then candidate for punting.
Comment by Wayne Siu [ 30/Jul/14 ]
Moving it out of 3.0.
Comment by Anil Kumar [ 17/Sep/14 ]
we need this for Windows 3.0 GA timeframe




[MB-9825] Rebalance exited with reason bad_replicas Created: 06/Jan/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Venu Uppalapati
Resolution: Unresolved Votes: 0
Labels: performance, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.5.0 enterprise edition (build-1015)

Platform = Physical
OS = Windows Server 2012
CPU = Intel Xeon E5-2630
Memory = 64 GB
Disk = 2 x HDD

Triage: Triaged
Operating System: Windows 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/zeus-64/564/artifact/

 Description   
Rebalance-out, 4 -> 3, 1 bucket x 50M x 2KB, DGM, 1 x 1 views

Bad replicators after rebalance:
Missing = [{'ns_1@172.23.96.27','ns_1@172.23.96.26',597},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',598},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',599},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',600},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',601},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',602},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',603},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',604},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',605},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',606},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',607},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',608},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',609},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',610},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',611},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',612},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',613},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',614},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',615},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',616},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',617},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',618},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',619},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',620},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',621},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',622},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',623},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',624},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',625},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',626},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',627},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',628},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',629},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',630},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',631},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',632},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',633},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',634},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',635},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',636},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',637},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',638},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',639},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',640},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',641},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',642},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',643},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',644},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',645},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',646},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',647},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',648},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',649},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',650},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',651},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',652},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',653},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',654},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',655},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',656},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',657},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',658},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',659},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',660},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',661},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',662},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',663},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',664},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',665},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',666},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',667},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',668},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',669},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',670},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',671},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',672},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',673},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',674},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',675},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',676},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',677},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',678},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',679},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',680},
{'ns_1@172.23.96.27','ns_1@172.23.96.26',681}]
Extras = []

 Comments   
Comment by Aleksey Kondratenko [ 06/Jan/14 ]
Looks like producer node simply closed socket.

Most likely duplicate of old issue where both socket sides suddenly see connection as closed.

Relevant log messages:

[error_logger:info,2014-01-06T10:30:00.231,ns_1@172.23.96.26:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
          supervisor: {local,'ns_vbm_new_sup-bucket-1'}
             started: [{pid,<0.1169.0>},
                       {name,
                           {new_child_id,
                               [597,598,599,600,601,602,603,604,605,606,607,
                                608,609,610,611,612,613,614,615,616,617,618,
                                619,620,621,622,623,624,625,626,627,628,629,
                                630,631,632,633,634,635,636,637,638,639,640,
                                641,642,643,644,645,646,647,648,649,650,651,
                                652,653,654,655,656,657,658,659,660,661,662,
                                663,664,665,666,667,668,669,670,671,672,673,
                                674,675,676,677,678,679,680,681],
                               'ns_1@172.23.96.27'}},
                       {mfargs,
                           {ebucketmigrator_srv,start_link,
                               [{"172.23.96.27",11209},
                                {"172.23.96.26",11209},
                                [{on_not_ready_vbuckets,
                                     #Fun<tap_replication_manager.2.133536719>},
                                 {username,"bucket-1"},
                                 {password,get_from_config},
                                 {vbuckets,
                                     [597,598,599,600,601,602,603,604,605,606,
                                      607,608,609,610,611,612,613,614,615,616,
                                      617,618,619,620,621,622,623,624,625,626,
                                      627,628,629,630,631,632,633,634,635,636,
                                      637,638,639,640,641,642,643,644,645,646,
                                      647,648,649,650,651,652,653,654,655,656,
                                      657,658,659,660,661,662,663,664,665,666,
                                      667,668,669,670,671,672,673,674,675,676,
                                      677,678,679,680,681]},
                                 {set_to_pending_state,false},
                                 {takeover,false},
                                 {suffix,"ns_1@172.23.96.26"}]]}},
                       {restart_type,temporary},
                       {shutdown,60000},
                       {child_type,worker}]



[rebalance:debug,2014-01-06T12:12:33.870,ns_1@172.23.96.26:<0.1169.0>:ebucketmigrator_srv:terminate:737]Dying with reason: normal

Mon Jan 06 12:12:44.371917 Pacific Standard Time 3: (bucket-1) TAP (Producer) eq_tapq:replication_ns_1@172.23.96.26 - disconnected, keep alive for 300 seconds
Comment by Maria McDuff (Inactive) [ 10/Jan/14 ]
looks like a dupe of memcached connection issue.
will close this as a dupe.
Comment by Wayne Siu [ 15/Jan/14 ]
Chiyoung to add more debug logging to 2.5.1.
Comment by Chiyoung Seo [ 17/Jan/14 ]
I added more warning-level logs for disconnection events in the memcached layer. We will continue to investigate this issue for 2.5.1 or 3.0 release.

http://review.couchbase.org/#/c/32567/

merged.
Comment by Cihan Biyikoglu [ 08/Apr/14 ]
Given we have more verbose logging, can we reproduce the issue again and see if we can get a better idea on where the problem is?
thanks
Comment by Pavel Paulau [ 08/Apr/14 ]
This issue happened only on Windows so far.
I wasn't able to reproduce it in 2.5.1 and obviously we haven't tested 3.0 yet.
Comment by Cihan Biyikoglu [ 25/Jun/14 ]
Pavel, do you have the repro with the detailed logs now? if yes, could we assign to a dev for fixing?
Comment by Pavel Paulau [ 25/Jun/14 ]
This is Windows specific bug. We are not testing Windows yet.
Comment by Pavel Paulau [ 27/Jun/14 ]
Just FYI.

I have finally tried Windows build. It's absolutely unstable and not ready for performance testing yet.
Please don't expect news any time soon.




[MB-9874] [Windows] Couchstore drop and reopen of file handle fails Created: 09/Jan/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Trond Norbye Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: windows, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows


 Description   
The unit test doing couchstore_drop_file and couchstore_repoen_file fails due to COUCHSTORE_READ_ERROR when it tries to reopen the file.

The commit http://review.couchbase.org/#/c/31767/ disabled the test to allow the rest of the unit tests to be executed.

 Comments   
Comment by Anil Kumar [ 17/Jul/14 ]
Triage - Chiyoung, Anil, Venu, Wayne .. July 17th




[MB-9635] Audit logs for Admin actions Created: 22/Nov/13  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.2.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Anil Kumar Assignee: Don Pinto
Resolution: Unresolved Votes: 0
Labels: security
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Duplicate

 Description   
Couchbase Server should be able to get an audit logs for all Admin actions such-as login/logout events, significant events (rebalance, failover, etc) etc.



 Comments   
Comment by Matt Ingenthron [ 13/Mar/14 ]
Note there isn't exactly a "login/logout" event. This is mostly by design. A feature like this could be added, but there may be better ways to achieve the underlying requirement. One suggestion would be to log initial activities instead of every activity and have a 'cache' for having seen that user agent within a particular window. That would probably meet most auditing requirements and is, I think, relatively straightforward to implement.
Comment by Aleksey Kondratenko [ 06/Jun/14 ]
We have access.log implemented now. But it's not exactly same as full-blown audit. Particularly we do log that certain POST was handled in access.log, but we do not log any parameters of that action. So it doesn't count as fullly-featured audit log I think.
Comment by Aleksey Kondratenko [ 06/Jun/14 ]
access.log for log and ep-engine's access.log do not conflict due to being in necessarily different directories.
Comment by Perry Krug [ 06/Jun/14 ]
They may not conflict in terms of unique names in the same directory, but to our customers it may be a little bit too close to remember which access.log does what...
Comment by Aleksey Kondratenko [ 06/Jun/14 ]
Ok. Any specific proposals ?
Comment by Perry Krug [ 06/Jun/14 ]
Yes, as mentioned above, login.log would be one proposal but I'm not tied to it.
Comment by Aleksey Kondratenko [ 06/Jun/14 ]
access.log has very little to do with logins. It's full blown equivalent of apache's access.log.
Comment by Perry Krug [ 06/Jun/14 ]
Oh sorry, I misread this specific section.

How about audit.log? I know it's not fully "audit" but I'm just trying to avoid the name clash in our customer's minds...
Comment by Anil Kumar [ 09/Jun/14 ]
Agreed we should rename this file to audit.log to avoid any confusion. Updating the MB-10020 to make that change.
Comment by Larry Liu [ 10/Jun/14 ]
Hi, Anil

Does this feature satisfy PCI compliance?

Larry
Comment by Cihan Biyikoglu [ 11/Jun/14 ]
Hi Larry, PCI is a comprehensive set of requirements that go beyond database features. This does help with some part of PCI but talking about compliance with PCI involve many additional controls and most can be done at the operational levels or at the app level.
thanks




[MB-9656] XDCR destination endpoints for "getting xdcr stats via rest" in url encoding Created: 29/Nov/13  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.1.0, 2.2.0, 2.1.1, 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Major
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: customer, supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/couchbase-manual-2.2/#getting-xdcr-stats-via-rest


 Description   
In our documentation the destination endpoint are not in url encoding where "/" are "%2F". This has mislead customers. That section should be in the following format:

replications%2F[UUID]%2F[source_bucket]%2F[destination_bucket]%2Fdocs_written

If this change is made we should remove this line too:

You need to provide properly URL-encoded /[UUID]/[source_bucket]/[destination_bucket]/[stat_name]. To get the number of documents written:



 Comments   
Comment by Amy Kurtzman [ 16/May/14 ]
The syntax and example code in this whole REST section needs to be cleaned up and tested. It is a bigger job than just fixing this one.
Comment by Patrick Varley [ 17/Sep/14 ]
I fall down this hole again and so do another Support Engineer. We really need to get this fixed in all versions.

The 3.0 documentation has this problem too.
Comment by Ruth Harris [ 17/Sep/14 ]
Why are you suggesting that the backslash in the syntax be %2F???
This is not a blocker.




[MB-12192] XDCR : After warmup, replica items are not deleted in destination cluster Created: 15/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, DCP
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Aruna Piravi Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.x, 3.0.1-1297-rel

Attachments: Zip Archive 172.23.106.45-9152014-1553-diag.zip     GZip Archive 172.23.106.45-9152014-1623-couch.tar.gz     Zip Archive 172.23.106.46-9152014-1555-diag.zip     GZip Archive 172.23.106.46-9152014-1624-couch.tar.gz     Zip Archive 172.23.106.47-9152014-1558-diag.zip     GZip Archive 172.23.106.47-9152014-1624-couch.tar.gz     Zip Archive 172.23.106.48-9152014-160-diag.zip     GZip Archive 172.23.106.48-9152014-1624-couch.tar.gz    
Triage: Untriaged
Is this a Regression?: Yes

 Description   
Steps
--------
1. Setup uni-xdcr between 2 clusters with atleast 2 nodes
2. Load 5000 items onto 3 buckets at source, they get replicated to destination
3. Reboot a non-master node on destination (in this test .48)
4. After warmup, perform 30% updates and 30% deletes on source cluster
5. Deletes get propagated to active vbuckets on destination but replica vbuckets only experience partial deletion.

Important note
--------------------
This test had passed on 3.0.0-1208-rel and 3.0.0-1209-rel. However I'm able to reproduce this consistently on 3.0.1. Unsure if this is a recent regression.

2014-09-15 14:43:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', sasl_bucket_1 bucket
2014-09-15 14:43:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', standard_bucket_1 bucket
2014-09-15 14:43:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', default bucket

Testcase
------------
./testrunner -i /tmp/bixdcr.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,reboot=dest_node,items=5000,rdirection=unidirection,replication_type=xmem,standard_buckets=1,sasl_buckets=1,pause=source,doc-ops=update-delete,doc-ops-dest=update-delete

On destination cluster
-----------------------------

Arunas-MacBook-Pro:bin apiravi$ ./cbvdiff 172.23.106.47:11210,172.23.106.48:11210
VBucket 512: active count 4 != 6 replica count

VBucket 513: active count 2 != 4 replica count

VBucket 514: active count 8 != 11 replica count

VBucket 515: active count 3 != 4 replica count

VBucket 516: active count 8 != 10 replica count

VBucket 517: active count 5 != 6 replica count

VBucket 521: active count 0 != 1 replica count

VBucket 522: active count 7 != 11 replica count

VBucket 523: active count 3 != 5 replica count

VBucket 524: active count 6 != 10 replica count

VBucket 525: active count 4 != 6 replica count

VBucket 526: active count 4 != 6 replica count

VBucket 528: active count 7 != 10 replica count

VBucket 529: active count 3 != 4 replica count

VBucket 530: active count 3 != 4 replica count

VBucket 532: active count 0 != 2 replica count

VBucket 533: active count 1 != 2 replica count

VBucket 534: active count 8 != 10 replica count

VBucket 535: active count 5 != 6 replica count

VBucket 536: active count 7 != 11 replica count

VBucket 537: active count 3 != 5 replica count

VBucket 540: active count 3 != 4 replica count

VBucket 542: active count 6 != 10 replica count

VBucket 543: active count 4 != 6 replica count

VBucket 544: active count 6 != 10 replica count

VBucket 545: active count 3 != 4 replica count

VBucket 547: active count 0 != 1 replica count

VBucket 548: active count 6 != 7 replica count

VBucket 550: active count 7 != 10 replica count

VBucket 551: active count 4 != 5 replica count

VBucket 552: active count 9 != 11 replica count

VBucket 553: active count 4 != 6 replica count

VBucket 554: active count 4 != 5 replica count

VBucket 555: active count 1 != 2 replica count

VBucket 558: active count 7 != 10 replica count

VBucket 559: active count 3 != 4 replica count

VBucket 562: active count 6 != 10 replica count

VBucket 563: active count 4 != 5 replica count

VBucket 564: active count 7 != 10 replica count

VBucket 565: active count 4 != 5 replica count

VBucket 566: active count 4 != 5 replica count

VBucket 568: active count 3 != 4 replica count

VBucket 570: active count 8 != 10 replica count

VBucket 571: active count 4 != 6 replica count

VBucket 572: active count 7 != 10 replica count

VBucket 573: active count 3 != 4 replica count

VBucket 574: active count 0 != 1 replica count

VBucket 575: active count 0 != 1 replica count

VBucket 578: active count 8 != 10 replica count

VBucket 579: active count 4 != 6 replica count

VBucket 580: active count 8 != 11 replica count

VBucket 581: active count 3 != 4 replica count

VBucket 582: active count 3 != 4 replica count

VBucket 583: active count 1 != 2 replica count

VBucket 584: active count 3 != 4 replica count

VBucket 586: active count 6 != 10 replica count

VBucket 587: active count 3 != 4 replica count

VBucket 588: active count 7 != 10 replica count

VBucket 589: active count 4 != 5 replica count

VBucket 591: active count 0 != 2 replica count

VBucket 592: active count 8 != 10 replica count

VBucket 593: active count 4 != 6 replica count

VBucket 594: active count 0 != 1 replica count

VBucket 595: active count 0 != 1 replica count

VBucket 596: active count 4 != 6 replica count

VBucket 598: active count 7 != 10 replica count

VBucket 599: active count 3 != 4 replica count

VBucket 600: active count 6 != 10 replica count

VBucket 601: active count 3 != 4 replica count

VBucket 602: active count 4 != 6 replica count

VBucket 606: active count 7 != 10 replica count

VBucket 607: active count 4 != 5 replica count

VBucket 608: active count 7 != 11 replica count

VBucket 609: active count 3 != 5 replica count

VBucket 610: active count 3 != 4 replica count

VBucket 613: active count 0 != 1 replica count

VBucket 614: active count 6 != 10 replica count

VBucket 615: active count 4 != 6 replica count

VBucket 616: active count 7 != 10 replica count

VBucket 617: active count 3 != 4 replica count

VBucket 620: active count 3 != 4 replica count

VBucket 621: active count 1 != 2 replica count

VBucket 622: active count 9 != 11 replica count

VBucket 623: active count 5 != 6 replica count

VBucket 624: active count 5 != 6 replica count

VBucket 626: active count 7 != 11 replica count

VBucket 627: active count 3 != 5 replica count

VBucket 628: active count 6 != 10 replica count

VBucket 629: active count 4 != 6 replica count

VBucket 632: active count 0 != 1 replica count

VBucket 633: active count 0 != 1 replica count

VBucket 634: active count 7 != 10 replica count

VBucket 635: active count 3 != 4 replica count

VBucket 636: active count 8 != 10 replica count

VBucket 637: active count 5 != 6 replica count

VBucket 638: active count 5 != 6 replica count

VBucket 640: active count 2 != 4 replica count

VBucket 641: active count 7 != 11 replica count

VBucket 643: active count 5 != 7 replica count

VBucket 646: active count 3 != 5 replica count

VBucket 647: active count 7 != 10 replica count

VBucket 648: active count 4 != 6 replica count

VBucket 649: active count 8 != 10 replica count

VBucket 651: active count 0 != 1 replica count

VBucket 653: active count 4 != 6 replica count

VBucket 654: active count 3 != 4 replica count

VBucket 655: active count 7 != 10 replica count

VBucket 657: active count 4 != 5 replica count

VBucket 658: active count 2 != 4 replica count

VBucket 659: active count 7 != 11 replica count

VBucket 660: active count 3 != 5 replica count

VBucket 661: active count 7 != 10 replica count

VBucket 662: active count 0 != 2 replica count

VBucket 666: active count 4 != 6 replica count

VBucket 667: active count 8 != 10 replica count

VBucket 668: active count 3 != 4 replica count

VBucket 669: active count 7 != 10 replica count

VBucket 670: active count 1 != 2 replica count

VBucket 671: active count 2 != 3 replica count

VBucket 673: active count 0 != 1 replica count

VBucket 674: active count 3 != 4 replica count

VBucket 675: active count 7 != 10 replica count

VBucket 676: active count 5 != 6 replica count

VBucket 677: active count 8 != 10 replica count

VBucket 679: active count 5 != 6 replica count

VBucket 681: active count 6 != 7 replica count

VBucket 682: active count 3 != 5 replica count

VBucket 683: active count 8 != 12 replica count

VBucket 684: active count 3 != 6 replica count

VBucket 685: active count 7 != 11 replica count

VBucket 688: active count 3 != 4 replica count

VBucket 689: active count 7 != 10 replica count

VBucket 692: active count 1 != 2 replica count

VBucket 693: active count 2 != 3 replica count

VBucket 694: active count 5 != 6 replica count

VBucket 695: active count 8 != 10 replica count

VBucket 696: active count 3 != 5 replica count

VBucket 697: active count 8 != 12 replica count

VBucket 699: active count 4 != 5 replica count

VBucket 700: active count 0 != 1 replica count

VBucket 702: active count 3 != 6 replica count

VBucket 703: active count 7 != 11 replica count

VBucket 704: active count 3 != 5 replica count

VBucket 705: active count 8 != 12 replica count

VBucket 709: active count 4 != 5 replica count

VBucket 710: active count 3 != 6 replica count

VBucket 711: active count 7 != 11 replica count

VBucket 712: active count 3 != 4 replica count

VBucket 713: active count 7 != 10 replica count

VBucket 715: active count 3 != 4 replica count

VBucket 716: active count 1 != 2 replica count

VBucket 717: active count 0 != 2 replica count

VBucket 718: active count 5 != 6 replica count

VBucket 719: active count 8 != 10 replica count

VBucket 720: active count 0 != 1 replica count

VBucket 722: active count 3 != 5 replica count

VBucket 723: active count 8 != 12 replica count

VBucket 724: active count 3 != 6 replica count

VBucket 725: active count 7 != 11 replica count

VBucket 727: active count 5 != 7 replica count

VBucket 728: active count 2 != 4 replica count

VBucket 729: active count 3 != 5 replica count

VBucket 730: active count 3 != 4 replica count

VBucket 731: active count 7 != 10 replica count

VBucket 732: active count 5 != 6 replica count

VBucket 733: active count 8 != 10 replica count

VBucket 737: active count 3 != 4 replica count

VBucket 738: active count 4 != 6 replica count

VBucket 739: active count 8 != 10 replica count

VBucket 740: active count 3 != 4 replica count

VBucket 741: active count 7 != 10 replica count

VBucket 743: active count 0 != 1 replica count

VBucket 746: active count 2 != 4 replica count

VBucket 747: active count 7 != 11 replica count

VBucket 748: active count 3 != 5 replica count

VBucket 749: active count 7 != 10 replica count

VBucket 751: active count 3 != 4 replica count

VBucket 752: active count 4 != 6 replica count

VBucket 753: active count 9 != 11 replica count

VBucket 754: active count 1 != 2 replica count

VBucket 755: active count 4 != 5 replica count

VBucket 758: active count 3 != 4 replica count

VBucket 759: active count 7 != 10 replica count

VBucket 760: active count 2 != 4 replica count

VBucket 761: active count 7 != 11 replica count

VBucket 762: active count 0 != 1 replica count

VBucket 765: active count 6 != 7 replica count

VBucket 766: active count 3 != 5 replica count

VBucket 767: active count 7 != 10 replica count

VBucket 770: active count 3 != 5 replica count

VBucket 771: active count 7 != 11 replica count

VBucket 772: active count 4 != 6 replica count

VBucket 773: active count 6 != 10 replica count

VBucket 775: active count 3 != 4 replica count

VBucket 777: active count 3 != 4 replica count

VBucket 778: active count 3 != 4 replica count

VBucket 779: active count 7 != 10 replica count

VBucket 780: active count 5 != 6 replica count

VBucket 781: active count 8 != 10 replica count

VBucket 782: active count 1 != 2 replica count

VBucket 783: active count 0 != 2 replica count

VBucket 784: active count 3 != 5 replica count

VBucket 785: active count 7 != 11 replica count

VBucket 786: active count 0 != 1 replica count

VBucket 789: active count 4 != 6 replica count

VBucket 790: active count 4 != 6 replica count

VBucket 791: active count 6 != 10 replica count

VBucket 792: active count 3 != 4 replica count

VBucket 793: active count 8 != 11 replica count

VBucket 794: active count 2 != 4 replica count

VBucket 795: active count 4 != 6 replica count

VBucket 798: active count 5 != 6 replica count

VBucket 799: active count 8 != 10 replica count

VBucket 800: active count 4 != 6 replica count

VBucket 801: active count 8 != 10 replica count

VBucket 803: active count 3 != 4 replica count

VBucket 804: active count 0 != 1 replica count

VBucket 805: active count 0 != 1 replica count

VBucket 806: active count 3 != 4 replica count

VBucket 807: active count 7 != 10 replica count

VBucket 808: active count 3 != 4 replica count

VBucket 809: active count 6 != 10 replica count

VBucket 813: active count 4 != 5 replica count

VBucket 814: active count 4 != 5 replica count

VBucket 815: active count 7 != 10 replica count

VBucket 816: active count 1 != 2 replica count

VBucket 817: active count 4 != 5 replica count

VBucket 818: active count 4 != 6 replica count

VBucket 819: active count 8 != 10 replica count

VBucket 820: active count 3 != 4 replica count

VBucket 821: active count 7 != 10 replica count

VBucket 824: active count 0 != 1 replica count

VBucket 826: active count 3 != 4 replica count

VBucket 827: active count 6 != 10 replica count

VBucket 828: active count 4 != 5 replica count

VBucket 829: active count 7 != 10 replica count

VBucket 831: active count 6 != 7 replica count

VBucket 833: active count 4 != 6 replica count

VBucket 834: active count 3 != 4 replica count

VBucket 835: active count 6 != 10 replica count

VBucket 836: active count 4 != 5 replica count

VBucket 837: active count 7 != 10 replica count

VBucket 840: active count 0 != 1 replica count

VBucket 841: active count 0 != 1 replica count

VBucket 842: active count 4 != 6 replica count

VBucket 843: active count 8 != 10 replica count

VBucket 844: active count 3 != 4 replica count

VBucket 845: active count 7 != 10 replica count

VBucket 847: active count 4 != 6 replica count

VBucket 848: active count 3 != 4 replica count

VBucket 849: active count 6 != 10 replica count

VBucket 851: active count 3 != 4 replica count

VBucket 852: active count 0 != 2 replica count

VBucket 854: active count 4 != 5 replica count

VBucket 855: active count 7 != 10 replica count

VBucket 856: active count 4 != 6 replica count

VBucket 857: active count 8 != 10 replica count

VBucket 860: active count 1 != 2 replica count

VBucket 861: active count 3 != 4 replica count

VBucket 862: active count 3 != 4 replica count

VBucket 863: active count 8 != 11 replica count

VBucket 864: active count 3 != 4 replica count

VBucket 865: active count 7 != 10 replica count

VBucket 866: active count 0 != 1 replica count

VBucket 867: active count 0 != 1 replica count

VBucket 869: active count 5 != 6 replica count

VBucket 870: active count 5 != 6 replica count

VBucket 871: active count 8 != 10 replica count

VBucket 872: active count 3 != 5 replica count

VBucket 873: active count 7 != 11 replica count

VBucket 875: active count 5 != 6 replica count

VBucket 878: active count 4 != 6 replica count

VBucket 879: active count 6 != 10 replica count

VBucket 882: active count 3 != 4 replica count

VBucket 883: active count 7 != 10 replica count

VBucket 884: active count 5 != 6 replica count

VBucket 885: active count 9 != 11 replica count

VBucket 886: active count 1 != 2 replica count

VBucket 887: active count 3 != 4 replica count

VBucket 889: active count 3 != 4 replica count

VBucket 890: active count 3 != 5 replica count

VBucket 891: active count 7 != 11 replica count

VBucket 892: active count 4 != 6 replica count

VBucket 893: active count 6 != 10 replica count

VBucket 894: active count 0 != 1 replica count

VBucket 896: active count 8 != 10 replica count

VBucket 897: active count 4 != 6 replica count

VBucket 900: active count 2 != 3 replica count

VBucket 901: active count 2 != 3 replica count

VBucket 902: active count 7 != 10 replica count

VBucket 903: active count 3 != 4 replica count

VBucket 904: active count 7 != 11 replica count

VBucket 905: active count 2 != 4 replica count

VBucket 906: active count 4 != 5 replica count

VBucket 909: active count 0 != 2 replica count

VBucket 910: active count 7 != 10 replica count

VBucket 911: active count 3 != 5 replica count

VBucket 912: active count 0 != 1 replica count

VBucket 914: active count 8 != 10 replica count

VBucket 915: active count 4 != 6 replica count

VBucket 916: active count 7 != 10 replica count

VBucket 917: active count 3 != 4 replica count

VBucket 918: active count 4 != 6 replica count

VBucket 920: active count 5 != 7 replica count

VBucket 922: active count 7 != 11 replica count

VBucket 923: active count 2 != 4 replica count

VBucket 924: active count 7 != 10 replica count

VBucket 925: active count 3 != 5 replica count

VBucket 928: active count 4 != 5 replica count

VBucket 930: active count 8 != 12 replica count

VBucket 931: active count 3 != 5 replica count

VBucket 932: active count 7 != 11 replica count

VBucket 933: active count 3 != 6 replica count

VBucket 935: active count 0 != 1 replica count

VBucket 938: active count 7 != 10 replica count

VBucket 939: active count 3 != 4 replica count

VBucket 940: active count 8 != 10 replica count

VBucket 941: active count 5 != 6 replica count

VBucket 942: active count 2 != 3 replica count

VBucket 943: active count 1 != 2 replica count

VBucket 944: active count 8 != 12 replica count

VBucket 945: active count 3 != 5 replica count

VBucket 946: active count 6 != 7 replica count

VBucket 950: active count 7 != 11 replica count

VBucket 951: active count 3 != 6 replica count

VBucket 952: active count 7 != 10 replica count

VBucket 953: active count 3 != 4 replica count

VBucket 954: active count 0 != 1 replica count

VBucket 956: active count 5 != 6 replica count

VBucket 958: active count 8 != 10 replica count

VBucket 959: active count 5 != 6 replica count

VBucket 960: active count 7 != 10 replica count

VBucket 961: active count 3 != 4 replica count

VBucket 962: active count 3 != 5 replica count

VBucket 963: active count 2 != 4 replica count

VBucket 966: active count 8 != 10 replica count

VBucket 967: active count 5 != 6 replica count

VBucket 968: active count 8 != 12 replica count

VBucket 969: active count 3 != 5 replica count

VBucket 971: active count 0 != 1 replica count

VBucket 972: active count 5 != 7 replica count

VBucket 974: active count 7 != 11 replica count

VBucket 975: active count 3 != 6 replica count

VBucket 976: active count 3 != 4 replica count

VBucket 978: active count 7 != 10 replica count

VBucket 979: active count 3 != 4 replica count

VBucket 980: active count 8 != 10 replica count

VBucket 981: active count 5 != 6 replica count

VBucket 982: active count 0 != 2 replica count

VBucket 983: active count 1 != 2 replica count

VBucket 986: active count 8 != 12 replica count

VBucket 987: active count 3 != 5 replica count

VBucket 988: active count 7 != 11 replica count

VBucket 989: active count 3 != 6 replica count

VBucket 990: active count 4 != 5 replica count

VBucket 993: active count 0 != 1 replica count

VBucket 994: active count 7 != 11 replica count

VBucket 995: active count 2 != 4 replica count

VBucket 996: active count 7 != 10 replica count

VBucket 997: active count 3 != 5 replica count

VBucket 998: active count 5 != 6 replica count

VBucket 1000: active count 4 != 5 replica count

VBucket 1001: active count 1 != 2 replica count

VBucket 1002: active count 9 != 11 replica count

VBucket 1003: active count 4 != 6 replica count

VBucket 1004: active count 7 != 10 replica count

VBucket 1005: active count 3 != 4 replica count

VBucket 1008: active count 7 != 11 replica count

VBucket 1009: active count 2 != 4 replica count

VBucket 1012: active count 4 != 5 replica count

VBucket 1014: active count 7 != 10 replica count

VBucket 1015: active count 3 != 5 replica count

VBucket 1016: active count 8 != 10 replica count

VBucket 1017: active count 4 != 6 replica count

VBucket 1018: active count 3 != 4 replica count

VBucket 1020: active count 0 != 1 replica count

VBucket 1022: active count 7 != 10 replica count

VBucket 1023: active count 3 != 4 replica count

Active item count = 3500

Same at source
----------------------
Arunas-MacBook-Pro:bin apiravi$ ./cbvdiff 172.23.106.45:11210,172.23.106.46:11210
Active item count = 3500

Will attach cbcollect and data files.


 Comments   
Comment by Mike Wiederhold [ 15/Sep/14 ]
This is not a bug. We no longer do this because a replica vbucket cannot delete items on it's own due to dcp.
Comment by Aruna Piravi [ 15/Sep/14 ]
I do not understand why this is not a bug. This is a case where replica items = 4250 and active = 3500. Both were initially 5000 before warmup. However 50% of the actual deletes have happened on replica bucket(5000->4250). And so I would expect the another 750 items to be deleted too so active=replica. If this is not a bug, in case of failover, the cluster will end up having more items than it did before the failover.
Comment by Aruna Piravi [ 15/Sep/14 ]
> We no longer do this because a replica vbucket cannot delete items on it's own due to dcp
Then I would expect the deletes to be propagated from active vbuckets through dcp..but these never get propagated. If you do a cbdiff even now, you can see the mismatch.
Comment by Sriram Ganesan [ 17/Sep/14 ]
Aruna

If there is a testrunner script available for steps (1) - (5), please update the bug. Thanks.
Comment by Aruna Piravi [ 17/Sep/14 ]
Done.




[MB-12138] {Windows - DCP}:: View Query fails with error 500 reason: error {"error":"error","reason":"{index_builder_exit,89,<<>>}"} Created: 05/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Parag Agarwal Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: windows, windows-3.0-beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1267, Windows 2012, 64 x, machine:: 172.23.105.112

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12138/172.23.105.112-952014-1511-diag.zip
Is this a Regression?: Yes

 Description   


1. Create 1 Node cluster
2. Create default bucket and add 100k items
3. Create views and query it

Seeing the following exceptions

http://172.23.105.112:8092/default/_design/ddoc1/_view/default_view0?connectionTimeout=60000&full_set=true&limit=100000&stale=false error 500 reason: error {"error":"error","reason":"{index_builder_exit,89,<<>>}"}

We cannot run any view tests as a result


 Comments   
Comment by Anil Kumar [ 16/Sep/14 ]
Nimish/Siri - Any update on this.
Comment by Meenakshi Goel [ 17/Sep/14 ]
Seeing similar issue in Views DGM test http://qa.hq.northscale.net/job/win_2008_x64--69_06_view_dgm_tests-P1/1/console
Test : view.createdeleteview.CreateDeleteViewTests.test_view_ops,ddoc_ops=update,test_with_view=True,num_ddocs=4,num_views_per_ddoc=10,items=200000,active_resident_threshold=10,dgm_run=True,eviction_policy=fullEviction
Comment by Nimish Gupta [ 17/Sep/14 ]
We have found the root cause and working on the fix.




[MB-12207] Related links could be clearer. Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I think it would be better if "Related link" at the bottom of the page was layout a little different and we added the ability to navigate (MB-12205) from the bottom of a page(Think long pages).

Maybe something like this could work:

Links

Parent Topic:
    Installation and upgrade
Previous Topic:
    Welcome to couchbase
Next Topic:
    uninstalling couchbase
Related Topics:
    Initial server setup
    Testing Couchbase Server
    Upgrading




[MB-12195] Update notifications does not seem to be working Created: 15/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.5.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Ian McCloy
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 5.8
2.5.0

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I have installed 2.5.0 build and enabled Update Notifications
Even though I enabled "Enable software Update Notifications", I keep getting "No Updates available"
I thought I will be notified in the UI that there is a 2.5.1 is available.

I have consulted Tony to see if I have done something wrong but he also confirmed that this seems to be an issue and is a bug

 Comments   
Comment by Aleksey Kondratenko [ 15/Sep/14 ]
Based on dev tools we're getting "no new version" from phone home requests. So it's not UI bug.
Comment by Ian McCloy [ 17/Sep/14 ]
Added the missing available upgrade paths to the database,

2.5.0-1059-rel-enterprise -> 2.5.1-1083-rel-enterprise
2.2.0-837-rel-enterprise -> 2.5.1-1083-rel-enterprise
2.1.0-718-rel-enterprise -> 2.2.0-837-rel-enterprise

but it looks like the code that parses http://ph.couchbase.net/v2?callback=jQueryxxx isn't checking the database.




[MB-12205] Doc-system: does not have a next page button. Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0-Beta
Security Level: Public

Type: Bug Priority: Major
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When reading a manual you normally want to go to the next page. It would be good to have a "next" button at the bottom of the page. Here is a good example:

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Views/views-operation.html




[MB-12204] New doc-system does not have anchors Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The support team uses anchors all the time to link customers directly to the selection that has the information they required.

 know that we have broken a number of sections out into their own page but there are still some long pages for example:

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Misc/security-client-ssl.html


It would be good if we could link the customer directly to: "Configuring the PHP client for SSL"

I have marked this as a blocker as it will affect the way the support team works today.




[MB-12203] Available-stats table formatted incorrectly Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Task Priority: Minor
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#available-stats


 Description   
See the pending_ops cell in the link below.

http://docs.couchbase.com/couchbase-manual-2.5/cb-cli/#available-stats

I believe "client connections blocked for operations in pending vbuckets" should all be in one cell.




[MB-11938]  N1QL developer preview does not work with couchbase 3.0 beta. Created: 12/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Patrick Varley Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This came in on IRC, user dropped offline before I could point them at Jira. I have created this defect on their behalf:

N1QL makes use of _all_docs which we have removed in 3.0.

The error from the query engine:

couchbase-query_dev_preview3_x86_64_mac ► ./cbq-engine -couchbase http://127.0.0.1:8091/
19:13:38.355197 Info line disabled false
19:13:38.367261 tuqtng started...
19:13:38.367282 version: v0.7.2
19:13:38.367287 site: http://127.0.0.1:8091/
19:14:24.179252 ERROR: Unable to access view - cause: error executing view req at http://127.0.0.1:8092/free/_all_docs?limit=1001: 400 Bad Request - {"error":"bad_request","reason":"_all_docs is no longer supported"}
 -- couchbase.(*viewIndex).ScanRange() at view_index.go:186
19:14:24.179272 Checking bucket URI: /pools/default/buckets/free?bucket_uuid=660ff64e9d1fdfee0c41017e89a4fe72
19:14:24.179315 ERROR: Get /pools/default/buckets/free?bucket_uuid=660ff64e9d1fdfee0c41017e89a4fe72: unsupported protocol scheme "" -- couchbase.(*viewIndex).ScanRange() at view_index.go:192

 Comments   
Comment by Gerald Sangudi [ 12/Aug/14 ]
Please use

CREATE PRIMARY INDEX

before issuing queries against 3.0.
Comment by Brett Lawson [ 17/Sep/14 ]
Hey Gerald,
I assume this is just a temporary workaround?
Cheers, Brett
Comment by Gerald Sangudi [ 17/Sep/14 ]
HI Brett,

It may not be temporary. User would need to issue

CREATE PRIMARY INDEX

once per bucket. After that, they can query the bucket as often as needed. Subsequent calls to CREATE PRIMARY INDEX will notice the existing index and return immediately.

Maintaining the primary index is not cost-free, so we may not want to automatically create it for every bucket (e.g. a very large KV bucket with no N1QL or view usage).

Thanks,
Gerald




[MB-10662] _all_docs is no longer supported in 3.0 Created: 27/Mar/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Melkote Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-10649 _all_docs view queries fails with err... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
As of 3.0, view engine will no longer support the special predefined view, _all_docs.

It was not a published feature, but as it has been around for a long time, it is possible it was actually utilized in some setups.

We should document that _all_docs queries will not work in 3.0

 Comments   
Comment by Cihan Biyikoglu [ 27/Mar/14 ]
Thanks. are there internal tools depending on this? Do you know if we have deprecated this in the past? I realize it isn't a supported API but want to make sure we keep the door open for feedback during beta from large customers etc.
Comment by Perry Krug [ 28/Mar/14 ]
We have a few (very few) customers who have used this. They've known it is unsupported...but that doesn't ever really stop anyone if it works for them.

Do we have a doc describing what the proposed replacement will look like and will that be available for 3.0?
Comment by Ruth Harris [ 01/May/14 ]
_all_docs is not mentioned anywhere in the 2.2+ documentation. Not sure how to handle this. It's not deprecated because it was never intended for use.
Comment by Perry Krug [ 01/May/14 ]
I think at the very least a prominant release not is appropriate.
Comment by Gerald Sangudi [ 17/Sep/14 ]
For N1QL, please advise customers to do

CREATE PRIMARY INDEX on --bucket-name--.




[MB-12101] A tool to restore corrupt vbucket file Created: 29/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Larry Liu Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Relates to

 Description   
Occassionally, a vbucket file might corrupt. It would be good to have a tool to be able to restore the data from a vbucket file.

 Comments   
Comment by Chiyoung Seo [ 03/Sep/14 ]
I'm not sure what this ticket exactly means. We can't fully restore the up-to-date state from a corrupted database file, but instead can write a tool that allows us to restore one of the latest versions that is not corrupted.




[MB-12176] Missing port number on the network ports documentation for 3.0 Created: 12/Sep/14  Updated: 16/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Ruth Harris [ 16/Sep/14 ]
Network Ports section of the Couchbase Server 3.0 beta doc has been updated with the new ssl port, 11207, and the table with the details for all of the ports has been updated.

http://docs.couchbase.com/prebuilt/couchbase-manual-3.0/Install/install-networkPorts.html
The site (and network ports section) should be refreshed soon.

thanks, Ruth




[MB-8297] Some key projects are still hosted at Membase GitHub account Created: 16/May/13  Updated: 16/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.1.0, 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Pavel Paulau Assignee: Trond Norbye
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-12185 update to "couchbase" from "membase" ... Open

 Description   
memcached, libmemcached, grommit, buildbot-internal...

They are important components of build workflow. For instance, repo manifests have multiple references to these projects.

This is very confusing legacy, I believe we can avoid it.

 Comments   
Comment by Chris Hillery [ 16/Sep/14 ]
buildbot-internal is at github.com/couchbase/buildbot-internal.

grommit will hopefully be retired in the 3.5 timeframe, and until then I don't want the disruption of moving it; it's private and internal.

Matt has opened MB-12185 to track moving memcached, which is the only project still referenced in the Couchbase server manifest from the membase remote.

libmemcached has been moved inside the "moxi" package for the Couchbase server build. Trond, two questions:

1. Does the project github.com/membase/libmemcached still have a purpose?

2. Do you think there are any projects under github.com/membase (including libmemcached) that should be retired, moved, or deprecated?




[MB-12199] curl -H arguments need to use double quotes Created: 16/Sep/14  Updated: 16/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0, 2.5.1, 3.0.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Matt Ingenthron Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Current documentation states:

Indicates that an HTTP PUT operation is requested.
-H 'Content-Type: application/json'

And that will fail, seemingly owing to the single quotes. See also:
https://twitter.com/RamSharp/status/511739806528077824


 Comments   
Comment by Ruth Harris [ 16/Sep/14 ]
TASK for TECHNICAL WRITER
Fix in 3.0 == FIXED: Added single quotes or removed quotes from around the http string in appropriate examples.
Design Doc rest file - added single quotes, Compaction rest file ok, Trbl design doc file ok

FIX in 2.5: TBD

-----------------------

CONCLUSION:
At least with PUT, both single and double quotes work around: Content-Type: application/json. Didn't check GET or DELETE.
With PUT and DELETE, no quotes and single quotes around the http string work. Note: Some of the examples are missing a single quote around the http string. Meaning, one quote is present, but either the ending or beginning quote is missing. Didn't check GET.

Perhaps a missing single quote around the http string was the problem?
Perhaps there was formatting tags associated with ZlatRam's byauth.ddoc code that was causing the problem?

----------------------

TEST ONE:
1. create a ddoc and view from the UI = testview and testddoc
2. retrieve the ddoc using GET
3. use single quotes around Content-Type: application/json and around the http string. Note: Some of the examples are missing single quotes around the http string.
code: curl -X GET -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_testddoc'
results: {
    "views": {
        "testview": {
            "map": "function (doc, meta) {\n emit(meta.id, null);\n}"
        }
    }
}

TEST TWO:
1. delete testddoc
2. use single quotes around Content-Type: application/json and around the http string
code: curl -X DELETE -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_testddoc'
results: {"ok":true,"id":"_design/dev_testddoc"}
visual check via UI: Yep, it's gone


TEST THREE:
1. create a myauth.ddoc text file using the code in the Couchbase design doc documentation page.
2. Use PUT to create a dev_myauth design doc
3. use single quotes around Content-Type: application/json and around the http string. Note: I used "| python -m json.tool" to get pretty print output

myauth.ddoc contents: {"views":{"byloc":{"map":"function (doc, meta) {\n if (meta.type == \"json\") {\n emit(doc.city, doc.sales);\n } else {\n emit([\"blob\"]);\n }\n}"}}}
code: curl -X PUT -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_myauth' -d @myauth.ddoc | python -m json.tool
results: {
    "id": "_design/dev_myauth",
    "ok": true
}
visual check via UI: Yep, it's there.

TEST FOUR:
1. copy myauth.ddoc to zlat.ddoc
2. Use PUT to create a dev_zlat design doc
3. use double quotes around Content-Type: application/json and single quotes around the http string.

zlat.ddoc contents: {"views":{"byloc":{"map":"function (doc, meta) {\n if (meta.type == \"json\") {\n emit(doc.city, doc.sales);\n } else {\n emit([\"blob\"]);\n }\n}"}}}
code: curl -X PUT -H "Content-Type: application/json" 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlat' -d @zlat.ddoc | python -m json.tool
results: {
    "id": "_design/dev_zlat",
    "ok": true
}
visual check via UI: Yep, it's there.


TEST FIVE:
1. create a ddoc text file using ZlatRam's ddoc code
2. flattened the formatting so it reflected the code in the Couchbase example (used above)
3. Use PUT and single quotes.

zlatram contents: {"views":{"byauth":{"map":"function (doc, username) {\n if (doc.type == \"session\" && doc.user == username && Date.Parse(doc.expires) > Date.Parse(Date.Now()) ) {\n emit(doc.token, null);\n }\n}"}}}
code: curl -X PUT -H 'Content-Type: application/json' 'http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlatram' -d @zlatram.ddoc | python -m json.tool
results: {
    "id": "_design/dev_zlatram",
    "ok": true
}
visual check via UI: Yep, it's there.

TEST SIX:
1. delete zlatram ddoc but without quotes around the http string: curl -X DELETE -H 'Content-Type: application/json' http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlatram
2. results: {
    "id": "_design/dev_zlatram",
    "ok": true
}
3. verify via UI: Yep, it gone
4. add zlatram but without quotes around the http string: curl -X PUT -H 'Content-Type: application/json' http://Administrator:password@10.5.2.54:8092/test/_design/dev_zlatram
5. results: {
    "id": "_design/dev_zlatram",
    "ok": true
}
6. verify via UI: Yep, it back.




[MB-11612] mapreduce: terminator thread can be up to maxTaskDuration late Created: 02/Jul/14  Updated: 16/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Dave Rigby Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
From investigation of how long-running map functions are terminated I noticed that the code to terminate them (mapreduce_nif.cc:terminatorLoop) only checks for long-running tasks every maxTaskDuration seconds.

Therefore if a task is close to (but not exceeding) it's timeout period, the terminatorLoop thread will sleep again for maxTaskDuration seconds, and will not detect the long-running task until almost 2x the timeout. Except from the code: https://github.com/couchbase/couchdb/blob/master/src/mapreduce/mapreduce_nif.cc#L459

    while (!shutdownTerminator) {
        enif_mutex_lock(terminatorMutex);
        // due to truncation of second's fraction lets pretend we're one second before
        now = time(NULL) - 1;

        for (it = contexts.begin(); it != contexts.end(); ++it) {
            map_reduce_ctx_t *ctx = (*it).second;

            if (ctx->taskStartTime >= 0) {
                if (ctx->taskStartTime + maxTaskDuration < now) {
                    terminateTask(ctx);
                }
            }
        }

        enif_mutex_unlock(terminatorMutex);
        doSleep(maxTaskDuration * 1000);
    }


We should either check more frequently, or calculate how far away the "oldest" task is from hitting it's deadline and sleep for that period.



 Comments   
Comment by Sriram Melkote [ 07/Jul/14 ]
Good catch, thanks! We'll fix this in 3.0.1 as we're limiting changes for 3.0 as we've hit beta.




[MB-12196] [Windows] When I run cbworkloadgen.exe, I see a Warning message Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build 1299

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install 3.0.1_1299 build
Go to bin directory on the installation directory, run cbworkloadgen.exe
You will see the following warning:
WARNING:root:could not import snappy module. Compress/uncompress function will be skipped.

Expected behavior: The above warning should not appear





[MB-12194] [Windows] When you try to uninstall CB server it comes up with Installer wizard instead of uninstall Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build: 3.0.1_1299

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install Windows 3.0.1_1299 build
Try to uninstall the CB server
You will see the CB InstallShield Installation Wizard and then it comes up with the prompt of removing the selected application and all of its features

Expected result: It would be nice to come up with Uninstall Wizard instead of confusing Installation wizard




[MB-12193] Docs should explicitly state that we don't support online downgrades in the installation guide Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Gokul Krishnan Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
In the installation guide, we should call out the fact that online downgrades (from 3.0 to 2.5.1) isn't supported and downgrades will require servers to be taken offline.

 Comments   
Comment by Ruth Harris [ 15/Sep/14 ]
In the 3.0 documentation:

Upgrading >
<note type="important">Online downgrades from 3.0 to 2.5.1 is not supported. Downgrades require that servers be taken offline.</note>

Should this be in the release notes too?
Comment by Matt Ingenthron [ 15/Sep/14 ]
"online" or "any"?




[MB-12191] forestdb needs an fdb_destroy() api to clean up a db Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Sundar Sridharan Assignee: Sundar Sridharan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: Unknown

 Description   
forestdb does not have an option to clean up a database.
Manual deletion of the database files after fdb_close() and fdb_shutdown() is the workaround.
fdb_destroy() option needs to be added which will erase all forestdb files cleanly.




[MB-12190] Typo in the output of couchbase-cli bucket-flush Created: 15/Sep/14  Updated: 15/Sep/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: cli
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
There should be a space between the full stop and Do.

[patrick:~] 2 $ couchbase-cli bucket-flush -b Test -c localhost
Running this command will totally PURGE database data from disk.Do you really want to do it? (Yes/No)

Another Typo when the command times out:

Running this command will totally PURGE database data from disk.Do you really want to do it? (Yes/No)TIMED OUT: command: bucket-flush: localhost:8091, most likely bucket is not flushed





[MB-12142] Rebalance Exit due to Bad Replicas Error has no support documentation Created: 05/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Rebalance exits with Bad Replicas which can be caused by ns_server or couchbase-bucket. In such situations, rebalance on re-try fails. To fix such an issue, we need manual intervention to diagnose the problem. For the support team we need to provide documentation as a part of our release notes. Please define a process for the same and then re-assign the bug to Ruth for adding it to our documentation for our release notes

 Comments   
Comment by Chiyoung Seo [ 12/Sep/14 ]
Mike,

Please provide more details on bad replica issues in DCP and assign this back to the doc team.
Comment by Mike Wiederhold [ 12/Sep/14 ]
Bad replicas is an error message that means that replication streams could not be created. As a result there may be many reasons for this to happen. One reason that this might happen is if some of the vbuckets sequence numbers which are maintained internally in Couchbase are invalid. If this happpens you will see a log message in the memcached logs that looks something like this.

(DCP Producer) some_dcp_stream_name (vb 0) Stream request failed because the snap start seqno (100) <= start seqno (101) <= snap end seqno (100) is required

In order for a DCP producer to accept a request for a DCP stream the following must be true.

snapshot start seqno <= start seqno <= snapshot end seqno

If the above condition is not true for a stream request then a customer should contact support so that we can resolve the issue using a script to "reset" the sequence numbers. I can provide this script at a later time, but it is worth noting that we do not expect this scenario to happen and have resolved all bugs we have seen related to this error.
Comment by Ruth Harris [ 12/Sep/14 ]
Put it into the release notes (not in beta but for GA) for Known Issues MB-12142.
Is this the correct MB issue?




[MB-12170] Memory usage did not go down after flush Created: 10/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Wayne Siu Assignee: Gokul Krishnan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Microsoft Windows Server 2008 R2 Enterprise
[info] OS Version : 6.1.7601 Service Pack 1 Build 7601
[info] HW Platform : PowerEdge M420
[info] CB Version : 2.5.0-1059-rel-enterprise
[info] CB Uptime : 31 days, 10 hours, 3 minutes, 51 seconds
[info] Architecture : x64-based PC
[ok] Installed CPUs : 16
[ok] Installed RAM : 98259 MB
[warn] Server Quota : 81.42% of total RAM. Max recommended is 80.00%
        (Quota: 80000 MB, Total RAM: 98259 MB)
[ok] Erlang VM vsize : 546 MB
[ok] Memcached vsize : 142 MB
[ok] Swap used : 0.00%
[info] Erlang VM scheduler : swt low is not set

Issue Links:
Relates to
relates to MB-9992 Memory is not released after 'flush' Closed
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Original problem was reported by our customer.

Steps to reproduce in their setup:
- Setup 4 node cluster (probably does not matter) bucket with 3GB, Replication of 1

- The program write 10MB binary objects from 3 threads parallely, 50 items in each thread.
Run the program (sometimes it crashes, I do not know the reason), simply run it again.
At the end of the run, there is a difference of 500 MB in ep_kv_size to the sum of vb_active_itm_memory and vb_replica_itm_memory (this might depend much on the network speed, I am using just a 100Mbit connection to the server, on production we have a faster network of course)
- Do the flush, ep_kv_size has the size of the difference even though the bucket is empty.
- Repeat this. On each run, the resident items percentage will go down.
- On the fourth or fifth run, it will throw an hard memory error, after insert only a part of the 150 items.




 Comments   
Comment by Wayne Siu [ 10/Sep/14 ]
Raju,
Can you please assign?
Comment by Raju Suravarjjala [ 10/Sep/14 ]
Tony, can you see if you can reproduce this bug? Please note it is 2.5.1 Windows 64bit
Comment by Anil Kumar [ 10/Sep/14 ]
Just a FYI previously we had opened similar issue which was on CentOS but resolved as cannot reproduce.
Comment by Ian McCloy [ 11/Sep/14 ]
It's 2.5.0 not 2.5.1 on Windows 2008 64bit
Comment by Thuan Nguyen [ 11/Sep/14 ]
Follow instruction from here,
Steps to reproduce in their setup:
- Setup 4 node cluster (probably does not matter) bucket with 3GB, Replication of 1

- The program write 10MB binary objects from 3 threads parallely, 50 items in each thread.
Run the program (sometimes it crashes, I do not know the reason), simply run it again.
At the end of the run, there is a difference of 500 MB in ep_kv_size to the sum of vb_active_itm_memory and vb_replica_itm_memory (this might depend much on the network speed, I am using just a 100Mbit connection to the server, on production we have a faster network of course)
- Do the flush, ep_kv_size has the size of the difference even though the bucket is empty.
- Repeat this. On each run, the resident items percentage will go down.
- On the fourth or fifth run, it will throw an hard memory error, after insert only a part of the 150 items.


I could not reproduce this bug after 6 flushes.
After each flush, mem use on both active and replica went down to zero.
Comment by Thuan Nguyen [ 11/Sep/14 ]
Using our loader, I could not reproduce this bug. I will use customer loader to test again.
Comment by Raju Suravarjjala [ 12/Sep/14 ]
Gokul: As we discussed can you folks try to reproduce this bug?




[MB-12019] XDCR@next release - Replication Manager #1: barebone Created: 19/Aug/14  Updated: 12/Sep/14

Status: In Progress
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: techdebt-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 32h
Time Spent: Not Specified
Original Estimate: 32h

Epic Link: XDCR next release

 Description   
build on top of generic FeedManager with XDCR specifics
1. interface with Distributed Metadata Service
2. interface with NS-server




[MB-12184] Enable logging to a remote server Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: James Mauss Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It would be nice to be able to configure Couchbase Server to log events into a remote syslog-ng or the like server.




[MB-12020] XDCR@next release - REST Server Created: 19/Aug/14  Updated: 12/Sep/14

Status: In Progress
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: techdebt-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Yu Sui
Resolution: Unresolved Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 32h
Time Spent: Not Specified
Original Estimate: 32h

Epic Link: XDCR next release

 Description   
build on top of admin port
1. request\response message format defined in protobuf
2. handlers for request




[MB-11428] JSON versions and encodings supported by Couchbase Server need to be defined Created: 16/Jun/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1, 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Matt Ingenthron Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: documentation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
While JSON is a standard, there are multiple unicode encodings and the definition of how to interact with this encoding has changed over the course of time. Also, our dependencies (mochiweb, view engine's JSON) may not actually conform to these standards.

Couchbase Server needs to define and document what it supports with respect to JSON.

See:
http://tools.ietf.org/html/draft-ietf-json-rfc4627bis-10 and
http://tools.ietf.org/html/rfc4627


 Comments   
Comment by Cihan Biyikoglu [ 16/Jun/14 ]
making this a documentation item - we should make this public.
Comment by Chiyoung Seo [ 24/Jun/14 ]
Moving this to post 3.0 as the datatype support is not supported in 3.0
Comment by Matt Ingenthron [ 11/Sep/14 ]
This isn't really datatype related, though it's not couchbase-bucket any more either. View engine and other parts of the server use JSON, what do they expect as input? It's also sort of documentation, but not strictly documentation since it should either be defined and validated, or determined based on what our dependencies actually do and verified. In either case, there's probably research and writing of unit tests I think.
Comment by Chiyoung Seo [ 12/Sep/14 ]
Assigning to the PM team to figure out the appropriate steps to be taken.




[MB-11589] Sliding endseqno during initial index build or upr reading from disk snapshot results in longer stale=false query latency and index startup time Created: 28/Jun/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Sarath Lakshman Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-11920 DCP based rebalance with views doesn'... Closed
Relates to
relates to MB-11919 3-5x increase in index size during re... Open
relates to MB-12081 Remove counting mutations introduced ... Resolved
relates to MB-11918 Latency of stale=update_after queries... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We have to fix this depending on the development cycles we have left for 3.0

 Comments   
Comment by Anil Kumar [ 17/Jul/14 ]
Triage - July 17

Currently investigating we will decide depending on the scope of changes needed.
Comment by Anil Kumar [ 30/Jul/14 ]
Triage : Anil, Wayne .. July 29th

Raising this issue to "Critical" this needs to be fixed by RC.
Comment by Sriram Melkote [ 31/Jul/14 ]
The issue is that we'll have to change the view dcp client to stream all 1024 vbuckets in parallel, or we'll need an enhancement in ep-engine to stop streaming at the point requested. Neither is a simple change - the reason it's in 3.0 is because Dipti had requested we try to optimize query performance. I'll leave it at Major as I don't want to commit to fixing this in RC and also, the product works with reasonable performance without this fix and so it's not a must have for RC.
Comment by Sriram Melkote [ 31/Jul/14 ]
Mike noted that even streaming all vbuckets in parallel (which was perhaps possible to do in 3.0) won't directly solve the issue as the backfills are scheduled one at a time. ep-engine could hold onto smaller snapshots but that's not something we can consider in 3.0 - so net effect is that we'll have to revisit this in 3.0.1 to design a proper solution.
Comment by Sriram Melkote [ 12/Aug/14 ]
Bringing back to 3.0 as this is the root cause of MB-11920 and MB-11918
Comment by Anil Kumar [ 13/Aug/14 ]
Deferring this to 3.0.1 since making this out of scope for 3.0.
Comment by Sarath Lakshman [ 05/Sep/14 ]
We need to file an EP-Engine dependency ticket to implement parallel streaming support without causing sliding endseq during ondisk snapshot backfill.




[MB-11840] 3.0 (Beta): Views periodically take 2 orders of magnitude longer to complete Created: 29/Jul/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Task Priority: Major
Reporter: Daniel Owen Assignee: Sriram Melkote
Resolution: Unresolved Votes: 0
Labels: customer, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Single Node running version 3.0.0 enterprise edition (build-918). Running on VirtualBox, assigned 8 vCPUs and 8GB memory. (host has 24 cores, 128GB RAM).

Attachments: File backup.tgz     File curlscript.sh     PNG File output-251.png     PNG File output-3.0.png    
Issue Links:
Dependency

 Description   
Hi Alk,

I can demonstrate the behaviour of views periodically taking 2 orders of magnitude longer with 3.0.
(Similar to the issue we were investigating relating to Stats Archiver).

See output-3.0, x-axis is just a count of view queries. Test ran for ~53 minutes and completed 315408 view (~100 second). The y-axis is view response time (in seconds).

In general the response time is < 0.01 of a second. However occasionally (9 out of 315408 views) it takes > 0.1. This may be considered acceptable in the design of the Server, but wanted to get confirmation.

To replicate the test, run...

 while true; do ./curlscript.sh >> times2.txt 2>&1 ; done

I have provided curlscript.sh as an attached file.

The generated workload is test data from the same customer that hit the Stats Archive Issue
Create a bucket named "oogway" and then do a cbtransfer of the unpacked backup.tgz file (see attached).

 Comments   
Comment by Aleksey Kondratenko [ 29/Jul/14 ]
What I'm supposed to do with that ?
Comment by Aleksey Kondratenko [ 29/Jul/14 ]
CC-ed some folks.
Comment by Sriram Melkote [ 29/Jul/14 ]
Daniel - can you please let me know what is plotted on X and Y axis, and the unit for them?
Comment by Daniel Owen [ 29/Jul/14 ]
Hi Sriram, I have updated the description to contain more information. I'm just currently running a similar experiment on 2.5.1 and will upload when I get the results.
Comment by Daniel Owen [ 29/Jul/14 ]
I have uploaded data for a similar experiment performed on 2.5.1 (build-1083).
Again for ~53 minutes, we performed at total of 308193 queries (~100 per second) and a total of 15 out of 308193 took > 0.1 seconds to complete. In general response time is < 0.01

Note: That given the large CPU entitlement, we don't see any regular peak in view times due to the Stats Archive (i.e. not seeing regular spikes every 120 seconds) however we are still seeing very large spikes in view query response times (it appears more frequently than in 3.0 beta).
Comment by Daniel Owen [ 29/Jul/14 ]
I suspect the 2.5.1 results are worse than 3.0 because 2.5.1 is using Erlang version R14B04, and therefore as highlighted by Dave Rigby may be impacted by the bug OTP-11163.

See https://gist.github.com/chewbranca/07d9a6eed3da7b490b47#scheduler-collapse
Comment by Sriram Melkote [ 29/Jul/14 ]
A few points I'd like to note:

(a) There is no specified guarantee on time a query will take to respond; 300ms is not unusual response time for the odd case.
(b) It appears to be not a regression based on the 2.5 and 3.0 comparison graph
(c) Query layer is heavily in Erlang and we are already rewriting it. So I'm targeting this outside of 3.0

I'm changing this back to a task as we need to investigate further to see if this behavior is indicative of an underlying bug before proceeding further.

EDIT: Removing comment about OTP-11163 not being a suspect because we're indeed seeing it in MB-11917




[MB-12180] Modularize the DCP code Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We need to modularize the DCP code so that we can write unit tests to ensure that we have fewer bugs and less regressions from future changes.




[MB-12179] Allow incremental pausable backfills Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Task Priority: Major
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Currently ep-engine requires that backfills run from start to end and cannot be paused. This creates a problem for a few reasons. First off, if a user has a large dataset then we will potentially need to backfill a large amount of data from disk and into memory. Without the ability to pause and resume a backfill we cannot control the memory overhead created from reading items off of disk. This can affect the resident ratio if the data that needs to be read by the backfill is large.

A second issue is that this means that we can only run one (or two if there are enough cpu cores) backfill at a time and all backfill must be run serially. In the future we plan on allowing more DCP connections to be created to a server. If many connections require backfill we may have some connections that do not receive data for an extended period of time because these connections are waiting for their backfills to be scheduled.




[MB-12182] XDCR@next release - unit test "asynchronize" mode of XmemNozzle Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: sprint1_xdcr
Remaining Estimate: 16h
Time Spent: Not Specified
Original Estimate: 16h

Epic Link: XDCR next release




[MB-11989] XDCR next release Created: 18/Aug/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Epic Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Epic Name: XDCR next release
Epic Status: To Do




[MB-12145] {DCP}:: After Rebalance ep_queue_size Stat gives incorrect info about persistence Created: 08/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 1208, 10.6.2.145-10.6.2.150

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.145-982014-1126-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.145-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.146-982014-1129-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.146-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.147-982014-1132-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.147-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.148-982014-1135-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.148-982014-1143-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.149-982014-1138-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.149-982014-1144-couch.tar.gz
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.150-982014-1141-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12145/10.6.2.150-982014-1144-couch.tar.gz
Is this a Regression?: Yes

 Description   

1. Create 6 cluster
2. Create default bucket with 10 K items
3. After ep_queue_size =0, take snap-shot of all data using cbtransfer (for couchstore files)
4. Rebalance-out 1 Node
5. After ep_queue_size =0, sleep for 30 seconds, take snap-shot of all data using cbtransfer (for couchstore files)

Step 5 and Step 3 shows inconsistency in expected keys as we find some keys missing. We also do data verification using another client which does not fail. Also, active and replica items counts are as expected. Issue seen in our expected items in couch store files

mike1651

 mike6340

 mike8616

 mike5380

 mike2691

 mike4740

 mike6432

 mike9418

 mike9769

 mike244

 mike7561

 mike5613

 mike6743

 mike2073

 mike1252

 mike4431

 mike9346

 mike4343

 mike9037

 mike6866

 mike2302

 mike3652

 mike7889

 mike2998

Note that on increasing the delay after we see ep_queue_size =0, from 30 to 60 to 120, we still hit issue when some keys are missing. Had adjusted the delay to 240 seconds and did not see the missing keys.

This is a not a case of data loss. Only stats (ep_queue_size =0) are incorrect. I have verified cbtransfer functionality and it does not break during the test runs.

Test Case:: ./testrunner -i ~/run_tests/palm.ini -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_after_ops,nodes_out=1,replicas=1,items=10000,skip_cleanup=True

Also, with vbuckets=128 this problem does not repro. So please try it for 1024 vbuckets.

Seen this issues in different places for failover+rebalance.



 Comments   
Comment by Ketaki Gangal [ 12/Sep/14 ]
Run into same issue with ./testrunner -i /tmp/rebal.ini active_resident_threshold=100,dgm_run=true,get-delays=True,get-cbcollect-info=True,eviction_policy=fullEviction,max_verify=100000 -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_after_ops,nodes_out=1,replicas=1,items=10000,GROUP=OUT

It uses same verification method as above and fails due to ep_queue_size stat
1. Create cluster
3. After ep_queue_size =0, take snap-shot of all data using cbtransfer (for couchstore files)
4. Rebalance-out 1 Node
5. After ep_queue_size =0, sleep for 30 seconds, take snap-shot of all data using cbtransfer (for couchstore files)




[MB-12173] SSL certificate should allow importing certs besides server generated certs Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: finish-start
has to be done before MB-12177 document SDK usage of CA and self-sig... Open
Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Matt Ingenthron [ 12/Sep/14 ]
Existing SDKs should be compatible with this, but importing the CA certs will need to be documented.




[MB-8872] a number of capi REST API endpoints are not secured Created: 19/Aug/13  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: 2.0, 2.1.0, 2.2.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The following APIs are all un-protected apparently.

[httpd_global_handlers]
/ = {couch_httpd_misc_handlers, handle_welcome_req, <<"Welcome">>}
_active_tasks = {couch_httpd_misc_handlers, handle_task_status_req}
_view_merge = {couch_httpd_view_merger, handle_req}
_set_view = {couch_set_view_http, handle_req}

[httpd_db_handlers]
_view_cleanup = {couch_httpd_db, handle_view_cleanup_req}
_compact = {couch_httpd_db, handle_compact_req}
_design = {couch_httpd_db, handle_design_req}
_changes = {couch_httpd_db, handle_changes_req}

[httpd_design_handlers]
_view = {couch_httpd_view, handle_view_req}
_info = {couch_httpd_db, handle_design_info_req}

At least _view above is overridden by capi layer.

I've myself just tried _changes feed and it worked.


 Comments   
Comment by Aleksey Kondratenko [ 19/Aug/13 ]
CC-ed some stakeholders
Comment by Aleksey Kondratenko [ 10/Oct/13 ]
Should be considered for 3.0
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
Alk,

yes, pls fix. required for 3.0 ssl.
Comment by Sriram Melkote [ 23/May/14 ]
Filipe, as it's not clear what the downstream effects are of securing these, request you to consider and fix this appropriately
Comment by Sriram Melkote [ 11/Jun/14 ]
Alk, do you mean that these endpoints should need authentication? Or are they bypassing SSL when it's enabled?
Comment by Aleksey Kondratenko [ 11/Jun/14 ]
Lack auth today. Exposing user data
Comment by Sriram Melkote [ 16/Jun/14 ]
I'd like to defer this to 3.0.1 as it has been this way for many releases and don't want to put in non-bugfixes at this point in release




[MB-12175] Need a way to enforce SSL for admin and data access Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Don Pinto
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
today we allow both unencrypted and encrypted communication and one can use firewalls to control which one stays available to communicating with couchbase server. it would be great to have a way to enforce secure communication through a switch and disable any unencrypted access to help compliance with security standards easily.




[MB-12174] Clarification on SSL communication documentation for 3.0 Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown




[MB-12177] document SDK usage of CA and self-signed certs Created: 12/Sep/14  Updated: 12/Sep/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Matt Ingenthron Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt: finish-start
has to be done after MB-12173 SSL certificate should allow importin... Open

 Description   
To be done after Couchbase Server supports this.




[MB-10694] Eliminate cygwin requirement for testing on Windows Created: 31/Mar/14  Updated: 11/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: test-execution
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Trond Norbye Assignee: Tommie McAfee
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We have made great strides in eliminating the cygwin/mingw requirements from the main build for Couchbase Server. However, there are many parts of the environment which still are dependent on GNU make in particular, and on a Unix-like (cygwin) environment in general - voltron, the buildbot scripts, and testrunner being the most obvious. We hope to eliminate those over time as well, and this bug will track that effort.

Original description from Trond:

The script to start / stop the test is implemented in bash which is unavailable on our windows machines (after the move to cmake). Move to python?

 Comments   
Comment by Chris Hillery [ 02/Apr/14 ]
I'm lowering the priority of this one, as it is not going to happen immediately and is of less urgency that making the main product build work. I'll assign it to myself as it is a larger issue that just commit validation.
Comment by Don Pinto [ 02/Sep/14 ]
Chris,

Does this go away with the new windows build system?

Thanks,
Comment by Chris Hillery [ 03/Sep/14 ]
No, unfortunately to the best of my knowledge all the test scripts are still cygwin-based. We HAVE managed to eliminate cygwin dependencies from voltron and buildbot.
Comment by Chris Hillery [ 03/Sep/14 ]
Raju this was initially assigned to me for the build side. At this point those issues have been addressed. The only remaining dependencies are in testrunner. Assigning to you for reallocation.
Comment by Raju Suravarjjala [ 05/Sep/14 ]
Tony: Can you take a look at this? If this is no longer an issue, please close it
Comment by Thuan Nguyen [ 05/Sep/14 ]
Cygwin uses as ssh server in windows vm to connect from slave in jenkins to windows vm when testing.
Copy... files to/from slave and vm. Delete/rename files in windows vm.
Comment by Chris Hillery [ 09/Sep/14 ]
The testrunner package still has numerous cygwin dependencies:

- Makefile invokes commands such as "tar", "echo", "rm"...

- All the actual test scripts are bash scripts

(I am assuming testrunner is still used for Windows; if those tests have been re-implemented for Windows in some other way, it seems to me it would make sense eliminate testrunner/ entirely and use that other way for Linux and Mac testing as well.)

Also FYI it's not necessary or desirable to use an SSH server on Windows to connect it as a Jenkins slave.

Re-opening this bug. It's not hugely important, but it shouldn't be marked "Done" if it's not.
Comment by Chris Hillery [ 09/Sep/14 ]
This shouldn't be assigned to me; not sure why Tony did so. I have no knowledge at all of how the testrunner world works or even what it's trying to do. I know enough about cygwin to identify the dependency, that's all. I can offer technical assistance in converting things to CMake, python, or other cross-platform solutions, but I can't drive the effort myself.

Assigning back to Raju, I guess. Raju, the risk with the current situation is that we're forcing ourselves to test the Windows product in an environment that is substantively different than the environment that customers will run in. In reality it is probably not a large risk, but to me leaving it open with a Priority Minor makes the most sense. If you want to decide to close this as Won't Fix, you can do so.
Comment by Trond Norbye [ 10/Sep/14 ]
There is a python implementation of "testrunner" in there that we've been using for testing on windows for so far. It seems to work as expected, but it needs to be verified on all platforms (and tests) so we don't break anything when its replaces.

The bug is still not complete, because you still can't run all of the tests with "expected behavior" on windows. If a test fails the current test setup fails to copy the "cbcollectinfo" stuff from the running cluster because it tries to use ssh etc (which you normally don't have on a windows box). In the normal test scenario (run by developers on their laptops), _EVERYTHING_ is on the same physical machine so using scp to copy files around isn't necessary.




[MB-12171] Typo missing space on point 4 couchbase data files Created: 11/Sep/14  Updated: 11/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 1.8.0, 2.0.1, 2.1.0, 2.2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: documentation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-data-files
http://docs.couchbase.com/couchbase-manual-2.1/#couchbase-data-files
http://docs.couchbase.com/couchbase-manual-2.0/#couchbase-data-files
http://docs.couchbase.com/couchbase-manual-1.8/#couchbase-data-files

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Point 4 needs a space between and monitor.

Start the service again andmonitor the “warmup” of the data.

 Comments   
Comment by Ruth Harris [ 11/Sep/14 ]
Fixed in 2.5. N/A in 3.0




[MB-7442] Information about SDK error-handling on Warmup Created: 18/Dec/12  Updated: 11/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Anonymous Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Hi Matt, can you get information from the team about SDK handling of this server error:

"To CouchBaser Server clients, ENGINE_TMPFAIL (0x0d) gets generated during warmup."

Is that converted into respective language error objects, or what happens?

 Comments   
Comment by Amy Kurtzman [ 23/Jun/14 ]
Matt, do you know if this issue is still outstanding?
Comment by Matt Ingenthron [ 11/Sep/14 ]
It pretty much is. This should be addressed in the Dev Guide I think.




[MB-9174] Smart Client version information available from cluster Created: 25/Sep/13  Updated: 11/Sep/14

Status: Open
Project: Couchbase Server
Component/s: clients, ns_server
Affects Version/s: 2.2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Improvement Priority: Minor
Reporter: David Haikney Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
As part of the support process we typically capture logs using cbcollect_info. For any support issue involving the clients we have to ask which version of the SDK is in use which adds a delay to the process. It would be useful if the SDK could supply this information to the cluster as some form of signature as part of its initial connection. Then we would need a method for extracting this information from the cluster as part of the cbcollect_info process.

 Comments   
Comment by Michael Nitschinger [ 25/Sep/13 ]
The clients could do that by supplying a x-header in the streaming connection request.

BUT we want to move away from that, so I'm not sure it's straightforward (since there can be no state pushed from the client aside from connecting to something)..
Comment by Matt Ingenthron [ 27/Nov/13 ]
Trond: do you think we can add something to authentication so client auth can be logged, including version?
Comment by Trond Norbye [ 21/Dec/13 ]
It would be better to use an explicit HELLO command that is coming in 3.0.
Comment by Matt Ingenthron [ 11/Sep/14 ]
We'd like to support this, with the move to carrier publication and questions on hello, I'll pass this to Anil at the moment. I'd be glad to get into a discussion about how we'd do this.




[MB-11393] cbstats doesn't give error messages Created: 11/Jun/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Anil Kumar Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
cbstats should give error message for example - if bucket doesn't exist

 Comments   
Comment by Bin Cui [ 17/Jun/14 ]
cbstats will be addressed by ep_engine team.
Comment by Mike Wiederhold [ 10/Sep/14 ]
We cannot tell if the username or password is incorrect based on the response from authentication. If this is the only request then I think this should be closed as won't fix. If there are other requests then please add them in the description.




[MB-9603] investigate why we need to sleep 1 sec on cluster leave Created: 19/Nov/13  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.1.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Major
Reporter: Artem Stemkovski Assignee: Artem Stemkovski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
... and if possible get rid of this sleep




[MB-12169] Unexpected disk creates during graceful failover Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
4-node cluster with beer-sample bucket plus 300k items. Workload is 50/50 gets/sets, but sets are over same 300k items constantly.

When I do a graceful failover of one node, I see a fair amount of disk creates even though no new data is being inserted.

If there is a reasonable explanation great, but I am concerned that there may be something incorrect going on either with the identification of new data or the movement of vbuckets.

Logs are here:
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-193-230-57.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-215-23-198.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-215-29-139.us-west-1.compute.amazonaws.com.zip
https://s3.amazonaws.com/cb-customers/perry/diskcreates/collectinfo-2014-09-10T205907-ns_1%40ec2-54-215-40-174.us-west-1.compute.amazonaws.com.zip




[MB-12165] UI: Log - Collect Information. Upload options text boxes should be 'grayed out' when "Upload to couchbase" is not selected. Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Jim Walker Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: log, ui
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6 CB server (1 node cluster, VirtualBox VM)
Client browsers all running on OSX 10.9.4

Triage: Untriaged
Operating System: MacOSX 64-bit
Is this a Regression?: Unknown

 Description   
Couchbase Server Version: 3.0.0 Enterprise Edition (build-1208)

When going to the log upload area of the UI I found that all text boxes in the Upload Options section are read only with out any visual indicator.

It took a bit of clicking and checking browser liveness that it was because the check box "Upload to couchbase" was not checked.

The input boxes should be grayed out or some other visual indicator showing they're not usable.

* Tested with Chrome 37.0.2062.120
* Tested with Safari 7.0.6 (9537.78.2)




[MB-8524] vbucket move scheduling should avoid increasing count of vbuckets on old nodes in rebalance-in cases (was: Certain rebalances appear to move vbuckets onto and off of a node) Created: 26/Jun/13  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.6.4, 1.7 GA, 1.8.0, 2.0, 2.1.0, 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2013-06-26 at 4.42.10 PM.png    

 Description   
ALK:

The bug appears to be that we can schedule moves in a way that temporarily increases number of vbuckets on some node(s) and that better moves scheduling would avoid it. It makes difference for cases where people add more nodes and when they're near their peak capacity. So it's undesirable for them to increase load (count of vbuckets) on any nodes during rebalance even if temporarily.

Original description below (by Perry):

-Setup cluster of 4 nodes
-Add two nodes and rebalance
-Wait a few minutes and stop the rebalance
-Restart the rebalance
-Wait a few minutes and stop it again
-Remove the same two nodes and rebalance
-Now notice that at least one of the nodes has an active vbucket count that increases before decreasing

Logs are at:
https://s3.amazonaws.com/customers.couchbase.com/perryrebalance/node1.zip
https://s3.amazonaws.com/customers.couchbase.com/perryrebalance/node2.zip
https://s3.amazonaws.com/customers.couchbase.com/perryrebalance/node3.zip
https://s3.amazonaws.com/customers.couchbase.com/perryrebalance/node4.zip
https://s3.amazonaws.com/customers.couchbase.com/perryrebalance/node5.zip
https://s3.amazonaws.com/customers.couchbase.com/perryrebalance/node6.zip
(logs have a few rebalance tests within them, last 3 relate to the above description)

Screenshot attached showing vbuckets going up and down

 Comments   
Comment by Maria McDuff (Inactive) [ 20/May/14 ]
Assigning to Iryna. If this is still happening in 3.0, pls assign to Alk.
Comment by Perry Krug [ 27/May/14 ]
I did a quick check and still see this happening in 3.0-743
Comment by Anil Kumar [ 04/Jun/14 ]
Triage - June 04 2014 Alk, Wayne, Parag, Anil
Comment by Aliaksey Artamonau [ 09/Jun/14 ]
I cannot access original logs and unfortunately wasn't able to reproduce it myself.
Comment by Perry Krug [ 10/Jun/14 ]
Aliaksey, I was just able to reproduce this on 3.0 build 797 with even simpler steps than what I wrote above:

-Setup 4-node cluster
-Load beer-sample bucket (likely not important which bucket you load)
-Add 2 nodes
-Rebalance

Observe that some of the original 4-nodes have their vbucket counts go up, then down, then back up and down again. I've seen this happen in a few different permutations, but this is the simplest.

Please let me know if you need a new set of logs or are able to reproduce yourself.

Comment by Aliaksey Artamonau [ 10/Jun/14 ]
In this scenario it's somewhat expected. After rebalance, each node's set of active vbuckets can be very different from what it used to be. Current move scheduling code doesn't try to ensure in any way that vbuckets are moved in only after some other are moved out. So it's possible for node to receive new active vbuckets before (any) of its current active vbuckets are moved out. That's why you see ups and downs on the graph.
Comment by Perry Krug [ 10/Jun/14 ]
Well I'm not sure I agree that it's desired behavior. The 4 existing nodes "should" only lose vbuckets as the two new ones gain. This is a simplified scenario, but even in similar cases we've seen customers negatively impacted because the existing nodes may have already been close to their sizing limit and adding vbuckets to them can cause further performance and stability problems.

I agree that the current algorithm does not have any code for handling this, but that's precisely what the bug was opened (almost a year ago) to resolve.
Comment by Aleksey Kondratenko [ 10/Jun/14 ]
Updated bug description given clarified understanding and moved out of 3.0. It's valid improvement but far less important than some other post-3.0 work that we have.




[MB-11640] DCP Prioritization Created: 03/Jul/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, DCP
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-11642 Intra-replication falling far behind ... Reopened

 Description   
It would be a valuable design improvement to allow for a high/low priority on DCP streams similar to how we handle bucket IO priority.

Intra-cluster DCP should always be prioritized over anything else to protect against single node failure and especially false-positive autofailover

Others such as XDCR and views should initially be low priority, with a future improvement to allow for end-user configuration if needed.

 Comments   
Comment by Mike Wiederhold [ 10/Sep/14 ]
I discussed this issue with Trond since I most of the work will need to be done in memcached. Resolving this issue will likely require some architectural changes and will need to be planned for a minor release.
Comment by Mike Wiederhold [ 10/Sep/14 ]
Assigning to Anil for planning since this is not a small change.




[MB-12168] Documentation: Clarification around server RAM quota best practice Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Brian Shumate Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The sizing[1] and RAM quota[2] documentation should be more clear about the specific best practices around general server RAM quota of no greater than 80% physical RAM per node on nodes with 16GB or more, or no greater than 60% on nodes with less than 16GB.

Emphasizing that the 20% or 40% remainder of RAM is required for the operating system, file system caches, and so on would be helpful as well.

Additionally, the RAM quota sub-section of the Memory quota section[3] reads as if it is abruptly cut off or otherwise incomplete:

--------
RAM quota

You will not be able to allocate all your machine RAM to the per_node_ram_quota as there may be other programs running on your machine.
--------

1. http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#couchbase-bestpractice-sizing
2. http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#ram-quotas
3. http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota






[MB-12166] Linux: Warnings on install are poorly formatted and unlikely to be read by a user. Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Dave Rigby Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6

Attachments: PNG File Screen Shot 2014-09-10 at 15.21.55.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
When installing the 3.0 RPM, we check for various OS settings and print warnings if they don't meet our recommendations.

This is a great idea in principle, but the actual output isn't very well presented, meaning users are (IMHO) likely to not spot the issues which are being raised.

I've attached a screenshot to show this exactly as displayed in the console, but the verbatim text is:

---cut ---
$ sudo rpm -Uvh couchbase-server-enterprise_centos6_x86_64_3.0.0-1209-rel.rpm
Preparing... ########################################### [100%]
Warning: Transparent hugepages may be used. To disable the usage
of transparent hugepages, set the kernel settings at runtime with
echo never > /sys/kernel/mm/transparent_hugepage/enabled
Warning: Transparent hugepages may be used. To disable the usage
of transparent hugepages, set the kernel settings at runtime with
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
Warning: Swappiness is not 0.
You can set the swappiness at runtime with
sysctl vm.swappiness=0
Minimum RAM required : 4 GB
System RAM configured : 0.97 GB

Minimum number of processors required : 4 cores
Number of processors on the system : 1 cores

   1:couchbase-server ########################################### [100%]
Starting couchbase-server[ OK ]

You have successfully installed Couchbase Server.
Please browse to http://localhost.localdomain:8091/ to configure your server.
Please refer to http://couchbase.com for additional resources.

Please note that you have to update your firewall configuration to
allow connections to the following ports: 11211, 11210, 11209, 4369,
8091, 8092, 18091, 18092, 11214, 11215 and from 21100 to 21299.

By using this software you agree to the End User License Agreement.
See /opt/couchbase/LICENSE.txt.
$
---cut ---

A couple of observations:

1) Everything is run together, including informational things (Preparing, Installation successful), things the user should act on (Warning: Swappiness, THP, Firewall information).

2) It's not very clear how serious some of these messages are - Is the fact I'm running with 1/4 of the minimum RAM just a minor thing, or a showstopper? Similary with THP - Support have seen on many occasions this can can cause false-positive fail overs, but we just casually say here:

"Warning: Transparent hugepages may be used. To disable the usage of transparent hugepages, set the kernel settings at runtime with echo never > /sys/kernel/mm/transparent_hugepage/enabled"


Suggestions:

1) Make the Warnings more pronounced - e.g prefix with "[WARNING]" and add some blank lines between things

2) Make clearer why these things are listed - linking back to more detailed information in our install guide if necessary. For example: "THP may cause slowdown of the cluster manager and false positive fail overs. Couchbase recommend disabling it. See http://docs.couchbase.com/THP for more details."

3) For things like THP which we can actually fix, ask the user if they want them fixed - after all we are already root if we are installing - e.g. "THP bad.... Would you like to change system THP setting to be changed to the recommended value (madvise) (y/n)?"

4) For things we can't fix (low memory, low CPUs) make the user confirm their decision to continue - e.g. "CPUs below minimum. Couchbase recommends at least XXX for production systems. Please type "test system" to continue installation.



 Comments   
Comment by David Haikney [ 10/Sep/14 ]
+1 from me - we can clearly improve the presentation here. I expect making the install interactive ("should I fix THP?") could be difficult. Are there existing precedents we can refer to here to help consistency?
Comment by Dave Rigby [ 10/Sep/14 ]
@DaveH: Admittedly I don't think they use RPM, but VMware guest tools springs to mind - they present the user a number of questions when installing - "do you want to automatically update kernel modules?", "do you want to use printer sharing", etc.

Admittedly they don't have a secondary config stage unlike us with our GUI, *but* if we are going to fix things like THP, swappiness, then we need to be root to do so (and so install-time is the only option).




[MB-12163] Memcached Closing connection due to read error: Unknown error Created: 10/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Couchbase Server
Component/s: memcached
Affects Version/s: 2.5.0, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Ian McCloy Assignee: Dave Rigby
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [info] OS Name : Microsoft Windows Server 2008 R2 Enterprise
[info] OS Version : 6.1.7601 Service Pack 1 Build 7601
[info] CB Version : 2.5.0-1059-rel-enterprise

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The error message "Closing connection due to read error: Unknown error" doesn't explain what the problem is. Unfortunately on Windows we aren't parsing the error code properly. We need to call FormatMessage() not strerror().

Code At
http://src.couchbase.org/source/xref/2.5.0/memcached/daemon/memcached.c#5360




[MB-12149] [Windows] Cleanup unnecessary files that are part of the windows builder Created: 08/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build 3.0.1-1261

Attachments: PNG File Screen Shot 2014-09-09 at 2.22.28 PM.png    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install windows build 3.0.1-1261
As part of the installation you will see the following directories:

1. cmake -- Does this need to be there?
2. erts-5.10.4 under server directory and also it is under lib directory but some files are duplicated please remove the duplicated files
3. licenses.tgz file -- This can be removed (I do not find this in Linux anymore)



 Comments   
Comment by Raju Suravarjjala [ 09/Sep/14 ]
I did a search on erts_mt and found 4 of them, looks like there are duplicate files 2 for each eras_MT.lib and eras_MTD.lib in two different folders
Comment by Sriram Melkote [ 09/Sep/14 ]
I can help with erts stuff (if removing one of them breaks anything, that is)




[MB-12157] Intrareplication falls behind OPs causing data loss situation Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.1, 3.0, 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thomas Anderson Assignee: Thomas Anderson
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 4 node cluster; 4 core nodes; beer-sample application run at 60Kops (50/50 ratio), nodes provisioned on RightScale EC2 x1.large images

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
the intra-replication queue grows to unacceptable limits, exposing dataloss of multiple seconds of queued replication.
the problem is more pronounced on the RightScale provision cluster, but can be seen on local physical clusters with long enough test run (>20min). recovery requires stopping input request queue.
initial measurements of the erlang process suggests that minor retries on scheduled network i/o eventually build up into a limit for push of replication data, scheduler_wait appears to be the consuming element, epoll_wait counter increases per measurement, as does the mean time wait, suggesting thrashing in the erlang event scheduler. there are various papers/presentations that suggest Erlang is sensitive to the balance of tasks (a mix of long event and short event can cause performance thruput issues).

cbcollectinfo logs will be attached shortly

 Comments   
Comment by Aleksey Kondratenko [ 09/Sep/14 ]
Still don't have any evidence. Cannot own this ticket until evidence is provided.




[MB-12161] per-server UI does not refresh properly when adding a node Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Admittedly quite minor, but a little annoying.

When you're looking at a single stat across all nodes of a cluster (i.e active vbuckets):

-Add a new node to the cluster from another tab open to the UI
-Note that the currently open stats screen stops displaying graphs for the existing nodes and does not update that a new node has joined until you refresh the screen




[MB-11670] Rebuild whole project when header file changes Created: 08/Jul/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Minor
Reporter: Volker Mische Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
When you change a header file in the view-engine (couchdb project) the whole project should be rebuild.

Currently if you change a header file and you don't clean up the project you could end up with run-time errors like a badmatch on the #writer_acc record.

PS: I opened that as an MB bug and not as a CBD as this is valueable information about badmatch errors that should be public.

 Comments   
Comment by Chris Hillery [ 09/Jul/14 ]
This really has nothing to do with build team, and as such it's perfectly appropriate for it to be MB.

I'm assigning it back to Volker for some more information. Can you give me a specific set of actions you can take that demonstrate this not happening? Is it to do with Erlang code, or C++?
Comment by Volker Mische [ 09/Jul/14 ]
Build Couchbase with a make.

Now edit a couchdb Erlang header file. For example edit couchdb/src/couch_set_view/include/couch_set_view.hrl and comment this block out (with leading `%`):

-record(set_view_params, {
    max_partitions = 0 :: non_neg_integer(),
    active_partitions = [] :: [partition_id()],
    passive_partitions = [] :: [partition_id()],
    use_replica_index = false :: boolean()
}).

When you do a "make" again, ns_server will complain about something missing, but couchdb won't as it doesn't rebuild at all.

Chris, I hope this information is good enough, if you need more, let me know.
Comment by Anil Kumar [ 30/Jul/14 ]
Triage : Anil, Wayne .. July 30th

Ceej/Volker - Please update the ticket.
Comment by Chris Hillery [ 30/Jul/14 ]
No update, working on beta issues.
Comment by Sriram Melkote [ 01/Aug/14 ]
Moving to 3.0.1 as I think it's probably too late to add this dependency detection for 3.0 build system
Comment by Volker Mische [ 09/Sep/14 ]
Raju, please see the history of this issue, it is intentionally set to "view-engine" and not "build".
Comment by Anil Kumar [ 09/Sep/14 ]
Volker - We see that its related to view-engine component but the work needed is from build. Also its assigned to Ceej to make those changes you requested.
Comment by Chris Hillery [ 09/Sep/14 ]
IMHO, choosing the Component of the ticket based on who it is assigned to isn't productive. It's assigned to me mostly because I happen to know CMake pretty well and I'm happy to help out. But the ownership of the issue in question is the view-engine team, not the build team. If the Component field has any meaning at all, surely it should be to track the ownership?

To be fair, the compilation scripts (formerly Makefiles, now CMakeLists) are on the fuzzy edge between product dev and build team. If this had something to do with how the code was packaged, or there was a specific platform compilation requirement that needed to be addressed, then it would make sense for build to own the issue. In this case, though, I think code dependencies are on the dev side.

I don't really care what the Component is set to because AFAIK it's not actually used for anything. I do want to ensure that nobody believes that the transition to CMake also involved a transition of ownership of all CMake code to build team, though.




[MB-12160] setWithMeta() is able to update a locked remote key Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: all, 3.0.0-1208

Attachments: Zip Archive 10.3.4.186-992014-168-diag.zip     Zip Archive 10.3.4.188-992014-1611-diag.zip    
Triage: Untriaged
Is this a Regression?: No

 Description   
A simple test to check if setWithMeta() refrains from updating a locked key-

Steps
--------
1. uni-xdcr on default bucket from .186 --> .188
2. create a key 'pymc1098' with "old_value" on .186
3. sleep for 10 secs, it gets replicated to .188.
4. Now getAndLock() on 'pymc1098' on .188 for 20s
5. Meanwhile, update same key at .186
6. After 10s(lock should not have expired now, also see timestamps in test log below), do a getMeta() at source and dest, they match
Destination key contains "new_doc".


def test_replication_after_getAndLock_dest(self):
        src = MemcachedClient(host=self.src_master.ip, port=11210)
        dest = MemcachedClient(host=self.dest_master.ip, port=11210)
        self.log.info("Initial set = key:pymc1098, value=\"old_doc\" ")
        src.set('pymc1098', 0, 0, "old_doc")
       # wait for doc to replicate
        self.sleep(10)
       # apply lock on destination
        self.log.info("getAndLock at destination for 20s ...")
        dest.getl('pymc1098', 20, 0)
       # update source doc
        self.log.info("Updating 'pymc1098' @ source with value \"new_doc\"...")
        src.set('pymc1098', 0, 0, "new_doc")
        self.sleep(10)
        self.log.info("getMeta @ src: {}".format(src.getMeta('pymc1098')))
        self.log.info("getMeta @ dest: {}".format(dest.getMeta('pymc1098')))
        src_doc = src.get('pymc1098')
        dest_doc = dest.get('pymc1098')


2014-09-09 15:27:13 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] Initial set = key:pymc1098, value="old_doc"
2014-09-09 15:27:13 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 10 secs for doc to be replicated ...
2014-09-09 15:27:23 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] getAndLock at destination for 20s ...
2014-09-09 15:27:23 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] Updating 'pymc1098' @ source with value "new_doc"...
2014-09-09 15:27:23 | INFO | MainProcess | test_thread | [xdcrbasetests.sleep] sleep for 10 secs. ...
2014-09-09 15:27:33 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] getMeta @ src: (0, 0, 0, 2, 16849348715855509)
2014-09-09 15:27:33 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] getMeta @ dest: (0, 0, 0, 2, 16849348715855509)
2014-09-09 15:27:33 | INFO | MainProcess | test_thread | [uniXDCR.test_replication_after_getAndLock_dest] src_doc = (0, 16849348715855509, 'new_doc')
dest_doc =(0, 16849348715855509, 'new_doc')

Will attach cbcollect.

 Comments   
Comment by Aruna Piravi [ 09/Sep/14 ]
Causes inconsistency when the server by itself disallows set but allows set through setWithMeta when locked.




[MB-12159] Memcached throws an irrelevant message while trying to update a locked key Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Aruna Piravi Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.0-1208

Triage: Untriaged
Is this a Regression?: No

 Description   
A simple test to see if updates are possible on locked keys

def test_lock(self):
        src = MemcachedClient(host=self.src_master.ip, port=11210)
        # first set
        src.set('pymc1098', 0, 0, "old_doc")
        # apply lock
        src.getl('pymc1098', 30, 0)
        # update key
        src.set('pymc1098', 0, 0, "new_doc")

throws the following Memcached error -

  File "pytests/xdcr/uniXDCR.py", line 784, in test_lock
    src.set('pymc1098', 0, 0, "new_doc")
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 163, in set
    return self._mutate(memcacheConstants.CMD_SET, key, exp, flags, 0, val)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 132, in _mutate
    cas)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 128, in _doCmd
    return self._handleSingleResponse(opaque)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 121, in _handleSingleResponse
    cmd, opaque, cas, keylen, extralen, data = self._handleKeyedResponse(myopaque)
  File "/Users/apiravi/Documents/testrunner/lib/mc_bin_client.py", line 117, in _handleKeyedResponse
    raise MemcachedError(errcode, rv)
MemcachedError: Memcached error #2 'Exists': Data exists for key for vbucket :0 to mc 10.3.4.186:11210






[MB-10469] Support Couchbase Server on SuSE linux platform Created: 14/Mar/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build, installer
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Anil Kumar Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: SuSE linux platform

Issue Links:
Dependency
Duplicate

 Description   
Add support for SuSE Linux platform




[MB-11585] [windows] A query with stale=false never returns Created: 27/Jun/14  Updated: 09/Sep/14

Status: In Progress
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.2.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Tom Yeh Assignee: Ketaki Gangal
Resolution: Unresolved Votes: 0
Labels: viewquery, windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Not sure what really happens, but my couchbase server never returns if I issues a query with stale=false. For example,

http://localhost:8092/default/_design/task/_view/by_project?stale=false&key=%22GgmBVrB9CGakdeHNnBMXZyms%22

Also, CPU usage is more than 95%.

It returns immediately if I don't specify stale=false.

It worked fine before, but not sure what happened. The database corrupted? Anything I can do?

It is a development environment, so the data is small -- only about 120 docs (and the query shall return only about 10 docs).

NOTE: the output of cbcollect_info is uploaded to https://s3.amazonaws.com/customers.couchbase.com/zk


 Comments   
Comment by Sriram Melkote [ 07/Jul/14 ]
Nimish, can you please look at the cbcollect and see if you can analyze the reason the query did not return?
Comment by Nimish Gupta [ 09/Jul/14 ]
From the logs, it looks like 200 Ok response header was sent back to client

[couchdb:info,2014-06-27T21:43:49.754,ns_1@127.0.0.1:<0.6397.0>:couch_log:info:39]127.0.0.1 - - GET /default/_design/task/_view/by_project?stale=false&key=%22GgmBVrB9CGakdeHNnBMXZyms%22 200

From the logs, we cant figure out if couchbase didn't send the response body. On windows , we have issue of indexing getting stuck (https://www.couchbase.com/issues/browse/MB-11385). But I am not sure if that bug is root cause for this issue from the logs.

Comment by Sriram Melkote [ 15/Jul/14 ]
Waiting for 3.0 system testing to see if we can reproduce this locally
Comment by Sriram Melkote [ 22/Jul/14 ]
Ketaki, can we please look out for this issue in 3.0 windows system tests? Specifically, is there a test that will fail if a single query does not respond among many?
Comment by Ketaki Gangal [ 09/Sep/14 ]
Will check on this w/ Windows functional and system tests.

We have not yet started System testing, but are running functional tests cycle for validation currently.




[MB-12112] Improve building of Erlang OTP apps Created: 02/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build, view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Improvement Priority: Major
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Streamline the CMake build of Erlang apps that follow the OTP rules.

 Comments   
Comment by Volker Mische [ 09/Sep/14 ]
I'm actively working on it as it is needed for building geocouch in a nicer way. I'm actively working on it, hence it's for 3.0.1.




[MB-11919] 3-5x increase in index size during rebalance with views Created: 09/Aug/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630 (24 vCPU)
Memory = 64 GB
Disk = 2 x SSD

Attachments: PNG File couch_views_actual_disk_size-reb_in.png     PNG File couch_views_actual_disk_size-reb_out.png     PNG File couch_views_actual_disk_size-reb_swap.png    
Issue Links:
Relates to
relates to MB-11918 Latency of stale=update_after queries... Closed
relates to MB-11589 Sliding endseqno during initial index... Open
relates to MB-11920 DCP based rebalance with views doesn'... Closed
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: Build 1121, reb-swap: http://ci.sc.couchbase.com/job/leto/471/artifact/
Build 1121, reb-out: http://ci.sc.couchbase.com/job/leto/470/artifact/
Build 1121, reb-in: http://ci.sc.couchbase.com/job/leto/469/artifact/
Is this a Regression?: Yes

 Description   
Scenario is described in MB-11918.

Rebalance-out: 37GB peak in 2.5.x and 98GB peak in 3.0
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_251-1083_1f4_rebalance&snapshot=leto_ssd_300-1121_19f_rebalance

Rebalance-in: 25GB peak in 2.5.x and 111GB peak in 3.0
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_251-1083_afc_rebalance&snapshot=leto_ssd_300-1121_99c_rebalance

Rebalance-swap: 25GB peak in 2.5.x and 140GB peak in 3.0
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_251-1083_515_rebalance&snapshot=leto_ssd_300-1121_98e_rebalance

 Comments   
Comment by Pavel Paulau [ 09/Aug/14 ]
Please dispatch.
Comment by Nimish Gupta [ 11/Aug/14 ]
Hi Pavel, due to rotation of logs, the logs are incomplete. Is there any way to get the logs for the complete test ?
Comment by Pavel Paulau [ 11/Aug/14 ]
I don't think that I can help with that.
Comment by Sriram Melkote [ 12/Aug/14 ]
Pinged ns_server team to see if compactions are being scheduled as expected
Comment by Anil Kumar [ 12/Aug/14 ]
Triage - Upgrading to 3.0 Blocker
Comment by Nimish Gupta [ 13/Aug/14 ]
This looks to be duplicate of MB-11589. Due to MB-11589, we get more data during indexing. During rebalance, compaction for views is done less frequently (due to deliberate features of rebalance), which increases the index file size.
 
Comment by Sriram Melkote [ 13/Aug/14 ]
Requested Nimish to add logging to confirm the above explanation numerically
Comment by Pavel Paulau [ 13/Aug/14 ]
Will re-run the test with tuned logging.
Comment by Pavel Paulau [ 13/Aug/14 ]
Oops, missed your last comment.

It makes more sense to re-test with that change.
Comment by Sriram Melkote [ 13/Aug/14 ]
Alk - assuming we can't fix the root cause for 3.0, is it worthwhile to profile maximum size and slowdown we'll see if we run view compaction more often during rebalance?
Comment by Sarath Lakshman [ 13/Aug/14 ]
Today, we fixed one bug that could lead to duplicate processing of same data in certain cases. Not sure if that is causing larger index size during rebalance.

We have merged this change.
http://review.couchbase.org/#/c/40574/

Pavel, Can we retest with a build containing above change ?
Comment by Pavel Paulau [ 14/Aug/14 ]
That change didn't help much with rebalance.

Do you still plan additional logging?
Comment by Nimish Gupta [ 14/Aug/14 ]
I have added the additional logging (http://review.couchbase.org/#/c/40617/). Pavel, please run the test with disabling the log rotation.
Comment by Pavel Paulau [ 15/Aug/14 ]
Complete logs with "processed x more seqs than expected"

http://ci.sc.couchbase.com/job/leto/490/artifact/
Comment by Sriram Melkote [ 15/Aug/14 ]
I see about 2.5 million extra items. I'm not sure if this can alone explain the compaction blow up. Need to debug further.
Comment by Sarath Lakshman [ 15/Aug/14 ]
Pavel, do we know what is the total number of items in the index ?
Comment by Pavel Paulau [ 15/Aug/14 ]
In this particular case the total number of items in the index should be equal to the number of documents, which is 100M.
Comment by Volker Mische [ 18/Aug/14 ]
Sarath has 2 blockers assigned, I've none, hence taking this one.
Comment by Volker Mische [ 18/Aug/14 ]
Pavel, are the logs of the 2.5 run it is compared to available?
Comment by Pavel Paulau [ 18/Aug/14 ]
Reb-in: http://ci.sc.couchbase.com/job/leto/276/artifact/
Reb-out: http://ci.sc.couchbase.com/job/leto/140/artifact/
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC1
Comment by Volker Mische [ 18/Aug/14 ]
After looking at the cbmonitor graphs I would say the reason is that the view engine got faster with processing data. During rebalance new data is added to the indexes on the target nodes, while still having the same data on the source nodes.

The graph "[bucket-1] couch_views_data_size" [1] shows the amount of data the indexes hold without the data structure overhead (and without fragmentation). It shows that in 3.0 the amount of data often increased by 15% and peaks at 25%.

The graph "[bucket-1] couch_total_disk_size" [2] shows the total disk size the indexes use. In 3.0 the size is around 50% bigger and peaks at 100% bigger before compaction.

This means that the fragmentation increased in 3.0. Graph "[bucket-1] couch_views_fragmentation" [3] shows that it's actually the case. The high increase in fragmentation is probably due to getting items through DCP and not directly from the couchstore database files. When items are stored with couchstore, they are also stored in batches (that's just a guess, but I wonder if they wouldn't store it in batches). So the batches in 3.0 are smaller than in 2.5.

Below are the stats when you grep the logs for the rebalance-in run for 2.5 and 3.0. You can see that in 2.5 the batch is not often below 60 items, in 3.0 it's way more often the case (15 times more often). The "Count" means that the indexer in 3.0 was run twice as often, i.e. twice as many headers. And headers are always aligned to 4k block boundaries, so additional space is wasted.


```
$ grep 'Inserted KVs' 2.5/cbcollect_info_ns_1@172.23.100.*/ns_server.couchdb.log|cut -d ' ' -f 4|perl -MStatistics::Histogram -e '@data = <>; chomp @data; print get_histogram(\@data, 15);'

Count: 4454
Range: 10.000 - 6090660.000; Mean: 3024.630; Median: 708.500; Stddev: 92239.270
Percentiles: 90th: 3056.000; 95th: 4439.000; 99th: 9592.000
  10.000 - 25.563: 19 #
  25.563 - 63.145: 93 ####
  63.145 - 153.898: 453 ####################
 153.898 - 373.052: 852 ######################################
 373.052 - 902.268: 1136 ###################################################
 902.268 - 2180.230: 1174 #####################################################
2180.230 - 5266.277: 570 ##########################
5266.277 - 12718.527: 130 ######
12718.527 - 30714.369: 19 #
30714.369 - 74171.089: 3 |
74171.089 - 179111.253: 1 |
179111.253 - 432522.869: 2 |
432522.869 - 1044466.335: 1 |
1044466.335 - 6090660.000: 1 |

$ grep 'Inserted KVs' 3.0/cbcollect_info_ns_1@172.23.100.*/ns_server.couchdb.log|cut -d ' ' -f 4|perl -MStatistics::Histogram -e '@data = <>; chomp @data; print get_histogram(\@data, 15);'

Count: 8220
Range: 0.000 - 764463.000; Mean: 368.726; Median: 107.000; Stddev: 8451.780
Percentiles: 90th: 740.000; 95th: 1085.000; 99th: 1846.000
   0.000 - 3.712: 233 #######
   3.712 - 10.100: 368 ###########
  10.100 - 25.151: 904 ############################
  25.151 - 60.610: 1487 ##############################################
  60.610 - 144.146: 1711 #####################################################
 144.146 - 340.949: 1533 ###############################################
 340.949 - 804.593: 1240 ######################################
 804.593 - 1896.890: 657 ####################
1896.890 - 4470.222: 69 ##
4470.222 - 10532.710: 8 |
10532.710 - 58463.418: 1 |
58463.418 - 764463.000: 1 |
```

[1]: cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_251-1083_afc_rebalance&snapshot=leto_ssd_300-1121_99c_rebalance#leto_ssd_251-1083_afc_rebalanceleto_ssd_300-1121_99c_rebalanceleto_ssd_251-1083_afcleto_ssd_300-1121_99cbucket-1couch_views_data_size
[2]: cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_251-1083_afc_rebalance&snapshot=leto_ssd_300-1121_99c_rebalance#leto_ssd_251-1083_afc_rebalanceleto_ssd_300-1121_99c_rebalanceleto_ssd_251-1083_afcleto_ssd_300-1121_99cbucket-1couch_views_data_size
[3]: cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_251-1083_afc_rebalance&snapshot=leto_ssd_300-1121_99c_rebalance#leto_ssd_251-1083_afc_rebalanceleto_ssd_300-1121_99c_rebalanceleto_ssd_251-1083_afcleto_ssd_300-1121_99cbucket-1couch_views_data_size
Comment by Sarath Lakshman [ 19/Aug/14 ]
Great observation Volker. I think your findings are correct. May be I can add few lines to support what you have stated.

Why batch sizes are smaller in 3.0:
In 2.5, We used to read from disk. EP-Engine does batching before inserting items to on disk couchstore. Since it is a bulk write operation, more items ends up in short time. Since view engine reads from disk, view engine approximately gets a batch size equivalent to batch size by EP-Engine.

In 3.0 as per your histogram, number of items per updater run is smaller and hence higher fragmentation. We need to check if many updater runs happen during rebalance. My understanding is that only one updater run should happen. One updater run and many checkpointing in that run should be ideal case.

We can compare number of updater runs used for updating index for same data in 2.5 and 3.0.

Another note on this change, http://review.couchbase.org/#/c/40617/. We should either revert or fix it. (Please see my last comment).
Comment by Volker Mische [ 19/Aug/14 ]
I had an idea about throttling the requests. I'll try it with a toy build. Here's the change that will be included in the you build: http://review.couchbase.org/40729
Comment by Raju Suravarjjala [ 19/Aug/14 ]
Triage: This is a major change and is being pushed out to 3.0.1
Comment by Volker Mische [ 20/Aug/14 ]
Pavel did a rebalance-in run [1] with a build with change 40729 [2] applied. It should've reduces the fragmentation but it didn't really help much [3], hence the total disk size is almost the same [4].

I'll have a closer look at the logs once I've downloaded them.

[1]: http://ci.sc.couchbase.com/job/leto-dev/27/
[2]: http://review.couchbase.org/40729
[3]: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1121_99c_rebalance&snapshot=leto_ssd_300-702-toy_81f_rebalance#leto_ssd_300-1121_99c_rebalanceleto_ssd_300-702-toy_81f_rebalanceleto_ssd_300-1121_99cleto_ssd_300-702-toy_81fbucket-1couch_views_fragmentation
[4]: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1121_99c_rebalance&snapshot=leto_ssd_300-702-toy_81f_rebalance#leto_ssd_300-1121_99c_rebalanceleto_ssd_300-702-toy_81f_rebalanceleto_ssd_300-1121_99cleto_ssd_300-702-toy_81fbucket-1couch_views_actual_disk_size
Comment by Volker Mische [ 20/Aug/14 ]
After having a look at the logs I can't really make sense of them (other than that the updater is triggered even more often with the toy build). This means that there doesn't seem to be a quick fix and I suggest going back to the drawing board for a proper solution.




[MB-11353] Index-compaction failed due to XDCR mutations Created: 09/Jun/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Jim Walker Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This bug was encountered on customer site.

A XDCR "target" bucket had a number of views enabled. During the initial-load where XDCR copied the bulk of the data-set, on one node in the cluster compaction failed due to the high-rate of mutations XDCR was driving. This eventually caused out-of-disk problems and related errors.

We've had to work-around the problem by carefully staggering XDCR "initial-load" and then the view indexing.

The issue is that indexing, compaction and XDCR should not be able to race and collide triggering a node outage, which is what was observed in the field.




[MB-12155] View query and index compaction failing on 1 node with error view_undefined Created: 09/Sep/14  Updated: 09/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.5.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Ian McCloy Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Customer upgraded their 6 node cluster from 2.2 to 2.5.1 running on Microsoft Windows Server 2008 R2 Enterprise and one of their views stopped working.

It appears the indexing and index compaction stopped working on 1 node out of the 6. This appeared to only affect 1 design document.

snips from problem node -->>

[couchdb:error,2014-09-08T17:20:31.840,ns_1@HOST:<0.23288.321>:couch_log:error:42]Uncaught error in HTTP request: {throw,view_undefined}

Stacktrace: [{couch_set_view,get_group,3},
             {couch_set_view,get_map_view,4},
             {couch_view_merger,get_set_view,5},
             {couch_view_merger,simple_set_view_query,3},
             {couch_httpd,handle_request,6},
             {mochiweb_http,headers,5},
             {proc_lib,init_p_do_apply,3}]
[couchdb:info,2014-09-08T17:20:31.840,ns_1@HOST:<0.23288.321>:couch_log:info:39]10.7.43.229 - - POST /_view_merge/?stale=false 500

=====

[ns_server:warn,2014-09-08T17:25:10.506,ns_1@HOST:<0.14357.327>:compaction_daemon:do_chain_compactors:725]Compactor for view `Bucket/_design/DDOC/main` (pid [{type,view},
                                                {important,true},
                                                {name,
                                                  <<"Bucket/_design/DDoc/main">>},
                                                {fa,
                                                  {#Fun<compaction_daemon.16.22390493>,
                                                  [<<"Bucket">>,
                                                    <<"_design/DDoc">>,main,
                                                    {config,
                                                    {30,18446744073709551616},
                                                    {30,18446744073709551616},
                                                    undefined,false,false,
                                                    {daemon_config,30,
                                                      131072}},
                                                    false,
                                                    {[{type,bucket}]}]}}]) terminated unexpectedly: {error,
                                                                                                    view_undefined}
[ns_server:warn,2014-09-08T17:25:10.506,ns_1@HOST:<0.14267.327>:compaction_daemon:do_chain_compactors:730]Compactor for view `Bucket/_design/DDoc` (pid [{type,view},
                                            {name,<<"Bucket/_design/DDoc">>},
                                            {important,false},
                                            {fa,
                                            {#Fun<compaction_daemon.20.107749383>,
                                              [<<"Bucket">>,<<"_design/DDoc">>,
                                              {config,
                                                {30,18446744073709551616},
                                                {30,18446744073709551616},
                                                undefined,false,false,
                                                {daemon_config,30,131072}},
                                              false,
                                              {[{type,bucket}]}]}}]) terminated unexpectedly (ignoring this): {error,
                                                                                                                view_undefined}
[ns_server:debug,2014-09-08T17:25:10.506,ns_1@HOST:compaction_daemon<0.480.0>:compaction_daemon:handle_info:505]Finished compaction iteration.




[MB-12152] phonehome ph.couchbase.net is not encrypted on https connections with 3.0 Created: 08/Sep/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Ian McCloy
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Comments   
Comment by Aleksey Kondratenko [ 08/Sep/14 ]
https needs to be supported on the other end first




[MB-12150] [Windows] Cleanup unnecessary files that are part of the windows installer Created: 08/Sep/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: installer
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Raju Suravarjjala Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7
Build 3.0.1-1261

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Install windows build 3.0.1-1261
As part of the installation you will see 2 files couchbase_console.html and also membase_console.html. You do not need membase_console.html. Please remove it




[MB-12148] {UI}: Cluster with Only memcached bucket allows graceful failover of a node Created: 08/Sep/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Parag Agarwal Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: all

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1. Create 2 node cluster
2. Create a memcached bucket
3. Try Graceful failover of node

Step 3 shows us that graceful failover is possible for memcached bucket. However, it does not make sense to have graceful failover only for memcached bucket. The option to do graceful failover should not be displayed in the UI.




[MB-11222] major_faults stat is incorrect on OSX Mavericks Created: 27/May/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Minor
Reporter: Artem Stemkovski Assignee: Trond Norbye
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: MacOSX 64-bit
Is this a Regression?: Unknown

 Description   
major_faults: [16765952, 8371200, -8371200, -16791552, 8380416, -16757760, 25168896, -8411136, 50320384, -58692608,…]


 Comments   
Comment by Aleksey Kondratenko [ 19/Jun/14 ]
Cannot do anything about it. We need OSX expert here.
Comment by Mike Wiederhold [ 01/Jul/14 ]
In sigar these stats are marked as not implemented and it appears that os x does not provide information on hard and soft faults. The library used by sigar to get this information is libproc and it is poorly documented. Below is what we can get from libroc for process related stats.

uint64_t pti_virtual_size; /* virtual memory size (bytes) */
uint64_t pti_resident_size; /* resident memory size (bytes) */
uint64_t pti_total_user; /* total time */
uint64_t pti_total_system;
uint64_t pti_threads_user; /* existing threads only */
uint64_t pti_threads_system;
int32_t pti_policy; /* default policy for new threads */
int32_t pti_faults; /* number of page faults */
int32_t pti_pageins; /* number of actual pageins */
int32_t pti_cow_faults; /* number of copy-on-write faults */
int32_t pti_messages_sent; /* number of messages sent */
int32_t pti_messages_received; /* number of messages received */
int32_t pti_syscalls_mach; /* number of mach system calls */
int32_t pti_syscalls_unix; /* number of unix system calls */
int32_t pti_csw; /* number of context switches */
int32_t pti_threadnum; /* number of threads in the task */
int32_t pti_numrunning; /* number of running threads */
int32_t pti_priority; /* task priority*/

I think the best thing to do here would probably be to exclude having these stats for osx builds. Otherwise we will need to try to find a way to estimate.
Comment by Anil Kumar [ 17/Jul/14 ]
Triage - Alk, Parag, Tony, Wayne .. July 17th
Comment by Sriram Melkote [ 13/Aug/14 ]
Major faults is definitely possible as top shows it. Minor faults too appear to be possible, see sc_usage -- but like windows, it appears to use kernel trace facility to gather these statistics. So I guess we should fix this bug in 3.0.1 and at least show top equivalent statistics. I wish we had a platform support engineer.
Comment by Don Pinto [ 02/Sep/14 ]
Hi Trond,

Can you help out on this?

Thanks,
Comment by Trond Norbye [ 02/Sep/14 ]
Mac OS X isn't a platform we're expecting to have a lot of customers running a real cluster on. This stat is non-essential for the clusters ability to function as expected.




[MB-12038] build go programs in ns_server instead of shipping pre-built binaries (was: [windows] F-Secure flagging binary) Created: 21/Aug/14  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Major
Reporter: Sriram Melkote Assignee: Wayne Siu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
A popular virus scanner, F-Secure is deleting a file installed by us, generate_cert.exe.

We need to analyze why this is happening.

See: https://groups.google.com/forum/#!topic/couchbase/E3QvNolCknQ


 Comments   
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
What you want me to do? Those crappy antiviruses are known to have false positives occasionally.
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
And btw our build machines _are_ clean.
Comment by Sriram Melkote [ 27/Aug/14 ]
You could compile it latest go on Windows, and the resulting binary does not trip any virus scanners
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
I don't compile anything on windows. This binaries are built on GNU/Linux machine using go cross compiler. I can rebuild using more recent go but it's unclear if it's really going to help.
Comment by Aleksey Kondratenko [ 27/Aug/14 ]
And surely if build folks are ready to build our go stuff in ns_server I'm ok if some fan of cmake can make it happen.
Comment by Sriram Melkote [ 27/Aug/14