[MB-11846] Compiling breakdancer test case exceeds available memory Created: 29/Jul/14  Updated: 25/Aug/14  Due: 30/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0.1, 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Chris Hillery Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Triage: Untriaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Unknown

 Description   
1. With memcached change 4bb252a2a7d9a369c80f8db71b3b5dc1c9f47eb9, cc1 on ubuntu-1204 quickly uses up 100% of the available memory (4GB RAM, 512MB swap) and crashes with an internal error.

2. Without Trond's change, cc1 compiles fine and never takes up more than 12% memory, running on the same hardware.

 Comments   
Comment by Chris Hillery [ 29/Jul/14 ]
Ok, weird fact - on further investigation, it appears that this is NOT happening on the production build server, which is an identically-configured VM. It only appears to be happening on the commit validation server ci03. I'm going to temporarily disable that machine so the next make-simple-github-tap test runs on a different ci server and see if it is unique to ci03. If it is I will lower the priority of the bug. I'd still appreciate some help in understanding what's going on either way.
Comment by Trond Norbye [ 30/Jul/14 ]
Please verify that the two builders have the same patch level so that we're comparing apples with apples.

It does bring up another interesting topic. should our builders just use the compiler provided with the installation, or should we have a reference compiler we're using to build our code. It does seems like a bad idea having to support a ton of various compiler revision (including the fact that they support different levels of C++11 that we have to work around).
Comment by Chris Hillery [ 31/Jul/14 ]
This is now occurring on other CI build servers in other tests - http://www.couchbase.com/issues/browse/CBD-1423

I am bumping this back to Test Blocker and I will revert the change as a work-around for now.
Comment by Chris Hillery [ 31/Jul/14 ]
Partial revert committed to memcached master: http://review.couchbase.org/#/c/40152/ and 3.0: http://review.couchbase.org/#/c/40153/
Comment by Trond Norbye [ 01/Aug/14 ]
That review in memcached should NEVER have been pushed through. Its subject line is too long
Comment by Chris Hillery [ 01/Aug/14 ]
If there's a documented standard out there for commit messages, my apologies; it was never revealed to me.
Comment by Trond Norbye [ 01/Aug/14 ]
When it doesn't fit within a terminal window there is a problem. it is way better to use multiple lines..

IN addition I'm not happy with the fix. instead of deleting the line it should have been checking for an environment variable so that people could explicitly disable it. This is why we have review cycles.
Comment by Chris Hillery [ 01/Aug/14 ]
I don't think I want to get into style arguments. If there's a standard I'll use it. In the meantime I'll try to keep things to 72-character lines.

As to the content of the change, it was not intended to be a "fix"; it was a simple revert of a change that was provably breaking other jobs. I returned the code to its previous state, nothing more or less. And especially given the time crunch of the beta (which is supposed to be built tomorrow), waiting for a code review on a reversion is not in the cards.
Comment by Trond Norbye [ 01/Aug/14 ]
The normal way of doing a revert is to use git revert (which as an extra bonus makes the commit message contain that).
Comment by Trond Norbye [ 01/Aug/14 ]
http://review.couchbase.org/#/c/40165/
Comment by Chris Hillery [ 01/Aug/14 ]
1. Your fix is not correct, because simply adding -D to cmake won't cause any preprocessor defines to be created. You need to have some CONFIGURE_FILE() or similar to create a config.h using #cmakedefine. As it is there is no way to compile with your change.

2. The default behaviour should not be the one that is known to cause problems. Until and unless there is an actual fix for the problem (whether or not that is in the code), the default should be to keep the optimization, with an option to let individuals bypass that if they desire and accept the risks.

3. Characterizing the problem as "misconfigured VMs" is, at best, premature.

I will revert this change again on the 3.0 branch shortly, unless you have a better suggestion (I'm definitely all ears for a better suggestion!).
Comment by Trond Norbye [ 01/Aug/14 ]
If you look at the comment it pass the -D over into the CMAKE_C_FLAGS, causing it to be set into the compiler flags and it'll be passed on to compilation cycle.

As of misconfiguration, it is either insufficient resources on the vm or a "broken" compiler version installed there.
Comment by Trond Norbye [ 01/Aug/14 ]
Can I get login credentials to the server it fails and an identical vm where it succeeds.
Comment by Chris Hillery [ 01/Aug/14 ]
[CMAKE_C_FLAGS] Fair enough, I did misread that. That's not really a sufficient workaround, though. Doing that may overwrite other CFLAGS set by other parts of the build process.

I still maintain that the default behaviour should be the known-working version. However, for the moment I have temporarily locked the rel-3.0.0.xml manifest to the revision before my revert (ie, to 5cc2f8d928f0eef8bddbcb2fcb796bc5e9768bb8), so I won't revert anything else until that has been tested.

The only VM I know of at the moment where we haven't seen build failures is the production build slave. I can't give you access to that tonight as we're in crunch mode to produce a beta build. Let's plan to hook up next week and do some exploration.
Comment by Volker Mische [ 01/Aug/14 ]
There are commit message guidelines. At the bottom of

http://www.couchbase.com/wiki/display/couchbase/Contributing+Changes

links to:

http://en.wikibooks.org/wiki/Git/Introduction#Good_commit_messages
Comment by Trond Norbye [ 01/Aug/14 ]
I've not done anything on the 3.0.0 branch, the fix going forward is for 3.0.1 and trunk. Hopefully the 3.0 branch will die relatively soon since we've got a lot of good stuff in the 3.0.1 branch.

The "workaround" is not intended as a permanent solution, its just until the vms is fixed. I've not been able to reproduce this issue on my centos, ubuntu, fedora or smartos builders. They're running in the following vm's:

[root@00-26-b9-85-bd-92 ~]# vmadm list
UUID TYPE RAM STATE ALIAS
04bf8284-9c23-4870-9510-0224e7478f08 KVM 2048 running centos-6
7bcd48a8-dcc2-43a6-a1d8-99fbf89679d9 KVM 2048 running ubuntu
c99931d7-eaa3-47b4-b7f0-cb5c4b3f5400 KVM 2048 running fedora
921a3571-e1f6-49f3-accb-354b4fa125ea OS 4096 running compilesrv
Comment by Trond Norbye [ 01/Aug/14 ]
I need access to two identical configured builders where one may reproduce the error and one where it succeeds.
Comment by Volker Mische [ 01/Aug/14 ]
I would also add that I think it is about bad VMs. On the commit validation we have 6 VMs, It failed only always on ubuntu-1204-64-ci-01 due to this error and never on the others (ubuntu-1204-64-ci-02 - 06).
Comment by Chris Hillery [ 01/Aug/14 ]
That's not correct. The problem originally occurred on ci-03.
Comment by Volker Mische [ 01/Aug/14 ]
Then I need to correct it that my comment only holds true for the couchdb-gerrit-300 job.
Comment by Trond Norbye [ 01/Aug/14 ]
can I get login creds to one that it fails on? while I'm waiting for access to one that it works on?
Comment by Volker Mische [ 01/Aug/14 ]
I don't know about creds (I think my normal user login works) The machine details are here: http://factory.couchbase.com/computer/ubuntu-1204-64-ci-01/
Comment by Chris Hillery [ 01/Aug/14 ]
Volker - it was initially detected in the make-simple-github-tap job, so it's not unique to couchdb-gerrit-300 either. Both jobs pretty much just checkout the code and build it, though; they're pretty similar.
Comment by Trond Norbye [ 01/Aug/14 ]
Adding swap space to the builder makes the compilation pass. I've been trying to figure out how to get gcc to print more information about each step (the -ftime-reports memory usage didn't at all match the process usage ;-))
Comment by Anil Kumar [ 12/Aug/14 ]
Adding the component as "build". Let me know if that's not correct.




[MB-12257] n1ql + 3.0.0 primary index doesn't work after reload bucket or loading more items Created: 25/Sep/14  Updated: 29/Sep/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 3.0.0-1209
n1ql version : dp3

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
after installing couchbase I run create primary index on my_bucket
and start running queries
if I add items and run same query it keeps old result
If I delete bucket and then add bucket with same name again - for every query i see error: {u'code': 5000, u'message': u'Bucket default not found.', u'caller': u'view_index:200', u'key': u'Internal Error'}

how can I update the index? or how can I delete it and create again? if bucket is deleted should the index be deleted also?

 Comments   
Comment by Gerald Sangudi [ 25/Sep/14 ]
Test blocker.
Comment by Manik Taneja [ 26/Sep/14 ]
In general we need a mechanism that informs the query engine whenever a bucket or an index is removed either via the UI or through another instance of a query engine. I guess these set of issues will be addressed once we have a query metadata store in place.

For now the best we can do, is to have a long poll function that checks the pool/buckets at a preconfigured frequency. Results may be inconsistent until the next pass of the pool refresh thread.
Comment by Manik Taneja [ 29/Sep/14 ]
As for the new items not showing up in the query, this is because by default the views engine uses stale=update_after which means that the views will be updated right after a query is made. So the next time you run the same query you should see the updated results.




[MB-10156] "XDCR - Cluster Compare" support tool Created: 07/Feb/14  Updated: 19/Jun/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.5.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Cihan Biyikoglu Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: 2.5.1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
for the recent issues we have seen we need a tool that cam compare metadata (specifically revids) for a given replication definition in XDCR. To scale to large data sizes, being able to do this per vbucket or per doc range would be great but we can do without these. for clarity, here is a high level desc.

Ideal case:
xdcr_compare cluster1_connectioninfo cluster1_bucketname cluster2connectioninfo cluster2_bucketname [vbucketid] [keyrange]
should return a line per docid for each row where cluster1 metadata and clustermetadata for the given key differ.
docID - cluster1_metadata cluster2_metadata

simplification: the tool is expected to return false positives in a moving system but we will tackle that by rerunning the tool multiple times.

 Comments   
Comment by Cihan Biyikoglu [ 19/Feb/14 ]
Aaron, do you have a timeline for this?
thanks
-cihan
Comment by Maria McDuff (Inactive) [ 19/Feb/14 ]
Cihan,

For test automation/verification, can you list out the stats/metadata that we should be testing specifically?
we want to create/implement the tests accordingly.


Also -- is this tool de-coupled from the server package? or is this part of rpm/deb/.exe/osx build package?

Thanks,
Maria
Comment by Aaron Miller (Inactive) [ 19/Feb/14 ]
This depends on the requirements; A tool that requires the manual collection of all data from all nodes in both clusters onto one machine (like we've done recently) could be done pretty quickly, but I imagine that may be difficult or unfeasible entirely for some users.

Better would be to be able to operate remotely on clusters and only look at metadata. Unfortunately there is no *currently exposed* interface to only extract metadata from the system without also retrieving values. I may be able to work around this, but the workaround is unlikely to be simple.

Also for some users, even the amount of *metadata* may be prohibitively large to transfer all to one place, this also can be avoided, but again, adds difficulty.

Q: Can the tool be JVM-based?
Comment by Aaron Miller (Inactive) [ 19/Feb/14 ]
I think it would be more feasible for this to ship separately from the server package.
Comment by Maria McDuff (Inactive) [ 19/Feb/14 ]
Cihan, Aaron,

If it's de-coupled, what older versions of Couchbase would this tool support? as far back as 1.8.x? pls confirm as this would expand our backward compatibility testing for this tool.
Comment by Aaron Miller (Inactive) [ 19/Feb/14 ]
Well, 1.8.x didn't have XDCR or the rev field; It can't be compatible with anything older than 2.0 since it operates mostly to check things added since 2.0.

I don't know how far back it needs to go but it *definitely* needs to be able to run against 2.2
Comment by Cihan Biyikoglu [ 19/Feb/14 ]
Agree with Aaron, lets keep this lightweight. can we depend on Aaron for testing if this will initially be just a support tool? for 3.0, we may graduate the tool to the server shipped category.
thanks
Comment by Sangharsh Agarwal [ 27/Feb/14 ]
Cihan, Is the Spec finalized for this tool in version 2.5.1?
Comment by Cihan Biyikoglu [ 27/Feb/14 ]
Sangharsh, for 2.5.1, we wanted to make this a "Aaron tested" tool. I believe Aaron already has the tool. Aaron?
Comment by Aaron Miller (Inactive) [ 27/Feb/14 ]
Working on it; wanted to get my actually-in-the-package 2.5.1 stuff into review first.

What I do already have is a diff tool for *files*, but is highly inconvenient to use; this should be a tool that doesn't require collecting all data files into one place in order to use, and instead can work against a running cluster.
Comment by Maria McDuff (Inactive) [ 05/Mar/14 ]
Aaron,

Is the tool merged yet into the build? can you update pls?
Comment by Cihan Biyikoglu [ 06/Mar/14 ]
2.5.1 shiproom note: Phil raised a build concern on getting this packaged with 2.5.1. The initial bar we set was not to ship this as part of the server - it was intended to be a downloadable support tool. Aaron/Cihan will re-eval and get back to shiproom.
Comment by Cihan Biyikoglu [ 15/Jun/14 ]
Aaron no longer here. assigning to Xiaomei for consideration.




[MB-10719] Missing autoCompactionSettings during create bucket through REST API Created: 01/Apr/14  Updated: 19/Jun/14

Status: Reopened
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: michayu Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File bucket-from-API-attempt1.txt     Text File bucket-from-API-attempt2.txt     Text File bucket-from-API-attempt3.txt     PNG File bucket-from-UI.png     Text File bucket-from-UI.txt    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Unless I'm not using the API correctly, there seems to be some holes in the Couchbase API – particularly with autoCompaction.

The autoCompaction parameter can be set via the UI (as long as the bucketType is couch base).

See the following attachments:
1) bucket-from-UI.png
2) bucket-from-UI.txt

And compare with creating the bucket (with autoCompaction) through the REST API:
1) bucket-from-API-attempt1.txt
    - Reference: http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#creating-and-editing-buckets
2) bucket-from-API-attempt2.txt
    - Reference: http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-admin-rest-auto-compaction
3) bucket-from-API-attempt3.txt
    - Setting autoCompaction globally
    - Reference: http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-admin-rest-auto-compaction

In all cases, autoCompactionSettings is still false.


 Comments   
Comment by Anil Kumar [ 19/Jun/14 ]
Triage - June 19 2014 Alk, parag, Anil
Comment by Aleksey Kondratenko [ 19/Jun/14 ]
It works, just apparently not properly documented:

# curl -u Administrator:asdasd -d name=other -d bucketType=couchbase -d ramQuotaMB=100 -d authType=sasl -d replicaNumber=1 -d replicaIndex=0 -d parallelDBAndViewCompaction=true -d purgeInterval=1 -d 'viewFragmentationThreshold[percentage]'=30 -d autoCompactionDefined=1 http://lh:9000/pools/default/buckets

And general hint is that you can see what browser is POSTing when it creates bucket or does anything else to figure out working (but not necessarily publicly supported) way of doing things.
Comment by Anil Kumar [ 19/Jun/14 ]
Ruth - Above documentation references needs to be fixed with correct REST API.




[MB-9358] while running concurrent queries(3-5 queries) getting 'Bucket X not found.' error from time to time Created: 16/Oct/13  Updated: 18/Jun/14  Due: 23/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 64 bit

Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
one thread gives correct result:
[root@localhost tuqtng]# curl 'http://10.3.121.120:8093/query?q=SELECT+META%28%29.cas+as+cas+FROM+bucket2'
{
    "resultset": [
        {
            "cas": 4.956322522514292e+15
        },
        {
            "cas": 4.956322525999292e+15
        },
        {
            "cas": 4.956322554862292e+15
        },
        {
            "cas": 4.956322832498292e+15
        },
        {
            "cas": 4.956322835757292e+15
        },
        {
            "cas": 4.956322838836292e+15
...

    ],
    "info": [
        {
            "caller": "http_response:152",
            "code": 100,
            "key": "total_rows",
            "message": "0"
        },
        {
            "caller": "http_response:154",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "405.41885ms"
        }
    ]
}

but in another I see
{
    "error":
        {
            "caller": "view_index:195",
            "code": 5000,
            "key": "Internal Error",
            "message": "Bucket bucket2 not found."
        }
}

cbcollect will be attached

 Comments   
Comment by Marty Schoch [ 16/Oct/13 ]
This is a duplicate, though I can't yet find the original.

We believe under higher load the view queries timeout, which we report as bucket not found (may not be possible to distinguish).
Comment by Iryna Mironava [ 16/Oct/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-9358/447a45ae/10.3.121.120-10162013-858-diag.zip
Comment by Ketaki Gangal [ 17/Oct/13 ]
Seeing these errors and frequent tuq-server crashes on concurrent queries during typical server operations like
- w/ Failovers
- w/ Backups
- w/ Indexing.

Similar server ops for single queries however seem to run okay.

Note: This is a very small number of concurrent queries ( 3-5), typically users may have higher level of concurrency if used at an Application level.




[MB-9145] Add option to download the manual in pdf format (as before) Created: 17/Sep/13  Updated: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 2.0, 2.1.0, 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
On the documentation site there is no option to download the manual in pdf format as before. We need to add this option back.

 Comments   
Comment by Maria McDuff (Inactive) [ 18/Sep/13 ]
need for 2.2.1 bug fix release.




[MB-8838] Security Improvement - Connectors to implement security improvements Created: 14/Aug/13  Updated: 19/May/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Anil Kumar Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: security
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Security Improvement - Connectors to implement security improvements

Spec ToDo.




[MB-9415] auto-failover in seconds - (reduced from minimum 30 seconds) Created: 21/May/12  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.0, 1.8.1, 2.0, 2.0.1, 2.2.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Dipti Borkar Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 2
Labels: customer, ns_server-story
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-9416 Make auto-failover near immediate whe... Technical task Open Aleksey Kondratenko  

 Description   
including no false positives

http://www.pivotaltracker.com/story/show/25006101

 Comments   
Comment by Aleksey Kondratenko [ 25/Oct/13 ]
At the very least it requires getting our timeout-ful cases under control. So at least splitting couchdb into separate VM is a requirement for this. But not necessarily enough.
Comment by Aleksey Kondratenko [ 25/Oct/13 ]
Still seeing misunderstanding on this one.

So we have _different_ problem that even manual failover (let alone automatic) cannot succeed quickly if master node fails. It can easily take up to 2 minutes because of our use of erlang "global" facility than requires us to detect that node is dead and erlang is tuned to detect that within 2 minutes.

Now _this_ problem is lowering autofailover detection to 10 seconds. We can blindly make it happen today. But it will not be usable because of all sorts of timeouts happening in cluster management layer. We have some significant proportion of CBSEs _today_ about false positive autofailovers even with 30 seconds threshold. Clearly lowering it to 10 will only make it worse. Therefore my point above. We have to get those timeouts under control so that heartbeats are sent/received timely. Or whatever else we use to detect node being unresponsive.

I would like to note however that especially in some virtualized environments (arguably, oversubscribed) we saw as high as low tens of seconds delays from virtualization _alone_. Given relatively high cost of failover in our software I'd like to point out that people could too easily abuse that feature.

High cost of failover is refered to above is this:

* you almost certainly and irrecoverably lose some recent mutations. _At least_ recent mutations. I.e. if replication is really working well. In node that's on the edge of autofailover you can imagine replication not being "diamond-hard quick". That's cost 1.

* in order to return node back to cluster (say node crashed and needed some time to recover, whatever it might mean) you need rebalance. That type of rebalance is relatively quick by design; i.e. it only moves data back to this node and nothing else. But it's still rebalance. with upr we can possibly make it better. I.e. because its failover log is capable of rewinding just conflicting mutations.

What I'm trying to say in "our approach appears to have relatively high price for failover" is that it appears inherent issue for strongly consistent system. I'm trying to say that in many cases it might be actually better to wait up to few minutes for node to recover and restore it's availability than failing it over and paying price of restoring cluster capacility (with rebalancing this node back or it's replacement, which is irrelevant). If somebody wants stronger availability then some other approaches which particularly can "reconcile" changes from both failed over node and it's replacement node look like fundamentally better choice _for this requirements_.




[MB-4030] enable traffic for for ready nodes even if not all nodes are up/healthy/ready (aka partial janitor) (was: After two nodes crashed, curr_items remained 0 after warmup for extended period of time) Created: 06/Jul/11  Updated: 20/May/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.1, 2.0, 2.0.1, 2.2.0, 2.1.1, 2.5.1
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: ns_server-story, supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
we had two nodes crash at a customer, possibly related to a disk space issue, but I don't think so.

After they crashed, the nodes warmed up relatively quickly, but immediately "discarded" their items. I say that because I see that they warmed up ~10m items, but the current item counts were both 0.

I tried shutting down the service and had to kill memcached manually (kill -9). Restarting it went through the same process of warming up and then nothing.

While I was looking around, I left it sit for a little while and magically all of the items came back. I seem to recall this bug previously where a node wouldn't be told to be active until all the nodes in the cluster were active...and it got into trouble when not all of the nodes restarted.

Diags for all nodes will be attached

 Comments   
Comment by Perry Krug [ 06/Jul/11 ]
Full set of logs at \\corp-fs1\export_support_cases\bug_4030
Comment by Aleksey Kondratenko [ 20/Mar/12 ]
It _is_ ns_server issue caused by janitor needing all nodes to be up for vbuckets activation. We planned fix for 1.8.1 (now 1.8.2)
Comment by Aleksey Kondratenko [ 20/Mar/12 ]
Fix would land as part of fast warmup integration
Comment by Perry Krug [ 18/Jul/12 ]
Peter, can we get a second look at this one? We've seen this before, and the problem is that the janitor did not run until all nodes had joined the cluster and warmed up. I'm not sure we've fixed that already...
Comment by Aleksey Kondratenko [ 18/Jul/12 ]
Latest 2.0 will mark nodes as green and enable memcached traffic when all of them are up. So easy part is done.

Partial janitor (i.e. enabling traffic for some nodes when others are still down/warming up) is something that will unlikely be done soon
Comment by Perry Krug [ 18/Jul/12 ]
Thanks Alk...what's the difference in behavior (in this area) between 1.x and 2.0? It "sounds" like they're the same, no?

And this bug should still remain open until we fix the primary issue which is the partial janitor...correct?
Comment by Aleksey Kondratenko [ 18/Jul/12 ]
1.8.1 will show node as green when ep-engine thinks it's warmed up. But confusingly it'll not be really ready. All vbuckets will be in state dead and curr_items will be 0.

2.0 fixes this confusion. Node is marked green when it's actually warmed up from user's perspective. I.e. right vbucket states are set and it'll serve clients traffic.

2.0 is still very conservative about only making vbucket state changes when all nodes are up and warmed up. Thats "impartial" janitor. Whether it's a bug or "lack of feature" is debatable. But I think main concern that users are confused by green-ness of nodes is resolved.
Comment by Aleksey Kondratenko [ 18/Jul/12 ]
Closing as fixed. We'll get to partial janitor some day in future which is feature we lack today, not bug we have IMHO
Comment by Perry Krug [ 12/Nov/12 ]
Reopening this for the need for partial janitor. Recent customer had multiple nodes need to be hard-booted and none returned to service until all were warmed up
Comment by Steve Yen [ 12/Nov/12 ]
bug-scrub: moving out of 2.0, as this looks like a feature req.
Comment by Farshid Ghods (Inactive) [ 13/Nov/12 ]
in system testing we have noticed many times that if multiple nodes crash until all nodes are warmed up node status for those that are already warmed up appears as yellow.


user won't be able to understand which node has successfully warmed up from the console and if one node is actually not recovering or not warm up in a reasonable time they have to figure it out some other way ( cbstats ... )

another issue with this is that user won't be able to perform a fail over for 1 node even though N-1 nodes has warmed up already.

i am not sure if fixing this bug will impact cluster-restore functionality but something important to fix or suggest a workaround to the user ( by workaround i mean a documented , tested and supported set of commands )
Comment by Mike Wiederhold [ 17/Mar/13 ]
Comments say this is an ns_server issue so I am removing couchbase-bucket from affected components. Please re-add if there is a couchbase-bucket task for this issue.
Comment by Aleksey Kondratenko [ 23/Feb/14 ]
Not going to happen for 3.0.




[MB-11736] add client SSL to 3.0 beta documentation Created: 15/Jul/14  Updated: 15/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0-Beta
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Matt Ingenthron Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This is mostly a curation exercise. Add to the server 3.0 beta docs the configuration information for each of the following clients:
- Java
- .NET
- PHP
- Node.js
- C/C++

No other SDKs support SSL at the moment.

This is either in work-in-progress documentation or in the blogs from the various DPs. Please check in with the component owner if you can't find what you need.




[MB-10180] Server Quota: Inconsistency between documentation and CB behaviour Created: 11/Feb/14  Updated: 21/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Dave Rigby Assignee: Ruth Harris
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File MB-10180_max_quota.png    
Issue Links:
Relates to
relates to MB-2762 Default node quota is still too high Resolved
relates to MB-8832 Allow for some back-end setting to ov... Open
Triage: Untriaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Yes

 Description   
In the documentation for the product (and general sizing advice) we tell people to allocate no more than 80% of their memory for the Server Quota, to leave headroom for the views, disk write queues and general OS usage.

However on larger[1] nodes we don't appear to enforce this, and instead allow people to allocate up to 1GB less than the total RAM.

This is inconsistent, as we document and tell people one thing and let them do another.

This appears to be something inherited from MB-2762, which the intent of which appeared to only allow the relaxing of this when joining a cluster, however this doesn't appear to be how it works - I can successfully change the existing cluster quota from the CLI to a "large" value:

    $ /opt/couchbase/bin/couchbase-cli cluster-edit -c localhost:8091 -u Administrator -p dynam1te --cluster-ramsize=127872
    ERROR: unable to init localhost (400) Bad Request
    [u'The RAM Quota value is too large. Quota must be between 256 MB and 127871 MB (memory size minus 1024 MB).']

While I can see some logic to relax the 80% constraint on big machines, with the advent of 2.X features 1024MB seems far too small an amount of headroom.

Suggestions to resolve:

A) Revert to a straightforward 80% max, with a --force option or similar to allow specific customers to go higher if they know what they are doing
B) Leave current behaviour, but document it.
B) Increase minimum headroom to something more reasonable for 2.X, *and* document the beaviour.

([1] On a machine with 128,895MB of RAM I get the "total-1024" behaviour, on a 1GB VM I get 80%. I didn't check in the code what the cutoff for 80% / total-1024 is).


 Comments   
Comment by Dave Rigby [ 11/Feb/14 ]
Screenshot of initial cluster config: maximum quota is total_RAM-1024
Comment by Aleksey Kondratenko [ 11/Feb/14 ]
Do not agree with that logic.

There's IMHO quite a bit of difference between default settings, recommended settings limit and allowed settings limit. The later can be wider for folks who really know what they're doing.
Comment by Aleksey Kondratenko [ 11/Feb/14 ]
Passed to Anil, because that's not my decision to change limits
Comment by Dave Rigby [ 11/Feb/14 ]
@Aleksey: I'm happy to resolve as something other than my (A,B,C), but the problem here is that many people haven't even been aware of this "extended" limit in the system - and moreover on a large system we actually advertise it in the GUI when specifying the allowed limit (see attached screenshot).

Furthermore, I *suspect* that this was originally only intended for upgrades for 1.6.X (see http://review.membase.org/#/c/4051/), but somehow is now being permitted for new clusters.

Ultimately I don't mind what our actual max quota value is, but the app behaviour should be consistent with the documentation (and the sizing advice we give people).
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
raising to product blocker.
this inconsistency has to be resolved - PM to re-align.
Comment by Anil Kumar [ 28/May/14 ]
Going with option B - Leave current behaviour, but document it.
Comment by Ruth Harris [ 17/Jul/14 ]
I only see the 80% number coming up as an example of setting the high water mark (85% suggested). The Server Quota section doesn't mention anything. The working set managment & ejection section(s) and item pager sub-section also mention high water mark.

Can you be more specific about where this information is? Anyway, the best solution is to add a "note" in the applicable section(s).

--ruth

Comment by Dave Rigby [ 21/Jul/14 ]
@Ruth: So the current product behaviour is that the ServerQuota limit depends on the maximum memory available:

* For machines with <= X MB of memory, the maximum server quota is 80% of total physical memory
* For machines with > X MB of memory, the maximum Server Quota is Total Physical Memory - 1024.

The value of 'X' is fixed in the code, but it wasn't obvious what it's actually is (it's derived from a few different things. I suggest you ask Alk who should be able to provide the value of it.




[MB-9632] diag / master events captured in log file Created: 22/Nov/13  Updated: 27/Aug/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.2.0, 2.5.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Blocker
Reporter: Steve Yen Assignee: Ravi Mayuram
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The information available in the diag / master events REST stream should be captured in a log (ALE?) file and hence available to cbcollect-info's and later analysis tools.

 Comments   
Comment by Aleksey Kondratenko [ 22/Nov/13 ]
It is already available in collectinfo
Comment by Dustin Sallings (Inactive) [ 26/Nov/13 ]
If it's only available in collectinfo, then it's not available at all. We lose most of the useful information if we don't run an http client to capture it continually throughout the entire course of a test.
Comment by Aleksey Kondratenko [ 26/Nov/13 ]
Feel free to submit a patch with exact behavior you need
Comment by Cihan Biyikoglu [ 27/Aug/14 ]
is this still relevant?




[MB-10214] Mac version update check is incorrectly identifying newest version Created: 14/Feb/14  Updated: 28/Aug/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.1, 2.2.0, 2.1.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: David Haikney Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified
Environment: Mac OS X

Attachments: PNG File upgrade_check.png    
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-12051 Update the Release_Server job on Jenk... Technical task Open Chris Hillery  
Is this a Regression?: Yes

 Description   
Running 2.1.1 version of couchbase on a Mac, "check for latest version" reports the latest version is already running (e.g. see attached screenshot)


 Comments   
Comment by Aleksey Kondratenko [ 14/Feb/14 ]
Definitely not ui bug. It's using phone home to find out about upgrades. And I have no idea who owns that now.
Comment by Steve Yen [ 12/Jun/14 ]
got an email from ravi to look into this
Comment by Steve Yen [ 12/Jun/14 ]
Not sure if this is correct analysis, but I did a quick scan of what I think is the mac installer, which I think is...

  https://github.com/couchbase/couchdbx-app

It gets its version string by running a "git describe", in the Makefile here...

  https://github.com/couchbase/couchdbx-app/blob/master/Makefile#L1

Currently, a "git describe" on master branch returns...

  $ git describe
  2.1.1r-35-gf6646fa

...which is *kinda* close to the reported version string in the screenshot ("2.1.1-764-rel").

So, I'm thinking one fix needed would be a tagging (e.g., "git tag -a FOO -m FOO") of the couchdbx-app repository.

So, reassigning to Phil to do that appropriately.

Also, it looks like the our mac installer is using an open-source packaging / installer / runtime library called "sparkle" (which might be a little under-maintained -- not sure).

  https://github.com/andymatuschak/Sparkle/wiki

The sparkle library seems to check for version updates by looking at the URL here...

  https://github.com/couchbase/couchdbx-app/blob/master/cb.plist.tmpl#L42

Which seems to either be...

  http://appcast.couchbase.com/membasex.xml

Or, perhaps...

  http://appcast.couchbase.com/couchbasex.xml

The appcast.couchbase.com appears to be actually an S3 bucket, off of our production couchbase AWS account. So those *.xml files need to be updated, as they currently have content that has older versions. For example, http://appcast.couchbase.com/couchbase.xml looks currently like...

    <rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sparkle="http://www.andymatuschak.org/xml-namespaces/sparkle" version="2.0">
    <channel>
    <title>Updates for Couchbase Server</title>
    <link>http://appcast.couchbase.com/couchbase.xml&lt;/link>
    <description>Recent changes to Couchbase Server.</description>
    <language>en</language>
    <item>
    <title>Version 1.8.0</title>
    <sparkle:releaseNotesLink>
    http://www.couchbase.org/wiki/display/membase/Couchbase+Server+1.8.0
    </sparkle:releaseNotesLink>
    <!-- date -u +"%a, %d %b %Y %H:%M:%S GMT" -->
    <pubDate>Fri, 06 Jan 2012 16:11:17 GMT</pubDate>
    <enclosure url="http://packages.couchbase.com/1.8.0/Couchbase-Server-Community.dmg" sparkle:version="1.8.0" sparkle:dsaSignature="MCwCFAK8uknVT3WOjPw/3LkQpLBadi2EAhQxivxe2yj6EU6hBlg9YK/5WfPa5Q==" length="33085691" type="application/octet-stream"/>
    </item>
    </channel>
    </rss>

Not updating the xml files, though, probably causes no harm. Just that our osx users won't be pushed news on updates.
Comment by Phil Labee [ 12/Jun/14 ]
This has nothing to do with "git describe". There should be no place in the product that "git describe" should be used to determine version info. See:

    http://hub.internal.couchbase.com/confluence/display/CR/Branching+and+Tagging

so there's definitely a bug in the Makefile.

The version update check seems to be out of date. The phone-home file is generated during:

    http://factory.hq.couchbase.com:8080/job/Product_Staging_Server/

but the process of uploading it is not automated.
Comment by Steve Yen [ 12/Jun/14 ]
Thanks for the links.

> This has nothing to do with "git describe".

My read of the Makefile makes me think, instead, that "git describe" is the default behavior unless it's overridden by the invoker of the make.

> There should be no place in the product that "git describe" should be used to determine version info. See:
> http://hub.internal.couchbase.com/confluence/display/CR/Branching+and+Tagging

It appears all this couchdbx-app / sparkle stuff predates that wiki page by a few years, so I guess it's inherited legacy.

Perhaps voltron / buildbot are not setting the PRODUCT_VERSION correctly before invoking the the couchdbx-app make, which makes the Makefile default to 'git describe'?

    commit 85710d16b1c52497d9f12e424a22f3efaeed61e4
    Date: Mon Jun 4 14:38:58 2012 -0700

    Apply correct product version number
    
    Get version number from $PRODUCT_VERSION if it's set.
    (Buildbot and/or voltron will set this.)
    If not set, default to `git describe` as before.
    
> The version update check seems to be out of date.

Yes, that's right. The appcast files are out of date.

> The phone-home file is generated during:
> http://factory.hq.couchbase.com:8080/job/Product_Staging_Server/

I think appcast files for OSX / sparkle are a _different_ mechanism than the phone-home file, and an appcast XML file does not appear to be generated/updated by the Product_Staging_Server job.

But, I'm not an expert or really qualified on the details here -- this is just my opinions from a quick code scan, not from actually doing/knowing.

Comment by Wayne Siu [ 01/Aug/14 ]
Per PM (Anil), we should get this fixed by 3.0 RC1.
Raising the priority to Critical.
Comment by Wayne Siu [ 07/Aug/14 ]
Phil,
Please provide update.
Comment by Anil Kumar [ 12/Aug/14 ]
Triage - Upgrading to 3.0 Blocker

Comment by Wayne Siu [ 20/Aug/14 ]
Looks like we may have a short term "fix" for this ticket which Ceej and I have tested.
@Ceej, can you put in the details here?
Comment by Chris Hillery [ 20/Aug/14 ]
The file is hosted in S3, and we proved tonight that overwriting that file (membasex.xml) with a version containing updated version information and download URLs works as expected. We updated it to point to 2.2 for now, since that is the latest version with a freely-available download URL.

We can update the Release_Server job on Jenkins to create an updated version of this XML file from a template, and upload it to S3.

Assigning back to Wayne for a quick question: Do we support Enterprise edition for MacOS? If we do, then this solution won't be sufficient without more effort, because the two editions will need different Sparkle configurations for updates. Also, Enterprise edition won't be able to directly download the newer release, unless we provide a "hidden" URL for that (the download link on the website goes to a form).




Mac version update check is incorrectly identifying newest version (MB-10214)

[MB-12051] Update the Release_Server job on Jenkins to include updating the file (membasex.xml) and the download URL Created: 22/Aug/14  Updated: 22/Aug/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.1, 2.2.0, 2.1.1, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Technical task Priority: Blocker
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We can update the Release_Server job on Jenkins to create an updated version of this XML file from a template, and upload it to S3.




[MB-9917] DOC - memcached should dynamically adjust the number of worker threads Created: 14/Jan/14  Updated: 24/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Trond Norbye Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
4 threads is probably not ideal for a 24 core system ;)

 Comments   
Comment by Anil Kumar [ 25/Mar/14 ]
Trond - Can you explain is this new feature in 3.0 or fixing documentation on older docs?
Comment by Ruth Harris [ 17/Jul/14 ]
Trond, Could you provide more information here and then reassign to me? --ruth
Comment by Trond Norbye [ 24/Jul/14 ]
New in 3.0 is that memcached no longer defaults to 4 threads for the frontend, but use 75% of the number of cores reported of the system (with a minimum of 4 cores).

There are 3 ways to tune this:

* Export MEMCACHED_NUM_CPUS=number of threads you want before starting couchbase server

* Use the -t <number> command line argument (this will go away in the future)

* specify it in the configuration file read during startup (but when started from the full server this file is regenerated every time so you'll loose the modifications)




[MB-12052] add stale=false semantic changes to release notes Created: 22/Aug/14  Updated: 28/Aug/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Matt Ingenthron Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Need to release note the stale=false semantic changes.
This doesn't seem to be in current release notes, though MB-11589 seems loosely related.

Please use/adapt from the following text:
Starting with the 3.0 release, the "stale" view query argument "false" has been enhanced so it will consider all document changes which have been received at the time the query has been received. This means that use of the `durability requirements` or `observe` feature to block for persistence in application code before issuing the `false` stale query is no longer needed. It is recommended that you remove all such application level checks after completing the upgrade to the 3.0 release.

- - -

Ruth: assigning this to you to work out the right way to work the text into the release notes. This probably goes with a change in a different MB.




[MB-12096] collect_server_info.py does not work on a dev tree on windows.. Created: 29/Aug/14  Updated: 03/Sep/14

Status: Open
Project: Couchbase Server
Component/s: test-execution
Affects Version/s: techdebt-backlog
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Trond Norbye Assignee: Tommie McAfee
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
cbcollectinfo.py tries to use ssh to collect the files even if the target machine is the same machine as the test is running on, and that doesn't seem to work on my windows development box. Since all of the files should be local it could might as well use normal copy.




[MB-7250] Mac OS X App should be signed by a valid developer key Created: 22/Nov/12  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build, installer
Affects Version/s: 2.0-beta-2, 2.1.0, 2.2.0, 2.5.0, 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: J Chris Anderson Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Build_2.5.0-950.png     PNG File Screen Shot 2013-02-17 at 9.17.16 PM.png     PNG File Screen Shot 2013-04-04 at 3.57.41 PM.png     PNG File Screen Shot 2013-08-22 at 6.12.00 PM.png     PNG File ss_2013-04-03_at_1.06.39 PM.png    
Issue Links:
Dependency
depends on MB-9437 macosx installer package fails during... Closed
Relates to
relates to CBLT-104 Enable Mac developer signing on Mac b... Open
Is this a Regression?: No

 Description   
Currently launching the Mac OS X version tells you it's from an unidentified developer. You have to right click to launch the app. We can fix this.

 Comments   
Comment by Farshid Ghods (Inactive) [ 22/Nov/12 ]
Chris,

do you know what needs to change on the build machine to embed our developer key ?
Comment by J Chris Anderson [ 22/Nov/12 ]
I have no idea. I could start researching how to get a key from Apple but maybe after the weekend. :)
Comment by Farshid Ghods (Inactive) [ 22/Nov/12 ]
we can discuss this next week : ) . Thanks for reporting the issue Chris.
Comment by Steve Yen [ 26/Nov/12 ]
we'll want separate, related bugs (tasks) for other platforms, too (windows, linux)
Comment by Jens Alfke [ 30/Nov/12 ]
We need to get a developer ID from Apple; this will give us some kind of cert, and a local private key for signing.
Then we need to figure out how to get that key and cert onto the build machine, in the Keychain of the account that runs the buildbot.
Comment by Farshid Ghods (Inactive) [ 02/Jan/13 ]
the instructions to build is available here :
https://github.com/couchbase/couchdbx-app
we need to add codesign as a build step there
Comment by Farshid Ghods (Inactive) [ 22/Jan/13 ]
Phil,

do you have any update on this ticket. ?
Comment by Phil Labee [ 22/Jan/13 ]
I have signing cert installed on 10.17.21.150 (MacBuild).

Change to Makefile: http://review.couchbase.org/#/c/24149/
Comment by Phil Labee [ 23/Jan/13 ]
need to change master.cfg and pass env.var. to package-mac
Comment by Phil Labee [ 29/Jan/13 ]
disregard previous. Have added signing to Xcode projects.

see http://review.couchbase.org/#/c/24273/
Comment by Phil Labee [ 31/Jan/13 ]
To test this go to System Preferences / Security & Privacy, and on the General tab set "Allow applications downloaded from" to "Mac App Store and Identified Developers". Set this before running Couchbase Server.app the first time. Once an app has been allowed to run this setting is no longer checked for that app, and there doesn't seem to be a way to reset that.

What is odd is that on my system, I allowed one unsigned build to run before restricting the app run setting, and then no other unsigned builds would be checked (and would all be allowed to run). Either there is a flaw in my testing methodology, or a serious weakness in this security setting: Just because one app called Couchbase Server was allowed to run should confer this privilege to other apps with the same name. A common malware tactic is to modify a trusted app and distribute it as update, and if the security setting keys off the app name it will do nothing to prevent that.

I'm approving this change without having satisfactorily tested it.
Comment by Jens Alfke [ 31/Jan/13 ]
Strictly speaking it's not the app name but its bundle ID, i.e. "com.couchbase.CouchbaseServer" or whatever we use.

> I allowed one unsigned build to run before restricting the app run setting, and then no other unsigned builds would be checked

By OK'ing an unsigned app you're basically agreeing to toss security out the window, at least for that app. This feature is really just a workaround for older apps. By OK'ing the app you're not really saying "yes, I trust this build of this app" so much as "yes, I agree to run this app even though I don't trust it".

> A common malware tactic is to modify a trusted app and distribute it as update

If it's a trusted app it's hopefully been signed, so the user wouldn't have had to waive signature checking for it.
Comment by Jens Alfke [ 31/Jan/13 ]
Further thought: It might be a good idea to change the bundle ID in the new signed version of the app, because users of 2.0 with strict security settings have presumably already bypassed security on the unsigned version.
Comment by Jin Lim [ 04/Feb/13 ]
Per bug scrubs, keep this a blocker since customers ran into this issues (and originally reported it).
Comment by Phil Labee [ 06/Feb/13 ]
revert the change so that builds can complete. App is currently not being signed.
Comment by Farshid Ghods (Inactive) [ 11/Feb/13 ]
i suggest for 2.0.1 release we do this build manually.
Comment by Jin Lim [ 11/Feb/13 ]
As one-off fix, add the signature manually and automate the required steps later in 2.0.2 or beyond.
Comment by Jin Lim [ 13/Feb/13 ]
Please move this bug to 2.0.2 after populating the required signature manually. I am lowing the severity to critical for it isn't no longer a blocking issue.
Comment by Farshid Ghods (Inactive) [ 15/Feb/13 ]
Phil to upload the binary to latestbuilds , ( 2.0.1-101-rel.zip )
Comment by Phil Labee [ 15/Feb/13 ]
Please verify:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-160-rel-signed.zip
Comment by Phil Labee [ 15/Feb/13 ]
uploaded:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-160-rel-signed.zip

I can rename it when uploading for release.
Comment by Farshid Ghods (Inactive) [ 17/Feb/13 ]
i still do get the error that it is from an identified developer.

Comment by Phil Labee [ 18/Feb/13 ]
operator error.

I rebuilt the app, this time verifying that the codesign step occurred.

Uploaded now file to same location:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-160-rel-signed.zip
Comment by Phil Labee [ 26/Feb/13 ]
still need to perform manual workaround
Comment by Phil Labee [ 04/Mar/13 ]
release candidate has been uploaded to:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-172-signed.zip
Comment by Wayne Siu [ 03/Apr/13 ]
Phil, looks like version 172/185 is still getting the error. My Mac version is 10.8.2
Comment by Thuan Nguyen [ 03/Apr/13 ]
Install couchbase server (build 2.0.1-172 community version) in my mac osx 10.7.4 , I only see the warning message
Comment by Wayne Siu [ 03/Apr/13 ]
Latest version (04.03.13) : http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.1-185-rel.zip
Comment by Maria McDuff (Inactive) [ 03/Apr/13 ]
works in 10.7 but not in 10.8.
if we can get the fix for 10.8 by tomorrow, end of day, QE is willing to test for release on tuesday, april 9.
Comment by Phil Labee [ 04/Apr/13 ]
The mac builds are not being automatically signed, so build 185 is not signed. The original 172 is also not signed.

Did you try

    http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-172-signed.zip

to see if that was signed correctly?

Comment by Wayne Siu [ 04/Apr/13 ]
Phil,
Yes, we did try the 172-signed version. It works on 10.7 but not 10.8. Can you take a look?
Comment by Phil Labee [ 04/Apr/13 ]
I rebuilt 2.0.1-185 and uploaded a signed app to:

    http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-185-rel.SIGNED.zip

Test on a machine that has never had Couchbase Server installed, and has the security setting to only allow Appstore or signed apps.

If you get the "Couchbase Server.app was downloaded from the internet" warning and you can click OK and install it, then this bug is fixed. The quarantining of files downloaded by a browser is part of the operating system and is not controlled by signing.
Comment by Wayne Siu [ 04/Apr/13 ]
Tried the 185-signed version (see attached screen shot). Same error message.
Comment by Phil Labee [ 04/Apr/13 ]
This is not an error message related to this bug.

Comment by Maria McDuff (Inactive) [ 14/May/13 ]
per bug triage, we need to have mac 10.8 osx working since it is a supported platform (published in the website).
Comment by Wayne Siu [ 29/May/13 ]
Work Around:
Step One
Hold down the Control key and click the application icon. From the contextual menu choose Open.

Step Two
A popup will appear asking you to confirm this action. Click the Open button.
Comment by Anil Kumar [ 31/May/13 ]
we need to address signed key for both Windows and Mac deferring this to next release.
Comment by Dipti Borkar [ 08/Aug/13 ]
Please let's make sure this is fixed in 2.2.
Comment by Phil Labee [ 16/Aug/13 ]
New keys will be created using new account.
Comment by Phil Labee [ 20/Aug/13 ]
iOS Apps
--------------
Certificates:
  Production:
    "Couchbase, Inc." type=iOS Distribution expires Aug 12, 2014

    ~buildbot/Desktop/appledeveloper.couchbase.com/certs/ios/ios_distribution_appledeveloper.couchbase.com.cer

Identifiers:
  App IDS:
    "Couchbase Server" id=com.couchbase.*

Provisining Profiles:
  Distribution:
    "appledeveloper.couchbase.com" type=Distribution

  ~buildbot/Desktop/appledeveloper.couchbase.com/profiles/ios/appledevelopercouchbasecom.mobileprovision
Comment by Phil Labee [ 20/Aug/13 ]
Mac Apps
--------------
Certificates:
  Production:
    "Couchbase, Inc." type=Mac App Distribution (Aug,15,2014)
    "Couchbase, Inc." type=Developer ID installer (Aug,16,2014)
    "Couchbase, Inc." type=Developer ID Application (Aug,16,2014)
    "Couchbase, Inc." type=Mac App Distribution (Aug,15,2014)

     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/mac_app_distribution.cer
     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/developerID_installer.cer
     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/developererID_application.cer
     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/mac_app_distribution-2.cer

Identifiers:
  App IDs:
    "Couchbase Server" id=couchbase.com.* Prefix=N2Q372V7W2
    "Coucbase Server adhoc" id=couchbase.com.* Prefix=N2Q372V7W2
    .

Provisioning Profiles:
  Distribution:
    "appstore.couchbase.com" type=Distribution
    "Couchbase Server adhoc" type=Distribution

     ~buildbot/Desktop/appledeveloper.couchbase.com/profiles/appstorecouchbasecom.privisioningprofile
     ~buildbot/Desktop/appledeveloper.couchbase.com/profiles/Couchbase_Server_adhoc.privisioningprofile

Comment by Phil Labee [ 21/Aug/13 ]

As of build 2.2.0-806 the app is signed by a new provisioning profile
Comment by Phil Labee [ 22/Aug/13 ]
 Install version 2.2.0-806 on a macosx 10.8 machine that has never had Couchbase Server installed, which has the security setting to require applications to be signed with a developer ID.
Comment by Phil Labee [ 22/Aug/13 ]
please assign to tester
Comment by Maria McDuff (Inactive) [ 22/Aug/13 ]
just tried this against newest build 809:
still getting restriction message. see attached.
Comment by Maria McDuff (Inactive) [ 22/Aug/13 ]
restriction still exists.
Comment by Maria McDuff (Inactive) [ 28/Aug/13 ]
verified in rc1 (build 817). still not fixed. getting same msg:
“Couchbase Server” can’t be opened because it is from an unidentified developer.
Your security preferences allow installation of only apps from the Mac App Store and identified developers.

Work Around:
Step One
Hold down the Control key and click the application icon. From the contextual menu choose Open.

Step Two
A popup will appear asking you to confirm this action. Click the Open button.
Comment by Phil Labee [ 03/Sep/13 ]
Need to create new certificates to replace these that were revoked:

Certificate: Mac Development
Team Name: Couchbase, Inc.

Certificate: Mac Installer Distribution
Team Name: Couchbase, Inc.

Certificate: iOS Development
Team Name: Couchbase, Inc.

Certificate: iOS Distribution
Team Name: Couchbase, Inc.
Comment by Maria McDuff (Inactive) [ 18/Sep/13 ]
candidate for 2.2.1 bug fix release.
Comment by Dipti Borkar [ 28/Oct/13 ]
This is going to make it into 2.5? We seemed to keep differing it?
Comment by Phil Labee [ 29/Oct/13 ]
cannot test changes with installer that fails
Comment by Phil Labee [ 11/Nov/13 ]
Installed certs as buildbot and signed app with "(recommended) 3rd Party Mac Developer Application", producing

    http://factory.hq.couchbase.com//couchbase_server_2.5.0_MB-7250-001.zip

Signed with "(Oct 30) 3rd Party Mac Developer Application: Couchbase, Inc. (N2Q372V7W2)", producing

    http://factory.hq.couchbase.com//couchbase_server_2.5.0_MB-7250-002.zip

These zip files were made on the command line, not a result of the make command. They are 2.5G in size, so they obviously include mote than the zip files produced by the make command.

Both versions of the app appear to be signed correctly!

Note: cannot run make command from ssh session. Must Remote Desktop in and use terminal shell natively.
Comment by Phil Labee [ 11/Nov/13 ]
Finally, some progress: If the zip file is made using the --symlinks argument it appears to be un-signed. If the symlinked files are included, the app appears to be signed correctly.

The zip file with symlinks is 60M, while the zip file with copies of the files is 2.5G, more than 40X the size.
Comment by Phil Labee [ 25/Nov/13 ]
Fixed in 2.5.0-950
Comment by Dipti Borkar [ 25/Nov/13 ]
Maria, can QE please verify this?
Comment by Wayne Siu [ 28/Nov/13 ]
Tested with build 2.5.0-950. Still see the warning box (attached).
Comment by Wayne Siu [ 19/Dec/13 ]
Phil,
Can you give an update on this?
Comment by Ashvinder Singh [ 14/Jan/14 ]
I tested the code signature with apple utility "spctl -a -v /Applications/Couchbase\ Server.app/" and got the output :
>>> /Applications/Couchbase Server.app/: a sealed resource is missing or invalid

also tried running the command:
 
bash: codesign -dvvvv /Applications/Couchbase\ Server.app
>>>
Executable=/Applications/Couchbase Server.app/Contents/MacOS/Couchbase Server
Identifier=com.couchbase.couchbase-server
Format=bundle with Mach-O thin (x86_64)
CodeDirectory v=20100 size=639 flags=0x0(none) hashes=23+5 location=embedded
Hash type=sha1 size=20
CDHash=868e4659f4511facdf175b44a950b487fa790dc4
Signature size=4355
Authority=3rd Party Mac Developer Application: Couchbase, Inc. (N2Q372V7W2)
Authority=Apple Worldwide Developer Relations Certification Authority
Authority=Apple Root CA
Signed Time=Jan 8, 2014, 10:59:16 AM
Info.plist entries=31
Sealed Resources version=1 rules=4 files=5723
Internal requirements count=1 size=216

It looks like the code signature is present but got invalid as the new file were added/modified to the project. I suggest for the build team to rebuild and add the code signature again.
Comment by Phil Labee [ 17/Apr/14 ]
need VM to clone for developer experimentation
Comment by Anil Kumar [ 18/Jul/14 ]
Any update on this? We need this for 3.0.0 GA.

Please update the ticket.

Triage - July 18th
Comment by Wayne Siu [ 02/Aug/14 ]
Siri is helping to figure out what the next step is.
Comment by Anil Kumar [ 13/Aug/14 ]
Jens - Assigning as per Ravi's request.
Comment by Chris Hillery [ 13/Aug/14 ]
Jens requested assistance in setting up a MacOS development environment for building Couchbase. Phil (or maybe Siri?), can you help him with that?
Comment by Phil Labee [ 13/Aug/14 ]
The production macosx builder has been cloned:

    10.6.2.159 macosx-x64-server-builder-01-clone

if you want to use your own host, see:

    http://hub.internal.couchbase.com/confluence/display/CR/How+to+Setup+a+MacOSX+Server+Build+Node
Comment by Jens Alfke [ 15/Aug/14 ]
Here are the Apple docs on building apps signed with a Developer ID: https://developer.apple.com/library/mac/documentation/IDEs/Conceptual/AppDistributionGuide/DistributingApplicationsOutside/DistributingApplicationsOutside.html#//apple_ref/doc/uid/TP40012582-CH12-SW2

I've got everything configured, but the build process fails at the final step, after I press the Distribute button in the Organizer window. I get a very uninformative error alert, "Code signing operation failed / Check that the identity you selected is valid."

I've asked for help on the xcode-users mailing list. Blocked until I hear something back.
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC1
Comment by Phil Labee [ 25/Aug/14 ]
from Apple Developer mail list:

Dear Developer,

With the release of OS X Mavericks 10.9.5, the way that OS X recognizes signed apps will change. Signatures created with OS X Mountain Lion 10.8.5 or earlier (v1 signatures) will be obsoleted and Gatekeeper will no longer recognize them. Users may receive a Gatekeeper warning and will need to exempt your app to continue using it. To ensure your apps will run without warning on updated versions of OS X, they must be signed on OS X Mavericks 10.9 or later (v2 signatures).

If you build code with an older version of OS X, use OS X Mavericks 10.9 or later to sign your app and create v2 signatures using the codesign tool. Structure your bundle according to the signature evaluation requirements for OS X Mavericks 10.9 or later. Considerations include:

 * Signed code should only be placed in directories where the system expects to find signed code.

 * Resources should not be located in directories where the system expects to find signed code.

 * The --resource-rules flag and ResourceRules.plist are not supported.

Make sure your current and upcoming releases work properly with Gatekeeper by testing on OS X Mavericks 10.9.5 and OS X Yosemite 10.10 Developer Preview 5 or later. Apps signed with v2 signatures will work on older versions of OS X.

For more details, read “Code Signing changes in OS X Mavericks” and “Changes in 
OS X 10.9.5 and Yosemite Developer Preview 5” in OS X Code Signing In Depth":

    http://c.apple.com/r?v=2&la=en&lc=us&a=EEjRsqZNfcheZauIAhlqmxVG35c6HJuf50mGu47LWEktoAjykEJp8UYqbgca3uWG&ct=AJ0T0e3y2W

Best regards,
Apple Developer Technical Support
Comment by Phil Labee [ 28/Aug/14 ]
change to buildbot-internal to unlock keychain before running make and lock after:

    http://review.couchbase.org/#/c/41028/

change to couchdbx-app to sign app, on dev branch "plabee/MB-7250":

    http://review.couchbase.org/#/c/41025/

change to manifest to use this dev branch for 3.0.1 builds:

    http://review.couchbase.org/#/c/41026/
Comment by Wayne Siu [ 29/Aug/14 ]
Moving it to 3.0.1.




[MB-12204] New doc-system does not have anchors Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The support team uses anchors all the time to link customers directly to the selection that has the information they required.

 know that we have broken a number of sections out into their own page but there are still some long pages for example:

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Misc/security-client-ssl.html


It would be good if we could link the customer directly to: "Configuring the PHP client for SSL"

I have marked this as a blocker as it will affect the way the support team works today.




[MB-12126] there is not manifest file on windows 3.0.1-1253 Created: 03/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 r2 64-bit

Attachments: PNG File ss 2014-09-03 at 12.05.41 PM.png    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 3.0.1-1253 on windows server 2008 r2 64-bit. There is not manifest file in directory c:\Program Files\Couchbase\Server\



 Comments   
Comment by Chris Hillery [ 03/Sep/14 ]
Also true for 3.0 RC2 build 1205.
Comment by Chris Hillery [ 03/Sep/14 ]
(Side note: While fixing this, log onto build slaves and delete stale "server-overlay/licenses.tgz" file so we stop shipping that)
Comment by Anil Kumar [ 17/Sep/14 ]
Ceej - Any update on this?
Comment by Chris Hillery [ 18/Sep/14 ]
No, not yet.




[MB-12185] update to "couchbase" from "membase" in gerrit mirroring and manifests Created: 14/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.0, 2.5.1, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Blocker
Reporter: Matt Ingenthron Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-8297 Some key projects are still hosted at... Resolved

 Description   
One of the key components of Couchbase is still only at github.com/membase and not at github.com/couchbase. I think it's okay to mirror to both locations (not that there's an advantage), but for sure it should be at couchbase and the manifest for Couchbase Server releases should be pointing to Couchbase.

I believe the steps here are as follows:
- Set up a github.com/couchbase/memcached project (I've done that)
- Update gerrit's commit hook to update that repository
- Change the manifests to start using that repository

Assigning this to build as a component, as gerrit is handled by the build team. Then I'm guessing it'll need to be handed over to Trond or another developer to do the manifest change once gerrit is up to date.

Since memcached is slow changing now, perhaps the third item can be done earlier.

 Comments   
Comment by Chris Hillery [ 15/Sep/14 ]
Actually manifests are owned by build team too so I will do both parts.

However, the manifest for the hopefully-final release candidate already exists, and I'm a teensy bit wary about changing it after the fact. The manifest change may need to wait for 3.0.1.
Comment by Matt Ingenthron [ 15/Sep/14 ]
I'll leave it to you to work out how to fix it, but I'd just point out that manifest files are mutable.
Comment by Chris Hillery [ 15/Sep/14 ]
The manifest we build from is mutable. The historical manifests recording what we have already built really shouldn't be.
Comment by Matt Ingenthron [ 15/Sep/14 ]
True, but they are. :) That was half me calling back to our discussion about tagging and mutability of things in the Mountain View office. I'm sure you remember that late night conversation.

If you can help here Ceej, that'd be great. I'm just trying to make sure we have the cleanest project possible out there on the web. One wart less will bring me to 999,999 or so. :)
Comment by Trond Norbye [ 15/Sep/14 ]
Just a FYI, we've been ramping up the changes to memcached, so it's no longer a slow moving component ;-)
Comment by Matt Ingenthron [ 15/Sep/14 ]
Slow moving w.r.t. 3.0.0 though, right? That means the current github.com/couchbase/memcached probably has the commit planned to be released, so it's low risk to update github.com/couchbase/manifest with the couchbase repo instead of membase.

That's all I meant. :)
Comment by Trond Norbye [ 15/Sep/14 ]
_all_ components should be slow moving with respect to 3.0.0 ;)
Comment by Chris Hillery [ 16/Sep/14 ]
Matt, it appears that couchbase/memcached is a *fork* of membase/memcached, which is probably undesirable. We can actively rename the membase/memcached project to couchbase/memcached, and github will automatically forward requests from the old name to the new so it is seamless. It also means that we don't have to worry about migrating any commits, etc.

Does anything refer to couchbase/memcached already? Could we delete that one outright and then rename membase/memcached instead?
Comment by Matt Ingenthron [ 16/Sep/14 ]
Ah, that would be my fault. I propose deleting the couchbase/memcached and then transferring ownership from membase/memcached to couchbase/memcached. I think that's what you meant by "actively rename", right? Sounds like a great plan.

I think that's all in your hands Ceej, but I'd be glad to help if needed.

I still think in the interest of reducing warts, it'd be good to fix the manifest.
Comment by Chris Hillery [ 16/Sep/14 ]
I will do that (rename the repo), just please confirm explicitly that temporarily deleting couchbase/memcached won't cause the world to end. :)
Comment by Matt Ingenthron [ 16/Sep/14 ]
It won't since it didn't exist until this last Sunday when I created this ticket. If something world-ending happens as a result, I'll call it a bug to have depended on it. ;)
Comment by Chris Hillery [ 18/Sep/14 ]
I deleted couchbase/memcached and then transferred ownership of membase/memcached to couchbase. The original membase/memcached repository had a number of collaborators, most of which I think were historical. For now, couchbase/memcached only has "Owners" and "Robots" listed as collaborators, which is generally the desired configuration.

http://review.couchbase.org/#/c/41470/ proposes changes to the active manifests. I see no problem with committing that.

As for the historical manifests, there are two:

1. Sooner or later we will add a "released/3.0.0.xml" manifest to the couchbase/manifest repository, representing the exact SHAs which were built. I think it's probably OK to retroactively change the remote on that manifest since the two repositories are aliases for each other. This will affect any 3.0.0 hotfixes which are built, etc.

2. However, all of the already-built 3.0 packages (.deb / .rpm / .zip files) have embedded in them the manifest which was used to build them. Those, unfortunately, cannot be changed at this time. Doing so would require re-packaging the deliverables which have already undergone QE validation. While it is technically possible to do so, it would be a great deal of manual work, and IMHO a non-trivial and unnecessary risk. The only safe solution would be to trigger a new build, but in that case I would argue we would need to re-validate the deliverables, which I'm sure is a non-starter for PM. I'm afraid this particular sub-wart will need to wait for 3.0.1 to be fully addressed.
Comment by Matt Ingenthron [ 18/Sep/14 ]
Excellent, thanks Ceej. I think this is a great improvement-- espeically if 3.0.0's release manifest no longer references membase.

I'll leave it to the build team to manage, but I might suggest that gerrit and various other things pointing to membase should slowly change as well, in case someone decides someday to cancel the membase organization subscription to github.




[MB-12090] add stale=false semantic changes to dev guide Created: 28/Aug/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Matt Ingenthron Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: No

 Description   
Need to change the dev guide to explain the semantics change with the stale parameter.

 Comments   
Comment by Matt Ingenthron [ 28/Aug/14 ]
I could not find the 3.0 dev guide to write up something. I've generated a diff based on the 2.5 dev guide. Note that much of that dev guide refers to the 3.0 admin guide section on views. I could not find that in the "dita" directory so I could contribute a change to the XML. I think based on this and what I put in MB-12052 should help.


diff --git a/content/couchbase-devguide-2.5/finding-data-with-views.markdown b/content/couchbase-devguide-2.5/finding-data-with-views.markdown
index 77735b9..811dff0 100644
--- a/content/couchbase-devguide-2.5/finding-data-with-views.markdown
+++ b/content/couchbase-devguide-2.5/finding-data-with-views.markdown
@@ -1,6 +1,6 @@
 # Finding Data with Views
 
-In Couchbase 2.1.0 you can index and query JSON documents using *views*. Views
+In Couchbase you can index and query JSON documents using *views*. Views
 are functions written in JavaScript that can serve several purposes in your
 application. You can use them to:
 
@@ -323,16 +323,25 @@ Forinformation about the sort order of indexes, see the
 [Couchbase Server Manual](http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/).
 
 The real-time nature of Couchbase Server means that an index can become outdated
-fairly quickly when new entries and updates occur. Couchbase Server generates
-the index when it is queried, but in the meantime more data can be added to the
-server and this information will not yet be part of the index. To resolve this,
-Couchbase SDKs and the REST API provide a `stale` parameter you use when you
-query a view. With this parameter you can indicate you will accept the most
-current index as it is, you want to trigger a refresh of the index and retrieve
-these results, or you want to retrieve the existing index as is but also trigger
-a refresh of the index. For instance, to query a view with the stale parameter
-using the Ruby SDK:
+fairly quickly when new entries and updates occur. Couchbase Server updates
+the index at the time the query is received if you supply the argument
+`false` to the `stale` parameter.
+
+<div class="notebox">
+<p>Note</p>
+<p>Starting with the 3.0 release, the "stale" view query argument
+"false" has been enhanced so it will consider all document changes
+which have been received at the time the query has been received. This
+means that use of the `durability requirements` or `observe` feature
+to block for persistence in application code before issuing the
+`false` stale query is no longer needed. It is recommended that you
+remove all such application level checks after completing the upgrade
+to the 3.0 release.
+</p>
+</div>
 
+For instance, to query a view with the stale parameter
+using the Ruby SDK:
 
 ```
 doc.recent_posts(:body => {:stale => :ok})
@@ -905,13 +914,14 @@ for(ViewRow row : result) {
 }
 ```
 
-Before we create a Couchbase client instance and connect to the server, we set a
-system property 'viewmode' to 'development' to put the view into production
-mode. Then we query our view and limit the number of documents returned to 20
-items. Finally when we query our view we set the `stale` parameter to FALSE to
-indicate we want to reindex and include any new or updated beers in Couchbase.
-For more information about the `stale` parameter and index updates, see Index
-Updates and the Stale Parameter in the
+Before we create a Couchbase client instance and connect to the
+server, we set a system property 'viewmode' to 'development' to put
+the view into production mode. Then we query our view and limit the
+number of documents returned to 20 items. Finally when we query our
+view we set the `stale` parameter to FALSE to indicate we want to
+consider any recent changes to documents. For more information about
+the `stale` parameter and index updates, see Index Updates and the
+Stale Parameter in the
 [Couchbase Server Manual](http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#couchbase-views-writing-stale).
 
 The last part of this code sample is a loop we use to iterate through each item




[MB-9656] XDCR destination endpoints for "getting xdcr stats via rest" in url encoding Created: 29/Nov/13  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.1.0, 2.2.0, 2.1.1, 3.0, 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Blocker
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: customer, supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/couchbase-manual-2.2/#getting-xdcr-stats-via-rest


 Description   
In our documentation the destination endpoint are not in url encoding where "/" are "%2F". This has mislead customers. That section should be in the following format:

replications%2F[UUID]%2F[source_bucket]%2F[destination_bucket]%2Fdocs_written

If this change is made we should remove this line too:

You need to provide properly URL-encoded /[UUID]/[source_bucket]/[destination_bucket]/[stat_name]. To get the number of documents written:



 Comments   
Comment by Amy Kurtzman [ 16/May/14 ]
The syntax and example code in this whole REST section needs to be cleaned up and tested. It is a bigger job than just fixing this one.
Comment by Patrick Varley [ 17/Sep/14 ]
I fall down this hole again and so do another Support Engineer. We really need to get this fixed in all versions.

The 3.0 documentation has this problem too.
Comment by Ruth Harris [ 17/Sep/14 ]
Why are you suggesting that the backslash in the syntax be %2F???
This is not a blocker.
Comment by Patrick Varley [ 18/Sep/14 ]
I believe this is a blocker has it has consumed a large amount of support's time on 3 separate occasion now. That also mean 3 separate end-users have had issues with this documentation.

Because if you use backslashes it does not work. Look at the examples further down the page.

This url works:
curl -u admin:password http://localhost:8091/pools/default/buckets/default/stats/replications%2F8ba6870d88cd72b3f1db113fc8aee675%2Fsource_bucket%2Fdestination_bucket%2Fdocs_written

This url does not:
 curl -u admin:password http://localhost:8091/pools/default/buckets/default/stats/replications/8ba6870d88cd72b3f1db113fc8aee675/source_bucket/destination_bucket/docs_written

That is pretty hard to workout from our documentation.




[MB-6972] distribute couchbase-server through yum and ubuntu package repositories Created: 19/Oct/12  Updated: 19/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.1.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Anil Kumar Assignee: Phil Labee
Resolution: Unresolved Votes: 3
Labels: devX
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-8693 [Doc] distribute couchbase-server thr... Reopened
blocks MB-7821 yum install couchbase-server from cou... Resolved
Duplicate
duplicates MB-2299 Create signed RPM's Resolved
is duplicated by MB-9409 repository for deb packages (debian&u... Resolved
Flagged:
Release Note

 Description   
this helps us in handling dependencies that are needed for couchbase server
sdk team has already implemented this for various sdk packages.

we might have to make some changes to our packaging metadata to work with this schema

 Comments   
Comment by Steve Yen [ 26/Nov/12 ]
to 2.0.2 per bug-scrub

first step is do the repositories?
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Farshid Ghods (Inactive) [ 19/Dec/12 ]
Phil,
please sync up with Farshid and get instructions that Sergey and Pavel sent
Comment by Farshid Ghods (Inactive) [ 28/Jan/13 ]
we should resolve this task once 2.0.1 is released .
Comment by Dipti Borkar [ 29/Jan/13 ]
Have we figured out the upgrade process moving forward. for example from 2.0.1 to 2.0.2 or 2.0.1 to 2.1 ?
Comment by Jin Lim [ 04/Feb/13 ]
Please ensure that we also confirm/validate the upgrade process moving from 2.0.1 to 2.0.2. Thanks.
Comment by Phil Labee [ 06/Feb/13 ]
Now have DEB repo working, but another issue has come up: We need to distribute the public key so that users can install the key before running apt-get.

wiki page has been updated.
Comment by kzeller [ 14/Feb/13 ]
Added to 2.0.1 RN as:

Fix:

We now provide Couchbase Server as a yum and Debian package
repositories.
Comment by Matt Ingenthron [ 09/Apr/13 ]
What are the public URLs for these repositories? This was mentioned in the release notes here:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-server-rn_2-0-0l.html
Comment by Matt Ingenthron [ 09/Apr/13 ]
Reopening, since this isn't documented that I can find. Apologies if I'm just missing it.
Comment by Dipti Borkar [ 23/Apr/13 ]
Anil, can you work with Phil to see what are the next steps here?
Comment by Anil Kumar [ 24/Apr/13 ]
Yes I'll be having discussion with Phil and will update here with details.
Comment by Tim Ray [ 28/Apr/13 ]
could we either remove the note about yum/deb repo's in the release notes or get those repo locations / sample files / keys added to public pages? The only links that seem that they 'might' contain the info point to internal pages I don't have access to.
Comment by Anil Kumar [ 14/May/13 ]
thanks Tim, we have removed it from release notes. we will add instructions about yum/deb repo's locations/files/keys to documentation once its available. thanks!
Comment by kzeller [ 14/May/13 ]
Removing duplicate ticket:

http://www.couchbase.com/issues/browse/MB-7860
Comment by h0nIg [ 24/Oct/13 ]
any update? maybe i created a duplicate issue: http://www.couchbase.com/issues/browse/MB-9409 but it seems that the repositories are outdated on http://hub.internal.couchbase.com/confluence/display/CR/How+to+Use+a+Linux+Repo+--+debian
Comment by Sriram Melkote [ 22/Apr/14 ]
I tried to install on Debian today. It failed badly. One .deb package didn't match the libc version of stable. The other didn't match the openssl version. Changing libc or openssl is simply not an option for someone using Debian stable because it messes with the base OS too deeply. So as of 4/23/14, we don't have support for Debian.
Comment by Sriram Melkote [ 22/Apr/14 ]
Anil, we have accumulated a lot of input in this bug. I don't think this will realistically go anywhere for 3.0 unless we define specific goals and some considered platform support matrix expansion. Can you please create a goal for 3.0 more precisely?
Comment by Matt Ingenthron [ 22/Apr/14 ]
+1 on Siri's comments. Conversations I had with both Ubuntu (who recommend their PPAs) and Red Hat experts (who recommend setting up a repo or getting into EPEL or the like) indicated that's the best way to ensure coverage of all OSs. Binary packages built on one OS and deployed on another are risky, run into dependency issues.
Comment by Anil Kumar [ 28/Apr/14 ]
This ticket specially for distributing DEB and RPM repositories through YUM and APT repo. We have another ticket for supporting Debian platform MB-10960.
Comment by Anil Kumar [ 23/Jun/14 ]
Assigning ticket to Tony for verification.
Comment by Phil Labee [ 21/Jul/14 ]
Need to do before closing:

[ ] capture keys and process used for build that is currently posted (3.0.0-628), update tools and keys of record in build repo and wiki page
[ ] distribute 2.5.1 and 3.0.0-beta1 builds using same process, testing update capability
[ ] test update from 2.0.0 to 2.5.1 to 3.0.0
Comment by Phil Labee [ 21/Jul/14 ]
re-opening to assign to sprint to prepare the distribution repos for testing
Comment by Wayne Siu [ 30/Jul/14 ]
Phil,
has build 3.0.0-973 be updated in the repos for beta testing?
Comment by Wayne Siu [ 29/Aug/14 ]
Phil,
Please refresh it with build 3.0.0-1205. Thanks.
Comment by Phil Labee [ 04/Sep/14 ]
Due to loss of private keys used to post 3.0.0-628, created new key pairs. Upgrade testing was never done, so starting with 2.5.1 release version (2.5.1-1100).

upload and test using location http://packages.couchbase.com/linux-repos/TEST/:

  [X] ubuntu-12.04 x86_64
  [X] ubuntu-10.04 x86_64

  [X] centos-6-x86_64
  [X] centos-5-x86_64
Comment by Anil Kumar [ 04/Sep/14 ]
Phil / Wayne - Not sure whats happening here please clarify.
Comment by Wayne Siu [ 16/Sep/14 ]
Please refresh with the build 3.0.0-1209.
Comment by Phil Labee [ 17/Sep/14 ]
upgrade to 3.0.0-1209 using test location:

    s3://packages.couchbase.com/linux-repos/TEST/

  [X] ubuntu-12.04 x86_64
  [X] ubuntu-10.04 x86_64

  [X] centos-6-x86_64
  [X] centos-5-x86_64

Comment by Phil Labee [ 19/Sep/14 ]
now pushing 3.0.0-1209 to production location:

    s3://packages.couchbase.com/releases/couchbase-server/

  [X] centos-6-x86_64
  [X] centos-5-x86_64

    Please verify with instructions at: http://hub.internal.couchbase.com/confluence/display/CR/How+to+Download+from+a+Linux+Repo+--+RPM

  [X] ubuntu-12.04 x86_64
  [X] ubuntu-10.04 x86_64

    Please verify with instructions at: http://hub.internal.couchbase.com/confluence/display/CR/How+to+Download+from+a+Linux+Repo+--+Ubuntu

  [ ] debian-7-x86_64





[MB-11917] One node slow probably due to the Erlang scheduler Created: 09/Aug/14  Updated: 22/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Volker Mische Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File crash_toy_701.rtf     PNG File leto_ssd_300-1105_561_build_init_indexleto_ssd_300-1105_561172.23.100.31beam.smp_cpu.png    
Issue Links:
Duplicate
duplicates MB-12200 Seg fault during indexing on view-toy... Resolved
duplicates MB-9822 One of nodes is too slow during indexing Closed
is duplicated by MB-12183 View Query Thruput regression compare... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
One node is slow, that's probably due to the "scheduler collapse" bug in the Erlang VM R16.

I will try to find a way to verify that it is really the scheduler and no other problem. This is basically a duplicate of MB-9822. Though that bug has a long history, hence I dare to create a new one.

 Comments   
Comment by Volker Mische [ 09/Aug/14 ]
I forgot to add that our issue sounds exactly like that one: http://erlang.org/pipermail/erlang-questions/2012-October/069503.html
Comment by Sriram Melkote [ 11/Aug/14 ]
Upgrading to blocker as this is doubling initial index time in recent runs on showfast.
Comment by Volker Mische [ 12/Aug/14 ]
I verified that it's the "scheduler collapse". Have a look at the chart I've attached (It's from [1] [172.23.100.31] beam.smp_cpu). It starts with a utilization of around 400% at around 120 I reduced the online schedulers to 1 (with running erlang:system_flag(schedulers_online, 1) via a remote shell). I then increased the schedulers_online again at around 150 to the original value of 24. You can see that it got back to normal.

[1]: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1105_561_build_init_index
Comment by Volker Mische [ 12/Aug/14 ]
I would try to run on R16 and see how often it happens with COUCHBASE_NS_SERVER_VM_EXTRA_ARGS=["+swt", "low", "+sfwi", "100"] set (as suggested in MB-9822 [1]).

[1]: https://www.couchbase.com/issues/browse/MB-9822?focusedCommentId=89219&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-89219
Comment by Pavel Paulau [ 12/Aug/14 ]
We agreed to try:

+sfwi 100/500 and +sbwt long

Will run test 5 times with these options.
Comment by Pavel Paulau [ 13/Aug/14 ]
5 runs of tests/index_50M_dgm.test with -sfwi 100 -sbwt long:

http://ci.sc.couchbase.com/job/leto-dev/19/
http://ci.sc.couchbase.com/job/leto-dev/20/
http://ci.sc.couchbase.com/job/leto-dev/21/
http://ci.sc.couchbase.com/job/leto-dev/22/
http://ci.sc.couchbase.com/job/leto-dev/23/

3 normal runs, 2 with slowness.
Comment by Volker Mische [ 13/Aug/14 ]
I see only one slow run (22): http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1137_6a0_build_init_index

But still :-/
Comment by Pavel Paulau [ 13/Aug/14 ]
See (20), incremental indexing: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1137_ed9_build_incr_index
Comment by Volker Mische [ 13/Aug/14 ]
Oh, I was only looking at the initial building.
Comment by Volker Mische [ 13/Aug/14 ]
I got a hint in the #erlang IRC channel. I'll try to use the erlang:bump_reductions(2000) and see if that helps.
Comment by Volker Mische [ 13/Aug/14 ]
Let's see if bumping the reductions make it work: http://review.couchbase.org/40591
Comment by Aleksey Kondratenko [ 13/Aug/14 ]
merged that commit.
Comment by Pavel Paulau [ 13/Aug/14 ]
Just tested build 3.0.0-1150, rebalance test but with initial indexing phase.

2 nodes are super slow and utilize only single core.
Comment by Volker Mische [ 18/Aug/14 ]
I can't reproduce it locally. I tend towards closing this issue as "won't fix". We should really not have long running NIFS.

I also think that it won't happen much under real work loads. And even if, the workaround would be to reduce the number of online schedulers to 1 and immediately increasing it again back to the original number.
Comment by Volker Mische [ 18/Aug/14 ]
Assigning to Siri to make the call on whether we close it or not.
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC1
Comment by Raju Suravarjjala [ 19/Aug/14 ]
Triage: Siri will put additional information and this bug is being retargeted to 3.0.1
Comment by Sriram Melkote [ 19/Aug/14 ]
Folks, for too long we've had trouble that get pinned to our NIFs. In 3.5, let's solve them whatever is the correct Erlang approach to running heavy high performance code. Port, or reporting reductions, or moving to R17 with dirty schedulers, or some other option I missed - whatever is the best solution, let us implement in 3.5 and be done.
Comment by Volker Mische [ 09/Sep/14 ]
I think we should close this issue and rather create a new one for whatever we come up with (e.g. the async mapreduce NIF).
Comment by Harsha Havanur [ 10/Sep/14 ]
Toy Build for this change at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-hhs-x86_64_3.0.0-702-toy.deb

Review in progress at
http://review.couchbase.org/#/c/41221/4
Comment by Harsha Havanur [ 12/Sep/14 ]
Please find udpated toy build for this
http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-hhs-x86_64_3.0.0-704-toy.deb
Comment by Sriram Melkote [ 12/Sep/14 ]
Another occurrence of this, MB-12183.

I'm making this a blocker.
Comment by Harsha Havanur [ 13/Sep/14 ]
Centos build at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent64-3.0.0-toy-hhs-x86_64_3.0.0-700-toy.rpm
Comment by Ketaki Gangal [ 16/Sep/14 ]
Filed bug MB-12200 for this toy-build
Comment by Ketaki Gangal [ 17/Sep/14 ]
Attaching stack from toy-build 701
File

crash_toy_701.rtf

Access to machine is as mentioned previously on MB-12200.
Comment by Harsha Havanur [ 19/Sep/14 ]
We are facing 2 issues with async nif implementation.
1) Loss of signals leading to deadlock in enqueue and dequeue in queues
I am suspecting enif mutex and condition variables. I could reproduce deadlock scenario on Centos which potentially point to both producer and consumer (enqueue and dequeue) in our case going to sleep due to not handling condition variable signals correctly.
To address this issue, I have replaced enif mutex and condition variables with that of C++ stl counterparts. This seem to fix the dead lock situation.

2) Memory getting freed by terminator task when the context is alive during mapDoc.
This is still work in progress and will update once I have a solution for this.
Comment by Harsha Havanur [ 21/Sep/14 ]
Segmentation fault is probably due to termination of erlang process calling map_doc. This triggers destructor which cleans up v8 context when the task is still in the queue. Will attempt a fix for this.
Comment by Harsha Havanur [ 22/Sep/14 ]
I have fixed both issues in this build
http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent64-3.0.0-toy-hhs-x86_64_3.0.0-709-toy.rpm
I am running systests as ketki suggested on VMs 10.6.2.164, 165, 168, 171, 172, 194, 195. Currently rebalance is in progress.

For the deadlock situation resolution was to broadcast condition signal to wake up all waiting threads instead of waking up only one of the threads.
For Segmentation fault resolution was to complete map task for the context before it is cleaned up by destructor when erlang process calling map task terminates or crashes.

Please use this build for further functional and performance verification. Thanks,




[MB-12236] update go-couchbase to retrieve node types / roles Created: 23/Sep/14  Updated: 23/Sep/14  Due: 23/Oct/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-alpha
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Gerald Sangudi Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Alk mentioned that NS server has some updated protocols / APIs / services that include node types / roles in the cluster map. When you get a chance, please update go-couchbase to use these. We will need the ability to retrieve query cluster topology for our JDBC clients.





[MB-4593] Windows Installer hangs on "Computing Space Requirements" Created: 27/Dec/11  Updated: 23/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.0-developer-preview-3, 2.0-developer-preview-4
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Bin Cui Assignee: Bin Cui
Resolution: Unresolved Votes: 3
Labels: windows, windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7 Ultimate 64. Sony Vaio, i3 with 4GB RAM and 200 GB of 500 GB free. Also on a Sony Vaio, Windows 7 Ultimate 64, i7, 6 GB RAM and a 750GB drive with about 600 GB free.

Attachments: PNG File couchbase-installer.png     PNG File image001.png     PNG File ss 2014-08-28 at 4.16.09 PM.png    
Triage: Triaged

 Description   
When installing the Community Server 2.0 DP3 on Windows, the installer hangs on the "Computing space requirements screen." There is no additional feedback from the installer. After 90-120 minutes or so, it does move forward and complete. The same issue was reported on Google Groups a few months back - http://groups.google.com/group/couchbase/browse_thread/thread/37dbba592a9c150b/f5e6d80880f7afc8?lnk=gst&q=msi.

Executable: couchbase-server-community_x86_64_2.0.0-dev-preview-3.setup.exe

WORKAROUND IN 3.0 - Create a registry key HKLM\SOFTWARE\Couchbase, name=SkipVcRuntime, type=DWORD, value=1 to skip installing VC redistributable installation which is causing this issue. If VC redistributable is necessary, it must be installed manually if the registry key is set to skip automatic install of it.


 Comments   
Comment by Filip Stas [ 23/Feb/12 ]
Is there any solution for this? I'm experiencing the same problem. Running the unpacked msi does not seem to work because the Installshield setup has been configured to require to install through the exe.

Comment by Farshid Ghods (Inactive) [ 22/Mar/12 ]
from Bin:

Looks like it is related to installshield engine. Maybe installshield tries to access system registry and it is locked by other process. The suggestion is to shut down other running programs and try again if such problem pops up.
Comment by Farshid Ghods (Inactive) [ 22/Mar/12 ]
we were unable to reproduce this on windows 2008 64-bit

the bug mentions this happened on windows 7 64-bit which is not a supported platform but that should not make any difference
Comment by Farshid Ghods (Inactive) [ 23/Mar/12 ]
From Bin:

Windows 7 is my dev environment. And I have no problem to install and test it. From your description, I cannot tell whether it is failed during the installation or after installation finishes but couchcbase server cannot start.
 
If it is due to installshield failure, you can generate the log file for debugging as:
setup.exe /debuglog"C:\PathToLog\setupexe.log"
 
If Couchbase server fails to start, the most possible reason is due to missing or incompatible Microsoft runtime library. You can manually service_start.bat under bin directory and check what is going on. And you can run cbbrowse_log.bat to generate log file for further debugging.
Comment by John Zablocki (Inactive) [ 23/Mar/12 ]
This is an installation only problem. There's not much more to it other than the installer hangs on the screen (see attachment).

However, after a failed install, I did get it to work by:

a) deleting C:\Program Files\Couchbase\*

b) deleting all registry keys with Couchbase Server left over from the failed install

c) rebooting

Next time I see this problem, I'll run it again with the /debuglog

I think the problem might be that a previous install of DP3 or DP4 (nightly build) failed and left some bits in place somewhere.
Comment by Steve Yen [ 05/Apr/12 ]
from Perry...
Comment by Thuan Nguyen [ 05/Apr/12 ]
I can not repo this bug. I test on Windows 7 Professional 64 bit and Windows Server 2008 64 bit.
Here are steps:
- Install couchbase server 2.0.0r-388 (dp3)
- Open web browser and go to initial setup in web console.
- Uninstall couchbase server 2.0.0r-388
- Install couchbase server 2.0.0dp4r-722
- Open web browser and go to initial setup in web console.
Install and uninstall couchbase server go smoothly without any problem.
Comment by Bin Cui [ 25/Apr/12 ]
Maybe we need to get the installer verbose log file to get some clues.

setup.exe /verbose"c:\temp\logfile.txt"
Comment by John Zablocki (Inactive) [ 06/Jul/12 ]
Not sure if this is useful or not, but without fail, every time I encounter this problem, simply shutting down apps (usually Chrome for some reason) causes the hanging to stop. Right after closing Chrome, the C++ redistributable dialog pops open and installation completes.
Comment by Matt Ingenthron [ 10/Jul/12 ]
Workarounds/troubleshooting for this issue:


On installshield's website, there are similar problems reported for installshield. There are several possible reasons behind it:

1. The installation of the Microsoft C++ redistributable is blocked by some other running program, sometimes Chrome.
2. There are some remote network drives that are mapped to local system. Installshield may not have enough network privileges to access them.
3. Couchbase server was installed on the machine before and it was not totally uninstalled and/or removed. Installshield tried to recover from those old images.

To determine where to go next, run setup with debugging mode enabled:
setup.exe /debuglog"C:\temp\setupexe.log"

The contents of the log will tell you where it's getting stuck.
Comment by Bin Cui [ 30/Jul/12 ]
Matt's explanation should be included in document and Q&A website. I reproduced the hanging problem during installation if Chrome browser is running.
Comment by Farshid Ghods (Inactive) [ 30/Jul/12 ]
so does that mean the installer should wait until chrome and other browsers are terminated before proceeding ?

i see this as a very common use case with many installers that they ask the user to stop those applications and if user does not follow the instructions the set up process does not continue until these conditions are met.
Comment by Dipti Borkar [ 31/Jul/12 ]
Is there no way to fix this? At the least we need to provide an error or guidance that chrome needs to be quit before continuing. Is chrome the only one we have seen causing this problem?
Comment by Steve Yen [ 13/Sep/12 ]
http://review.couchbase.org/#/c/20552/
Comment by Steve Yen [ 13/Sep/12 ]
See CBD-593
Comment by Øyvind Størkersen [ 17/Dec/12 ]
Same bug when installing 2.0.0 (build-1976) on Windows 7. Stopping Chrome did not help, but killing the process "Logitech ScrollApp" (KhalScroll.exe) did..
Comment by Joseph Lam [ 13/Sep/13 ]
It's happening to me when installing 2.1.1 on Windows 7. What is this step for and it is really necessary? I see that it happens after the files have been copied to the installation folder. No entirely sure what it's computing space requirements for.
Comment by MikeOliverAZ [ 16/Nov/13 ]
Same problem on 2.2.0x86_64. I have tried everything, closing down chrome and torch from Task Manager to ensure no other apps are competing. Tried removing registry entries but so many, my time please. As is noted above this doesn't seem to be preventing writing the files under Program Files so what's it doing? So I cannot install, it now complains it cannot upgrade and run the installer again.

BS....giving up and going to MongoDB....it installs no sueat.

Comment by Sriram Melkote [ 18/Nov/13 ]
Reopening. Testing on VMs is a problem because they are all clones. We miss many problems like these.
Comment by Sriram Melkote [ 18/Nov/13 ]
Please don't close this bug until we have clear understanding of:

(a) What is the Runtime Library that we're trying to install that conflicts with all these other apps
(b) Why we need it
(c) A prioritized task to someone to remove that dependency on 3.0 release requirements

Until we have these, please do not close the bug.

We should not do any fixes on the lines of checking for known apps that conflict etc, as that is treating the symptom and not fixing the cause.
Comment by Bin Cui [ 18/Nov/13 ]
We install window runtime library because erlang runtime libraries depend on it. Not any runtime library, but the one that comes with erlang distribution package. Without it or with incompatible versions, erl.exe won't run.

In stead of checking any particular applications, the current solution is:
Run a erlang test script. If it runs correctly, no runtime library installed. Otherwise, installer has to install the runtime library.

Please see CBD-593.

Comment by Sriram Melkote [ 18/Nov/13 ]
My suggestion is that let us not attempt to install MSVCRT ourselves.

Let us check the library we need is present or not prior to starting the install (via appropriate registry keys).

If it is absent, let us direct the user to download and install it and exit.
Comment by Bin Cui [ 18/Nov/13 ]
The approach is not totally right. Even if the msvcrt exists, we still need to install it. Here the key is the absolute same msvrt package that comes with erlang distribution. We had problems before that with the same version, but different build of msvcrt installed, erlang won't run.

One possible solution is to ask user to download the msvcrt library from our website and make it a prerequisite for installing couchbase server.
Comment by Sriram Melkote [ 18/Nov/13 ]
OK. It looks like MS distributes some versions of VC runtime with the OS itself. I doubt that Erlang needs anything newer.

So let us rebuild Erlang and have it link to the OS supplied version of MSVCRT (i.e., msvcr70.dll) in Couchbase 3.0 onwards

In the meanwhile, let us point the user to the vcredist we ship in Couchbase 2.x versions and ask them to install it from there.
Comment by Steve Yen [ 23/Dec/13 ]
Saw this in the email inboxes...

From: Tal V
Date: December 22, 2013 at 1:19:36 AM PST
Subject: Installing Couchbase on Windows 7

Hi CouchBase support,
I would like to get your assist on an issue I’m having. I have a windows 7 machine on which I tried to install Couchbase, the installation is stuck on the “Computing space requirements”.
I tried several things without success:

1. 1. I tried to download a new installation package.

2. 2. I deleted all records of the software from the Registry.

3. 3. I deleted the folder that was created under C:\Program Files\Couchbase

4. 4. I restart the computer.

5. 5. Opened only the installation package.

6. 6. Re-install it again.
And again it was stuck on the same step.
What is the solution for it?

Thank you very much,


--
Tal V
Comment by Steve Yen [ 23/Dec/13 ]
Hi Bin,
Not knowing much about installshield here, but one idea - are there ways of forcibly, perhaps optionally, skipping the computing space requirements step? Some environment variable flag, perhaps?
Thanks,
Steve

Comment by Bin Cui [ 23/Dec/13 ]
This "Computing space requirements" is quite misleading. It happens at the post install step while GUI still shows that message. Within the step, we run the erlang test script and fails and the installer runs "vcredist.exe" for microsoft runtime library which gets stuck.

For the time being, the most reliable way is not to run this vcredist.exe from installer. Instead, we should provide a link in our download web site.

1. During installation, if we fails to run the erlang test script, we can pop up a warning dialog and ask customers to download and run it after installation.
 
Comment by Bin Cui [ 23/Dec/13 ]
To work around the problem, we can instruct the customer to download the vcredist.exe and run it manually before set up couchbase server. If running environment is set up correctly, installer will bypass that step.
Comment by Bin Cui [ 30/Dec/13 ]
Use windows registry key to install/skip the vcredist.exe step:

On 32bit windows, Installer will check HKEY_LOCAL_MACHINE\SOFTWARE\Couchbase\SkipVcRuntime
On 64bit windows, Installer will check HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Couchbase\SkipVcRuntime,
where SkipVcRuntime is a DWORD (32-bit) value.

When SkipVcRuntime is set to 1, installer will skip the step to install vcredist.exe. Otherwise, installer will follow the same logic as before.
vcredist_x86.exe can be found in the root directory of couchbase server. It can be run as:
c:\<couchbase_root>\vcredist_x86.exe

http://review.couchbase.org/#/c/31501/
Comment by Bin Cui [ 02/Jan/14 ]
Check into branch 2.5 http://review.couchbase.org/#/c/31558/
Comment by Iryna Mironava [ 22/Jan/14 ]
tested with Win 7 and Win Server 2008
I am unable to reproduce this issue(build 2.0.0-1976, dp3 is no longer available)
Installed/uninstalled couchbase several times
Comment by Sriram Melkote [ 22/Jan/14 ]
Unfortunately, for this problem, if it did not reproduce, we can't say it is fixed. We have to find a machine where it reproduces and then verify a fix.

Anyway, no change made actually addresses the underlying problem (the registry key just gives a way to workaround it when it happens), so reopening the bug and targeting for 3.0
Comment by Sriram Melkote [ 23/Jan/14 ]
Bin - I just noticed that the Erlang installer itself (when downloaded from their website) installs VC redistributable in non-silent mode. The Microsoft runtime installer dialog pop us up, indicates it will install VC redistributable and then complete. Why do we run it in silent mode (and hence assume liability of it running properly)? Why do we not run the MSI in interactive mode like ESL Erlang installer itself does?
Comment by Wayne Siu [ 05/Feb/14 ]
If we could get the information on the exact software version, it could be helpful.
From registry, Computer\HKLM\Software\Microsoft\WindowsNT\CurrentVersion
Comment by Wayne Siu [ 12/Feb/14 ]
Bin, looks like the erl.ini was locked when this issue happened.
Comment by Pavel Paulau [ 19/Feb/14 ]
Just happened to me in 2.2.0-837.
Comment by Anil Kumar [ 18/Mar/14 ]
Triaged by Don and Anil as per Windows Developer plan.
Comment by Bin Cui [ 08/Apr/14 ]
http://review.couchbase.org/#/c/35463/
Comment by Chris Hillery [ 13/May/14 ]
I'm new here, but it seems to me that vcredist_x64.exe does exactly the same thing as the corresponding MS-provided merge module for MSVC2013. If that's true, we should be able to just include that merge module in our project, and not need to fork out to install things. In fact, as of a few weeks ago, the 3.0 server installers are doing just that.

http://msdn.microsoft.com/en-us/library/dn501987.aspx

Is my understanding incomplete in some way?
Comment by Chris Hillery [ 14/May/14 ]
I can confirm that the most recent installers do install msvcr120.dll and msvcp120.dll in apparently the correct places, and the server can start with them. I *believe* this means that we no longer need to fork out vcredist_x64.exe, or have any of the InstallShield tricks to detect whether it is needed and/or skip installing it, etc. I'm leaving this bug open to both verify that the current merge module-based solution works, and to track removal of the unwanted code.
Comment by Sriram Melkote [ 16/May/14 ]
I've also verified that 3.0 build installed VCRT (msvcp100) is sufficient for Erlang R16.
Comment by Bin Cui [ 15/Sep/14 ]
Recently I happen to reproduce this problem on my own laptop. Use setup.exe /verbose"c:\temp\verbose.log", i generated a log file with more verbose debugging information. At the end the file, it looks something like :

MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: OVERVIEW.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_admin\overview\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: BUCKETS.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_admin\buckets\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: MN_DIALOGS.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_dialogs\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: ABOUT.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_dialogs\about\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: ALLUSERSPROFILE , Object: Q:\
MSI (c) (C4:C0) [10:51:36:274]: PROPERTY CHANGE: Adding INSTALLLEVEL property. Its value is '1'.

It means that installer tried to populate some property values for alluser profile after it copied all data to install location even though it shows this notorious "Computing space requirements" message.

From every installation, installer will use user temp directory to populate installer related data. After I delete or rename temp data under
c:\Users\<logonuser>\AppData\Temp, I reboot the machine. I solve the problem. at least for my laptop.

Conclusion:

1. After installed copied files, it needs to set alluser profiles. This action is synchronous and it waits and checks exit code. And certainly it will hangs on if this action never returns.

2. This is an issue related to setup environment, i.e. caused by other running applications, etc.

Suggestion:

1. Stop any other browers and applications when you install couchbase.
2. Kill the installation process and uninstall the failed setup.
3. Delete/rename the temp location under c:\Users\<logonuser>\AppData\Temp
4. Reboot and try again.

Comment by Bin Cui [ 17/Sep/14 ]
Turns out, it is really about the installation environment, not about a particular installation step.

Suggest to document the work around method.
Comment by Don Pinto [ 17/Sep/14 ]
Bin, some installers kill conflicting processes before installation starts so that it can complete. Why can't we do this?

(Maybe using something like this - http://stackoverflow.com/questions/251218/how-to-stop-a-running-process-during-an-msi-based-un-install)

Thanks,
Don
Comment by Don Pinto [ 23/Sep/14 ]
Triaged by PM and QE -
Present a dialog that says - Do you want to stop <Dependant process> or Continue.

If user hits stop, installer should kill the dependant process.
If user hits continue, should retry and if dependant process is still open, continue showing the dialog

Thanks,




[MB-12043] cbq crash after trying to delete a key Created: 21/Aug/14  Updated: 25/Sep/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
cbq> delete from my_bucket KEYS ['query-testa7480c4-0'];
PANIC: Expected plan.Operator instead of <nil>..

 Comments   
Comment by Gerald Sangudi [ 25/Sep/14 ]
Not yet implemented. On the way.




[MB-12256] No 32bit support for 3.x on RedHat CentOS Ubuntu Linux Created: 25/Sep/14  Updated: 25/Sep/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Ian McCloy Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The documentation pages for 3.x are incorrect.

http://docs.couchbase.com/prebuilt/couchbase-manual-3.0/Install/RHEL-install-intro.html
http://docs.couchbase.com/prebuilt/couchbase-manual-3.0/Install/Ubuntu-install.html

Both pages list 32bit and 64bit as supported, PM's documents state that only 64bit will be supported from 3.x onwards and indeed the beta is only available as 64bit.

http://www.couchbase.com/download#beta

 Comments   
Comment by Anil Kumar [ 25/Sep/14 ]
Ruth this is "blocker" needs to be fixed for 3.0GA.
Comment by Ruth Harris [ 25/Sep/14 ]
Fixed. in 3 sections: supported platforms, RHEL/CentOS, and Ubuntu

BTW,Ceej indicated that Debian 7 is supported. Te PM wiki page says 2-3 months after 3.0. Please clarify..... is Debian 7 64-bit in 3.0???
Comment by Anil Kumar [ 25/Sep/14 ]
I have fixed the wiki http://hub.internal.couchbase.com/confluence/display/PM/Couchbase+Server+-+Supported+Platforms. Debian 7 is supported in 3.0.




[MB-12226] Rebalance operation hanged during online upgrade Created: 23/Sep/14  Updated: 26/Sep/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sangharsh Agarwal Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Online Upgrade from 2.0.1-170 to 3.0.1309.

CentOS 5 64 bit.

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: 10.3.121.199 : https://s3.amazonaws.com/bugdb/jira/MB-12226/2fde8bb1/10.3.121.199-8091-diag.txt.gz
10.3.121.199 : https://s3.amazonaws.com/bugdb/jira/MB-12226/80625d16/10.3.121.199-9222014-518-diag.zip
10.3.3.126 : https://s3.amazonaws.com/bugdb/jira/MB-12226/ce3a5cf4/10.3.3.126-8091-diag.txt.gz
10.3.3.126 : https://s3.amazonaws.com/bugdb/jira/MB-12226/d6a72dec/10.3.3.126-9222014-58-diag.zip
10.3.5.11 : https://s3.amazonaws.com/bugdb/jira/MB-12226/7deec852/10.3.5.11-8091-diag.txt.gz
10.3.5.11 : https://s3.amazonaws.com/bugdb/jira/MB-12226/f1015de1/10.3.5.11-9222014-511-diag.zip
10.3.5.60 : https://s3.amazonaws.com/bugdb/jira/MB-12226/2b6552e4/10.3.5.60-9222014-516-diag.zip
10.3.5.60 : https://s3.amazonaws.com/bugdb/jira/MB-12226/33fcedfb/10.3.5.60-8091-diag.txt.gz
10.3.5.61 : https://s3.amazonaws.com/bugdb/jira/MB-12226/2cfdfd22/10.3.5.61-9222014-514-diag.zip
10.3.5.61 : https://s3.amazonaws.com/bugdb/jira/MB-12226/6bdd00ae/10.3.5.61-8091-diag.txt.gz
Is this a Regression?: Yes

 Description   
[Live Cluster]
http://10.3.121.199:8091/index.html#sec=servers

Re-balance progress is still 66.6 since yesterday itself.

[Test Logs]
https://friendpaste.com/1mF2hXiYmVjpqoTHIGMd3F

[Error Logs]
014-09-22 04:56:44,289 - root - INFO - adding remote node @10.3.3.126:8091 to this cluster @10.3.121.199:8091
2014-09-22 04:56:47,986 - root - INFO - adding node 10.3.5.11:8091 to cluster
2014-09-22 04:56:47,986 - root - INFO - adding remote node @10.3.5.11:8091 to this cluster @10.3.121.199:8091
2014-09-22 04:56:51,696 - root - INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.121.199%2Cns_1%4010.3.3.126%2Cns_1%4010.3.5.11
2014-09-22 04:56:51,706 - root - INFO - rebalance operation started
2014-09-22 04:56:51,717 - root - INFO - rebalance percentage : 0.00 %
2014-09-22 04:57:01,735 - root - INFO - rebalance percentage : 6.51 %
2014-09-22 04:57:11,753 - root - INFO - rebalance percentage : 15.47 %
2014-09-22 04:57:21,770 - root - INFO - rebalance percentage : 24.84 %
2014-09-22 04:57:31,796 - root - INFO - rebalance percentage : 31.64 %
2014-09-22 04:57:41,814 - root - INFO - rebalance percentage : 38.55 %
2014-09-22 04:57:51,837 - root - INFO - rebalance percentage : 47.71 %
2014-09-22 04:58:01,859 - root - INFO - rebalance percentage : 57.72 %
2014-09-22 04:58:11,877 - root - INFO - rebalance percentage : 64.93 %
2014-09-22 04:58:21,894 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:58:31,913 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:58:41,931 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:58:51,949 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:01,966 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:11,983 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:22,000 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:32,017 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:42,034 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 04:59:52,052 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:02,069 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:12,086 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:22,103 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:32,120 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:42,137 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:00:52,154 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:02,172 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:12,189 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:22,206 - root - INFO - rebalance percentage : 66.65 %
2014-09-22 05:01:32,223 - root - INFO - rebalance percentage : 66.65 %

[Test Steps]
1. Setup XDCR cluster 2-2 nodes. CAPI Mode. Node version: 2.0.1-170-rel.

Source: 10.3.3.126, 10.3.5.11
Destination: 10.3.5.60, 10.3.5.61

2. Do online upgrade on Source:
    a. add new node (10.3.121.199) with 3.0.1-1309-rel. check if new orchrestor node is new node.
    b. remove both old nodes from the cluster.
    c. Re-install both old nodes with 3.0.1-1309-rel.
    d. re-add both the nodes to the cluster. Failed here. Re-balance hangs here itself.


Cluster is liver for investigation. Issue is always reproducible on Centos cluster with build 3.0.1-1309-rel

 Comments   
Comment by Sangharsh Agarwal [ 23/Sep/14 ]
http://10.3.121.199:8091/index.html#sec=servers
Comment by Aleksey Kondratenko [ 23/Sep/14 ]
This wait for seqno persistence is seemingly stuck:

 338538:[rebalance:debug,2014-09-22T4:58:12.307,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:handle_call:812]Going to wait for persistence of seqno 4 in vbucket 384
 343699:[rebalance:debug,2014-09-22T4:58:23.308,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343702:[rebalance:debug,2014-09-22T4:58:44.309,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343729:[rebalance:debug,2014-09-22T4:59:15.310,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343756:[rebalance:debug,2014-09-22T4:59:46.311,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343783:[rebalance:debug,2014-09-22T5:00:17.312,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343810:[rebalance:debug,2014-09-22T5:00:48.313,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343837:[rebalance:debug,2014-09-22T5:01:19.314,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343890:[rebalance:debug,2014-09-22T5:01:50.314,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343919:[rebalance:debug,2014-09-22T5:02:21.316,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343944:[rebalance:debug,2014-09-22T5:02:52.317,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343971:[rebalance:debug,2014-09-22T5:03:23.318,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 343998:[rebalance:debug,2014-09-22T5:03:54.319,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344025:[rebalance:debug,2014-09-22T5:04:25.320,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344052:[rebalance:debug,2014-09-22T5:04:56.321,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344079:[rebalance:debug,2014-09-22T5:05:27.322,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344104:[rebalance:debug,2014-09-22T5:05:58.323,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344131:[rebalance:debug,2014-09-22T5:06:29.324,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344160:[rebalance:debug,2014-09-22T5:07:00.325,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 344185:[rebalance:debug,2014-09-22T5:07:31.326,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398713:[rebalance:debug,2014-09-22T5:08:02.327,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398738:[rebalance:debug,2014-09-22T5:08:33.328,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398765:[rebalance:debug,2014-09-22T5:09:04.329,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398792:[rebalance:debug,2014-09-22T5:09:35.330,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398819:[rebalance:debug,2014-09-22T5:10:06.331,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398844:[rebalance:debug,2014-09-22T5:10:37.332,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398871:[rebalance:debug,2014-09-22T5:11:08.333,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398898:[rebalance:debug,2014-09-22T5:11:39.334,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398927:[rebalance:debug,2014-09-22T5:12:10.336,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398952:[rebalance:debug,2014-09-22T5:12:41.337,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again
 398979:[rebalance:debug,2014-09-22T5:13:12.338,ns_1@10.3.5.11:<0.23979.0>:janitor_agent:do_wait_seqno_persisted:1119]Got etmpfail waiting for seq no persistence. Will try again




[MB-11589] Sliding endseqno during initial index build or upr reading from disk snapshot results in longer stale=false query latency and index startup time Created: 28/Jun/14  Updated: 26/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sarath Lakshman Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-11920 DCP based rebalance with views doesn'... Closed
Relates to
relates to MB-11919 3-5x increase in index size during re... Open
relates to MB-12125 rebalance swap regression of 39.3% c... Open
relates to MB-12081 Remove counting mutations introduced ... Resolved
relates to MB-11918 Latency of stale=update_after queries... Closed
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
We have to fix this depending on the development cycles we have left for 3.0

 Comments   
Comment by Anil Kumar [ 17/Jul/14 ]
Triage - July 17

Currently investigating we will decide depending on the scope of changes needed.
Comment by Anil Kumar [ 30/Jul/14 ]
Triage : Anil, Wayne .. July 29th

Raising this issue to "Critical" this needs to be fixed by RC.
Comment by Sriram Melkote [ 31/Jul/14 ]
The issue is that we'll have to change the view dcp client to stream all 1024 vbuckets in parallel, or we'll need an enhancement in ep-engine to stop streaming at the point requested. Neither is a simple change - the reason it's in 3.0 is because Dipti had requested we try to optimize query performance. I'll leave it at Major as I don't want to commit to fixing this in RC and also, the product works with reasonable performance without this fix and so it's not a must have for RC.
Comment by Sriram Melkote [ 31/Jul/14 ]
Mike noted that even streaming all vbuckets in parallel (which was perhaps possible to do in 3.0) won't directly solve the issue as the backfills are scheduled one at a time. ep-engine could hold onto smaller snapshots but that's not something we can consider in 3.0 - so net effect is that we'll have to revisit this in 3.0.1 to design a proper solution.
Comment by Sriram Melkote [ 12/Aug/14 ]
Bringing back to 3.0 as this is the root cause of MB-11920 and MB-11918
Comment by Anil Kumar [ 13/Aug/14 ]
Deferring this to 3.0.1 since making this out of scope for 3.0.
Comment by Sarath Lakshman [ 05/Sep/14 ]
We need to file an EP-Engine dependency ticket to implement parallel streaming support without causing sliding endseq during ondisk snapshot backfill.




[MB-11623] test for performance regressions with JSON detection Created: 02/Jul/14  Updated: 29/Sep/14

Status: In Progress
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Blocker
Reporter: Matt Ingenthron Assignee: Thomas Anderson
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: 0h
Time Spent: 120h
Original Estimate: Not Specified

Attachments: File JSONDoctPerfTest140728.rtf     File JSONPerfTestV3.uos    
Issue Links:
Relates to
relates to MB-11675 20-30% performance degradation on app... Closed

 Description   
Related to one of the changes in 3.0, we need to test what has been implemented to see if a performance regression or unexpected resource utilization has been introduced.

In 2.x, all JSON detection was handled at the time of persistence. Since persistence was done in batch and in background, with the then current document, it would limit the resource utilization of any JSON detection.

Starting in 3.x, with the datatype/HELLO changes introduced (and currently disabled), the JSON detection has moved to both memcached and ep-engine, depending on the type of mutation.

Just to paint the reason this is a concern, here's a possible scenario.

Imagine a cluster node that is happily accepting 100,000 sets/s for a given small JSON document, and it accounts for about 20mbit of the network (small enough to not notice). That node has a fast SSD at about 8k IOPS. That means that we'd only be doing JSON detection some 5000 times per second with Couchbase Server 2.x

With the changes already integrated, that JSON detection may be tried over 100k times/s. That's a 20x increase. The detection needs to occur somewhere other than on the persistence path, as the contract between DCP and view engine is such that the JSON detection needs to occur before DCP transfer.

This request is to test/assess if there is a performance change and/or any unexpected resource utilization when having fast mutating JSON documents.

I'll leave it to the team to decide what the right test is, but here's what I might suggest.

With a view defined create a test that has a small to moderate load at steady state and one fast-changing item. Test it with a set of sizes and different complexity. For instance, permutations that might be something like this:
non-JSON of 1k, 8k, 32k, 128k
simple JSON of 1k, 8k, 32k, 128k
complex JSON of 1k, 8k, 32k, 128k
metrics to gather:
throughput, CPU utilization by process, RSS by process, memory allocation requests by process (or minor faults or something)

Hopefully we won't see anything to be concerned with, but it is possible.

There are options to move JSON detection to somewhere later in processing (i.e., before DCP transfer) or other optimization thoughts if there is an issue.

 Comments   
Comment by Cihan Biyikoglu [ 07/Jul/14 ]
this is no longer needed for 3.0 is that right? ready to postpone to 3.0.1?
Comment by Pavel Paulau [ 07/Jul/14 ]
HELLO-based negotiation was disabled but detection still happens in ep-engine.
We need to understand impact before 3.0 release. Sooner than later.
Comment by Matt Ingenthron [ 23/Jul/14 ]
I'm curious Thomas, when you say "increase in bytes appended", do you mean for the same workload the RSS is larger in the 'increase' case? Great to see you making progress.
Comment by Wayne Siu [ 24/Jul/14 ]
Pasted comment from Thomas:
Subject: Re: Couchbase Issues: (MB-11623) test for performance regressions with JSON detection
Yes, ~20% increase from 2.5.1 to 3.0 for same load generator. as reported by the cb server for same input load. I’m verifying and ‘isolating’ . Will also be looking at if/how this contributes to replication load increase (20% on 20% increase …)
The issues seem related. Same increase for 1K, 8K, 16K and 32K with some variance.
—thomas
Comment by Thomas Anderson [ 29/Jul/14 ]
initial results using JSON document load test.
Comment by Matt Ingenthron [ 29/Jul/14 ]
Tom: saw your notes in the work log, out of curiosity, what was deferred to 3.0.1? Also, from the comment above, 20% increase in what?
Comment by Anil Kumar [ 13/Aug/14 ]
Thomas - As discussed please update the ticket with % or regression it has caused with JSON detection now in memcached. I will open separate ticket to document it.
Comment by Thomas Anderson [ 19/Aug/14 ]
a comparison of non-JSON to JSON in 2.5.1 and 3.0.0.1105 showed statistically similar performance, i.e., the minimal overhead of handling JSON document over similar KV document stayed consistent from 2.5.1 to 3.0.0 pre-RC1. see attached file JSONPerfTestV3.uos. to be re-run with official RC1 candidate. feature to load complex JSON documents now modified to 4 levels of JSON complexity (for each document size in bytes) {simpleJSON:: 1 element-attribute value pair; smallJSON:: 10 elements - no array, no nesting; mediumJSON:: 100 elements - arrays & nesting; largeJSON:: 10000 elements mix of element types}.

note, the original seed to this issue was a detected performance issue with JSON documents, ~20-30%. the code/architectural change which caused this was deferred to 3.0.1. additional modifications to server to address simple append mode performance degradation, further lessened issue of whether the document type was the cause of performance degradation. the tests did however show the positive change in compaction, i.e., 3.x compacts documents ~ 5-7% over 2.5.1

 
Comment by Thomas Anderson [ 19/Aug/14 ]
re-run with build 1105. regression comparing same document size, same document load for non-JSON to simple-JSON.
2.5.1:: 1024 byte document, 10 loaders, 1.25M documents for nonJSON to JSON showed a < 4% performance degredation; 3.0:: shows a < 3% degredation. many other factors seem to dominate
Comment by Matt Ingenthron [ 19/Aug/14 ]
Just for the comments here, the original seed wasn't an observed performance regression but rather an architectural concern that there could be a space/CPU/throughput cost for the new JSON detection. That's why I opened it.




[MB-12259] data loss if reboot right after online upgrade 2.0.1->3.0.1 Created: 25/Sep/14  Updated: 30/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1326-rel


 Description   
test to reproduce:
-t newupgradetests.MultiNodesUpgradeTests.online_upgrade_rebalance_in_out,initial_version=2.0.1-170-rel,initial_build_type=community,reboot_cluster=true,upgrade_version=3.0.1-1326-rel,initial_vbuckets=512

1)initial vbuckets number= 512, 2 nodes with 2.0.1, 1 default bucket, 1000 items
2)added two nodes with 3.0.1, after that removed 2.0.1 nodes
3)reboot both nodes, wait for warmup
4)2 items are lost: upgrade45 and upgrade927 keys, 261 vbucket
vbucket is empty
[root@kiwi-r112 default]# /opt/couchbase/bin/couch_dbinfo 261.couch.1
DB Info (261.couch.1) - header at 12288
   file format version: 11
   update_seq: 0
   no documents
   B-tree size: 0 bytes
   total disk size: 12.0 kB
[root@kiwi-r112 default]#

will attach cbcollects(.10, .12 -are 2.0.1 nodes, .11, .13 - 3.0.1)

https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/172.27.33.10-9252014-1827-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/172.27.33.11-9252014-1829-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/172.27.33.12-9252014-1828-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/172.27.33.13-9252014-1830-diag.zip

 Comments   
Comment by Aleksey Kondratenko [ 26/Sep/14 ]
Upgrading to blocker.

Do you know if items "survived" rebalance ? I.e. we need to know if they got lost during rebalance or during restart.

In logs I see successful and completely normal shutdown followed by startup. So ep-engine was certainly supposed to preserve all items.
Comment by Aleksey Kondratenko [ 26/Sep/14 ]
Notably rebalance was TAP while indexes used DCP to consume docs on 3.x nodes. Could be relevant.
Comment by Aleksey Kondratenko [ 26/Sep/14 ]
I need you to reproduce the test while checking for items _after every step_.

And please consider doing it all the time.
Comment by Iryna Mironava [ 28/Sep/14 ]
stats after each step:
1) stats before upgrade - 1000 items, 1000 items in replica
172.27.33.12
 curr_items: 500
 curr_items_tot: 1000
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 5000
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 0
 ep_total_del_items: 0
 ep_total_new_items: 1000
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 500
 vb_active_num_ref_items: 500
 vb_pending_curr_items: 0
 vb_pending_num_ref_items: 0
 vb_replica_curr_items: 500
 vb_replica_num_ref_items: 500

172.27.33.10
 curr_items: 500
 curr_items_tot: 1000
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 5000
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 0
 ep_total_del_items: 0
 ep_total_new_items: 1000
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 500
 vb_active_num_ref_items: 500
 vb_pending_curr_items: 0
 vb_pending_num_ref_items: 0
 vb_replica_curr_items: 500
 vb_replica_num_ref_items: 500

2) after rebalance in 3.0.1 nodes - 1000 items, 1000 of replica
 
172.27.33.12
 curr_items: 267
 curr_items_tot: 507
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 5000
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 44
 ep_total_del_items: 0
 ep_total_new_items: 1000
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 267
 vb_active_num_ref_items: 267
 vb_pending_curr_items: 0
 vb_pending_num_ref_items: 0
 vb_replica_curr_items: 240
 vb_replica_num_ref_items: 240

172.27.33.13
 curr_items: 232
 curr_items_tot: 486
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 500
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 481
 ep_total_del_items: 0
 ep_total_new_items: 486
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 232
 vb_pending_curr_items: 0
 vb_replica_curr_items: 254

172.27.33.10
 curr_items: 269
 curr_items_tot: 515
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 5000
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 44
 ep_total_del_items: 0
 ep_total_new_items: 1000
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 269
 vb_active_num_ref_items: 269
 vb_pending_curr_items: 0
 vb_pending_num_ref_items: 0
 vb_replica_curr_items: 246
 vb_replica_num_ref_items: 246

172.27.33.11
 curr_items: 232
 curr_items_tot: 492
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 500
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 464
 ep_total_del_items: 0
 ep_total_new_items: 492
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 232
 vb_pending_curr_items: 0
 vb_replica_curr_items: 260

3) rebalance out 2.0.1 - 1000 items, but 996 of replica, still can see all items


172.27.33.13
 curr_items: 500
 curr_items_tot: 996
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 500
 ep_diskqueue_items: 3
 ep_items_rm_from_checkpoints: 1007
 ep_total_del_items: 0
 ep_total_new_items: 993
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 500
 vb_pending_curr_items: 0
 vb_replica_curr_items: 496

172.27.33.11
 curr_items: 500
 curr_items_tot: 1000
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 500
 ep_diskqueue_items: 6
 ep_items_rm_from_checkpoints: 990
 ep_total_del_items: 0
 ep_total_new_items: 994
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 500
 vb_pending_curr_items: 0
 vb_replica_curr_items: 500

4) after warm up: 996 items and 996 of replica

172.27.33.13
 curr_items: 500
 curr_items_tot: 996
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 500
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 0
 ep_total_del_items: 0
 ep_total_new_items: 0
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 500
 vb_pending_curr_items: 0
 vb_replica_curr_items: 496

172.27.33.11
 curr_items: 496
 curr_items_tot: 996
 curr_temp_items: 0
 ep_access_scanner_num_items: 0
 ep_chk_max_items: 500
 ep_diskqueue_items: 0
 ep_items_rm_from_checkpoints: 0
 ep_total_del_items: 0
 ep_total_new_items: 0
 ep_uncommitted_items: 0
 ep_warmup_min_items_threshold: 100
 vb_active_curr_items: 496
 vb_pending_curr_items: 0
 vb_replica_curr_items: 500

new logs :
https://s3.amazonaws.com/bugdb/jira/MB-12259/dc4r5e89/172.27.33.10-9282014-174-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dc4r5e89/172.27.33.11-9282014-175-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dc4r5e89/172.27.33.12-9282014-174-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dc4r5e89/172.27.33.13-9282014-176-diag.zip
Comment by Aleksey Kondratenko [ 29/Sep/14 ]
Stats is not helpful enough way. I need to know if items actually exist. You could for example backup .couch files. In addition to just doing plain GETs to see if everything is present or not.
Comment by Aleksey Kondratenko [ 29/Sep/14 ]
Given that presented evidence (however poor) suggests that data was lost after rebalance I'm going to pass it to ep-engine team.
Comment by Iryna Mironava [ 30/Sep/14 ]
vbuckets after rebalance out 2.5 nodes(.11 and .13):
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/step3_11.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/step3_13.zip
vbuckets after dcp rebalance
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/step4_11.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/step4_13.zip
vbuckets after warmup
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/step5_11.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dca3be89/step5_13.zip

vbucket 292 had 2 items after rebalance out.

new cbcollects:
https://s3.amazonaws.com/bugdb/jira/MB-12259/dcft6e89/172.27.33.10-9302014-1928-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dcft6e89/172.27.33.11-9302014-1929-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dcft6e89/172.27.33.12-9302014-1928-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12259/dcft6e89/172.27.33.13-9302014-1930-diag.zip




[MB-12213] Get the couchbase-server_src.tar.gz for 3.0.0 Created: 18/Sep/14  Updated: 30/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Blocker
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-12279] 86% regression in 95th percentile GET latency on Windows 2012 Created: 30/Sep/14  Updated: 01/Oct/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Venu Uppalapati Assignee: Venu Uppalapati
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
86% regression in 95th percentile GET latency on Windows 2012

 Comments   
Comment by Venu Uppalapati [ 30/Sep/14 ]
will assign owner after attaching logs.




[MB-11060] Build and test 3.0 for 32-bit Windows Created: 06/May/14  Updated: 01/Oct/14  Due: 09/Jun/14

Status: Open
Project: Couchbase Server
Component/s: build, ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Blocker
Reporter: Chris Hillery Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7/8 32-bit

Issue Links:
Dependency
Duplicate

 Description   
For the "Developer Edition" of Couchbase Server 3.0 on Windows 32-bit, we need to first ensure that we can build 32-bit-compatible binaries. It is not possible to build 3.0 on a 32-bit machine due to the MSVC 2013 requirement. Hence we need to configure MSVC as well as Erlang on a 64-bit machine to produce 32-bit compatible binaries.

 Comments   
Comment by Chris Hillery [ 06/May/14 ]
This is assigned to Trond who is already experimenting with this. He should:

 * test being able to start the server on a 32-bit Windows 7/8 VM

 * make whatever changes are necessary to the CMake configuration or other build scripts to produce this build on a 64-bit VM

 * thoroughly document the requirements for the build team to reproduce this build

Then he can assign this bug to Chris to carry out configuring our build jobs accordingly.
Comment by Trond Norbye [ 16/Jun/14 ]
Can you give me a 32 bit windows installation I can test on. My MSDN license have expired and I don't have Windows media available (and the internal wiki page just have a limited set of licenses and no download links)

Then assign it back to me and I'll try it
Comment by Chris Hillery [ 16/Jun/14 ]
I think you can use 172.23.106.184 - it's a 32-bit Windows 2008 VM that we can't use for 3.0 builds anyway.
Comment by Trond Norbye [ 24/Jun/14 ]
I copied the full result of a build where I set target_platform=x86 on my 64 bit windows server (the "install" directory) over to a 32 bit windows machine and was able to start memcached and it worked as expected.

Our installers make other magic like install the service etc needed in order to start the full server. Once we have such an installer I can do further testing
Comment by Chris Hillery [ 24/Jun/14 ]
Bin - could you take a look at this (figuring out how to make InstallShield on a 64-bit machine create a 32-bit compatible installer)? I won't likely be able to get to it for at least a month, and I think you're the only person here who still has access to an InstallShield 2010 designer anyway.
Comment by Bin Cui [ 04/Sep/14 ]
PM should make the call that whether or not we want to have 32bit support for windows.
Comment by Anil Kumar [ 05/Sep/14 ]
Bin - As confirmed back in March-April supported platforms for Couchbase Server 3.0 - we decided to continue to build 32bit Windows for Development-Only support. As mentioned in our documentation deprecation page http://docs.couchbase.com/couchbase-manual-2.5/deprecated/#platforms.

Comment by Bin Cui [ 17/Sep/14 ]
1. create a 64bit builder with 32bit target.
2. Create a 32bit builder.
3. Transfer 64bit staging image to 32bit builder
4. Run the packaging steps and generate the final package out of 32bit builder.
Comment by Chris Hillery [ 18/Sep/14 ]
Bin - when we discussed this a few weeks ago I had thought you were going to be driving forward on the details of implementing this. There are a few steps here that we don't know how to do. I will work with Trond to figure out how to enable step #1, as it sounds like he has accomplished most of that locally. Steps 2 and 3 I think we (build team) can figure out.

Step 4, though, is what I was referring to in my comment on 24/Jun/14. You are the only person in the company, so far as I know, who has both understanding and access to InstallShield. I feel sure that this is going to require making changes to our project (or, worse, creating a new project) to create a 32-bit installer. I need you to figure out how to do that, or this task will not be completed.

Assigning this back to Bin for now, although I will work on figuring out how to enable steps 1-3 of his proposed workflow.
Comment by Wayne Siu [ 19/Sep/14 ]
Chris,
Can you give us an update on steps 1/2/3 by Monday (09.22)? Thanks.
Comment by Chris Hillery [ 22/Sep/14 ]
I have updated the Windows build script to accept an explicit architecture (x86 / amd64) and am currently re-packaging Trond's depot of third-party x86 dependencies for my cbdeps mechanism. If the build does not then succeed, I will try to work with Trond tonight on debugging it.

In the meantime, I could use some help with step 2 - either we need to just use the existing 2.5.1 x86 build slave, or else we'll want to clone it to a new VM that can still run InstallShield 2010. At that point we should be able to get Bin involved.
Comment by Chris Hillery [ 23/Sep/14 ]
I've repackaged all the dependencies and got a complete build. I will attach the output of the install/ directory to this bug so hopefully someone (Bin or Phil) can work on modifying the server-win.rb script and InstallShield files to turn this into a final installer.

I am currently working on splitting the server-windows-build.bat script and corresponding Jenkins jobs into two, so they can be run on separate machines (along with Jenkins magic to transfer the install/ directory from one to the other).
Comment by Chris Hillery [ 23/Sep/14 ]
https://copy.com/VkfqKM4GyKhx for the output of the current build.
Comment by Chris Hillery [ 23/Sep/14 ]
As I see it, there are three things that need to be done before we can hand a build to QE. At this point, I *believe* all three things can be done in parallel.

A. Configure Jenkins jobs and scripts in the build/ repo to automate the build and packaging steps. I am working on this; at the moment I do not need anything from anybody to continue working on this.

B. Verify that the 32-bit build above actually functions on a 32-bit VM, and start to identify issues (ie, start the server, run testrunner..). I would suggest Bin work on this on a new VM which Wayne is creating now.

C. Modify voltron (server-win.rb and all the InstallShield configuration files) so that, given an install directory like the one in the .zip file above, it will create a full 32-bit installer .exe. Phil has created a Jenkins slave VM and he can work on this step there. For this step, assume that server-win.rb will be invoked exactly as it is today, which is: from the voltron directory - ruby server-win.rb C:\path\to\install 5.10.4 couchbase_server 3.0.1-1290 community x86

Comment by Chris Hillery [ 24/Sep/14 ]
Nightly update: I've finished a tentative arrangement of Jenkins jobs to achieve (A) for both 64-bit and 32-bit master builds. It isn't tested yet so I will probably need to do some fine-tuning tomorrow. Once it's fully working I will need to make a minor tweak to the Buildbot configuration to trigger these jobs.
Comment by Chris Hillery [ 25/Sep/14 ]
Nightly update: great success! After considerable tweaking, my split build/package jobs are working for master builds on amd64. More excitingly, with only one minor change to voltron (and a ton of configuration changes on the x86 build slave), it appears to be working on x86 as well. I guess we inherited the InstallShield configuration from the 2.x days where we were building x86, and not much has changed.

voltron change: http://review.couchbase.org/#/c/41636/

Sample master amd64 build: http://latestbuilds.hq.couchbase.com/couchbase_server-enterprise-windows-amd64-0.0.0-0001.exe

Sample master x86 build: http://latestbuilds.hq.couchbase.com/couchbase_server-enterprise-windows-x86-0.0.0-0001.exe

I am going to do a quick test of the x86 installer on the 32-bit VM - frankly I do not expect it to work out of the box, but fingers crossed... I also am going to make the Buildbot change I mentioned.
Comment by Chris Hillery [ 25/Sep/14 ]
The x86 installer does successfully install on the x86 VM, but as expected, the server doesn't start. I have no insight into debugging this, so I'm leaving it assigned to Bin.
Comment by Wayne Siu [ 30/Sep/14 ]
The 32-bit 3.0.1 windows is available here (which was done manually).

http://latestbuilds.hq.couchbase.com/couchbase_server-enterprise-windows-x86-0.0.0-0001.exe

Automated builds will start coming out tomorrow (which Ceej will update the ticket when ready).
Comment by Chris Hillery [ 01/Oct/14 ]
As of about 3:00am we're producing amd64 and x86 builds with the same build numbers as Linux. The latestbuilds 3.0.1 index page only has links for amd64, I will correct that. You can get an x86 build for now by cut-and-pasting the amd64 build and changing "amd64" to "x86", eg:

http://latestbuilds.hq.couchbase.com/couchbase_server-community-windows-x86-3.0.1-1358.exe

(there are only community x86 builds).
Comment by Chris Hillery [ 01/Oct/14 ]
Actually, I lied - the 3.0.1 index page DOES have the x86 links. It's just that since there are only community builds, they're sorted after all the enterprise builds.




[MB-11048] Range queries result in thousands of GET operations/sec Created: 05/May/14  Updated: 18/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
Benchmark for range queries demonstrated very high latency. At the same time I noticed extremely high rate of GET operations.

Even single query such as "SELECT name.f.f.f AS _name FROM bucket-1 WHERE coins.f > 224.210000 AND coins.f < 448.420000 LIMIT 20" led to hundreds of memcached reads.

Explain:

https://gist.github.com/pavel-paulau/5e90939d6ab28034e3ed

Engine output:

https://gist.github.com/pavel-paulau/b222716934dfa3cb598e

I don't like to use JIRA as forum but why does it happen? Do you fetch entire range before returning limited output?

 Comments   
Comment by Gerald Sangudi [ 05/May/14 ]
Pavel,

Yes, the scan and fetch are performed before we do any LIMIT. This will be fixed in DP4, but it may not be easily fixable in DP3.

Can you please post the results of the following query:

SELECT COUNT(*) FROM bucket-1 WHERE coins.f > 224.210000 AND coins.f < 448.420000

Thanks.
Comment by Pavel Paulau [ 05/May/14 ]
cbq> SELECT COUNT(*) FROM bucket-1 WHERE coins.f > 224.210000 AND coins.f < 448.420000
{
    "resultset": [
        {
            "$1": 2134
        }
    ],
    "info": [
        {
            "caller": "http_response:160",
            "code": 100,
            "key": "total_rows",
            "message": "1"
        },
        {
            "caller": "http_response:162",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "547.545767ms"
        }
    ]
}
Comment by Pavel Paulau [ 05/May/14 ]
Also it looks like we are leaking memory in this scenario.

Resident memory of cbq-engine grows very fast (several megabytes per second) and never goes down...




[MB-11007] Request for Get Multi Meta Call for bulk meta data reads Created: 30/Apr/14  Updated: 30/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Parag Agarwal Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: All


 Description   
Currently we support per key call for getMetaData. As a result our verification requires per key fetch during verification phase. This request is to support for get bulk meta data call which can get us meta data per vbucket for all keys or in batches. This would help enhance our verification ability for meta data per documents over time or after operations like rebalance, as it will be faster. If there is a better alternative, please recommend.

Current Behavior

https://github.com/couchbase/ep-engine/blob/master/src/ep.cc

ENGINE_ERROR_CODE EventuallyPersistentStore::getMetaData(
                                                        const std::string &key,
                                                        uint16_t vbucket,
                                                        const void *cookie,
                                                        ItemMetaData &metadata,
                                                        uint32_t &deleted,
                                                        bool trackReferenced)
{
    (void) cookie;
    RCPtr<VBucket> vb = getVBucket(vbucket);
    if (!vb || vb->getState() == vbucket_state_dead ||
        vb->getState() == vbucket_state_replica) {
        ++stats.numNotMyVBuckets;
        return ENGINE_NOT_MY_VBUCKET;
    }

    int bucket_num(0);
    deleted = 0;
    LockHolder lh = vb->ht.getLockedBucket(key, &bucket_num);
    StoredValue *v = vb->ht.unlocked_find(key, bucket_num, true,
                                          trackReferenced);

    if (v) {
        stats.numOpsGetMeta++;

        if (v->isTempInitialItem()) { // Need bg meta fetch.
            bgFetch(key, vbucket, -1, cookie, true);
            return ENGINE_EWOULDBLOCK;
        } else if (v->isTempNonExistentItem()) {
            metadata.cas = v->getCas();
            return ENGINE_KEY_ENOENT;
        } else {
            if (v->isTempDeletedItem() || v->isDeleted() ||
                v->isExpired(ep_real_time())) {
                deleted |= GET_META_ITEM_DELETED_FLAG;
            }
            metadata.cas = v->getCas();
            metadata.flags = v->getFlags();
            metadata.exptime = v->getExptime();
            metadata.revSeqno = v->getRevSeqno();
            return ENGINE_SUCCESS;
        }
    } else {
        // The key wasn't found. However, this may be because it was previously
        // deleted or evicted with the full eviction strategy.
        // So, add a temporary item corresponding to the key to the hash table
        // and schedule a background fetch for its metadata from the persistent
        // store. The item's state will be updated after the fetch completes.
        return addTempItemForBgFetch(lh, bucket_num, key, vb, cookie, true);
    }
}



 Comments   
Comment by Venu Uppalapati [ 30/Apr/14 ]
Server has support for quiet CMD_GETQ_META call which can be used on the client side to create a multi-getMeta call similar to multiGet call implementation.
Comment by Parag Agarwal [ 30/Apr/14 ]
Please point to a working example for this call
Comment by Venu Uppalapati [ 30/Apr/14 ]
Parag, you can find some relevant information on using queuing requests using quiet call at https://code.google.com/p/memcached/wiki/BinaryProtocolRevamped#Get,_Get_Quietly,_Get_Key,_Get_Key_Quietly
Comment by Chiyoung Seo [ 30/Apr/14 ]
Changing the fix version to the feature backlog given that 3.0 feature complete date was already passed and it is requested for the QE testing framework.




[MB-10993] Cluster Overview - Usable Free Space documentation misleading Created: 29/Apr/14  Updated: 29/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Jim Walker Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: documentation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Issue relates to:
 http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#viewing-cluster-summary

I was working through a support case and trying to explain the cluster overview free space and usable free space.

The following statement is from out documentation. After code review of ns_server I concluded that this is incorrect.

Usable Free Space:
The amount of usable space for storing information on disk. This figure shows the amount of space available on the configured path after non-Couchbase files have been taken into account.

The correct statement should be

Usable Free Space:
The amount of usable space for storing information on disk. This figure is calculated from the node with least amount of available storage in the cluster. The final value is calculated by multiplying by the number of nodes in the cluster.


This change is important as it is important for users to understand why Usable Free Space can be less than Free Space. The cluster considers all nodes to be equal. If you actually have a "weak" node in the cluster, e.g. one with a small disk, then the cluster nodes all have to ensure they keep storage under the weaker nodes limits, else for example we can never failover to the weak node as it cannot take on the job of a stronger node. When Usable Free Space is less than Free space, the user may actually want to see why a node has less storage available.




[MB-10920] unable to start tuq if there are no buckets Created: 22/Apr/14  Updated: 18/Jun/14  Due: 23/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
node is initialized but has no buckets
[root@kiwi-r116 tuqtng]# ./tuqtng -couchbase http://localhost:8091
10:26:56.520415 Info line disabled false
10:26:56.522641 FATAL: Unable to run server, err: Unable to access site http://localhost:8091, err: HTTP error 401 Unauthorized getting "http://localhost:8091/pools": -- main.main() at main.go:76




[MB-10834] update the license.txt for enterprise edition for 2.5.1 Created: 10/Apr/14  Updated: 19/Jun/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word 2014-04-07 EE Free Clickthru Breif License.docx    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
document attached.

 Comments   
Comment by Phil Labee [ 10/Apr/14 ]
2.5.1 has already been shipped, so this file can't be included.

Is this for 3.0.0 release?
Comment by Phil Labee [ 10/Apr/14 ]
voltron commit: 8044c51ad7c5bc046f32095921f712234e74740b

uses the contents of the attached file to update LICENSE-enterprise.txt on the master branch.




[MB-10821] optimize storage of larger binary object in couchbase Created: 10/Apr/14  Updated: 10/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-10084] Sub-Task: Changes required for Data Encryption in Client SDK's Created: 30/Jan/14  Updated: 28/May/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Anil Kumar Assignee: Andrei Baranouski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on CCBC-344 add support for SSL to libcouchbase i... Resolved
depends on JCBC-441 add SSL support in support of Couchba... Resolved
depends on NCBC-424 Add SSL support in support of Couchba... Resolved

 Description   
Changes required for Data Encryption in Client SDK's

 Comments   
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
wanted to make sure we agree this will be in 3.0. Matt any concerns?
thanks
Comment by Matt Ingenthron [ 20/Mar/14 ]
This should be closed in favor of the specific project issues. That said, the description is a bit fuzzy. Is this SSL support for memcached && views && any cluster management?

Please clarify and then we can open specific issues. It'd be good to have a link to functional requirements.
Comment by Matt Ingenthron [ 20/Mar/14 ]
And Cihan: it can't be "in 3.0", unless you mean concurrent release or release prior to 3.0 GA. Is that what you mean? I'd actually aim to have this feature support in SDKs prior to 3.0's release and we are working on it right now, though it has some other dependencies. See CCBC-344, for example.
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
thanks Matt. I meant 3.0 paired client SDK release so prior or shortly after is all good for me.
context - we are doing a pass to clean up JIRA. Like to button up what's in and out for 3.0.
Comment by Cihan Biyikoglu [ 24/Mar/14 ]
Matt, is there a client side ref implementation you guys did for this one? would be good to pass that onto test folks for initial validation until you guys completely integrate so no regressions creep up while we march to GA.
thanks
Comment by Matt Ingenthron [ 24/Mar/14 ]
We did verification with a non-mainline client since that was the quickest way to do so and have provided that to QE. Also, Brett filed a bug around HTTPS with ns-server and streaming configuration replies. See MB-10519.

We'll do a mainline client with libcouchbase and the python client as soon as it's dependency for handling packet IO is done. This is under CCBC-298 and CCBC-301, among others.




[MB-10003] [Port-configurability] Non-root instances and multiple sudo instances in a box cannot be 'offline' upgraded Created: 24/Jan/14  Updated: 27/Mar/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.5.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Unix/Linux


 Description   
Scenario
------------
As of today, we do not support offline 'upgrade' per se for packages installed in non-root/sudo users. Upgrades are usually handled by package managers. Since these are absent in non-root users and rpm cannot handle more than a a single package upgrade(if there are many instances running), offline upgrades are not supported (confirmed with Bin).

ALL non-root installations will be affected by this limitation. Although a single instance running on a box under sudo user can be offline upgraded, it cannot be extended to more than one such instance.

This is important

Workaround
-----------------
- Online upgrade (swap with nodes running latest build, take old nodes down and do clean install)
- Backup data and restore after fresh install (cbbackup and cbrestore)

Note : At this point, these are mere suggestions and both these workarounds haven't been tested yet.




[MB-10146] Document editor overwrites precision of long numbers Created: 06/Feb/14  Updated: 09/May/14

Status: Reopened
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged

 Description   
Just tested this out, not sure what diagnostics to capture so please let me know.

Simple test case:
-Create new document via document editor in UI
-Document contents are:
{"id": 18446744072866779556}
-As soon as you save, the above number is rewritten to:
{
  "id": 18446744072866780000
}
-The same effect is had if you edit a document that was inserted with the above "long" number

 Comments   
Comment by Aaron Miller (Inactive) [ 06/Feb/14 ]
It's worth noting views will always suffer from this, as it is a limitation of Javascript in general. Many JSON libraries have this behavior as well (even though they don't *have* to).
Comment by Aleksey Kondratenko [ 11/Apr/14 ]
cannot fix it. Just closing. If you want to reopen, please pass it to somebody responsible for overall design.
Comment by Perry Krug [ 11/Apr/14 ]
Reopening and assigning to docs, we need this to be release noted IMO.
Comment by Ruth Harris [ 14/Apr/14 ]
Reassigning to Anil. He makes the call on what we put in the release notes for known and fixed issues.
Comment by Anil Kumar [ 09/May/14 ]
Ruth - Lets release note this for 3.0.




[MB-11282] Separate stats for internal memory allocation (application vs. data) Created: 02/Jun/14  Updated: 02/Jun/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Story Priority: Critical
Reporter: Pavel Paulau Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
AFAIK currently we track allocation for data and application together.

But sometimes application (memcached / ep-engine) overhead is huge and cannot be ignored.




[MB-11250] Go-Coucbase: Provide DML APIs using CAS Created: 29/May/14  Updated: 18/Jun/14  Due: 30/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Gerald Sangudi Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-11247] Go-Couchbase: Use password to connect to SASL buckets Created: 29/May/14  Updated: 19/Jun/14  Due: 30/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Gerald Sangudi Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Gerald Sangudi [ 19/Jun/14 ]
https://github.com/couchbaselabs/query/blob/master/docs/n1ql-authentication.md




[MB-11208] stats.org should be installed Created: 27/May/14  Updated: 27/May/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: techdebt-backlog
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Trond Norbye Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
stats.org contains a description of the stats we're sending from ep-engine. It could be useful for people

 Comments   
Comment by Matt Ingenthron [ 27/May/14 ]
If it's "useful" shouldn't this be part of official documentation? I've often thought it should be. There's probably a duplicate here somewhere.

I also think the stats need stability labels applied as people may rely on stats when building their own integration/monitoring tools. COMMITTED, UNCOMMITTED, VOLATILE, etc. would be useful for the stats.

Relatedly, someone should document deprecation of TAP stats for 3.0.




[MB-11195] Support binary collation for views Created: 23/May/14  Updated: 16/Jun/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Melkote Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
N1QL would benefit significantly if we could allow memcmp() collation for views it creates. So much so that we should consider this for a minor release after 3.0 so it can be available for N1QL beta.




[MB-11102] extended documentation about stats flowing out of CBSTATS and the correlation between them Created: 12/May/14  Updated: 12/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Update documentation about stats flowing out of CBSTATS and the correlation between them - Need this to be able to accurately predict capacity/other bottlenecks as well as detect trends.




[MB-11101] supported go SDK for couchbase server Created: 12/May/14  Updated: 16/Jun/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
go client




[MB-11098] Ability to set block size written to storage for better alignment with SSD¹s and/or HDD¹s for better throughput performance Created: 12/May/14  Updated: 12/May/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Ability to set block size written to storage for better alignment with SSD¹s and/or HDD¹s for better throughput performance




[MB-10767] DOC: Misc - DITA conversion Created: 04/Apr/14  Updated: 04/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Ruth Harris Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-10651] The guide for install user defined port doesn't work for Rest port change Created: 26/Mar/14  Updated: 17/Jun/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Larry Liu Assignee: Aruna Piravi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#install-user-defined-ports

I followed the instruction to change admin port (Rest port):
append to the /opt/couchbase/etc/couchbase/static_config file.
{rest_port, 9000}.

[root@localhost bin]# netstat -an| grep 9000
[root@localhost bin]# netstat -an| grep :8091
tcp 0 0 0.0.0.0:8091 0.0.0.0:* LISTEN

logs:
https://s3.amazonaws.com/customers.couchbase.com/larry/output.zip

Larry



 Comments   
Comment by Larry Liu [ 26/Mar/14 ]
The log files shows that the change was taken by server:

[ns_server:info,2014-03-26T19:13:24.063,nonode@nohost:<0.58.0>:ns_server:log_pending:30]Static config terms:
[{error_logger_mf_dir,"/opt/couchbase/var/lib/couchbase/logs"},
 {error_logger_mf_maxbytes,10485760},
 {error_logger_mf_maxfiles,20},
 {path_config_bindir,"/opt/couchbase/bin"},
 {path_config_etcdir,"/opt/couchbase/etc/couchbase"},
 {path_config_libdir,"/opt/couchbase/lib"},
 {path_config_datadir,"/opt/couchbase/var/lib/couchbase"},
 {path_config_tmpdir,"/opt/couchbase/var/lib/couchbase/tmp"},
 {nodefile,"/opt/couchbase/var/lib/couchbase/couchbase-server.node"},
 {loglevel_default,debug},
 {loglevel_couchdb,info},
 {loglevel_ns_server,debug},
 {loglevel_error_logger,debug},
 {loglevel_user,debug},
 {loglevel_menelaus,debug},
 {loglevel_ns_doctor,debug},
 {loglevel_stats,debug},
 {loglevel_rebalance,debug},
 {loglevel_cluster,debug},
 {loglevel_views,debug},
 {loglevel_mapreduce_errors,debug},
 {loglevel_xdcr,debug},
 {rest_port,9000}]
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
This is because rest_port entry in static_config is only taken into account for fresh install.

There's some way to install our package without starting server first. And that has to be documented. I don't know who owns working with docs people.
Comment by Anil Kumar [ 09/May/14 ]
Alk - Before it gets to documentation we need to test it and verify the instructions. Can you provide those instructions and assign this ticket to Aruna to test it.
Comment by Anil Kumar [ 03/Jun/14 ]
Alk - can you provide those instructions and assign this ticket to Aruna to test it.
Comment by Aleksey Kondratenko [ 04/Jun/14 ]
Instructions fail to mention the fact that rest_port must be changed before config.dat is written. And config.dat is initialized on first server start.

There's some way to install server without starting it.

But here's what I managed to do:

# dpkg -i ~/Desktop/forReview/couchbase-server-enterprise_ubuntu_1204_x86_2.5.1-1086-rel.deb

# /etc/init.d/couchbase-server stop

# rm /opt/couchbase/var/lib/couchbase/config/config.dat

# emacs /opt/couchbase/etc/couchbase/static_config

# /etc/init.d/couchbase-server start

I.e. I stoped service, removed config.dat, edited static_config, then started it back and found rest port to be updated.
Comment by Anil Kumar [ 04/Jun/14 ]
Thanks Alk. Assigning this to Aruna for verification and later please assign this ticket to Documentation (Ruth).




[MB-10531] No longer necessary to wait for persistence to issue stale=false query Created: 21/Mar/14  Updated: 25/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Sriram Melkote Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Matt pointed out that in the past, we had to wait for an item to persist to disk before issuing stale=false query for correct results. In 3.0, this is not necessary. One can issue a stale=false view query anytime and results will fetch all changes that have been made when the query was issued. This task is a placeholder to update docs to remove the unnecessary step of waiting for persistence in 3.0 docs.

 Comments   
Comment by Matt Ingenthron [ 21/Mar/14 ]
Correct. Thanks for making sure this is raised Siri. While I'm thinking of it, two points need to be in there:
1) if you have older code, you will need to change it to take advantage of the semantic change to the query
2) application developers still need to be a bit careful to ensure any modifications being done aren't async operations-- they'll have to wait for the responses before doing the stale=false query
Comment by Anil Kumar [ 25/Mar/14 ]
This is for 3.0 documentation.
Comment by Sriram Melkote [ 25/Mar/14 ]
Not an improvement. This is a task.




[MB-10511] Feature request for supporting rolling downgrades Created: 19/Mar/14  Updated: 11/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Abhishek Singh Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to

 Description   
Some customers are interested in Couchbase supporting rolling downgrades. Currently we can't add 2.2 nodes inside a cluster that has all nodes on 2.5.




[MB-10512] Update documentation to convey we don't support rolling downgrades Created: 19/Mar/14  Updated: 27/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Abhishek Singh Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Update documentation to convey we don't support rolling downgrades to 2.2 once all nodes are running on 2.5




[MB-10432] Removed ep_max_txn_size stat/engine_parameter Created: 11/Mar/14  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Mike Wiederhold Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This value is no longer used in the server. Please not that you need to update the documentation for cbepctl since this stat could be set with that script.




[MB-10430] Add AWS AMI documentation to Installation and Upgrade Guide Created: 11/Mar/14  Updated: 25/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Brian Shumate Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It would be useful to have some basic installation instructions for those who want to use the Couchbase Server Amazon Machine Instances in a direct manner without RightScale.

This is particularly with regards to the special case of the Administrator user and password, which can become a stumbling point for some users.


 Comments   
Comment by Anil Kumar [ 25/Mar/14 ]
Ruth - Please add reference to Couchbase on AWS Whitepaper - http://aws.typepad.com/aws/2013/08/running-couchbase-on-aws-new-white-paper.html that has all the information.




[MB-10379] index is not used for simple query Created: 06/Mar/14  Updated: 28/May/14  Due: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: 2.5.0
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Iryna Mironava
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 64-bit

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
I created index for name field of bucket b0 and then my_skill index for b0
cbq> select * from :system.indexes
{
    "resultset": [
        {
            "bucket_id": "b0",
            "id": "#alldocs",
            "index_key": [
                "META().id"
            ],
            "index_type": "view",
            "name": "#alldocs",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
        {
            "bucket_id": "b0",
            "id": "my_name",
            "index_key": [
                "name"
            ],
            "index_type": "view",
            "name": "my_name",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
       {
            "bucket_id": "b0",
            "id": "my_skill",
            "index_key": [
                "skills"
            ],
            "index_type": "view",
            "name": "my_skill",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
        {
            "bucket_id": "b1",
            "id": "#alldocs",
            "index_key": [
                "META().id"
            ],
            "index_type": "view",
            "name": "#alldocs",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
        {
            "bucket_id": "default",
            "id": "#alldocs",
            "index_key": [
                "META().id"
            ],
            "index_type": "view",
            "name": "#alldocs",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        }
    ],
    "info": [
        {
            "caller": "http_response:160",
            "code": 100,
            "key": "total_rows",
            "message": "4"
        },
        {
            "caller": "http_response:162",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "1.185438ms"
        }
    ]
}

I see my view on UI, I can query it.
but explain says i am still using #alldocs

cbq> explain select name from b0
{
    "resultset": [
        {
            "input": {
                "as": "b0",
                "bucket": "b0",
                "ids": null,
                "input": {
                    "as": "",
                    "bucket": "b0",
                    "cover": false,
                    "index": "#alldocs",
                    "pool": "default",
                    "ranges": null,
                    "type": "scan"
                },
                "pool": "default",
                "projection": null,
                "type": "fetch"
            },
            "result": [
                {
                    "as": "name",
                    "expr": {
                        "left": {
                            "path": "b0",
                            "type": "property"
                        },
                        "right": {
                            "path": "name",
                            "type": "property"
                        },
                        "type": "dot_member"
                    },
                    "star": false
                }
            ],
            "type": "projector"
        }
    ],
    "info": [
        {
            "caller": "http_response:160",
            "code": 100,
            "key": "total_rows",
            "message": "1"
        },
        {
            "caller": "http_response:162",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "1.236104ms"
        }
    ]
}
same result i see for skills


 Comments   
Comment by Sriram Melkote [ 07/Mar/14 ]
I think the current implementation considers secondary indexes only for filtering operations. When you do SELECT <anything> FROM <bucket>, it is a full bucket scan, and that is implemented by #alldocs and by #primary index only.

So the current behavior looks to be correct. Try running "CREATE PRIMARY INDEX USING VIEW" and please see if the query will then switch from #alldocs to #primary. Please also try adding a filter, like WHERE name > 'Mary' and see if the my_name index gets used for the filtering.

As a side note, what you're running is a covered query, where all the data necessary is held in a secondary index completely. However, this is not implemented. A secondary index is only used as an access path, and not as a source of data.
Comment by Gerald Sangudi [ 11/Mar/14 ]
This particular query will always use #primary or #alldocs. Even for documents without a "name" field, we return a result object that is missing the "name" field.

@Iryna, please test WHERE name iS NOT MISSING to see if it uses the index. If not, we'll fix that for DP4. Thanks.




[MB-9446] there's chance of starting janitor while not having latest version of config (was: On reboot entire cluster , see many conflicting bucket config changes frequently.) Created: 30/Oct/13  Updated: 04/Jun/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.0, 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Aliaksey Artamonau
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 0.0.0-7040toy

Triage: Triaged
Is this a Regression?: Yes

 Description   

Load items on a cluster , build toy-000-704
Reboot cluster

Post reboot, see a lot of messages on conflicting bucket config on the web logs.

Cluster logs here: https://s3.amazonaws.com/bugdb/bug_9445/9435.tar

Sample

{fastForwardMap,undefined}]}]}]}, choosing the former, which looks newer.
ns_config003 ns_1@soursop-s11207.sc.couchbase.com 18:59:30 - Wed Oct 30, 2013
Conflicting configuration changes to field buckets:
{[{'ns_1@172.23.105.45',{5088,63550403967}},
{'ns_1@soursop-s11203.sc.couchbase.com',{1,63550403967}},
{'ns_1@soursop-s11204.sc.couchbase.com',{1764,63550403283}}],
[{'_vclock',[{'ns_1@172.23.105.45',{5088,63550403967}},
{'ns_1@soursop-s11203.sc.couchbase.com',{1,63550403967}},
{'ns_1@soursop-s11204.sc.couchbase.com',{1764,63550403283}}]},
{configs,[{"saslbucket",
[{uuid,<<"b51edfdad356db7e301d9b32c6ef47a3">>},
{num_replicas,1},
{replica_index,false},
{ram_quota,3355443200},
{auth_type,sasl},
{sasl_password,"password"},
{autocompaction,false},
{purge_interval,undefined},
{flush_enabled,false},
{num_threads,3},
{type,membase},
{num_vbuckets,1024},
{servers,['ns_1@soursop-s11203.sc.couchbase.com',
'ns_1@soursop-s11204.sc.couchbase.com',
'ns_1@soursop-s11205.sc.couchbase.com',
'ns_1@soursop-s11207.sc.couchbase.com']},
{map,[['ns_1@soursop-s11207.sc.couchbase.com',
'ns_1@soursop-s11205.sc.couchbase.com'],
['ns_1@soursop-s11207.sc.couchbase.com',
'ns_1@soursop-s11203.sc.couchbase.com'],
['ns_1@soursop-s11207.sc.couchbase.com',
'ns_1@soursop-s11204.sc.couchbase.com'],

 Comments   
Comment by Aleksey Kondratenko [ 30/Oct/13 ]
Very weird. But if indeed issue, there's likely exactly same issue on 2.5.0. And if it's the case looks pretty scary.
Comment by Aliaksey Artamonau [ 01/Nov/13 ]
I set affect version to 2.5 because I really know that it affects 2.5. And actually many preceding releases.
Comment by Maria McDuff (Inactive) [ 31/Jan/14 ]
Alk,

is this already merged in 2.5? pls confirm and mark as resolved if that's the case, assign back to QE.
Thanks.
Comment by Aliaksey Artamonau [ 31/Jan/14 ]
No, it's not fixed in 2.5.
Comment by Anil Kumar [ 04/Jun/14 ]
Triage - 06/04/2014 Alk, Wayne, Parag, Anil




[MB-9356] tuq crash during query + rebalance having 1M items Created: 16/Oct/13  Updated: 18/Jun/14  Due: 23/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 64 bit

Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Initial setup:
2 buckets, 1M items each of them, 1 node

Steps:
1) start a query using curl
2) add a node and start rebalance
3) start same query using tuq_client console.


[root@localhost tuqtng]# ./tuqtng -couchbase http://localhost:8091
07:19:57.406786 tuqtng started...
07:19:57.406880 version: 0.0.0
07:19:57.406887 site: http://localhost:8091
panic: runtime error: index out of range

goroutine 283323 [running]:
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:151 +0x4f1
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c101fc, 0xc20043af70, 0x1, 0x1, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 1 [chan receive]:
github.com/couchbaselabs/tuqtng/server.Server(0x8464a0, 0x5, 0x7fff3931ab76, 0x15, 0x8554a0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:66 +0x4f4
main.main()
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/main.go:71 +0x28a

goroutine 2 [syscall]:

goroutine 4 [syscall]:
os/signal.loop()
/usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x1c
created by os/signal.init·1
/usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x2f

goroutine 13 [chan send]:
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).SendResult(0xc2001cb930, 0x70d420, 0xc200a27880)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:47 +0x46
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).processItem(0xc2001aa080, 0xc2001c9a80, 0xc2001c99c0, 0xc200a27080, 0x2b5e52485d01, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:119 +0x119
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal(0xc2001aa080, 0xc2001d7e10, 0xc2001c9a80, 0xc2001c99c0, 0xc2001e2720, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:90 +0x2a7
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).Execute(0xc2001aa080, 0xc2001d7e10, 0xc2001c9a80, 0xc2001c99c0, 0xc2000004f8, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:42 +0x100
github.com/couchbaselabs/tuqtng/server.Dispatch(0xc2001c9a80, 0xc2001c99c0, 0xc2001c1b10, 0xc2001ab000, 0xc2001c1b40, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:85 +0x191
created by github.com/couchbaselabs/tuqtng/server.Server
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:67 +0x59c

goroutine 6 [chan receive]:
main.dumpOnSignal(0x2b5e52484fa0, 0x1, 0x1)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/main.go:80 +0x7f
main.dumpOnSignalForPlatform()
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/dump.go:19 +0x80
created by main.main
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/main.go:62 +0x1d7

goroutine 7 [IO wait]:
net.runtime_pollWait(0x2aaaaabacf00, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001242c0, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).accept(0xc200124240, 0x90ae00, 0x0, 0xc200198660, 0xb, ...)
/usr/local/go/src/pkg/net/fd_unix.go:385 +0x2c1
net.(*TCPListener).AcceptTCP(0xc2000005f8, 0x4443f6, 0x2b5e52483e28, 0x4443f6)
/usr/local/go/src/pkg/net/tcpsock_posix.go:229 +0x45
net.(*TCPListener).Accept(0xc2000005f8, 0xc200125420, 0xc2001ac2b0, 0xc2001f46c0, 0x0, ...)
/usr/local/go/src/pkg/net/tcpsock_posix.go:239 +0x25
net/http.(*Server).Serve(0xc200107a50, 0xc2001488c0, 0xc2000005f8, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/server.go:1542 +0x85
net/http.(*Server).ListenAndServe(0xc200107a50, 0xc200107a50, 0xc2001985d0)
/usr/local/go/src/pkg/net/http/server.go:1532 +0x9e
net/http.ListenAndServe(0x846860, 0x5, 0xc2001985d0, 0xc200107960, 0x0, ...)
/usr/local/go/src/pkg/net/http/server.go:1597 +0x65
github.com/couchbaselabs/tuqtng/network/http.func·001()
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:37 +0x6c
created by github.com/couchbaselabs/tuqtng/network/http.NewHttpEndpoint
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:41 +0x2a0

goroutine 31 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).SendItem(0xc2001e29c0, 0xc200eb0a40, 0x85b2c0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:151 +0xbe
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange(0xc2001e29c0, 0x0, 0x8a7970)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:121 +0x640
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).Run(0xc2001e29c0, 0xc2001e2960)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:61 +0xdc
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 11 [IO wait]:
net.runtime_pollWait(0x2aaaaabace60, 0x77, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitWrite(0xc200124620, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:80 +0x31
net.(*netFD).Write(0xc2001245a0, 0xc2001af000, 0x44, 0x1000, 0x4, ...)
/usr/local/go/src/pkg/net/fd_unix.go:294 +0x3e6
net.(*conn).Write(0xc2000008f0, 0xc2001af000, 0x44, 0x1000, 0x452dd2, ...)
/usr/local/go/src/pkg/net/net.go:131 +0xc3
net/http.(*switchWriter).Write(0xc2001ad040, 0xc2001af000, 0x44, 0x1000, 0x4d5989, ...)
/usr/local/go/src/pkg/net/http/chunked.go:0 +0x62
bufio.(*Writer).Flush(0xc200148f40, 0xc20092e6b4, 0x34)
/usr/local/go/src/pkg/bufio/bufio.go:465 +0xb9
net/http.(*chunkWriter).flush(0xc2001cb8e0)
/usr/local/go/src/pkg/net/http/server.go:270 +0x59
net/http.(*response).Flush(0xc2001cb8c0)
/usr/local/go/src/pkg/net/http/server.go:953 +0x5d
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).ProcessResults(0xc2001cb930, 0x2, 0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:109 +0x16b
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).Process(0xc2001cb930, 0x40519c, 0x71b260)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:61 +0x52
github.com/couchbaselabs/tuqtng/network/http.(*HttpQuery).Process(0xc2001c99c0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_query.go:72 +0x29
github.com/couchbaselabs/tuqtng/network/http.(*HttpEndpoint).ServeHTTP(0xc200000508, 0xc2001b1140, 0xc2001cb8c0, 0xc2001cd000)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:55 +0xcd
github.com/gorilla/mux.(*Router).ServeHTTP(0xc200107960, 0xc2001b1140, 0xc2001cb8c0, 0xc2001cd000)
/tmp/gocode/src/github.com/gorilla/mux/mux.go:90 +0x1e1
net/http.serverHandler.ServeHTTP(0xc200107a50, 0xc2001b1140, 0xc2001cb8c0, 0xc2001cd000)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc200124630)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266

goroutine 21 [chan receive]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.keepPoolFresh(0xc2001fa200)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/couchbase.go:157 +0x4b
created by github.com/couchbaselabs/tuqtng/catalog/couchbase.newPool
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/couchbase.go:149 +0x34c

goroutine 30 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).SendItem(0xc2001c1660, 0xc200a27940, 0xc200a27940)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:49 +0xbf
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).flushBatch(0xc2001e1c40, 0xc2001e2a00)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:141 +0x7cd
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).processItem(0xc2001e1c40, 0xc200390c00, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:78 +0xd9
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc2001c1660, 0xc2002015f0, 0xc2001e1c40, 0xc2001e2840)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:107 +0x1b0
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).Run(0xc2001e1c40, 0xc2001e2840)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:57 +0xa8
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 32 [chan send]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange(0xc200201500, 0x0, 0x0, 0x0, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:179 +0x386
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanEntries(0xc200201500, 0x0, 0xc2001e2b40, 0xc2001e2ba0, 0xc2001250c0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:112 +0x78
created by github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:82 +0x18b

goroutine 29 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).SendItem(0xc2001c1600, 0xc200a27140, 0x87ea70)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:49 +0xbf
github.com/couchbaselabs/tuqtng/xpipeline.(*Project).processItem(0xc2001c1630, 0xc200a27140, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/project.go:95 +0x33b
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc2001c1600, 0xc2002015a0, 0xc2001c1630, 0xc2001e2ae0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:107 +0x1b0
github.com/couchbaselabs/tuqtng/xpipeline.(*Project).Run(0xc2001c1630, 0xc2001e2ae0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/project.go:46 +0x91
created by github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:79 +0x1c7

goroutine 33 [chan send]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.WalkViewInBatches(0xc2001251e0, 0xc2001252a0, 0xc2000e8480, 0x845260, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_util.go:90 +0x424
created by github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:159 +0x209

goroutine 165124 [select]:
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal(0xc2001aa080, 0xc2005d6340, 0xc2001c9a80, 0xc200282580, 0xc2005bba80, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:87 +0x667
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).Execute(0xc2001aa080, 0xc2005d6340, 0xc2001c9a80, 0xc200282580, 0xc2000004f8, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:42 +0x100
github.com/couchbaselabs/tuqtng/server.Dispatch(0xc2001c9a80, 0xc200282580, 0xc2001c1b10, 0xc2001ab000, 0xc2001c1b40, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:85 +0x191
created by github.com/couchbaselabs/tuqtng/server.Server
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:67 +0x59c

goroutine 283325 [chan receive]:
github.com/dustin/gomemcached/client.(*Client).GetBulk(0xc200d55990, 0xc200d50092, 0xc2001d7d60, 0x1, 0x1, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:228 +0x3c3
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:158 +0x1dc
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c10092, 0xc2001d7d60, 0x1, 0x1, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 283623 [runnable]:
net.runtime_pollWait(0x2aaaaabac820, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f46b0, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4630, 0xc20063fda0, 0x18, 0x18, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc2002f2658, 0xc20063fda0, 0x18, 0x18, 0x1, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
io.ReadAtLeast(0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:284 +0xf7
io.ReadFull(0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:302 +0x6f
github.com/dustin/gomemcached.(*MCResponse).Receive(0xc2002b4d80, 0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/mc_res.go:155 +0xc7
github.com/dustin/gomemcached/client.getResponse(0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/transport.go:30 +0xc6
github.com/dustin/gomemcached/client.(*Client).Receive(0xc200b13f30, 0xc2002b4c00, 0x0, 0x0)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:81 +0x67
github.com/dustin/gomemcached/client.func·003()
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:193 +0xaf
created by github.com/dustin/gomemcached/client.(*Client).GetBulk
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:207 +0x1e6

goroutine 165127 [chan receive]:
github.com/couchbaselabs/go-couchbase.(*Bucket).GetBulk(0xc2000e8480, 0xc200977000, 0x3e8, 0x3e8, 0xc200000001, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:278 +0x341
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*bucket).BulkFetch(0xc2001c11e0, 0xc200977000, 0x3e8, 0x3e8, 0x15, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/couchbase.go:249 +0x83
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).flushBatch(0xc2006e8230, 0xc2005bbd00)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:113 +0x35d
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).processItem(0xc2006e8230, 0xc200c19840, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:78 +0xd9
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc200257300, 0xc2002015f0, 0xc2006e8230, 0xc2005bbba0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:107 +0x1b0
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).Run(0xc2006e8230, 0xc2005bbba0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:57 +0xa8
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 165126 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc2002572a0, 0xc2002015a0, 0xc2002572d0, 0xc2005bbe40)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:104 +0x32c
github.com/couchbaselabs/tuqtng/xpipeline.(*Project).Run(0xc2002572d0, 0xc2005bbe40)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/project.go:46 +0x91
created by github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:79 +0x1c7

goroutine 165123 [chan receive]:
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).ProcessResults(0xc2006e80e0, 0x2, 0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:88 +0x3c
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).Process(0xc2006e80e0, 0x40519c, 0x71b260)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:61 +0x52
github.com/couchbaselabs/tuqtng/network/http.(*HttpQuery).Process(0xc200282580)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_query.go:72 +0x29
github.com/couchbaselabs/tuqtng/network/http.(*HttpEndpoint).ServeHTTP(0xc200000508, 0xc2001b1140, 0xc2006e8070, 0xc2001cdea0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:55 +0xcd
github.com/gorilla/mux.(*Router).ServeHTTP(0xc200107960, 0xc2001b1140, 0xc2006e8070, 0xc2001cdea0)
/tmp/gocode/src/github.com/gorilla/mux/mux.go:90 +0x1e1
net/http.serverHandler.ServeHTTP(0xc200107a50, 0xc2001b1140, 0xc2006e8070, 0xc2001cdea0)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc2001f46c0)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266

goroutine 165129 [select]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange(0xc200201500, 0x0, 0x0, 0x0, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:166 +0x6b9
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanEntries(0xc200201500, 0x0, 0xc2005bbea0, 0xc2005bbf00, 0xc2005bbf60, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:112 +0x78
created by github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:82 +0x18b

goroutine 165128 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange(0xc2005bbd20, 0x0, 0x8a7970)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:99 +0x7a2
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).Run(0xc2005bbd20, 0xc2005bbcc0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:61 +0xdc
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 165130 [select]:
net/http.(*persistConn).roundTrip(0xc200372f00, 0xc200765660, 0xc200372f00, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/transport.go:857 +0x6c7
net/http.(*Transport).RoundTrip(0xc20012e080, 0xc2004dac30, 0xc2005a5808, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/transport.go:186 +0x396
net/http.send(0xc2004dac30, 0xc2000e7e70, 0xc20012e080, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/client.go:166 +0x3a1
net/http.(*Client).send(0xb7fcc0, 0xc2004dac30, 0x7c, 0x2b5e52429020, 0xc200fca2c0, ...)
/usr/local/go/src/pkg/net/http/client.go:100 +0xcd
net/http.(*Client).doFollowingRedirects(0xb7fcc0, 0xc2004dac30, 0x90ae80, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/client.go:282 +0x5ff
net/http.(*Client).Do(0xb7fcc0, 0xc2004dac30, 0xc20052cae0, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/client.go:129 +0x8d
github.com/couchbaselabs/go-couchbase.(*Bucket).ViewCustom(0xc2000e8480, 0x845260, 0x0, 0x873ab0, 0x9, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/views.go:115 +0x210
github.com/couchbaselabs/go-couchbase.(*Bucket).View(0xc2000e8480, 0x845260, 0x0, 0x873ab0, 0x9, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/views.go:155 +0xcc
github.com/couchbaselabs/tuqtng/catalog/couchbase.WalkViewInBatches(0xc20045b000, 0xc20045b060, 0xc2000e8480, 0x845260, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_util.go:80 +0x2ce
created by github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:159 +0x209

goroutine 283324 [chan receive]:
github.com/dustin/gomemcached/client.(*Client).GetBulk(0xc200b13f30, 0xc200b1001b, 0xc200fce200, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:228 +0x3c3
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:158 +0x1dc
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c1001b, 0xc200fce200, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 283622 [runnable]:
net.runtime_pollWait(0x2aaaaabacb40, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f4e00, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4d80, 0xc200eda4e0, 0x18, 0x18, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc20080b948, 0xc200eda4e0, 0x18, 0x18, 0x1, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
io.ReadAtLeast(0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:284 +0xf7
io.ReadFull(0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:302 +0x6f
github.com/dustin/gomemcached.(*MCResponse).Receive(0xc2002b4ba0, 0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/mc_res.go:155 +0xc7
github.com/dustin/gomemcached/client.getResponse(0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/transport.go:30 +0xc6
github.com/dustin/gomemcached/client.(*Client).Receive(0xc2004bfa20, 0xc2002b4a20, 0x0, 0x0)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:81 +0x67
github.com/dustin/gomemcached/client.func·003()
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:193 +0xaf
created by github.com/dustin/gomemcached/client.(*Client).GetBulk
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:207 +0x1e6

goroutine 283331 [select]:
net/http.(*persistConn).writeLoop(0xc200372f00)
/usr/local/go/src/pkg/net/http/transport.go:774 +0x26f
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:512 +0x58b

goroutine 283321 [chan receive]:
github.com/couchbaselabs/go-couchbase.errorCollector(0xc20101a900, 0xc200372c80)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:246 +0x9f
created by github.com/couchbaselabs/go-couchbase.(*Bucket).GetBulk
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:275 +0x2f2

goroutine 283544 [select]:
net/http.(*persistConn).writeLoop(0xc2006cbd80)
/usr/local/go/src/pkg/net/http/transport.go:774 +0x26f
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:512 +0x58b

goroutine 283322 [chan receive]:
github.com/dustin/gomemcached/client.(*Client).GetBulk(0xc2004bfa20, 0xc2004b01ec, 0xc200e8cd60, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:228 +0x3c3
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:158 +0x1dc
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c101ec, 0xc200e8cd60, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 283621 [runnable]:
net.runtime_pollWait(0x2aaaaabac960, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f4500, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4480, 0xc200a83940, 0x18, 0x18, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc20084a5d8, 0xc200a83940, 0x18, 0x18, 0x1, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
io.ReadAtLeast(0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:284 +0xf7
io.ReadFull(0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:302 +0x6f
github.com/dustin/gomemcached.(*MCResponse).Receive(0xc2002b49c0, 0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/mc_res.go:155 +0xc7
github.com/dustin/gomemcached/client.getResponse(0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/transport.go:30 +0xc6
github.com/dustin/gomemcached/client.(*Client).Receive(0xc200d55990, 0xc2002b4720, 0x0, 0x0)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:81 +0x67
github.com/dustin/gomemcached/client.func·003()
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:193 +0xaf
created by github.com/dustin/gomemcached/client.(*Client).GetBulk
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:207 +0x1e6

goroutine 283543 [runnable]:
net/http.(*persistConn).readLoop(0xc2006cbd80)
/usr/local/go/src/pkg/net/http/transport.go:761 +0x64b
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:511 +0x574

goroutine 283320 [chan send]:
github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet(0xc2000e8480, 0xc200c19980, 0xc20101a8a0, 0xc20101a900)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:222 +0x26b
created by github.com/couchbaselabs/go-couchbase.(*Bucket).GetBulk
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:273 +0x2d1

goroutine 283330 [IO wait]:
net.runtime_pollWait(0x2aaaaabacdc0, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f47d0, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4750, 0xc20097b000, 0x1000, 0x1000, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc20084a498, 0xc20097b000, 0x1000, 0x1000, 0x8, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
bufio.(*Reader).fill(0xc200b83180)
/usr/local/go/src/pkg/bufio/bufio.go:79 +0x10c
bufio.(*Reader).Peek(0xc200b83180, 0x1, 0xc200198840, 0x0, 0xc200eda4e0, ...)
/usr/local/go/src/pkg/bufio/bufio.go:107 +0xc9
net/http.(*persistConn).readLoop(0xc200372f00)
/usr/local/go/src/pkg/net/http/transport.go:670 +0xc4
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:511 +0x574
[root@localhost tuqtng]#


 Comments   
Comment by Marty Schoch [ 16/Oct/13 ]
Looks like a bug in go-couchbase. I have filed an issue there:

http://cbugg.hq.couchbase.com/bug/bug-906
Comment by Ketaki Gangal [ 16/Oct/13 ]
Can easily hit this on any rebalance, bumping this upto a critical.




[MB-9234] Failover message should take into account availability of replica vbuckets Created: 08/Oct/13  Updated: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If a node goes down and its vbuckets do not have corresponding replicas available in the cluster, we should warn the user that pressing failover will result in perceived dataloss. At the moment, we have the same failover message whether those replica vbuckets are available or not.




[MB-9143] Allow replica count to be edited Created: 17/Sep/13  Updated: 12/Jun/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 2.5.0, 3.0

Type: Task Priority: Critical
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-2512 Allow replica count to be edited Closed

 Description   
Currently the replication factor cannot be edited after a bucket has been created. It would be nice to have this functionality.

 Comments   
Comment by Ruth Harris [ 06/Nov/13 ]
Currently, it's added to the 3.0 Eng branch by Alk. See MB-2512. This would be a 3.0 doc enhancement.
Comment by Perry Krug [ 25/Mar/14 ]
FYI, this is already in as-of 2.5 and probably needs to be documented there as well...if possible before 3.0
Comment by Amy Kurtzman [ 16/May/14 ]
Anil, Can you verify whether this was added in 2.5 or 3.0?
Comment by Anil Kumar [ 28/May/14 ]
Verified - As perry mentioned this was added in 2.5 release. We need to document this soon for 2.5 docs.




[MB-8686] CBHealthChecker - Fix fetching number of CPU processors Created: 23/Jul/13  Updated: 05/Jun/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.1.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Anil Kumar Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: customer
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-8817 REST API support to report number of ... Technical task Open Bin Cui  
Triage: Untriaged

 Description   
Issue reported by customer - cbhealthchecker report showing incorrect information for 'Minimum CPU core number required'.


 Comments   
Comment by Bin Cui [ 07/Aug/13 ]
it will depend on ns_server to provide number of cpu processors in the collected stats. Suggest to push to next release.
Comment by Maria McDuff (Inactive) [ 01/Nov/13 ]
per Bin:
Suggest to push the following two bugs to next release:
1. MB-8686: it depends on ns_server to provide capability to retrieve number of cpu cores
2. MB-8502: caused by async communication between main installer thread and api to get status. Change will be dramatic for installer.
 
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
Bin,

Raising to Critical.
If this is still dependent on ns_server, pls assign to Alk.
This needs to be fixed for 3.0.
Comment by Anil Kumar [ 05/Jun/14 ]
We need this information to be provided from ns_server. Created ticket MB-11334.

Traige - June 05 2014 Bin, Anil, Tony, Ashvinder
Comment by Aleksey Kondratenko [ 05/Jun/14 ]
Ehm. I don't think it's good idea to treat ns_server as "provider of random system-level stats". I believe you'll need to find other way of getting it.




[MB-8915] Tombstone purger need to find a better home for lifetime of deletion Created: 21/Aug/13  Updated: 15/Jun/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, cross-datacenter-replication, storage-engine
Affects Version/s: 2.2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Junyi Xie (Inactive) Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-8916 migration tool for offline upgrade Technical task Open Anil Kumar  
Triage: Untriaged

 Description   
=== copy and paste my email to a group of people, it should explain clearly why we need this ticket ===

Thanks for your comments. Probably it is easier to read in email than code review.

Let me explain a bit to see if we can be on the same page. First of all the current resolution algorithm (comparing all fields) is still right, yes there is small chance we would touch fields after CAS, but for correctness we should have them there.

The cause of MB-8825 is that tombstone purger uses expiration time field to put the purger specific "lifetime of deletion". This is just a "temporary solution" because IMHO the expiration time of a key is not the right place for "lifetime of deletion" (this is purely a storage specific metadata, IMHO should not be in eo_engine), but unfortunately today we cannot find a better place to put such info unless we change the storage format, which has too much overhead at this time. In future, I think we need to figure out the best place for "lifetime of deletion" and move it out of key expiration time field.

In practice, today this temporary solution in tombstone purger is OK in most cases because rarely you have collision in CAS for two deletions on the same key. But MB-8825 just hit the small dark area, when destination tries to replicate a deletion from source back to source in bi-dir XDCR, both share the same (SeqNo, CAS) but different expiration time field (which is not exp time of key, but lifetime of deletion created by tombstone purger), exp time at destination is some times bigger than that at source, causing incorrect resolution results at source. The problem exists for both CAPI and XMEM.

For backward compatibility,
1) If both sides are 2.2, we uses new resolution algorithm for deletion and we are safe.
2) if both sides are pre-2.2, since they do not have tombstone purger, the current algorithm (comparing all fields) should be safe.
3) If a bi-dir XDCR between pre-2.2 and 2.2 cluster on CAPI. deletion born at 2.2 replicating to pre-2.2 should be safe because there is no tombstone purger at pre-2.2. For deletions born at pre-2.2, we may see them bounced back from 2.2. But there should be no dataloss since you just re-delete something already deleted.

This fix may not be perfect, but it is still much better than issues in MB-8825. I hope in near future we can find a right place for "lifetime of deletion" in tombstone purger.


Thanks,

Junyi

 Comments   
Comment by Junyi Xie (Inactive) [ 21/Aug/13 ]
Anil and Dipti,

Please determine the priority of this task, and comment if I miss anything. Thanks.


Comment by Anil Kumar [ 21/Aug/13 ]
Upgrade - We need migration tool (which we talked about) in case of Offline upgrade to move the data. Created a subtask for that.
Comment by Aaron Miller (Inactive) [ 17/Oct/13 ]
Considering that fixing this has lots of implications w.r.t. upgrade and all components that touch the file format, and that not fixing it is not causing any problems, I believe that this is not appropriate for 2.5.0
Comment by Junyi Xie (Inactive) [ 22/Oct/13 ]
I agree with Aaron that this may not be a small task and may have lots of implications to different components.

Anil, please reconsider if this is appropriate for 2.5. Thanks.
Comment by Anil Kumar [ 22/Oct/13 ]
Moved it to 3.0.
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
As "temporary head of xdcr for 3.0" I don't need this fixed in 3.0

And my guess is that after 3.0 when "the plan" for xdcr will be ready, we'll just close it as won't fix, but lets wait and see.
Comment by Cihan Biyikoglu [ 15/Jun/14 ]
Aaron is no longer here. assigning to Chiyoung for consideration.




[MB-8832] Allow for some back-end setting to override hard limit on server quota being 80% of RAM capacity Created: 14/Aug/13  Updated: 28/May/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.1.0, 2.2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Perry Krug Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Relates to
relates to MB-10180 Server Quota: Inconsistency between d... Open
Triage: Untriaged
Is this a Regression?: Yes

 Description   
At the moment, there is no way to override the 80% of RAM limit for the server quota. At very large node sizes, this can end up leaving a lot of RAM unused.

 Comments   
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
Passing this to Dipti.

We've seen memory fragmentation to easily be 50% of memory usage. So even with 80% you can get into swap and badness.

I'd recommend _against_ this until we solve fragmentation issues we have today.

Also keep in mind that today you _can_ raise this above all limites with simple /diag/eval snippet
Comment by Perry Krug [ 14/Aug/13 ]
We have seen this I agree, but it's been fairly uncommon in production environments and is something that can be monitored and resolved when it does occur. In larger RAM systems, I think we would be better served for most use cases by allowing more RAM to be used.

For example, 80% of 60GB is 48GB...leaving 12GB unused. Even worse for 256GB (leaving 50+GB unused)
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
And on 256 gigs machine fragmentation can be as big as 128 gigs (!) IMHO this is not about absolute numbers but about percentages. Anyways, Dipti will tell us what to do, but your numbers above are just saying how bad our _expected_ fragmentation is.
Comment by Perry Krug [ 14/Aug/13 ]
But that's where I disagree...I think it _is_ about absolute numbers. If we leave fragmentation out of it (since it's something we will fix eventually, something that is specific to certain workloads and something that can be worked around via rebalancing), the point of this overhead was specifically to leave space available for the operating system and any other processes running outside of Couchbase. I'm sure you'd agree that Linux doesn't need anywhere near 50GB of RAM to run properly :) Even if we could decrease that by half it would provide huge savings in terms of hardware and costs to our users.

Is fragmentation the only concern of yours? If we were able to analyze a running production cluster to quantify the RAM fragmentation that exists and determine that it is within a certain bounds...would it be okay to raise the quota about 80%?
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
My point was that fragmentation is also % not absolute. So with larger ram, waste from fragmentation looks scarier.

Now that you're asking if that's my only concern I see that there's more.

Without sufficient space for page cache disk performance will suffer. How much we need to be at least on par with sqlite I cannot say. Nobody can, apparently. Things depend on whether you're going to do bgfetches or not.

Because if you do care about quick bgfetches (or, say views and xdcr) then you may want to set lowest possible quota and give us much ram as possible for page cache, hoping that at least all metadata is in page cache.

If you do not care about residency of metadata, that means you don't care about btree leafs being page-cache-resident. But in order to remain io-efficient you do need to keep non-leaf nodes in page cache. The issue is that with our append-only design nobody knows how well it works in practice and exactly how much of page cache you need to give to keep few perhaps hundreds of megs of metadata-of-metadata page-cache resident. And quite possibly that "correct" recommendation is something like "you need XX percents of your data size for page cache to keep disk subsystem efficient".
Comment by Perry Krug [ 14/Aug/13 ]
Okay, that does make a very good point.

But it also highlights the need for a flexible configuration on our end depending on the use case and customer's needs. i.e., certain customers want to enforce that they are 100% resident and to me that would mean giving Couchbase more than the default quota (while still keeping the potential for fragmentation in mind).
Comment by Patrick Varley [ 11/Feb/14 ]
MB-10180 is strongly related to this issue.
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
Anil, pls see my comment on MB-10180.




[MB-8054] Couchstore's mergesort module, currently used for db compaction, can buffer too much data in memory Created: 10/Apr/13  Updated: 19/May/14

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0.1, 2.1.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Filipe Manana Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
The size of the buffer used by the mergesort module is exclusively bounded by number of elements. This
is dangerous, because elements can have variable sizes, and
a small number of elements does not necessarily means that
the buffer size (byte size) is small.
    
Namely the treewriter module, used by the database compactor
to sort the temporary file containing records for the id btree,
was specifying a buffer element count of 100 * 1024 * 1024.
If for example there are 100 * 1024 * 1024 id records and each
has an average size of 512 bytes, the merge sort module buffers
50Gb of data in memory!
    
Although the id btree records are currently very small (under
a hundred bytes or so), use of other types of records may easily
cause too much memory consumption - this will be the case for
view records. Issue MB-8029 adds a module that uses the mergesort
module to sort files containing view records.


 Comments   
Comment by Filipe Manana [ 10/Apr/13 ]
http://review.couchbase.org/#/c/25588
Comment by Filipe Manana [ 11/Apr/13 ]
It turns out this is not a simple change.

Simply adding a buffer byte size limit breaks the merge algorithm for some cases, particularly when the file to sort is larger than the specified buffer sizes. The mergesort.c merge phase relies on the fact that each sorted batch written to the tmp files always has the same number of elements - a thing that doesn't hold true when records have a variable size such as with views (MB-8029).

For now it's not too bad because for the current use of mergesort.c by the views, the files to sort are small (up to 30Mb max). Later this will have to change, as the files to sort can have any size, from a few kb to hundreds of mbs or gbs. I'll look for alternative external mergesort implementation, which allows to control max buffer size, merge only a group of already sorted files (like Erlang's file_sorter allows) and ideally more optimized as well (allow for N-way merge, instead of fixed 2-way merge, etc).
Comment by Filipe Manana [ 16/May/13 ]
There's a new and improved (flexibility, error handling, some performance optimizations) on-disk file sorter now in master branch.
It's being used for views already.

Introduced in:

https://github.com/couchbase/couchstore/commit/fdb0da52a1e3c059fef3fa7e74ec54b03e62d5db

Advantages:

1) Allow in-memory buffer sizes to be bounded by number of
   bytes, unlike mergesort.c which bounds buffers by number of
   records regardless of their sizes.

2) Allow for a N-way merges, allowing for better performance,
   due to a significant reduction of moving records between
   temporary files;

3) Some optimizations to avoid unncessary moving of records
   between temporary files (specially when total number of
   records is smaller than buffer size);

4) Allow specifying which directory is used to store temporary
   files. The mergesort.c uses the C stdlib function tmpfile()
   to create temporary files - the standard doesn't specify in
   which directory such files are created, but on GNU/Linux it
   seems to be in /tmp (see http://linux.die.net/man/3/tmpfile).
   For database compaction and index compaction, it's important
   to use a directory from within the configured database and
   index directories (settings database_dir and view_index_dir),
   because those directories are what the administrator configured
   and may be part of a disk drive that offers better performance
   or just has more available space for example.
   Further, in some system /tmp might map to a tmpfs mount, which
   is an in-memory filesystem (http://en.wikipedia.org/wiki/Tmpfs);

5) Better and more fine grained error handling. Confront with MB-8055 -
   the mergesort.c module ignored completely read errors when
   reading from the temporary files - which could lead to silent data loss.
Comment by Filipe Manana [ 16/May/13 ]
See above.
Since this is core database, I believe it belongs to you.
Comment by Maria McDuff (Inactive) [ 10/Feb/14 ]
Aaron,

is this going to be in 3.0?
Comment by Aaron Miller (Inactive) [ 18/Feb/14 ]
I wouldn't count on it. This sort of thing affects views a lot more than the storage files, and the view code has already been modified to use the newer disk sort.

This is unlikely to cause any problems with storage file compaction, as the sizes of the records in storage files can't grow arbitrarily.

Using the newer sort will probably perform better, but *not* using it shouldn't cause any problems, making this issue more of a performance enhancement than a bug, and as such will probably lose to other issues I'm working on for 3.0 and 2.5.X




[MB-8022] Fsync optimizations (remove double fsyncs) Created: 05/Feb/13  Updated: 01/Apr/14

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0, 2.0.1, 2.1.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Dipti Borkar Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: PM-PRIORITIZED
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Aaron Miller (Inactive) [ 28/Mar/13 ]
There is a toy build that Ronnie is testing to see the potential perfomance impacts of this. (toy-aaron #1022)
Comment by Maria McDuff (Inactive) [ 10/Apr/13 ]
Jin will update use case scenario that QE will run.
Comment by Jin Lim [ 11/Apr/13 ]
This feature is to optimize disk write from ep engine/couchstore.

Any existing test that measures disk drain rate should determine any tangible improvement from the feature.
Baseline:
* Heavy dgm
* Write heavy (read:20% write:80%)
* Write I/O should be mix of set/delete/update
* Measure disk drain rate and cbstats's kvtimings (writeTime, commit, save_documents)
Comment by Aaron Miller (Inactive) [ 11/Apr/13 ]
The most complicated part of this change is the addition of a corruption check that must be run the first time a file is opened after the server comes up, since we're buying these perf gains by playing a bit more fast and loose with the disk.

To check that this is behaving correctly we'll want to make sure that corrupting the most-recent transaction in a storage file rolls that transaction back.

This could be accomplished by updating an item that will land in a known vbucket, shutting down the server, and flipping some bits around end of the file. The update should be rolled back when the server comes back up, and nothing should freak out :)

A position guaranteed to affect an item body from the recentmost transaction is 4095 bytes behind the last position in the file that is a multiple of 4096, or: floor(file_length / 4096) * 4096 - 4095
Comment by Maria McDuff (Inactive) [ 16/Apr/13 ]
abhinav,
will you be able to craft a test that involves this update to an item and manipulating the bits on eof? this seems tricky. let's discuss with Jin/Aaron.
Comment by Dipti Borkar [ 19/Apr/13 ]
I don't think this is user visible and so doesn't make sense to include in the release notes.
Comment by Maria McDuff (Inactive) [ 19/Apr/13 ]
aaron, pls assign back to QE (Abhinav) once you've merged the fix.
Comment by kzeller [ 22/Apr/13 ]
Updated 4/22 - No docs needed
Comment by Maria McDuff (Inactive) [ 22/Apr/13 ]
Aaron, can you also include the code changes for review here as soon as you have checked-in the fix?
thanks.
Comment by Maria McDuff (Inactive) [ 23/Apr/13 ]
deferred.
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
Hi Aaron, are you working on this for 3.0? if yes, could you push this to fexversion=3.0
Comment by Cihan Biyikoglu [ 01/Apr/14 ]
Chiyoung, pls close if this isn't relevant anymore, given this is a year old.




[MB-7177] lack of fsyncs in view engine may lead to silent index corruption Created: 13/Nov/12  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Rahim Yaseen (Inactive)
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
SUBJ. Found out about this in discussion with Filipe about how views work.

If I understood correctly it doesn't fsync at all silently assuming that if there's valid header, then preceding data is valid as well. Which is clearly not true.

IMHO that's a massive blocker that needs to be fixed sooner rather than later.

 Comments   
Comment by Steve Yen [ 14/Nov/12 ]
bug-scrub -- assigned to yaseen
Comment by Aleksey Kondratenko [ 14/Nov/12 ]
Comment was made that this cannot be silent index corruption due to CRC-ing of all btree nodes. But my point still holds, we if there's data corruption we'll know at query time and people we'll have to experience down time to manually rebuild index.
Comment by Steve Yen [ 15/Nov/12 ]
per bug scrub
Comment by Farshid Ghods (Inactive) [ 26/Nov/12 ]
Deep and Iryna have tried a scenario where they rebooted the system and did not hit this issue.
Comment by Steve Yen [ 26/Nov/12 ]
to .next per bug-scrub.

QE reports that deep & iryna tried to reproduce this and couldn't yet.
Comment by Aleksey Kondratenko [ 26/Nov/12 ]
It appears that move to .next was based on same old "we cannot reproduce" logic. It appears that we continue to under-prioritize IMHO important bugs merely because it's hard to reproduce them.

Because with that logic we'll I'm sure will forever move it to next release. If we think we don't need to that, IMHO it would be better to just close it.
Comment by Filipe Manana [ 04/Jan/13 ]
Due to crc checks for every object written to a file (btree nodes), it won't certainly be silent.
Comment by Aleksey Kondratenko [ 04/Jan/13 ]
I agree. My earlier comment above (based on your's or Damien's verbal comment) has same information.

But not being silent doesn't mean we can simply close it (or IMHO downgrade or forget it). Do we know what exactly will happen if querying or updating view will suddenly detect corrupted index file ?
Comment by Andrew DePue [ 21/May/13 ]
We just ran into this, or something like it. We have a development cluster and lost power to the entire cluster at once (it was a dev cluster so we didn't have backup power). The Couchbase cluster _seemed_ to start OK, but accessing certain views would result in strange behavior... mostly timeouts without any error or any indication as to what the problem could be.
Comment by Filipe Manana [ 21/May/13 ]
If there's a corruption issue with a file (either view or database), view queries will return an explicit file_corruption error if the index file is corrupted. If the corruption is in a database file, the error is only returned in a query response if the query is of type stale=false. For all cases, the error (and a stack trace) are logged.

Did you saw such error in your case? Example:
http://www.couchbase.com/forums/thread/filecorruption-error-executing-view




[MB-6746] separate disk path for replica index (and individual design doc) disk path Created: 26/Sep/12  Updated: 20/Jun/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0-beta
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Dipti Borkar Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
On the UI and REST API to create a separate disk path for replica indexes (right under the replica index check box in the setup wizard)

This will allow users to have better disk solution is replica index is used.

In addition, add new rest APIs to enable a separate disk path for each design document (Not in the UI, only in REST)

 Comments   
Comment by Aleksey Kondratenko [ 28/Sep/12 ]
Dipti, this checkbox in setup wizard is for default bucket. Not cluster-wide setting.

Also are you really sure we need this ? I mean raid0 for views looks even better from performance perspective.
Comment by Aleksey Kondratenko [ 04/Oct/12 ]
We discussed already that I can't do that without more instructions.
Comment by Peter Wansch (Inactive) [ 08/Oct/12 ]
Change too invasive for 2.0
Comment by Steve Yen [ 25/Oct/12 ]
alk would be a better assignee for this than peter
Comment by Aleksey Kondratenko [ 20/Jun/13 ]
Given this is per-bucket/per-node we don't have place for this in current UI design.

And I'm not sure we really need this. I seriously doubt that, honestly speaking.




[MB-6527] Tools to Index and compact database/indexes when the server is offline Created: 05/Sep/12  Updated: 01/Apr/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Karan Kumar (Inactive) Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: system-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This is from the supportability point of view.
If for whatever reason customer bring their nodes down, eg. maintenance etc.

When they bring the node back up, hopefully we have all the compaction/indexing finished for that particular node.

We need a way to index and compact data (database and index) if possible when the nodes are offline.

 Comments   
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
Anil, could you pull this into 3.0 if this is happening in 3.0 timeline?




[MB-6450] Finalize doc editing API and implementation Created: 27/Aug/12  Updated: 19/May/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0-beta
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
We need to:

* avoid loading and displaying blobs on UI

* avoid seeing deleted docs

* handle warmup, node being down in REST API implementation and UI


 Comments   
Comment by Tug Grall (Inactive) [ 09/Sep/13 ]
As a follow up to our discussion...

We have some issue with the console where for example the user stores: (using the PHP SDK but more or less the same with other)

$cb->set("T2","2.0");

The data on disk looks like (couchstore dump)
     id: T2
     rev: 6
     content_meta: 128
     cas: 76870971085289, expiry: 0, flags: 0
     data: (snappy) 2.0

When I read the value is:

echo( $cb->get("T2") );
2.0

But in the Couchbase console the value is showed has:
2




Comment by Tug Grall (Inactive) [ 09/Sep/13 ]
Another issue that I have found that is relate to this global work is:

If you store:
$cb->set("T3","2,0");

On disk it is:
     id: T3
     rev: 8
     content_meta: 131
     cas: 76870971228863, expiry: 0, flags: 0
     data: (snappy) 2,0

In the console : document view :
"SyntaxError: JSON.parse: unexpected non-whitespace character after JSON data (Document is invalid JSON)"
Does not show any valye (where it should show a "base64" string

In the console : preview document in the view editor:
"Lost connection to server at 127.0.0.1:8091. Repeating in 1 seconds. Retry now" and nothing is showed

The XHR (for example http://127.0.0.1:8091/couchBase/default/T3?_=1378774035807) returns 2,0 where it "should" be a base64 not a JSON "thing"
Comment by Aleksey Kondratenko [ 10/Sep/13 ]
Tug, issue we discussed above indeed applies to UI. _But_ there's also full stack issue. And IMHO larger one.

I've cc-ed some bosses so that they're aware too. So let me elaborate and raise that.

Our product is positioned as "document database". Yet what you've shown me php sdk doing is:

* when number 2.0 is passed to set(), it get's silently converted to array of bytes "2" which is stored on server. And we treat that as valid json (well at least in views; we've long had plans to have flag that would allow memcached to refuse non-json values). So map function will see that as 2.0 (js numbers are always floating point). So far good.

* when string "2.0" is sent, it still gets sent as array of bytes "2.0" which is stored on server. This time map function will see _number_ again. Which is arguably not good. Application is seeing string, yet view is seeing number.

* when string "2gargage" is sent it gets to server as "2gargage". And this is not valid json (note, quotes are _mine_ and are not part of value). So views will see that (due to some arguably questionable decision) as base64 encoding of that octet string.

My point is: if we want to seriously be json storage, we should consider doing something on clients as well. So that for example "asdsad" is sent as "\"asdasd\"" (quoted to be json string). So that views see roughly same value as your app.

This is in my opinion larger issue than UI problem that I'm going to fix soon. And let me note again that there is _no plan at all_ to support displaying let alone editing arbitrary values (blobs) on UI. We'll only limit ourselves to json values. And quite possibly even not all types of json values.




[MB-4840] hotfix release should reflect the change(version#) on the web console Created: 27/Feb/12  Updated: 11/Mar/14

Status: In Progress
Project: Couchbase Server
Component/s: build
Affects Version/s: None
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Farshid Ghods (Inactive) Assignee: Thuan Nguyen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Comments   
Comment by Phil Labee [ 04/Mar/13 ]
When a hotfix is generated, capture the manifest.xml file from the build. Diff with the GA manifest and update the installed manifest by appending with new commit info.
Comment by Steve Yen [ 18/Mar/13 ]
Related to this, need to update the hotfix creation procedures / docs.
Comment by Steve Yen [ 18/Mar/13 ]
Also, easier to wait for the next hotfix before updating this.
Comment by Phil Labee [ 17/Apr/13 ]
I need the diagnostic info that customers send to support, the one that includes the manifest.xml file.

If the customer ran the update.sh script the manifest.xml should have been updated in a way that is reflected in the diagnostic info.

I'd also like general feedback on the hot-fix installation process, as I am trying to make it easier and more reliable.
Comment by Maria McDuff (Inactive) [ 03/Jul/13 ]
tony,

can u quickly verify this?
you have to run a hotfix release -- get it from phil.




[MB-4785] Meaningful alert when low-level packet corruption on node Created: 08/Feb/12  Updated: 21/Nov/13

Status: Reopened
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 1.7.2
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Tim Smith (Inactive) Assignee: Anil Kumar
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: See http://www.couchbase.com/issues/browse/CBSE-101 for complete details


 Description   
Logs showed that some low-level corruption in network data was apparent. Symptom is that nodes are going up and down. Not clear in the UI that this is happening only on 2 nodes. Not clear in UI that it's low-level corruption. Not clear that these nodes are consistently having a problem, and need to be failed over. No info bubbles up about why the node flaps up and down, or how to report this up to data center or Amazon (in this case on EC2).

Need a clear alert to user, suggesting to fail over a troublesome node. Ideal to have concrete examples of the corrupt data to pass on to data center ops.

 Comments   
Comment by Aleksey Kondratenko [ 08/Feb/12 ]
are you sure this is really critical ?
Comment by Tim Smith (Inactive) [ 08/Feb/12 ]
My priority calibration may be off here. It is OK for product management or whoever to re-triage this request based on a larger picture of priorities.

Tim
Comment by Farshid Ghods (Inactive) [ 08/Feb/12 ]
I would actually rephrase this bug saying that the node status should become red when nss_server detects a corruption during send/receive and change the issue type from enhancement into a bug.

and the fact this happened in ec2 environment makes it more important
Comment by Peter Wansch (Inactive) [ 19/Jul/12 ]
Maybe this has been resolved because of recent infinity fixes in Erlang
Comment by Aleksey Kondratenko [ 07/Sep/12 ]
Not fixed.

We indeed think we've fixed cause of this.

But if this happens again, only thing we'll see is node being red for a moment in UI.

Unfortunately erlang doesn't provide us a way to monitor and react on this particular condition. It'll be just disconnect and you cannot know why.

So only way to fix seems to be extending erlang's vm.
Comment by Aleksey Kondratenko [ 20/Sep/12 ]
We haven't fixed it.

We think we _have_ fixed CBSE-whatever by working around some unknown subtle bug in infinity trapping via signals that's specific to Linux on EC2 (or any linux or any xen, we don't know).

This particular request is to make this condition when low-level erlang code detects packet corruption and disconnects pair of nodes _visible to end user_. Via alert particularly. It makes sense to me.

Regarding what Farshid said. We _do_ mark node as red, but next second we re-establish connection and things work again, until this happens next time.
Comment by Aleksey Kondratenko [ 20/Sep/12 ]
Also I think we can fix it, but in not necessarily future-proof and pleasant way. We can grep log message that erlang logs via error logging facility that our logger implementation intercepts. That seems like the only path (excluding erlang VM modification) that can produce alerts from this kind of events.
Comment by Aleksey Kondratenko [ 12/Aug/13 ]
Depends on non-poor-man's alerts
Comment by Brian Shumate [ 21/Nov/13 ]
It would be helpful if network errors or network partition conditions could
be logged and represented almost in the same manner as the uptime command's
representation of load average, i.e. number of network issues in the last
5/15/30 minutes or similar somewhere in the web console UI.




[MB-8040] 2.0 needs to support use couchbase for all the REST endpoints. (no membase) Created: 21/May/12  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0, 2.0.1, 2.1.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Dipti Borkar Assignee: