[MB-11846] Compiling breakdancer test case exceeds available memory Created: 29/Jul/14  Updated: 25/Aug/14  Due: 30/Jul/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0
Fix Version/s: 3.0.1, 3.0
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Chris Hillery Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Triage: Untriaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Unknown

 Description   
1. With memcached change 4bb252a2a7d9a369c80f8db71b3b5dc1c9f47eb9, cc1 on ubuntu-1204 quickly uses up 100% of the available memory (4GB RAM, 512MB swap) and crashes with an internal error.

2. Without Trond's change, cc1 compiles fine and never takes up more than 12% memory, running on the same hardware.

 Comments   
Comment by Chris Hillery [ 29/Jul/14 ]
Ok, weird fact - on further investigation, it appears that this is NOT happening on the production build server, which is an identically-configured VM. It only appears to be happening on the commit validation server ci03. I'm going to temporarily disable that machine so the next make-simple-github-tap test runs on a different ci server and see if it is unique to ci03. If it is I will lower the priority of the bug. I'd still appreciate some help in understanding what's going on either way.
Comment by Trond Norbye [ 30/Jul/14 ]
Please verify that the two builders have the same patch level so that we're comparing apples with apples.

It does bring up another interesting topic. should our builders just use the compiler provided with the installation, or should we have a reference compiler we're using to build our code. It does seems like a bad idea having to support a ton of various compiler revision (including the fact that they support different levels of C++11 that we have to work around).
Comment by Chris Hillery [ 31/Jul/14 ]
This is now occurring on other CI build servers in other tests - http://www.couchbase.com/issues/browse/CBD-1423

I am bumping this back to Test Blocker and I will revert the change as a work-around for now.
Comment by Chris Hillery [ 31/Jul/14 ]
Partial revert committed to memcached master: http://review.couchbase.org/#/c/40152/ and 3.0: http://review.couchbase.org/#/c/40153/
Comment by Trond Norbye [ 01/Aug/14 ]
That review in memcached should NEVER have been pushed through. Its subject line is too long
Comment by Chris Hillery [ 01/Aug/14 ]
If there's a documented standard out there for commit messages, my apologies; it was never revealed to me.
Comment by Trond Norbye [ 01/Aug/14 ]
When it doesn't fit within a terminal window there is a problem. it is way better to use multiple lines..

IN addition I'm not happy with the fix. instead of deleting the line it should have been checking for an environment variable so that people could explicitly disable it. This is why we have review cycles.
Comment by Chris Hillery [ 01/Aug/14 ]
I don't think I want to get into style arguments. If there's a standard I'll use it. In the meantime I'll try to keep things to 72-character lines.

As to the content of the change, it was not intended to be a "fix"; it was a simple revert of a change that was provably breaking other jobs. I returned the code to its previous state, nothing more or less. And especially given the time crunch of the beta (which is supposed to be built tomorrow), waiting for a code review on a reversion is not in the cards.
Comment by Trond Norbye [ 01/Aug/14 ]
The normal way of doing a revert is to use git revert (which as an extra bonus makes the commit message contain that).
Comment by Trond Norbye [ 01/Aug/14 ]
http://review.couchbase.org/#/c/40165/
Comment by Chris Hillery [ 01/Aug/14 ]
1. Your fix is not correct, because simply adding -D to cmake won't cause any preprocessor defines to be created. You need to have some CONFIGURE_FILE() or similar to create a config.h using #cmakedefine. As it is there is no way to compile with your change.

2. The default behaviour should not be the one that is known to cause problems. Until and unless there is an actual fix for the problem (whether or not that is in the code), the default should be to keep the optimization, with an option to let individuals bypass that if they desire and accept the risks.

3. Characterizing the problem as "misconfigured VMs" is, at best, premature.

I will revert this change again on the 3.0 branch shortly, unless you have a better suggestion (I'm definitely all ears for a better suggestion!).
Comment by Trond Norbye [ 01/Aug/14 ]
If you look at the comment it pass the -D over into the CMAKE_C_FLAGS, causing it to be set into the compiler flags and it'll be passed on to compilation cycle.

As of misconfiguration, it is either insufficient resources on the vm or a "broken" compiler version installed there.
Comment by Trond Norbye [ 01/Aug/14 ]
Can I get login credentials to the server it fails and an identical vm where it succeeds.
Comment by Chris Hillery [ 01/Aug/14 ]
[CMAKE_C_FLAGS] Fair enough, I did misread that. That's not really a sufficient workaround, though. Doing that may overwrite other CFLAGS set by other parts of the build process.

I still maintain that the default behaviour should be the known-working version. However, for the moment I have temporarily locked the rel-3.0.0.xml manifest to the revision before my revert (ie, to 5cc2f8d928f0eef8bddbcb2fcb796bc5e9768bb8), so I won't revert anything else until that has been tested.

The only VM I know of at the moment where we haven't seen build failures is the production build slave. I can't give you access to that tonight as we're in crunch mode to produce a beta build. Let's plan to hook up next week and do some exploration.
Comment by Volker Mische [ 01/Aug/14 ]
There are commit message guidelines. At the bottom of

http://www.couchbase.com/wiki/display/couchbase/Contributing+Changes

links to:

http://en.wikibooks.org/wiki/Git/Introduction#Good_commit_messages
Comment by Trond Norbye [ 01/Aug/14 ]
I've not done anything on the 3.0.0 branch, the fix going forward is for 3.0.1 and trunk. Hopefully the 3.0 branch will die relatively soon since we've got a lot of good stuff in the 3.0.1 branch.

The "workaround" is not intended as a permanent solution, its just until the vms is fixed. I've not been able to reproduce this issue on my centos, ubuntu, fedora or smartos builders. They're running in the following vm's:

[root@00-26-b9-85-bd-92 ~]# vmadm list
UUID TYPE RAM STATE ALIAS
04bf8284-9c23-4870-9510-0224e7478f08 KVM 2048 running centos-6
7bcd48a8-dcc2-43a6-a1d8-99fbf89679d9 KVM 2048 running ubuntu
c99931d7-eaa3-47b4-b7f0-cb5c4b3f5400 KVM 2048 running fedora
921a3571-e1f6-49f3-accb-354b4fa125ea OS 4096 running compilesrv
Comment by Trond Norbye [ 01/Aug/14 ]
I need access to two identical configured builders where one may reproduce the error and one where it succeeds.
Comment by Volker Mische [ 01/Aug/14 ]
I would also add that I think it is about bad VMs. On the commit validation we have 6 VMs, It failed only always on ubuntu-1204-64-ci-01 due to this error and never on the others (ubuntu-1204-64-ci-02 - 06).
Comment by Chris Hillery [ 01/Aug/14 ]
That's not correct. The problem originally occurred on ci-03.
Comment by Volker Mische [ 01/Aug/14 ]
Then I need to correct it that my comment only holds true for the couchdb-gerrit-300 job.
Comment by Trond Norbye [ 01/Aug/14 ]
can I get login creds to one that it fails on? while I'm waiting for access to one that it works on?
Comment by Volker Mische [ 01/Aug/14 ]
I don't know about creds (I think my normal user login works) The machine details are here: http://factory.couchbase.com/computer/ubuntu-1204-64-ci-01/
Comment by Chris Hillery [ 01/Aug/14 ]
Volker - it was initially detected in the make-simple-github-tap job, so it's not unique to couchdb-gerrit-300 either. Both jobs pretty much just checkout the code and build it, though; they're pretty similar.
Comment by Trond Norbye [ 01/Aug/14 ]
Adding swap space to the builder makes the compilation pass. I've been trying to figure out how to get gcc to print more information about each step (the -ftime-reports memory usage didn't at all match the process usage ;-))
Comment by Anil Kumar [ 12/Aug/14 ]
Adding the component as "build". Let me know if that's not correct.




[MB-12138] {Windows - DCP}:: View Query fails with error 500 reason: error {"error":"error","reason":"{index_builder_exit,89,<<>>}"} Created: 05/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Parag Agarwal Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: windows, windows-3.0-beta
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1267, Windows 2012, 64 x, machine:: 172.23.105.112

Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12138/172.23.105.112-952014-1511-diag.zip
Is this a Regression?: Yes

 Description   


1. Create 1 Node cluster
2. Create default bucket and add 100k items
3. Create views and query it

Seeing the following exceptions

http://172.23.105.112:8092/default/_design/ddoc1/_view/default_view0?connectionTimeout=60000&full_set=true&limit=100000&stale=false error 500 reason: error {"error":"error","reason":"{index_builder_exit,89,<<>>}"}

We cannot run any view tests as a result


 Comments   
Comment by Anil Kumar [ 16/Sep/14 ]
Nimish/Siri - Any update on this.
Comment by Meenakshi Goel [ 17/Sep/14 ]
Seeing similar issue in Views DGM test http://qa.hq.northscale.net/job/win_2008_x64--69_06_view_dgm_tests-P1/1/console
Test : view.createdeleteview.CreateDeleteViewTests.test_view_ops,ddoc_ops=update,test_with_view=True,num_ddocs=4,num_views_per_ddoc=10,items=200000,active_resident_threshold=10,dgm_run=True,eviction_policy=fullEviction
Comment by Nimish Gupta [ 17/Sep/14 ]
We have found the root cause and working on the fix.




[MB-10156] "XDCR - Cluster Compare" support tool Created: 07/Feb/14  Updated: 19/Jun/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.5.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Cihan Biyikoglu Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: 2.5.1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
for the recent issues we have seen we need a tool that cam compare metadata (specifically revids) for a given replication definition in XDCR. To scale to large data sizes, being able to do this per vbucket or per doc range would be great but we can do without these. for clarity, here is a high level desc.

Ideal case:
xdcr_compare cluster1_connectioninfo cluster1_bucketname cluster2connectioninfo cluster2_bucketname [vbucketid] [keyrange]
should return a line per docid for each row where cluster1 metadata and clustermetadata for the given key differ.
docID - cluster1_metadata cluster2_metadata

simplification: the tool is expected to return false positives in a moving system but we will tackle that by rerunning the tool multiple times.

 Comments   
Comment by Cihan Biyikoglu [ 19/Feb/14 ]
Aaron, do you have a timeline for this?
thanks
-cihan
Comment by Maria McDuff (Inactive) [ 19/Feb/14 ]
Cihan,

For test automation/verification, can you list out the stats/metadata that we should be testing specifically?
we want to create/implement the tests accordingly.


Also -- is this tool de-coupled from the server package? or is this part of rpm/deb/.exe/osx build package?

Thanks,
Maria
Comment by Aaron Miller (Inactive) [ 19/Feb/14 ]
This depends on the requirements; A tool that requires the manual collection of all data from all nodes in both clusters onto one machine (like we've done recently) could be done pretty quickly, but I imagine that may be difficult or unfeasible entirely for some users.

Better would be to be able to operate remotely on clusters and only look at metadata. Unfortunately there is no *currently exposed* interface to only extract metadata from the system without also retrieving values. I may be able to work around this, but the workaround is unlikely to be simple.

Also for some users, even the amount of *metadata* may be prohibitively large to transfer all to one place, this also can be avoided, but again, adds difficulty.

Q: Can the tool be JVM-based?
Comment by Aaron Miller (Inactive) [ 19/Feb/14 ]
I think it would be more feasible for this to ship separately from the server package.
Comment by Maria McDuff (Inactive) [ 19/Feb/14 ]
Cihan, Aaron,

If it's de-coupled, what older versions of Couchbase would this tool support? as far back as 1.8.x? pls confirm as this would expand our backward compatibility testing for this tool.
Comment by Aaron Miller (Inactive) [ 19/Feb/14 ]
Well, 1.8.x didn't have XDCR or the rev field; It can't be compatible with anything older than 2.0 since it operates mostly to check things added since 2.0.

I don't know how far back it needs to go but it *definitely* needs to be able to run against 2.2
Comment by Cihan Biyikoglu [ 19/Feb/14 ]
Agree with Aaron, lets keep this lightweight. can we depend on Aaron for testing if this will initially be just a support tool? for 3.0, we may graduate the tool to the server shipped category.
thanks
Comment by Sangharsh Agarwal [ 27/Feb/14 ]
Cihan, Is the Spec finalized for this tool in version 2.5.1?
Comment by Cihan Biyikoglu [ 27/Feb/14 ]
Sangharsh, for 2.5.1, we wanted to make this a "Aaron tested" tool. I believe Aaron already has the tool. Aaron?
Comment by Aaron Miller (Inactive) [ 27/Feb/14 ]
Working on it; wanted to get my actually-in-the-package 2.5.1 stuff into review first.

What I do already have is a diff tool for *files*, but is highly inconvenient to use; this should be a tool that doesn't require collecting all data files into one place in order to use, and instead can work against a running cluster.
Comment by Maria McDuff (Inactive) [ 05/Mar/14 ]
Aaron,

Is the tool merged yet into the build? can you update pls?
Comment by Cihan Biyikoglu [ 06/Mar/14 ]
2.5.1 shiproom note: Phil raised a build concern on getting this packaged with 2.5.1. The initial bar we set was not to ship this as part of the server - it was intended to be a downloadable support tool. Aaron/Cihan will re-eval and get back to shiproom.
Comment by Cihan Biyikoglu [ 15/Jun/14 ]
Aaron no longer here. assigning to Xiaomei for consideration.




[MB-10719] Missing autoCompactionSettings during create bucket through REST API Created: 01/Apr/14  Updated: 19/Jun/14

Status: Reopened
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: michayu Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File bucket-from-API-attempt1.txt     Text File bucket-from-API-attempt2.txt     Text File bucket-from-API-attempt3.txt     PNG File bucket-from-UI.png     Text File bucket-from-UI.txt    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Unless I'm not using the API correctly, there seems to be some holes in the Couchbase API – particularly with autoCompaction.

The autoCompaction parameter can be set via the UI (as long as the bucketType is couch base).

See the following attachments:
1) bucket-from-UI.png
2) bucket-from-UI.txt

And compare with creating the bucket (with autoCompaction) through the REST API:
1) bucket-from-API-attempt1.txt
    - Reference: http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#creating-and-editing-buckets
2) bucket-from-API-attempt2.txt
    - Reference: http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-admin-rest-auto-compaction
3) bucket-from-API-attempt3.txt
    - Setting autoCompaction globally
    - Reference: http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-admin-rest-auto-compaction

In all cases, autoCompactionSettings is still false.


 Comments   
Comment by Anil Kumar [ 19/Jun/14 ]
Triage - June 19 2014 Alk, parag, Anil
Comment by Aleksey Kondratenko [ 19/Jun/14 ]
It works, just apparently not properly documented:

# curl -u Administrator:asdasd -d name=other -d bucketType=couchbase -d ramQuotaMB=100 -d authType=sasl -d replicaNumber=1 -d replicaIndex=0 -d parallelDBAndViewCompaction=true -d purgeInterval=1 -d 'viewFragmentationThreshold[percentage]'=30 -d autoCompactionDefined=1 http://lh:9000/pools/default/buckets

And general hint is that you can see what browser is POSTing when it creates bucket or does anything else to figure out working (but not necessarily publicly supported) way of doing things.
Comment by Anil Kumar [ 19/Jun/14 ]
Ruth - Above documentation references needs to be fixed with correct REST API.




[MB-9358] while running concurrent queries(3-5 queries) getting 'Bucket X not found.' error from time to time Created: 16/Oct/13  Updated: 18/Jun/14  Due: 23/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 64 bit

Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
one thread gives correct result:
[root@localhost tuqtng]# curl 'http://10.3.121.120:8093/query?q=SELECT+META%28%29.cas+as+cas+FROM+bucket2&#39;
{
    "resultset": [
        {
            "cas": 4.956322522514292e+15
        },
        {
            "cas": 4.956322525999292e+15
        },
        {
            "cas": 4.956322554862292e+15
        },
        {
            "cas": 4.956322832498292e+15
        },
        {
            "cas": 4.956322835757292e+15
        },
        {
            "cas": 4.956322838836292e+15
...

    ],
    "info": [
        {
            "caller": "http_response:152",
            "code": 100,
            "key": "total_rows",
            "message": "0"
        },
        {
            "caller": "http_response:154",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "405.41885ms"
        }
    ]
}

but in another I see
{
    "error":
        {
            "caller": "view_index:195",
            "code": 5000,
            "key": "Internal Error",
            "message": "Bucket bucket2 not found."
        }
}

cbcollect will be attached

 Comments   
Comment by Marty Schoch [ 16/Oct/13 ]
This is a duplicate, though I can't yet find the original.

We believe under higher load the view queries timeout, which we report as bucket not found (may not be possible to distinguish).
Comment by Iryna Mironava [ 16/Oct/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-9358/447a45ae/10.3.121.120-10162013-858-diag.zip
Comment by Ketaki Gangal [ 17/Oct/13 ]
Seeing these errors and frequent tuq-server crashes on concurrent queries during typical server operations like
- w/ Failovers
- w/ Backups
- w/ Indexing.

Similar server ops for single queries however seem to run okay.

Note: This is a very small number of concurrent queries ( 3-5), typically users may have higher level of concurrency if used at an Application level.




[MB-9145] Add option to download the manual in pdf format (as before) Created: 17/Sep/13  Updated: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 2.0, 2.1.0, 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Anil Kumar Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
On the documentation site there is no option to download the manual in pdf format as before. We need to add this option back.

 Comments   
Comment by Maria McDuff (Inactive) [ 18/Sep/13 ]
need for 2.2.1 bug fix release.




[MB-8838] Security Improvement - Connectors to implement security improvements Created: 14/Aug/13  Updated: 19/May/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Anil Kumar Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: security
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Security Improvement - Connectors to implement security improvements

Spec ToDo.




[MB-9415] auto-failover in seconds - (reduced from minimum 30 seconds) Created: 21/May/12  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.0, 1.8.1, 2.0, 2.0.1, 2.2.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Dipti Borkar Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 2
Labels: customer, ns_server-story
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-9416 Make auto-failover near immediate whe... Technical task Open Aleksey Kondratenko  

 Description   
including no false positives

http://www.pivotaltracker.com/story/show/25006101

 Comments   
Comment by Aleksey Kondratenko [ 25/Oct/13 ]
At the very least it requires getting our timeout-ful cases under control. So at least splitting couchdb into separate VM is a requirement for this. But not necessarily enough.
Comment by Aleksey Kondratenko [ 25/Oct/13 ]
Still seeing misunderstanding on this one.

So we have _different_ problem that even manual failover (let alone automatic) cannot succeed quickly if master node fails. It can easily take up to 2 minutes because of our use of erlang "global" facility than requires us to detect that node is dead and erlang is tuned to detect that within 2 minutes.

Now _this_ problem is lowering autofailover detection to 10 seconds. We can blindly make it happen today. But it will not be usable because of all sorts of timeouts happening in cluster management layer. We have some significant proportion of CBSEs _today_ about false positive autofailovers even with 30 seconds threshold. Clearly lowering it to 10 will only make it worse. Therefore my point above. We have to get those timeouts under control so that heartbeats are sent/received timely. Or whatever else we use to detect node being unresponsive.

I would like to note however that especially in some virtualized environments (arguably, oversubscribed) we saw as high as low tens of seconds delays from virtualization _alone_. Given relatively high cost of failover in our software I'd like to point out that people could too easily abuse that feature.

High cost of failover is refered to above is this:

* you almost certainly and irrecoverably lose some recent mutations. _At least_ recent mutations. I.e. if replication is really working well. In node that's on the edge of autofailover you can imagine replication not being "diamond-hard quick". That's cost 1.

* in order to return node back to cluster (say node crashed and needed some time to recover, whatever it might mean) you need rebalance. That type of rebalance is relatively quick by design; i.e. it only moves data back to this node and nothing else. But it's still rebalance. with upr we can possibly make it better. I.e. because its failover log is capable of rewinding just conflicting mutations.

What I'm trying to say in "our approach appears to have relatively high price for failover" is that it appears inherent issue for strongly consistent system. I'm trying to say that in many cases it might be actually better to wait up to few minutes for node to recover and restore it's availability than failing it over and paying price of restoring cluster capacility (with rebalancing this node back or it's replacement, which is irrelevant). If somebody wants stronger availability then some other approaches which particularly can "reconcile" changes from both failed over node and it's replacement node look like fundamentally better choice _for this requirements_.




[MB-4030] enable traffic for for ready nodes even if not all nodes are up/healthy/ready (aka partial janitor) (was: After two nodes crashed, curr_items remained 0 after warmup for extended period of time) Created: 06/Jul/11  Updated: 20/May/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.1, 2.0, 2.0.1, 2.2.0, 2.1.1, 2.5.1
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: ns_server-story, supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
we had two nodes crash at a customer, possibly related to a disk space issue, but I don't think so.

After they crashed, the nodes warmed up relatively quickly, but immediately "discarded" their items. I say that because I see that they warmed up ~10m items, but the current item counts were both 0.

I tried shutting down the service and had to kill memcached manually (kill -9). Restarting it went through the same process of warming up and then nothing.

While I was looking around, I left it sit for a little while and magically all of the items came back. I seem to recall this bug previously where a node wouldn't be told to be active until all the nodes in the cluster were active...and it got into trouble when not all of the nodes restarted.

Diags for all nodes will be attached

 Comments   
Comment by Perry Krug [ 06/Jul/11 ]
Full set of logs at \\corp-fs1\export_support_cases\bug_4030
Comment by Aleksey Kondratenko [ 20/Mar/12 ]
It _is_ ns_server issue caused by janitor needing all nodes to be up for vbuckets activation. We planned fix for 1.8.1 (now 1.8.2)
Comment by Aleksey Kondratenko [ 20/Mar/12 ]
Fix would land as part of fast warmup integration
Comment by Perry Krug [ 18/Jul/12 ]
Peter, can we get a second look at this one? We've seen this before, and the problem is that the janitor did not run until all nodes had joined the cluster and warmed up. I'm not sure we've fixed that already...
Comment by Aleksey Kondratenko [ 18/Jul/12 ]
Latest 2.0 will mark nodes as green and enable memcached traffic when all of them are up. So easy part is done.

Partial janitor (i.e. enabling traffic for some nodes when others are still down/warming up) is something that will unlikely be done soon
Comment by Perry Krug [ 18/Jul/12 ]
Thanks Alk...what's the difference in behavior (in this area) between 1.x and 2.0? It "sounds" like they're the same, no?

And this bug should still remain open until we fix the primary issue which is the partial janitor...correct?
Comment by Aleksey Kondratenko [ 18/Jul/12 ]
1.8.1 will show node as green when ep-engine thinks it's warmed up. But confusingly it'll not be really ready. All vbuckets will be in state dead and curr_items will be 0.

2.0 fixes this confusion. Node is marked green when it's actually warmed up from user's perspective. I.e. right vbucket states are set and it'll serve clients traffic.

2.0 is still very conservative about only making vbucket state changes when all nodes are up and warmed up. Thats "impartial" janitor. Whether it's a bug or "lack of feature" is debatable. But I think main concern that users are confused by green-ness of nodes is resolved.
Comment by Aleksey Kondratenko [ 18/Jul/12 ]
Closing as fixed. We'll get to partial janitor some day in future which is feature we lack today, not bug we have IMHO
Comment by Perry Krug [ 12/Nov/12 ]
Reopening this for the need for partial janitor. Recent customer had multiple nodes need to be hard-booted and none returned to service until all were warmed up
Comment by Steve Yen [ 12/Nov/12 ]
bug-scrub: moving out of 2.0, as this looks like a feature req.
Comment by Farshid Ghods (Inactive) [ 13/Nov/12 ]
in system testing we have noticed many times that if multiple nodes crash until all nodes are warmed up node status for those that are already warmed up appears as yellow.


user won't be able to understand which node has successfully warmed up from the console and if one node is actually not recovering or not warm up in a reasonable time they have to figure it out some other way ( cbstats ... )

another issue with this is that user won't be able to perform a fail over for 1 node even though N-1 nodes has warmed up already.

i am not sure if fixing this bug will impact cluster-restore functionality but something important to fix or suggest a workaround to the user ( by workaround i mean a documented , tested and supported set of commands )
Comment by Mike Wiederhold [ 17/Mar/13 ]
Comments say this is an ns_server issue so I am removing couchbase-bucket from affected components. Please re-add if there is a couchbase-bucket task for this issue.
Comment by Aleksey Kondratenko [ 23/Feb/14 ]
Not going to happen for 3.0.




[MB-10838] cbq-engine must work without all_docs Created: 11/Apr/14  Updated: 29/Jun/14  Due: 07/Jul/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: tried builds 3.0.0-555 and 3.0.0-554

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
WORKAROUND: Run "CREATE PRIMARY INDEX ON <bucket>" once per bucket, when using 3.0 server

SYMPTOM: tuq returns Bucket default not found.', u'caller': u'view_index:200 for all queries

single node cluster, 2 buckets(default and standard)
run simple query
q=FROM+default+SELECT+name%2C+email+ORDER+BY+name%2Cemail+ASC

got {u'code': 5000, u'message': u'Bucket default not found.', u'caller': u'view_index:200', u'key': u'Internal Error'}
tuq displays
[root@grape-001 tuqtng]# ./tuqtng -couchbase http://localhost:8091
22:36:07.549322 Info line disabled false
22:36:07.554713 tuqtng started...
22:36:07.554856 version: 0.0.0
22:36:07.554942 site: http://localhost:8091
22:47:06.915183 ERROR: Unable to access view - cause: error executing view req at http://127.0.0.1:8092/default/_all_docs?limit=1001: 500 Internal Server Error - {"error":"noproc","reason":"{gen_server,call,[undefined,bytes,infinity]}"}
 -- couchbase.(*viewIndex).ScanRange() at view_index.go:186


 Comments   
Comment by Sriram Melkote [ 11/Apr/14 ]
Iryna, can you please add cbcollectinfo or at least the couchdb logs?

Also, all CBQ DP4 testing must be done against 2.5.x server, please confirm it is the case in this bug.
Comment by Iryna Mironava [ 22/Apr/14 ]
cbcollect
https://s3.amazonaws.com/bugdb/jira/MB-10838/9c1cf39c/172.27.33.17-4222014-111-diag.zip

bug is valid only for 3.0. 2.5.x versions are working fine
Comment by Sriram Melkote [ 22/Apr/14 ]
Gerald, we need to update query code to not use _all_docs for 3.0

Iryna, workaround is to run "CREATE PRIMARY INDEX ON <bucket>" first before running any queries when using 3.0 server
Comment by Sriram Melkote [ 22/Apr/14 ]
Reducing severity with workaround. Please ping me if that doesn't work
Comment by Iryna Mironava [ 22/Apr/14 ]
works with workaround
Comment by Gerald Sangudi [ 22/Apr/14 ]
Manik,

Please modify the tuqtng / DP3 Couchbase catalog to return an error telling the user to CREATE PRIMARY INDEX. This should only happen with 3.0 server. For 2.5.1 or below, #all_docs should still work.

Thanks.




[MB-11736] add client SSL to 3.0 beta documentation Created: 15/Jul/14  Updated: 15/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0-Beta
Fix Version/s: 3.0-Beta
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Matt Ingenthron Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This is mostly a curation exercise. Add to the server 3.0 beta docs the configuration information for each of the following clients:
- Java
- .NET
- PHP
- Node.js
- C/C++

No other SDKs support SSL at the moment.

This is either in work-in-progress documentation or in the blogs from the various DPs. Please check in with the component owner if you can't find what you need.




[MB-10180] Server Quota: Inconsistency between documentation and CB behaviour Created: 11/Feb/14  Updated: 21/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Dave Rigby Assignee: Ruth Harris
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File MB-10180_max_quota.png    
Issue Links:
Relates to
relates to MB-2762 Default node quota is still too high Resolved
relates to MB-8832 Allow for some back-end setting to ov... Open
Triage: Untriaged
Operating System: Ubuntu 64-bit
Is this a Regression?: Yes

 Description   
In the documentation for the product (and general sizing advice) we tell people to allocate no more than 80% of their memory for the Server Quota, to leave headroom for the views, disk write queues and general OS usage.

However on larger[1] nodes we don't appear to enforce this, and instead allow people to allocate up to 1GB less than the total RAM.

This is inconsistent, as we document and tell people one thing and let them do another.

This appears to be something inherited from MB-2762, which the intent of which appeared to only allow the relaxing of this when joining a cluster, however this doesn't appear to be how it works - I can successfully change the existing cluster quota from the CLI to a "large" value:

    $ /opt/couchbase/bin/couchbase-cli cluster-edit -c localhost:8091 -u Administrator -p dynam1te --cluster-ramsize=127872
    ERROR: unable to init localhost (400) Bad Request
    [u'The RAM Quota value is too large. Quota must be between 256 MB and 127871 MB (memory size minus 1024 MB).']

While I can see some logic to relax the 80% constraint on big machines, with the advent of 2.X features 1024MB seems far too small an amount of headroom.

Suggestions to resolve:

A) Revert to a straightforward 80% max, with a --force option or similar to allow specific customers to go higher if they know what they are doing
B) Leave current behaviour, but document it.
B) Increase minimum headroom to something more reasonable for 2.X, *and* document the beaviour.

([1] On a machine with 128,895MB of RAM I get the "total-1024" behaviour, on a 1GB VM I get 80%. I didn't check in the code what the cutoff for 80% / total-1024 is).


 Comments   
Comment by Dave Rigby [ 11/Feb/14 ]
Screenshot of initial cluster config: maximum quota is total_RAM-1024
Comment by Aleksey Kondratenko [ 11/Feb/14 ]
Do not agree with that logic.

There's IMHO quite a bit of difference between default settings, recommended settings limit and allowed settings limit. The later can be wider for folks who really know what they're doing.
Comment by Aleksey Kondratenko [ 11/Feb/14 ]
Passed to Anil, because that's not my decision to change limits
Comment by Dave Rigby [ 11/Feb/14 ]
@Aleksey: I'm happy to resolve as something other than my (A,B,C), but the problem here is that many people haven't even been aware of this "extended" limit in the system - and moreover on a large system we actually advertise it in the GUI when specifying the allowed limit (see attached screenshot).

Furthermore, I *suspect* that this was originally only intended for upgrades for 1.6.X (see http://review.membase.org/#/c/4051/), but somehow is now being permitted for new clusters.

Ultimately I don't mind what our actual max quota value is, but the app behaviour should be consistent with the documentation (and the sizing advice we give people).
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
raising to product blocker.
this inconsistency has to be resolved - PM to re-align.
Comment by Anil Kumar [ 28/May/14 ]
Going with option B - Leave current behaviour, but document it.
Comment by Ruth Harris [ 17/Jul/14 ]
I only see the 80% number coming up as an example of setting the high water mark (85% suggested). The Server Quota section doesn't mention anything. The working set managment & ejection section(s) and item pager sub-section also mention high water mark.

Can you be more specific about where this information is? Anyway, the best solution is to add a "note" in the applicable section(s).

--ruth

Comment by Dave Rigby [ 21/Jul/14 ]
@Ruth: So the current product behaviour is that the ServerQuota limit depends on the maximum memory available:

* For machines with <= X MB of memory, the maximum server quota is 80% of total physical memory
* For machines with > X MB of memory, the maximum Server Quota is Total Physical Memory - 1024.

The value of 'X' is fixed in the code, but it wasn't obvious what it's actually is (it's derived from a few different things. I suggest you ask Alk who should be able to provide the value of it.




[MB-12043] cbq crash after trying to delete a key Created: 21/Aug/14  Updated: 21/Aug/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
cbq> delete from my_bucket KEYS ['query-testa7480c4-0'];
PANIC: Expected plan.Operator instead of <nil>..




[MB-9632] diag / master events captured in log file Created: 22/Nov/13  Updated: 27/Aug/14

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.2.0, 2.5.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Blocker
Reporter: Steve Yen Assignee: Ravi Mayuram
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The information available in the diag / master events REST stream should be captured in a log (ALE?) file and hence available to cbcollect-info's and later analysis tools.

 Comments   
Comment by Aleksey Kondratenko [ 22/Nov/13 ]
It is already available in collectinfo
Comment by Dustin Sallings (Inactive) [ 26/Nov/13 ]
If it's only available in collectinfo, then it's not available at all. We lose most of the useful information if we don't run an http client to capture it continually throughout the entire course of a test.
Comment by Aleksey Kondratenko [ 26/Nov/13 ]
Feel free to submit a patch with exact behavior you need
Comment by Cihan Biyikoglu [ 27/Aug/14 ]
is this still relevant?




[MB-10214] Mac version update check is incorrectly identifying newest version Created: 14/Feb/14  Updated: 28/Aug/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.1, 2.2.0, 2.1.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: David Haikney Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified
Environment: Mac OS X

Attachments: PNG File upgrade_check.png    
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-12051 Update the Release_Server job on Jenk... Technical task Open Chris Hillery  
Is this a Regression?: Yes

 Description   
Running 2.1.1 version of couchbase on a Mac, "check for latest version" reports the latest version is already running (e.g. see attached screenshot)


 Comments   
Comment by Aleksey Kondratenko [ 14/Feb/14 ]
Definitely not ui bug. It's using phone home to find out about upgrades. And I have no idea who owns that now.
Comment by Steve Yen [ 12/Jun/14 ]
got an email from ravi to look into this
Comment by Steve Yen [ 12/Jun/14 ]
Not sure if this is correct analysis, but I did a quick scan of what I think is the mac installer, which I think is...

  https://github.com/couchbase/couchdbx-app

It gets its version string by running a "git describe", in the Makefile here...

  https://github.com/couchbase/couchdbx-app/blob/master/Makefile#L1

Currently, a "git describe" on master branch returns...

  $ git describe
  2.1.1r-35-gf6646fa

...which is *kinda* close to the reported version string in the screenshot ("2.1.1-764-rel").

So, I'm thinking one fix needed would be a tagging (e.g., "git tag -a FOO -m FOO") of the couchdbx-app repository.

So, reassigning to Phil to do that appropriately.

Also, it looks like the our mac installer is using an open-source packaging / installer / runtime library called "sparkle" (which might be a little under-maintained -- not sure).

  https://github.com/andymatuschak/Sparkle/wiki

The sparkle library seems to check for version updates by looking at the URL here...

  https://github.com/couchbase/couchdbx-app/blob/master/cb.plist.tmpl#L42

Which seems to either be...

  http://appcast.couchbase.com/membasex.xml

Or, perhaps...

  http://appcast.couchbase.com/couchbasex.xml

The appcast.couchbase.com appears to be actually an S3 bucket, off of our production couchbase AWS account. So those *.xml files need to be updated, as they currently have content that has older versions. For example, http://appcast.couchbase.com/couchbase.xml looks currently like...

    <rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sparkle="http://www.andymatuschak.org/xml-namespaces/sparkle" version="2.0">
    <channel>
    <title>Updates for Couchbase Server</title>
    <link>http://appcast.couchbase.com/couchbase.xml&lt;/link>
    <description>Recent changes to Couchbase Server.</description>
    <language>en</language>
    <item>
    <title>Version 1.8.0</title>
    <sparkle:releaseNotesLink>
    http://www.couchbase.org/wiki/display/membase/Couchbase+Server+1.8.0
    </sparkle:releaseNotesLink>
    <!-- date -u +"%a, %d %b %Y %H:%M:%S GMT" -->
    <pubDate>Fri, 06 Jan 2012 16:11:17 GMT</pubDate>
    <enclosure url="http://packages.couchbase.com/1.8.0/Couchbase-Server-Community.dmg" sparkle:version="1.8.0" sparkle:dsaSignature="MCwCFAK8uknVT3WOjPw/3LkQpLBadi2EAhQxivxe2yj6EU6hBlg9YK/5WfPa5Q==" length="33085691" type="application/octet-stream"/>
    </item>
    </channel>
    </rss>

Not updating the xml files, though, probably causes no harm. Just that our osx users won't be pushed news on updates.
Comment by Phil Labee [ 12/Jun/14 ]
This has nothing to do with "git describe". There should be no place in the product that "git describe" should be used to determine version info. See:

    http://hub.internal.couchbase.com/confluence/display/CR/Branching+and+Tagging

so there's definitely a bug in the Makefile.

The version update check seems to be out of date. The phone-home file is generated during:

    http://factory.hq.couchbase.com:8080/job/Product_Staging_Server/

but the process of uploading it is not automated.
Comment by Steve Yen [ 12/Jun/14 ]
Thanks for the links.

> This has nothing to do with "git describe".

My read of the Makefile makes me think, instead, that "git describe" is the default behavior unless it's overridden by the invoker of the make.

> There should be no place in the product that "git describe" should be used to determine version info. See:
> http://hub.internal.couchbase.com/confluence/display/CR/Branching+and+Tagging

It appears all this couchdbx-app / sparkle stuff predates that wiki page by a few years, so I guess it's inherited legacy.

Perhaps voltron / buildbot are not setting the PRODUCT_VERSION correctly before invoking the the couchdbx-app make, which makes the Makefile default to 'git describe'?

    commit 85710d16b1c52497d9f12e424a22f3efaeed61e4
    Date: Mon Jun 4 14:38:58 2012 -0700

    Apply correct product version number
    
    Get version number from $PRODUCT_VERSION if it's set.
    (Buildbot and/or voltron will set this.)
    If not set, default to `git describe` as before.
    
> The version update check seems to be out of date.

Yes, that's right. The appcast files are out of date.

> The phone-home file is generated during:
> http://factory.hq.couchbase.com:8080/job/Product_Staging_Server/

I think appcast files for OSX / sparkle are a _different_ mechanism than the phone-home file, and an appcast XML file does not appear to be generated/updated by the Product_Staging_Server job.

But, I'm not an expert or really qualified on the details here -- this is just my opinions from a quick code scan, not from actually doing/knowing.

Comment by Wayne Siu [ 01/Aug/14 ]
Per PM (Anil), we should get this fixed by 3.0 RC1.
Raising the priority to Critical.
Comment by Wayne Siu [ 07/Aug/14 ]
Phil,
Please provide update.
Comment by Anil Kumar [ 12/Aug/14 ]
Triage - Upgrading to 3.0 Blocker

Comment by Wayne Siu [ 20/Aug/14 ]
Looks like we may have a short term "fix" for this ticket which Ceej and I have tested.
@Ceej, can you put in the details here?
Comment by Chris Hillery [ 20/Aug/14 ]
The file is hosted in S3, and we proved tonight that overwriting that file (membasex.xml) with a version containing updated version information and download URLs works as expected. We updated it to point to 2.2 for now, since that is the latest version with a freely-available download URL.

We can update the Release_Server job on Jenkins to create an updated version of this XML file from a template, and upload it to S3.

Assigning back to Wayne for a quick question: Do we support Enterprise edition for MacOS? If we do, then this solution won't be sufficient without more effort, because the two editions will need different Sparkle configurations for updates. Also, Enterprise edition won't be able to directly download the newer release, unless we provide a "hidden" URL for that (the download link on the website goes to a form).




Mac version update check is incorrectly identifying newest version (MB-10214)

[MB-12051] Update the Release_Server job on Jenkins to include updating the file (membasex.xml) and the download URL Created: 22/Aug/14  Updated: 22/Aug/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.1, 2.2.0, 2.1.1, 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Technical task Priority: Blocker
Reporter: Wayne Siu Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We can update the Release_Server job on Jenkins to create an updated version of this XML file from a template, and upload it to S3.




[MB-11623] test for performance regressions with JSON detection Created: 02/Jul/14  Updated: 19/Aug/14

Status: In Progress
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Blocker
Reporter: Matt Ingenthron Assignee: Thomas Anderson
Resolution: Unresolved Votes: 0
Labels: performance, releasenote
Remaining Estimate: 0h
Time Spent: 120h
Original Estimate: Not Specified

Attachments: File JSONDoctPerfTest140728.rtf     File JSONPerfTestV3.uos    
Issue Links:
Relates to
relates to MB-11675 20-30% performance degradation on app... Closed

 Description   
Related to one of the changes in 3.0, we need to test what has been implemented to see if a performance regression or unexpected resource utilization has been introduced.

In 2.x, all JSON detection was handled at the time of persistence. Since persistence was done in batch and in background, with the then current document, it would limit the resource utilization of any JSON detection.

Starting in 3.x, with the datatype/HELLO changes introduced (and currently disabled), the JSON detection has moved to both memcached and ep-engine, depending on the type of mutation.

Just to paint the reason this is a concern, here's a possible scenario.

Imagine a cluster node that is happily accepting 100,000 sets/s for a given small JSON document, and it accounts for about 20mbit of the network (small enough to not notice). That node has a fast SSD at about 8k IOPS. That means that we'd only be doing JSON detection some 5000 times per second with Couchbase Server 2.x

With the changes already integrated, that JSON detection may be tried over 100k times/s. That's a 20x increase. The detection needs to occur somewhere other than on the persistence path, as the contract between DCP and view engine is such that the JSON detection needs to occur before DCP transfer.

This request is to test/assess if there is a performance change and/or any unexpected resource utilization when having fast mutating JSON documents.

I'll leave it to the team to decide what the right test is, but here's what I might suggest.

With a view defined create a test that has a small to moderate load at steady state and one fast-changing item. Test it with a set of sizes and different complexity. For instance, permutations that might be something like this:
non-JSON of 1k, 8k, 32k, 128k
simple JSON of 1k, 8k, 32k, 128k
complex JSON of 1k, 8k, 32k, 128k
metrics to gather:
throughput, CPU utilization by process, RSS by process, memory allocation requests by process (or minor faults or something)

Hopefully we won't see anything to be concerned with, but it is possible.

There are options to move JSON detection to somewhere later in processing (i.e., before DCP transfer) or other optimization thoughts if there is an issue.

 Comments   
Comment by Cihan Biyikoglu [ 07/Jul/14 ]
this is no longer needed for 3.0 is that right? ready to postpone to 3.0.1?
Comment by Pavel Paulau [ 07/Jul/14 ]
HELLO-based negotiation was disabled but detection still happens in ep-engine.
We need to understand impact before 3.0 release. Sooner than later.
Comment by Matt Ingenthron [ 23/Jul/14 ]
I'm curious Thomas, when you say "increase in bytes appended", do you mean for the same workload the RSS is larger in the 'increase' case? Great to see you making progress.
Comment by Wayne Siu [ 24/Jul/14 ]
Pasted comment from Thomas:
Subject: Re: Couchbase Issues: (MB-11623) test for performance regressions with JSON detection
Yes, ~20% increase from 2.5.1 to 3.0 for same load generator. as reported by the cb server for same input load. I’m verifying and ‘isolating’ . Will also be looking at if/how this contributes to replication load increase (20% on 20% increase …)
The issues seem related. Same increase for 1K, 8K, 16K and 32K with some variance.
—thomas
Comment by Thomas Anderson [ 29/Jul/14 ]
initial results using JSON document load test.
Comment by Matt Ingenthron [ 29/Jul/14 ]
Tom: saw your notes in the work log, out of curiosity, what was deferred to 3.0.1? Also, from the comment above, 20% increase in what?
Comment by Anil Kumar [ 13/Aug/14 ]
Thomas - As discussed please update the ticket with % or regression it has caused with JSON detection now in memcached. I will open separate ticket to document it.
Comment by Thomas Anderson [ 19/Aug/14 ]
a comparison of non-JSON to JSON in 2.5.1 and 3.0.0.1105 showed statistically similar performance, i.e., the minimal overhead of handling JSON document over similar KV document stayed consistent from 2.5.1 to 3.0.0 pre-RC1. see attached file JSONPerfTestV3.uos. to be re-run with official RC1 candidate. feature to load complex JSON documents now modified to 4 levels of JSON complexity (for each document size in bytes) {simpleJSON:: 1 element-attribute value pair; smallJSON:: 10 elements - no array, no nesting; mediumJSON:: 100 elements - arrays & nesting; largeJSON:: 10000 elements mix of element types}.

note, the original seed to this issue was a detected performance issue with JSON documents, ~20-30%. the code/architectural change which caused this was deferred to 3.0.1. additional modifications to server to address simple append mode performance degradation, further lessened issue of whether the document type was the cause of performance degradation. the tests did however show the positive change in compaction, i.e., 3.x compacts documents ~ 5-7% over 2.5.1

 
Comment by Thomas Anderson [ 19/Aug/14 ]
re-run with build 1105. regression comparing same document size, same document load for non-JSON to simple-JSON.
2.5.1:: 1024 byte document, 10 loaders, 1.25M documents for nonJSON to JSON showed a < 4% performance degredation; 3.0:: shows a < 3% degredation. many other factors seem to dominate
Comment by Matt Ingenthron [ 19/Aug/14 ]
Just for the comments here, the original seed wasn't an observed performance regression but rather an architectural concern that there could be a space/CPU/throughput cost for the new JSON detection. That's why I opened it.




[MB-10440] something isn't right with tcmalloc in build 1074 on at least rhel6 causing memcached to crash Created: 11/Mar/14  Updated: 04/Aug/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Aleksey Kondratenko Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Relates to
relates to MB-10371 tcmalloc must be compiled with -DTCMA... Resolved
relates to MB-10439 Upgrade:: 2.5.0-1059 to 2.5.1-1074 =>... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
SUBJ.

Just installing latest 2.5.1 build on rhel6 and creating bucket caused segmentation fault (see also MB-10439).

When replacing tcmalloc with a copy I've built it works.

Cannot be 100% sure it's tcmalloc but crash looks too easily reproducible to be something else.


 Comments   
Comment by Wayne Siu [ 12/Mar/14 ]
Phil,
Can you review if this change has been (copied from MB-10371) applied properly?

voltron (2.5.1) commit: 73125ad66996d34e94f0f1e5892391a633c34d3f

    http://review.couchbase.org/#/c/34344/

passes "CPPFLAGS=-DTCMALLOC_SMALL_BUT_SLOW" to each gprertools configure command
Comment by Andrei Baranouski [ 12/Mar/14 ]
see the same issue on centos 64
Comment by Phil Labee [ 12/Mar/14 ]
need more info:

1. What package did you install?

2. How did you build the tcmalloc which fixes the problem?
 
Comment by Aleksey Kondratenko [ 12/Mar/14 ]
build 1740. Rhel6 package.

You can see yourself. It's easily reproducible as Andrei also confirmed too.

I've got 2.1 tar.gz from googlecode. And then did ./configure --prefix=/opt/couchbase --enable-minimal CPPFLAGS='-DTCMALLOC_SMALL_BUT_SLOW' and then make and make install. After that it works. Have no idea why.

Do you know exact CFLAGS and CXXFLAGS that are used to build our tcmalloc ? Those variables are likely set in voltron (or even from outside of voltron) and might affect optimization and therefore expose some bugs.

Comment by Aleksey Kondratenko [ 12/Mar/14 ]
And 64 bit.
Comment by Phil Labee [ 12/Mar/14 ]
We build out of:

    https://github.com/couchbase/gperftools

and for 2.5.1 use commit:

    674fcd94a8a0a3595f64e13762ba3a6529e09926

compile using:

(cd /home/buildbot/buildbot_slave/centos-6-x86-251-builder/build/build/gperftools \
&& ./autogen.sh \
        && ./configure --prefix=/opt/couchbase CPPFLAGS=-DTCMALLOC_SMALL_BUT_SLOW --enable-minimal \
        && make \
        && make install-exec-am install-data-am)
Comment by Aleksey Kondratenko [ 12/Mar/14 ]
That part I know. What I don't know is what cflags are being used.
Comment by Phil Labee [ 13/Mar/14 ]
from the 2.5.1 centos-6-x86 build log:

http://builds.hq.northscale.net:8010/builders/centos-6-x86-251-builder/builds/18/steps/couchbase-server%20make%20enterprise%20/logs/stdio

make[1]: Entering directory `/home/buildbot/buildbot_slave/centos-6-x86-251-builder/build/build/gperftools'

/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I./src -I./src -DNO_TCMALLOC_SAMPLES -DTCMALLOC_SMALL_BUT_SLOW -DNO_TCMALLOC_SAMPLES -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -fno-builtin-malloc -fno-builtin-free -fno-builtin-realloc -fno-builtin-calloc -fno-builtin-cfree -fno-builtin-memalign -fno-builtin-posix_memalign -fno-builtin-valloc -fno-builtin-pvalloc -mmmx -fno-omit-frame-pointer -Wno-unused-result -march=i686 -mno-tls-direct-seg-refs -O3 -ggdb3 -MT libtcmalloc_minimal_la-tcmalloc.lo -MD -MP -MF .deps/libtcmalloc_minimal_la-tcmalloc.Tpo -c -o libtcmalloc_minimal_la-tcmalloc.lo `test -f 'src/tcmalloc.cc' || echo './'`src/tcmalloc.cc

libtool: compile: g++ -DHAVE_CONFIG_H -I. -I./src -I./src -DNO_TCMALLOC_SAMPLES -DTCMALLOC_SMALL_BUT_SLOW -DNO_TCMALLOC_SAMPLES -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -fno-builtin-malloc -fno-builtin-free -fno-builtin-realloc -fno-builtin-calloc -fno-builtin-cfree -fno-builtin-memalign -fno-builtin-posix_memalign -fno-builtin-valloc -fno-builtin-pvalloc -mmmx -fno-omit-frame-pointer -Wno-unused-result -march=i686 -mno-tls-direct-seg-refs -O3 -ggdb3 -MT libtcmalloc_minimal_la-tcmalloc.lo -MD -MP -MF .deps/libtcmalloc_minimal_la-tcmalloc.Tpo -c src/tcmalloc.cc -fPIC -DPIC -o .libs/libtcmalloc_minimal_la-tcmalloc.o

libtool: compile: g++ -DHAVE_CONFIG_H -I. -I./src -I./src -DNO_TCMALLOC_SAMPLES -DTCMALLOC_SMALL_BUT_SLOW -DNO_TCMALLOC_SAMPLES -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -fno-builtin-malloc -fno-builtin-free -fno-builtin-realloc -fno-builtin-calloc -fno-builtin-cfree -fno-builtin-memalign -fno-builtin-posix_memalign -fno-builtin-valloc -fno-builtin-pvalloc -mmmx -fno-omit-frame-pointer -Wno-unused-result -march=i686 -mno-tls-direct-seg-refs -O3 -ggdb3 -MT libtcmalloc_minimal_la-tcmalloc.lo -MD -MP -MF .deps/libtcmalloc_minimal_la-tcmalloc.Tpo -c src/tcmalloc.cc -o libtcmalloc_minimal_la-tcmalloc.o
Comment by Phil Labee [ 13/Mar/14 ]
from a 2.5.1 centos-6-x64 build log:

http://builds.hq.northscale.net:8010/builders/centos-6-x64-251-builder/builds/16/steps/couchbase-server%20make%20enterprise%20/logs/stdio

/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I./src -I./src -DNO_TCMALLOC_SAMPLES -DTCMALLOC_SMALL_BUT_SLOW -DNO_TCMALLOC_SAMPLES -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -fno-builtin-malloc -fno-builtin-free -fno-builtin-realloc -fno-builtin-calloc -fno-builtin-cfree -fno-builtin-memalign -fno-builtin-posix_memalign -fno-builtin-valloc -fno-builtin-pvalloc -Wno-unused-result -DNO_FRAME_POINTER -O3 -ggdb3 -MT libtcmalloc_minimal_la-tcmalloc.lo -MD -MP -MF .deps/libtcmalloc_minimal_la-tcmalloc.Tpo -c -o libtcmalloc_minimal_la-tcmalloc.lo `test -f 'src/tcmalloc.cc' || echo './'`src/tcmalloc.cc

libtool: compile: g++ -DHAVE_CONFIG_H -I. -I./src -I./src -DNO_TCMALLOC_SAMPLES -DTCMALLOC_SMALL_BUT_SLOW -DNO_TCMALLOC_SAMPLES -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -fno-builtin-malloc -fno-builtin-free -fno-builtin-realloc -fno-builtin-calloc -fno-builtin-cfree -fno-builtin-memalign -fno-builtin-posix_memalign -fno-builtin-valloc -fno-builtin-pvalloc -Wno-unused-result -DNO_FRAME_POINTER -O3 -ggdb3 -MT libtcmalloc_minimal_la-tcmalloc.lo -MD -MP -MF .deps/libtcmalloc_minimal_la-tcmalloc.Tpo -c src/tcmalloc.cc -fPIC -DPIC -o .libs/libtcmalloc_minimal_la-tcmalloc.o

libtool: compile: g++ -DHAVE_CONFIG_H -I. -I./src -I./src -DNO_TCMALLOC_SAMPLES -DTCMALLOC_SMALL_BUT_SLOW -DNO_TCMALLOC_SAMPLES -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -fno-builtin-malloc -fno-builtin-free -fno-builtin-realloc -fno-builtin-calloc -fno-builtin-cfree -fno-builtin-memalign -fno-builtin-posix_memalign -fno-builtin-valloc -fno-builtin-pvalloc -Wno-unused-result -DNO_FRAME_POINTER -O3 -ggdb3 -MT libtcmalloc_minimal_la-tcmalloc.lo -MD -MP -MF .deps/libtcmalloc_minimal_la-tcmalloc.Tpo -c src/tcmalloc.cc -o libtcmalloc_minimal_la-tcmalloc.o
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
Ok. I'll try to exclude -O3 as possible reason of failure later today (in which case it might be upstream bug). In the meantime I suggest you to try lowering optimization to -O2. Unless you have other ideas of course.
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
Building tcmalloc with exact same cflags -O3 doesn't cause any crashes. At this time my guess is either compiler bug or cosmic radiation hitting just this specific build.

Can we simply force rebuild ?
Comment by Phil Labee [ 13/Mar/14 ]
test with newer build 2.5.1-1075:

http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_centos6_x86_2.5.1-1075-rel.rpm

http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_centos6_x86_64_2.5.1-1075-rel.rpm
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
Didn't help unfortunately. Is that still with -O3 ?
Comment by Phil Labee [ 14/Mar/14 ]
still using -O3. There are extensive comments in the voltron Makefile warning against changing to -O2
Comment by Phil Labee [ 14/Mar/14 ]
Did you try to build gperftools out of our repo?
Comment by Aleksey Kondratenko [ 14/Mar/14 ]
The following is not true:

Got myself centos 6.4. And with it's gcc and -O3 I'm finally able to reproduce issue.
Comment by Aleksey Kondratenko [ 14/Mar/14 ]
So I've got myself centos 6.4 and _exact same compiler version_. And when I build tcmalloc myself with all right flags and replace tcmalloc from package it works. Without replacing it crashes.
Comment by Aleksey Kondratenko [ 14/Mar/14 ]
Phil, please, clean ccache, reboot builder host (to clean page cache) and _then_ do another rebuild. Looking at build logs it looks like ccache is being used. So my suspicion about ram corruption is not fully excluded yet. And I have not much other ideas.
Comment by Phil Labee [ 14/Mar/14 ]
cleared ccache and restarted centos-6-x86-builder, centos-6-x64-builder

started build 2.5.1-1076
Comment by Pavel Paulau [ 14/Mar/14 ]
2.5.1-1076 seems to be working, it warns about "SMALL MEMORY MODEL IS IN USE, PERFORMANCE MAY SUFFER" as well.
Comment by Aleksey Kondratenko [ 14/Mar/14 ]
Maybe I'm doing something wrong but it fails in exact same way on my VM
Comment by Pavel Paulau [ 14/Mar/14 ]
Sorry, it crashed eventually.
Comment by Aleksey Kondratenko [ 14/Mar/14 ]
Confirmed again. Everything is exactly same as before. Build 1076 centos 6.4 amd64 crashes very easily. Both enterprise edition and community. And doesn't crash if I replace tcmalloc with stuff that I've built, that's exact same source and exact same flags and exact same compiler version.

Build 1071 doesn't crash. All of the 100% consistently.
Comment by Phil Labee [ 17/Mar/14 ]
possibly a difference in build environment

reference env is described in voltron README.md file

for centos-6 X64 (6.4 final) we use the defaults for these tools:


gcc-4.4.7-3.el6 ( 4.4.7-4 available)
gcc-c++-4.4.7-3 ( 4.4.7-4 available)
kernel-devel-2.6.32-358 ( 2.6.32-431.5.1 available)
openssl-devel-1.0.0-27.el6_4.2 ( 1.0.1e-16.el6_5.4 available)
rpm-build-4.8.0-32 ( 4.8.0-37 available)

these tools do not have an update:

scons-2.0.1-1
libtool-2.2.6-15.5

For all centos these specific versions are installed:

gcc, g++ 4.4, currently 4.4.7-3, 4.4.7-4 available
autoconf 2.65, currently 2.63-5 (no update available)
automake 1.11.1
libtool 2.4.2
Comment by Phil Labee [ 17/Mar/14 ]
downloaded gperftools-2.1.tar.gz from

    http://gperftools.googlecode.com/files/gperftools-2.1.tar.gz

and expanded into directory: gperftools-2.1

cloned https://github.com/couchbase/gperftools.git at commit:

    674fcd94a8a0a3595f64e13762ba3a6529e09926

into directory gperftools, and compared:

=> diff -r gperftools-2.1 gperftools
Only in gperftools: .git
Only in gperftools: autogen.sh
Only in gperftools/doc: pprof.see_also
Only in gperftools/src/windows: TODO
Only in gperftools/src/windows: google

Only in gperftools-2.1: Makefile.in
Only in gperftools-2.1: aclocal.m4
Only in gperftools-2.1: compile
Only in gperftools-2.1: config.guess
Only in gperftools-2.1: config.sub
Only in gperftools-2.1: configure
Only in gperftools-2.1: depcomp
Only in gperftools-2.1: install-sh
Only in gperftools-2.1: libtool
Only in gperftools-2.1: ltmain.sh
Only in gperftools-2.1/m4: libtool.m4
Only in gperftools-2.1/m4: ltoptions.m4
Only in gperftools-2.1/m4: ltsugar.m4
Only in gperftools-2.1/m4: ltversion.m4
Only in gperftools-2.1/m4: lt~obsolete.m4
Only in gperftools-2.1: missing
Only in gperftools-2.1/src: config.h.in
Only in gperftools-2.1: test-driver
Comment by Phil Labee [ 17/Mar/14 ]
Since the build files in your source are different than in the production build, we can't really say we're using the same source.

Please build from our repo and re-try your test.
Comment by Aleksey Kondratenko [ 17/Mar/14 ]
The difference is in autotools products. I _cannot_ build using same autotools that's present on build machine unless I'm given access to that box.
Comment by Aleksey Kondratenko [ 17/Mar/14 ]
The _source_ is exact same
Comment by Phil Labee [ 17/Mar/14 ]
I've given the versions of autotools to use, so you could make your build environment in line with the production builds.

As a shortcut, I've submitted a request for a clone of the builder VM that you can experiment with.

See CBIT-1053
Comment by Wayne Siu [ 17/Mar/14 ]
The cloned builder is available. Info in CBIT-1053.
Comment by Aleksey Kondratenko [ 18/Mar/14 ]
Built tcmalloc from exact copy in builder directory.

Installed package from inside builder directory (build 1077). Verified that problem exists. Stopped service. Replaced tcmalloc. Observer that everything is fine.

Something in environment is causing this. Like maybe unusual ldflags or something else. But _not_ source.
Comment by Aleksey Kondratenko [ 18/Mar/14 ]
Build full rpm package under buildbot user. With exact same make invocation as I see in buildbot logs. And resultant package works. Weird indeed.
Comment by Phil Labee [ 18/Mar/14 ]
some differences between test build and production build:


1) In gperftools, production calls "make install-exec-am install-data-am" while test calls "make install" which executes extra step "all-am"

2) In ep-engine, produciton uses "make install" while test uses "make"

3) Test build as user "root" while production build as user "buildbot", so PATH and other env.vars may be different.

In general it's hard to tell what steps were performed for the test build, as no output logfiles have been captured.
Comment by Wayne Siu [ 21/Mar/14 ]
Updated from Phil:
comment:
________________________________________

2.5.1-1082 was done without the tcmalloc flag: CPPFLAGS=-DTCMALLOC_SMALL_BUT_SLOW

    http://review.couchbase.org/#/c/34755/


2.5.1-1083 was done with build step timeout increased from 60 minutes to 90

2.5.1-1084 was done with the tcmalloc flag restored:

    http://review.couchbase.org/#/c/34792/
Comment by Andrei Baranouski [ 23/Mar/14 ]
 2.5.1-1082 MB-10545 Vbucket map is not ready after 60 seconds
Comment by Meenakshi Goel [ 24/Mar/14 ]
Memcached crashes with segmentation fault is observed with build 2.5.1-1084-rel on ubuntu 12.04 during Auto Compaction tests.

Jenkins Link:
http://qa.sc.couchbase.com/view/2.5.1%20centos/job/centos_x64--00_02--compaction_tests-P0/56/consoleFull

root@jackfruit-s12206:/tmp# gdb /opt/couchbase/bin/memcached core.memcached.8276
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /opt/couchbase/bin/memcached...done.
[New LWP 8301]
[New LWP 8302]
[New LWP 8599]
[New LWP 8303]
[New LWP 8604]
[New LWP 8299]
[New LWP 8601]
[New LWP 8600]
[New LWP 8602]
[New LWP 8287]
[New LWP 8285]
[New LWP 8300]
[New LWP 8276]
[New LWP 8516]
[New LWP 8603]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 11, Segmentation fault.
#0 tcmalloc::CentralFreeList::FetchFromSpans (this=0x7f356f45d780) at src/central_freelist.cc:298
298 src/central_freelist.cc: No such file or directory.
(gdb) t a a bt

Thread 15 (Thread 0x7f3568039700 (LWP 8603)):
#0 0x00007f356f01b9fa in __lll_unlock_wake () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f356f018104 in _L_unlock_644 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f356f018063 in pthread_mutex_unlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f3569c663d6 in Mutex::release (this=0x5f68250) at src/mutex.cc:94
#4 0x00007f3569c9691f in unlock (this=<optimized out>) at src/locks.hh:58
#5 ~LockHolder (this=<optimized out>, __in_chrg=<optimized out>) at src/locks.hh:41
#6 fireStateChange (to=<optimized out>, from=<optimized out>, this=<optimized out>) at src/warmup.cc:707
#7 transition (force=<optimized out>, to=<optimized out>, this=<optimized out>) at src/warmup.cc:685
#8 Warmup::initialize (this=<optimized out>) at src/warmup.cc:413
#9 0x00007f3569c97f75 in Warmup::step (this=0x5f68258, d=..., t=...) at src/warmup.cc:651
#10 0x00007f3569c2644a in Dispatcher::run (this=0x5e7f180) at src/dispatcher.cc:184
#11 0x00007f3569c26c1d in launch_dispatcher_thread (arg=0x5f68258) at src/dispatcher.cc:28
#12 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7f356a705700 (LWP 8516)):
#0 0x00007f356ed0d83d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356ed3b774 in usleep () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f3569c65445 in updateStatsThread (arg=<optimized out>) at src/memory_tracker.cc:31
#3 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7f35703e8740 (LWP 8276)):
#0 0x00007f356ed42353 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356fdadf36 in epoll_dispatch (base=0x5e8e000, tv=<optimized out>) at epoll.c:404
#2 0x00007f356fd99394 in event_base_loop (base=0x5e8e000, flags=<optimized out>) at event.c:1558
#3 0x000000000040c9e6 in main (argc=<optimized out>, argv=<optimized out>) at daemon/memcached.c:7996

Thread 12 (Thread 0x7f356c709700 (LWP 8300)):
#0 0x00007f356ed42353 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356fdadf36 in epoll_dispatch (base=0x5e8e280, tv=<optimized out>) at epoll.c:404
#2 0x00007f356fd99394 in event_base_loop (base=0x5e8e280, flags=<optimized out>) at event.c:1558
#3 0x0000000000415584 in worker_libevent (arg=0x16814f8) at daemon/thread.c:301
#4 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7f356e534700 (LWP 8285)):
#0 0x00007f356ed348bd in read () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356ecc8ff8 in _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f356ecca03e in _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f356ecbe18a in _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f356ecbd06b in fgets () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f356e535b19 in fgets (__stream=<optimized out>, __n=<optimized out>, __s=<optimized out>) at /usr/include/bits/stdio2.h:255
#6 check_stdin_thread (arg=<optimized out>) at extensions/daemon/stdin_check.c:37
#7 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#9 0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f356d918700 (LWP 8287)):
#0 0x00007f356f0190fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---

#1 0x00007f356db32176 in logger_thead_main (arg=<optimized out>) at extensions/loggers/file_logger.c:368
#2 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f3567037700 (LWP 8602)):
#0 SpinLock::acquire (this=0x5ff7010) at src/atomic.cc:32
#1 0x00007f3569c6351c in lock (this=<optimized out>) at src/atomic.hh:282
#2 SpinLockHolder (theLock=<optimized out>, this=<optimized out>) at src/atomic.hh:274
#3 gimme (this=<optimized out>) at src/atomic.hh:396
#4 RCPtr (other=..., this=<optimized out>) at src/atomic.hh:334
#5 KVShard::getBucket (this=0x7a6e7c0, id=256) at src/kvshard.cc:58
#6 0x00007f3569c9231d in VBucketMap::getBucket (this=0x614a448, id=256) at src/vbucketmap.cc:40
#7 0x00007f3569c314ef in EventuallyPersistentStore::getVBucket (this=<optimized out>, vbid=256, wanted_state=<optimized out>) at src/ep.cc:475
#8 0x00007f3569c315f6 in EventuallyPersistentStore::firePendingVBucketOps (this=0x614a400) at src/ep.cc:488
#9 0x00007f3569c41bb1 in EventuallyPersistentEngine::notifyPendingConnections (this=0x5eb8a00) at src/ep_engine.cc:3474
#10 0x00007f3569c41d63 in EvpNotifyPendingConns (arg=0x5eb8a00) at src/ep_engine.cc:1182
#11 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f3565834700 (LWP 8600)):
#0 0x00007f356f0190fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f3569c68f7d in wait (tv=..., this=<optimized out>) at src/syncobject.hh:57
#2 ExecutorThread::run (this=0x5e7e1c0) at src/scheduler.cc:146
#3 0x00007f3569c6963d in launch_executor_thread (arg=0x5e7e204) at src/scheduler.cc:36
#4 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f3566035700 (LWP 8601)):
#0 0x00007f356f0190fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f3569c68f7d in wait (tv=..., this=<optimized out>) at src/syncobject.hh:57
#2 ExecutorThread::run (this=0x5e7fa40) at src/scheduler.cc:146
#3 0x00007f3569c6963d in launch_executor_thread (arg=0x5e7fa84) at src/scheduler.cc:36
#4 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f356cf0a700 (LWP 8299)):
#0 0x00007f356ed42353 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356fdadf36 in epoll_dispatch (base=0x5e8e500, tv=<optimized out>) at epoll.c:404
#2 0x00007f356fd99394 in event_base_loop (base=0x5e8e500, flags=<optimized out>) at event.c:1558
#3 0x0000000000415584 in worker_libevent (arg=0x1681400) at daemon/thread.c:301
#4 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f3567838700 (LWP 8604)):
#0 0x00007f356f01b89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f356f017065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f356f016eba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f3569c6635a in Mutex::acquire (this=0x5e7f890) at src/mutex.cc:79
#4 0x00007f3569c261f8 in lock (this=<optimized out>) at src/locks.hh:48
#5 LockHolder (m=..., this=<optimized out>) at src/locks.hh:26
---Type <return> to continue, or q <return> to quit---
#6 Dispatcher::run (this=0x5e7f880) at src/dispatcher.cc:138
#7 0x00007f3569c26c1d in launch_dispatcher_thread (arg=0x5e7f898) at src/dispatcher.cc:28
#8 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f356af06700 (LWP 8303)):
#0 0x00007f356ed42353 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356fdadf36 in epoll_dispatch (base=0x5e8e780, tv=<optimized out>) at epoll.c:404
#2 0x00007f356fd99394 in event_base_loop (base=0x5e8e780, flags=<optimized out>) at event.c:1558
#3 0x0000000000415584 in worker_libevent (arg=0x16817e0) at daemon/thread.c:301
#4 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f3565033700 (LWP 8599)):
#0 0x00007f356ed18267 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f3569c13997 in SpinLock::acquire (this=0x5ff7010) at src/atomic.cc:35
#2 0x00007f3569c63e57 in lock (this=<optimized out>) at src/atomic.hh:282
#3 SpinLockHolder (theLock=<optimized out>, this=<optimized out>) at src/atomic.hh:274
#4 gimme (this=<optimized out>) at src/atomic.hh:396
#5 RCPtr (other=..., this=<optimized out>) at src/atomic.hh:334
#6 KVShard::getVBucketsSortedByState (this=0x7a6e7c0) at src/kvshard.cc:75
#7 0x00007f3569c5d494 in Flusher::getNextVb (this=0x168d040) at src/flusher.cc:232
#8 0x00007f3569c5da0d in doFlush (this=<optimized out>) at src/flusher.cc:211
#9 Flusher::step (this=0x5ff7010, tid=21) at src/flusher.cc:152
#10 0x00007f3569c69034 in ExecutorThread::run (this=0x5e7e8c0) at src/scheduler.cc:159
#11 0x00007f3569c6963d in launch_executor_thread (arg=0x5ff7010) at src/scheduler.cc:36
#12 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f356b707700 (LWP 8302)):
#0 0x00007f356ed42353 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f356fdadf36 in epoll_dispatch (base=0x5e8ea00, tv=<optimized out>) at epoll.c:404
#2 0x00007f356fd99394 in event_base_loop (base=0x5e8ea00, flags=<optimized out>) at event.c:1558
#3 0x0000000000415584 in worker_libevent (arg=0x16816e8) at daemon/thread.c:301
#4 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f356bf08700 (LWP 8301)):
#0 tcmalloc::CentralFreeList::FetchFromSpans (this=0x7f356f45d780) at src/central_freelist.cc:298
#1 0x00007f356f23ef19 in tcmalloc::CentralFreeList::FetchFromSpansSafe (this=0x7f356f45d780) at src/central_freelist.cc:283
#2 0x00007f356f23efb7 in tcmalloc::CentralFreeList::RemoveRange (this=0x7f356f45d780, start=0x7f356bf07268, end=0x7f356bf07260, N=4) at src/central_freelist.cc:263
#3 0x00007f356f2430b5 in tcmalloc::ThreadCache::FetchFromCentralCache (this=0xf5d298, cl=9, byte_size=128) at src/thread_cache.cc:160
#4 0x00007f356f239fa3 in Allocate (this=<optimized out>, cl=<optimized out>, size=<optimized out>) at src/thread_cache.h:364
#5 do_malloc_small (size=128, heap=<optimized out>) at src/tcmalloc.cc:1088
#6 do_malloc_no_errno (size=<optimized out>) at src/tcmalloc.cc:1095
#7 (anonymous namespace)::cpp_alloc (size=128, nothrow=<optimized out>) at src/tcmalloc.cc:1423
#8 0x00007f356f249538 in tc_new (size=139867476842368) at src/tcmalloc.cc:1601
#9 0x00007f3569c2523e in Dispatcher::schedule (this=0x5e7f880,
    callback=<error reading variable: DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>, outtid=0x6127930, priority=...,
    sleeptime=<optimized out>, isDaemon=true, mustComplete=false) at src/dispatcher.cc:243
#10 0x00007f3569c84c1a in TapConnNotifier::start (this=0x6127920) at src/tapconnmap.cc:66
---Type <return> to continue, or q <return> to quit---
#11 0x00007f3569c42362 in EventuallyPersistentEngine::initialize (this=0x5eb8a00, config=<optimized out>) at src/ep_engine.cc:1415
#12 0x00007f3569c42616 in EvpInitialize (handle=0x5eb8a00,
    config_str=0x7f356bf07993 "ht_size=3079;ht_locks=5;tap_noop_interval=20;max_txn_size=10000;max_size=1491075072;tap_keepalive=300;dbname=/opt/couchbase/var/lib/couchbase/data/default;allow_data_loss_during_shutdown=true;backend="...) at src/ep_engine.cc:126
#13 0x00007f356cf0f86a in create_bucket_UNLOCKED (e=<optimized out>, bucket_name=0x7f356bf07b80 "default", path=0x7f356bf07970 "/opt/couchbase/lib/memcached/ep.so", config=<optimized out>,
    e_out=<optimized out>, msg=0x7f356bf07560 "", msglen=1024) at bucket_engine.c:711
#14 0x00007f356cf0faac in handle_create_bucket (handle=<optimized out>, cookie=0x5e4bc80, request=<optimized out>, response=0x40d520 <binary_response_handler>) at bucket_engine.c:2168
#15 0x00007f356cf10229 in bucket_unknown_command (handle=0x7f356d1171c0, cookie=0x5e4bc80, request=0x5e44000, response=0x40d520 <binary_response_handler>) at bucket_engine.c:2478
#16 0x0000000000412c35 in process_bin_unknown_packet (c=<optimized out>) at daemon/memcached.c:2911
#17 process_bin_packet (c=<optimized out>) at daemon/memcached.c:3238
#18 complete_nread_binary (c=<optimized out>) at daemon/memcached.c:3805
#19 complete_nread (c=<optimized out>) at daemon/memcached.c:3887
#20 conn_nread (c=0x5e4bc80) at daemon/memcached.c:5744
#21 0x0000000000406e45 in event_handler (fd=<optimized out>, which=<optimized out>, arg=0x5e4bc80) at daemon/memcached.c:6012
#22 0x00007f356fd9948c in event_process_active_single_queue (activeq=<optimized out>, base=<optimized out>) at event.c:1308
#23 event_process_active (base=<optimized out>) at event.c:1375
#24 event_base_loop (base=0x5e8ec80, flags=<optimized out>) at event.c:1572
#25 0x0000000000415584 in worker_libevent (arg=0x16815f0) at daemon/thread.c:301
#26 0x00007f356f014e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#27 0x00007f356ed41cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#28 0x0000000000000000 in ?? ()
(gdb)
Comment by Aleksey Kondratenko [ 25/Mar/14 ]
Yesterday I took that consistently failing ubuntu build and played with it on my box.

It is exactly same situation. Replacing libtcmalloc.so makes it work.

So I've spent afternoon on running what's in our actual package under debugger.

I found several evidences that some object files linked into libtcmalloc.so that we ship were built with -DTCMALLOC_SMALL_BUT_SLOW and some _were_ not.

That explains weird crashes.

I'm unable to explain how it's possible that our builders produced such .so files. Yet.

Gut feeling is that it might be:

* something caused by ccache

* perhaps not full cleanup between builds

In order to verify that I'm asking the following:

* do a build with ccache completely disabled but with define

* do git clean -xfd inside gperftools checkout before doing build

Comment by Phil Labee [ 29/Jul/14 ]
The failure was detected by

    http://qa.sc.couchbase.com/job/centos_x64--00_02--compaction_tests-P0/

Can I run this test on a 3.0.0 build to see if this bug still exists?
Comment by Phil Labee [ 29/Jul/14 ]
Can I run this test on a 3.0.0 build to see if bug still exists?
Comment by Meenakshi Goel [ 30/Jul/14 ]
Started a run with latest 3.0.0 build 1057.
http://qa.hq.northscale.net/job/centos_x64--44_01--auto_compaction_tests-P0/37/console

However haven't seen such crashes with compaction tests during 3.0.0 testing.
Comment by Meenakshi Goel [ 30/Jul/14 ]
Tests passed with 3.0.0-1057-rel.
Comment by Wayne Siu [ 31/Jul/14 ]
Pavel also helped verify that this is not an issue in 3.0 (3.0.0-1067).
Comment by Wayne Siu [ 31/Jul/14 ]
Reopening for 2.5.x.
Comment by Aleksey Kondratenko [ 01/Aug/14 ]
We still have _exactly_ same problem as in 2.5.1. Enabling -DSMALL_BUT_SLOW causes mis-compilation. And this is _not_ upstream bug.
Comment by Aleksey Kondratenko [ 01/Aug/14 ]
Looking at build log here: http://builds.hq.northscale.net:8010/builders/centos-6-x64-300-builder/builds/1111/steps/couchbase-server%20make%20enterprise%20/logs/stdio

I see that just few files were rebuilt with new define. And previous build did not have CPPFLAGS set to -DSMALL_BUT_SLOW.

So at least in this case I'm adamant that builder did not rebuild tcmalloc when it should.
Comment by Phil Labee [ 01/Aug/14 ]
check to see if ccache is causing failure to rebuild components under changed configure settings
Comment by Aleksey Kondratenko [ 01/Aug/14 ]
build logs indicate that it is unrelated to ccache. I.e. lots of files are not getting built at all.
Comment by Chris Hillery [ 01/Aug/14 ]
The quick-n-dirty solution would be to delete the buildslave directories for the next build. I would do that now, but there is a build on-going so possibly Phil has already taken care of it. If this build (1091) doesn't work, then we'll clean the world and try again.
Comment by Chris Hillery [ 01/Aug/14 ]
build 1091 is wrapping up, and visually it doesn't appear that gperftools got recompiled. I am waiting for each builder to finish and deleting the buildslave directories. When that is done I'll start a new build.
Comment by Chris Hillery [ 01/Aug/14 ]
build 1092 should start shortly.
Comment by Chris Hillery [ 01/Aug/14 ]
build 1092 is wrapping up (bits are already on latestbuilds I believe). Please test.




[MB-9917] DOC - memcached should dynamically adjust the number of worker threads Created: 14/Jan/14  Updated: 24/Jul/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Trond Norbye Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
4 threads is probably not ideal for a 24 core system ;)

 Comments   
Comment by Anil Kumar [ 25/Mar/14 ]
Trond - Can you explain is this new feature in 3.0 or fixing documentation on older docs?
Comment by Ruth Harris [ 17/Jul/14 ]
Trond, Could you provide more information here and then reassign to me? --ruth
Comment by Trond Norbye [ 24/Jul/14 ]
New in 3.0 is that memcached no longer defaults to 4 threads for the frontend, but use 75% of the number of cores reported of the system (with a minimum of 4 cores).

There are 3 ways to tune this:

* Export MEMCACHED_NUM_CPUS=number of threads you want before starting couchbase server

* Use the -t <number> command line argument (this will go away in the future)

* specify it in the configuration file read during startup (but when started from the full server this file is regenerated every time so you'll loose the modifications)




[MB-12052] add stale=false semantic changes to release notes Created: 22/Aug/14  Updated: 28/Aug/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Matt Ingenthron Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Need to release note the stale=false semantic changes.
This doesn't seem to be in current release notes, though MB-11589 seems loosely related.

Please use/adapt from the following text:
Starting with the 3.0 release, the "stale" view query argument "false" has been enhanced so it will consider all document changes which have been received at the time the query has been received. This means that use of the `durability requirements` or `observe` feature to block for persistence in application code before issuing the `false` stale query is no longer needed. It is recommended that you remove all such application level checks after completing the upgrade to the 3.0 release.

- - -

Ruth: assigning this to you to work out the right way to work the text into the release notes. This probably goes with a change in a different MB.




[MB-12090] add stale=false semantic changes to dev guide Created: 28/Aug/14  Updated: 28/Aug/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Matt Ingenthron Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
Need to change the dev guide to explain the semantics change with the stale parameter.

 Comments   
Comment by Matt Ingenthron [ 28/Aug/14 ]
I could not find the 3.0 dev guide to write up something. I've generated a diff based on the 2.5 dev guide. Note that much of that dev guide refers to the 3.0 admin guide section on views. I could not find that in the "dita" directory so I could contribute a change to the XML. I think based on this and what I put in MB-12052 should help.


diff --git a/content/couchbase-devguide-2.5/finding-data-with-views.markdown b/content/couchbase-devguide-2.5/finding-data-with-views.markdown
index 77735b9..811dff0 100644
--- a/content/couchbase-devguide-2.5/finding-data-with-views.markdown
+++ b/content/couchbase-devguide-2.5/finding-data-with-views.markdown
@@ -1,6 +1,6 @@
 # Finding Data with Views
 
-In Couchbase 2.1.0 you can index and query JSON documents using *views*. Views
+In Couchbase you can index and query JSON documents using *views*. Views
 are functions written in JavaScript that can serve several purposes in your
 application. You can use them to:
 
@@ -323,16 +323,25 @@ Forinformation about the sort order of indexes, see the
 [Couchbase Server Manual](http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/).
 
 The real-time nature of Couchbase Server means that an index can become outdated
-fairly quickly when new entries and updates occur. Couchbase Server generates
-the index when it is queried, but in the meantime more data can be added to the
-server and this information will not yet be part of the index. To resolve this,
-Couchbase SDKs and the REST API provide a `stale` parameter you use when you
-query a view. With this parameter you can indicate you will accept the most
-current index as it is, you want to trigger a refresh of the index and retrieve
-these results, or you want to retrieve the existing index as is but also trigger
-a refresh of the index. For instance, to query a view with the stale parameter
-using the Ruby SDK:
+fairly quickly when new entries and updates occur. Couchbase Server updates
+the index at the time the query is received if you supply the argument
+`false` to the `stale` parameter.
+
+<div class="notebox">
+<p>Note</p>
+<p>Starting with the 3.0 release, the "stale" view query argument
+"false" has been enhanced so it will consider all document changes
+which have been received at the time the query has been received. This
+means that use of the `durability requirements` or `observe` feature
+to block for persistence in application code before issuing the
+`false` stale query is no longer needed. It is recommended that you
+remove all such application level checks after completing the upgrade
+to the 3.0 release.
+</p>
+</div>
 
+For instance, to query a view with the stale parameter
+using the Ruby SDK:
 
 ```
 doc.recent_posts(:body => {:stale => :ok})
@@ -905,13 +914,14 @@ for(ViewRow row : result) {
 }
 ```
 
-Before we create a Couchbase client instance and connect to the server, we set a
-system property 'viewmode' to 'development' to put the view into production
-mode. Then we query our view and limit the number of documents returned to 20
-items. Finally when we query our view we set the `stale` parameter to FALSE to
-indicate we want to reindex and include any new or updated beers in Couchbase.
-For more information about the `stale` parameter and index updates, see Index
-Updates and the Stale Parameter in the
+Before we create a Couchbase client instance and connect to the
+server, we set a system property 'viewmode' to 'development' to put
+the view into production mode. Then we query our view and limit the
+number of documents returned to 20 items. Finally when we query our
+view we set the `stale` parameter to FALSE to indicate we want to
+consider any recent changes to documents. For more information about
+the `stale` parameter and index updates, see Index Updates and the
+Stale Parameter in the
 [Couchbase Server Manual](http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#couchbase-views-writing-stale).
 
 The last part of this code sample is a loop we use to iterate through each item




[MB-12096] collect_server_info.py does not work on a dev tree on windows.. Created: 29/Aug/14  Updated: 03/Sep/14

Status: Open
Project: Couchbase Server
Component/s: test-execution
Affects Version/s: techdebt-backlog
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Trond Norbye Assignee: Tommie McAfee
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
cbcollectinfo.py tries to use ssh to collect the files even if the target machine is the same machine as the test is running on, and that doesn't seem to work on my windows development box. Since all of the files should be local it could might as well use normal copy.




[MB-7250] Mac OS X App should be signed by a valid developer key Created: 22/Nov/12  Updated: 08/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build, installer
Affects Version/s: 2.0-beta-2, 2.1.0, 2.2.0, 2.5.0, 2.5.1, 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: J Chris Anderson Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Build_2.5.0-950.png     PNG File Screen Shot 2013-02-17 at 9.17.16 PM.png     PNG File Screen Shot 2013-04-04 at 3.57.41 PM.png     PNG File Screen Shot 2013-08-22 at 6.12.00 PM.png     PNG File ss_2013-04-03_at_1.06.39 PM.png    
Issue Links:
Dependency
depends on MB-9437 macosx installer package fails during... Closed
Relates to
relates to CBLT-104 Enable Mac developer signing on Mac b... Open
Is this a Regression?: No

 Description   
Currently launching the Mac OS X version tells you it's from an unidentified developer. You have to right click to launch the app. We can fix this.

 Comments   
Comment by Farshid Ghods (Inactive) [ 22/Nov/12 ]
Chris,

do you know what needs to change on the build machine to embed our developer key ?
Comment by J Chris Anderson [ 22/Nov/12 ]
I have no idea. I could start researching how to get a key from Apple but maybe after the weekend. :)
Comment by Farshid Ghods (Inactive) [ 22/Nov/12 ]
we can discuss this next week : ) . Thanks for reporting the issue Chris.
Comment by Steve Yen [ 26/Nov/12 ]
we'll want separate, related bugs (tasks) for other platforms, too (windows, linux)
Comment by Jens Alfke [ 30/Nov/12 ]
We need to get a developer ID from Apple; this will give us some kind of cert, and a local private key for signing.
Then we need to figure out how to get that key and cert onto the build machine, in the Keychain of the account that runs the buildbot.
Comment by Farshid Ghods (Inactive) [ 02/Jan/13 ]
the instructions to build is available here :
https://github.com/couchbase/couchdbx-app
we need to add codesign as a build step there
Comment by Farshid Ghods (Inactive) [ 22/Jan/13 ]
Phil,

do you have any update on this ticket. ?
Comment by Phil Labee [ 22/Jan/13 ]
I have signing cert installed on 10.17.21.150 (MacBuild).

Change to Makefile: http://review.couchbase.org/#/c/24149/
Comment by Phil Labee [ 23/Jan/13 ]
need to change master.cfg and pass env.var. to package-mac
Comment by Phil Labee [ 29/Jan/13 ]
disregard previous. Have added signing to Xcode projects.

see http://review.couchbase.org/#/c/24273/
Comment by Phil Labee [ 31/Jan/13 ]
To test this go to System Preferences / Security & Privacy, and on the General tab set "Allow applications downloaded from" to "Mac App Store and Identified Developers". Set this before running Couchbase Server.app the first time. Once an app has been allowed to run this setting is no longer checked for that app, and there doesn't seem to be a way to reset that.

What is odd is that on my system, I allowed one unsigned build to run before restricting the app run setting, and then no other unsigned builds would be checked (and would all be allowed to run). Either there is a flaw in my testing methodology, or a serious weakness in this security setting: Just because one app called Couchbase Server was allowed to run should confer this privilege to other apps with the same name. A common malware tactic is to modify a trusted app and distribute it as update, and if the security setting keys off the app name it will do nothing to prevent that.

I'm approving this change without having satisfactorily tested it.
Comment by Jens Alfke [ 31/Jan/13 ]
Strictly speaking it's not the app name but its bundle ID, i.e. "com.couchbase.CouchbaseServer" or whatever we use.

> I allowed one unsigned build to run before restricting the app run setting, and then no other unsigned builds would be checked

By OK'ing an unsigned app you're basically agreeing to toss security out the window, at least for that app. This feature is really just a workaround for older apps. By OK'ing the app you're not really saying "yes, I trust this build of this app" so much as "yes, I agree to run this app even though I don't trust it".

> A common malware tactic is to modify a trusted app and distribute it as update

If it's a trusted app it's hopefully been signed, so the user wouldn't have had to waive signature checking for it.
Comment by Jens Alfke [ 31/Jan/13 ]
Further thought: It might be a good idea to change the bundle ID in the new signed version of the app, because users of 2.0 with strict security settings have presumably already bypassed security on the unsigned version.
Comment by Jin Lim [ 04/Feb/13 ]
Per bug scrubs, keep this a blocker since customers ran into this issues (and originally reported it).
Comment by Phil Labee [ 06/Feb/13 ]
revert the change so that builds can complete. App is currently not being signed.
Comment by Farshid Ghods (Inactive) [ 11/Feb/13 ]
i suggest for 2.0.1 release we do this build manually.
Comment by Jin Lim [ 11/Feb/13 ]
As one-off fix, add the signature manually and automate the required steps later in 2.0.2 or beyond.
Comment by Jin Lim [ 13/Feb/13 ]
Please move this bug to 2.0.2 after populating the required signature manually. I am lowing the severity to critical for it isn't no longer a blocking issue.
Comment by Farshid Ghods (Inactive) [ 15/Feb/13 ]
Phil to upload the binary to latestbuilds , ( 2.0.1-101-rel.zip )
Comment by Phil Labee [ 15/Feb/13 ]
Please verify:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-160-rel-signed.zip
Comment by Phil Labee [ 15/Feb/13 ]
uploaded:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-160-rel-signed.zip

I can rename it when uploading for release.
Comment by Farshid Ghods (Inactive) [ 17/Feb/13 ]
i still do get the error that it is from an identified developer.

Comment by Phil Labee [ 18/Feb/13 ]
operator error.

I rebuilt the app, this time verifying that the codesign step occurred.

Uploaded now file to same location:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-160-rel-signed.zip
Comment by Phil Labee [ 26/Feb/13 ]
still need to perform manual workaround
Comment by Phil Labee [ 04/Mar/13 ]
release candidate has been uploaded to:

http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-172-signed.zip
Comment by Wayne Siu [ 03/Apr/13 ]
Phil, looks like version 172/185 is still getting the error. My Mac version is 10.8.2
Comment by Thuan Nguyen [ 03/Apr/13 ]
Install couchbase server (build 2.0.1-172 community version) in my mac osx 10.7.4 , I only see the warning message
Comment by Wayne Siu [ 03/Apr/13 ]
Latest version (04.03.13) : http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.1-185-rel.zip
Comment by Maria McDuff (Inactive) [ 03/Apr/13 ]
works in 10.7 but not in 10.8.
if we can get the fix for 10.8 by tomorrow, end of day, QE is willing to test for release on tuesday, april 9.
Comment by Phil Labee [ 04/Apr/13 ]
The mac builds are not being automatically signed, so build 185 is not signed. The original 172 is also not signed.

Did you try

    http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-172-signed.zip

to see if that was signed correctly?

Comment by Wayne Siu [ 04/Apr/13 ]
Phil,
Yes, we did try the 172-signed version. It works on 10.7 but not 10.8. Can you take a look?
Comment by Phil Labee [ 04/Apr/13 ]
I rebuilt 2.0.1-185 and uploaded a signed app to:

    http://packages.northscale.com/latestbuilds/couchbase-server-community_x86_64_2.0.1-185-rel.SIGNED.zip

Test on a machine that has never had Couchbase Server installed, and has the security setting to only allow Appstore or signed apps.

If you get the "Couchbase Server.app was downloaded from the internet" warning and you can click OK and install it, then this bug is fixed. The quarantining of files downloaded by a browser is part of the operating system and is not controlled by signing.
Comment by Wayne Siu [ 04/Apr/13 ]
Tried the 185-signed version (see attached screen shot). Same error message.
Comment by Phil Labee [ 04/Apr/13 ]
This is not an error message related to this bug.

Comment by Maria McDuff (Inactive) [ 14/May/13 ]
per bug triage, we need to have mac 10.8 osx working since it is a supported platform (published in the website).
Comment by Wayne Siu [ 29/May/13 ]
Work Around:
Step One
Hold down the Control key and click the application icon. From the contextual menu choose Open.

Step Two
A popup will appear asking you to confirm this action. Click the Open button.
Comment by Anil Kumar [ 31/May/13 ]
we need to address signed key for both Windows and Mac deferring this to next release.
Comment by Dipti Borkar [ 08/Aug/13 ]
Please let's make sure this is fixed in 2.2.
Comment by Phil Labee [ 16/Aug/13 ]
New keys will be created using new account.
Comment by Phil Labee [ 20/Aug/13 ]
iOS Apps
--------------
Certificates:
  Production:
    "Couchbase, Inc." type=iOS Distribution expires Aug 12, 2014

    ~buildbot/Desktop/appledeveloper.couchbase.com/certs/ios/ios_distribution_appledeveloper.couchbase.com.cer

Identifiers:
  App IDS:
    "Couchbase Server" id=com.couchbase.*

Provisining Profiles:
  Distribution:
    "appledeveloper.couchbase.com" type=Distribution

  ~buildbot/Desktop/appledeveloper.couchbase.com/profiles/ios/appledevelopercouchbasecom.mobileprovision
Comment by Phil Labee [ 20/Aug/13 ]
Mac Apps
--------------
Certificates:
  Production:
    "Couchbase, Inc." type=Mac App Distribution (Aug,15,2014)
    "Couchbase, Inc." type=Developer ID installer (Aug,16,2014)
    "Couchbase, Inc." type=Developer ID Application (Aug,16,2014)
    "Couchbase, Inc." type=Mac App Distribution (Aug,15,2014)

     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/mac_app_distribution.cer
     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/developerID_installer.cer
     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/developererID_application.cer
     ~buildbot/Desktop/appledeveloper.couchbase.com/certs/mac_app/mac_app_distribution-2.cer

Identifiers:
  App IDs:
    "Couchbase Server" id=couchbase.com.* Prefix=N2Q372V7W2
    "Coucbase Server adhoc" id=couchbase.com.* Prefix=N2Q372V7W2
    .

Provisioning Profiles:
  Distribution:
    "appstore.couchbase.com" type=Distribution
    "Couchbase Server adhoc" type=Distribution

     ~buildbot/Desktop/appledeveloper.couchbase.com/profiles/appstorecouchbasecom.privisioningprofile
     ~buildbot/Desktop/appledeveloper.couchbase.com/profiles/Couchbase_Server_adhoc.privisioningprofile

Comment by Phil Labee [ 21/Aug/13 ]

As of build 2.2.0-806 the app is signed by a new provisioning profile
Comment by Phil Labee [ 22/Aug/13 ]
 Install version 2.2.0-806 on a macosx 10.8 machine that has never had Couchbase Server installed, which has the security setting to require applications to be signed with a developer ID.
Comment by Phil Labee [ 22/Aug/13 ]
please assign to tester
Comment by Maria McDuff (Inactive) [ 22/Aug/13 ]
just tried this against newest build 809:
still getting restriction message. see attached.
Comment by Maria McDuff (Inactive) [ 22/Aug/13 ]
restriction still exists.
Comment by Maria McDuff (Inactive) [ 28/Aug/13 ]
verified in rc1 (build 817). still not fixed. getting same msg:
“Couchbase Server” can’t be opened because it is from an unidentified developer.
Your security preferences allow installation of only apps from the Mac App Store and identified developers.

Work Around:
Step One
Hold down the Control key and click the application icon. From the contextual menu choose Open.

Step Two
A popup will appear asking you to confirm this action. Click the Open button.
Comment by Phil Labee [ 03/Sep/13 ]
Need to create new certificates to replace these that were revoked:

Certificate: Mac Development
Team Name: Couchbase, Inc.

Certificate: Mac Installer Distribution
Team Name: Couchbase, Inc.

Certificate: iOS Development
Team Name: Couchbase, Inc.

Certificate: iOS Distribution
Team Name: Couchbase, Inc.
Comment by Maria McDuff (Inactive) [ 18/Sep/13 ]
candidate for 2.2.1 bug fix release.
Comment by Dipti Borkar [ 28/Oct/13 ]
This is going to make it into 2.5? We seemed to keep differing it?
Comment by Phil Labee [ 29/Oct/13 ]
cannot test changes with installer that fails
Comment by Phil Labee [ 11/Nov/13 ]
Installed certs as buildbot and signed app with "(recommended) 3rd Party Mac Developer Application", producing

    http://factory.hq.couchbase.com//couchbase_server_2.5.0_MB-7250-001.zip

Signed with "(Oct 30) 3rd Party Mac Developer Application: Couchbase, Inc. (N2Q372V7W2)", producing

    http://factory.hq.couchbase.com//couchbase_server_2.5.0_MB-7250-002.zip

These zip files were made on the command line, not a result of the make command. They are 2.5G in size, so they obviously include mote than the zip files produced by the make command.

Both versions of the app appear to be signed correctly!

Note: cannot run make command from ssh session. Must Remote Desktop in and use terminal shell natively.
Comment by Phil Labee [ 11/Nov/13 ]
Finally, some progress: If the zip file is made using the --symlinks argument it appears to be un-signed. If the symlinked files are included, the app appears to be signed correctly.

The zip file with symlinks is 60M, while the zip file with copies of the files is 2.5G, more than 40X the size.
Comment by Phil Labee [ 25/Nov/13 ]
Fixed in 2.5.0-950
Comment by Dipti Borkar [ 25/Nov/13 ]
Maria, can QE please verify this?
Comment by Wayne Siu [ 28/Nov/13 ]
Tested with build 2.5.0-950. Still see the warning box (attached).
Comment by Wayne Siu [ 19/Dec/13 ]
Phil,
Can you give an update on this?
Comment by Ashvinder Singh [ 14/Jan/14 ]
I tested the code signature with apple utility "spctl -a -v /Applications/Couchbase\ Server.app/" and got the output :
>>> /Applications/Couchbase Server.app/: a sealed resource is missing or invalid

also tried running the command:
 
bash: codesign -dvvvv /Applications/Couchbase\ Server.app
>>>
Executable=/Applications/Couchbase Server.app/Contents/MacOS/Couchbase Server
Identifier=com.couchbase.couchbase-server
Format=bundle with Mach-O thin (x86_64)
CodeDirectory v=20100 size=639 flags=0x0(none) hashes=23+5 location=embedded
Hash type=sha1 size=20
CDHash=868e4659f4511facdf175b44a950b487fa790dc4
Signature size=4355
Authority=3rd Party Mac Developer Application: Couchbase, Inc. (N2Q372V7W2)
Authority=Apple Worldwide Developer Relations Certification Authority
Authority=Apple Root CA
Signed Time=Jan 8, 2014, 10:59:16 AM
Info.plist entries=31
Sealed Resources version=1 rules=4 files=5723
Internal requirements count=1 size=216

It looks like the code signature is present but got invalid as the new file were added/modified to the project. I suggest for the build team to rebuild and add the code signature again.
Comment by Phil Labee [ 17/Apr/14 ]
need VM to clone for developer experimentation
Comment by Anil Kumar [ 18/Jul/14 ]
Any update on this? We need this for 3.0.0 GA.

Please update the ticket.

Triage - July 18th
Comment by Wayne Siu [ 02/Aug/14 ]
Siri is helping to figure out what the next step is.
Comment by Anil Kumar [ 13/Aug/14 ]
Jens - Assigning as per Ravi's request.
Comment by Chris Hillery [ 13/Aug/14 ]
Jens requested assistance in setting up a MacOS development environment for building Couchbase. Phil (or maybe Siri?), can you help him with that?
Comment by Phil Labee [ 13/Aug/14 ]
The production macosx builder has been cloned:

    10.6.2.159 macosx-x64-server-builder-01-clone

if you want to use your own host, see:

    http://hub.internal.couchbase.com/confluence/display/CR/How+to+Setup+a+MacOSX+Server+Build+Node
Comment by Jens Alfke [ 15/Aug/14 ]
Here are the Apple docs on building apps signed with a Developer ID: https://developer.apple.com/library/mac/documentation/IDEs/Conceptual/AppDistributionGuide/DistributingApplicationsOutside/DistributingApplicationsOutside.html#//apple_ref/doc/uid/TP40012582-CH12-SW2

I've got everything configured, but the build process fails at the final step, after I press the Distribute button in the Organizer window. I get a very uninformative error alert, "Code signing operation failed / Check that the identity you selected is valid."

I've asked for help on the xcode-users mailing list. Blocked until I hear something back.
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC1
Comment by Phil Labee [ 25/Aug/14 ]
from Apple Developer mail list:

Dear Developer,

With the release of OS X Mavericks 10.9.5, the way that OS X recognizes signed apps will change. Signatures created with OS X Mountain Lion 10.8.5 or earlier (v1 signatures) will be obsoleted and Gatekeeper will no longer recognize them. Users may receive a Gatekeeper warning and will need to exempt your app to continue using it. To ensure your apps will run without warning on updated versions of OS X, they must be signed on OS X Mavericks 10.9 or later (v2 signatures).

If you build code with an older version of OS X, use OS X Mavericks 10.9 or later to sign your app and create v2 signatures using the codesign tool. Structure your bundle according to the signature evaluation requirements for OS X Mavericks 10.9 or later. Considerations include:

 * Signed code should only be placed in directories where the system expects to find signed code.

 * Resources should not be located in directories where the system expects to find signed code.

 * The --resource-rules flag and ResourceRules.plist are not supported.

Make sure your current and upcoming releases work properly with Gatekeeper by testing on OS X Mavericks 10.9.5 and OS X Yosemite 10.10 Developer Preview 5 or later. Apps signed with v2 signatures will work on older versions of OS X.

For more details, read “Code Signing changes in OS X Mavericks” and “Changes in 
OS X 10.9.5 and Yosemite Developer Preview 5” in OS X Code Signing In Depth":

    http://c.apple.com/r?v=2&la=en&lc=us&a=EEjRsqZNfcheZauIAhlqmxVG35c6HJuf50mGu47LWEktoAjykEJp8UYqbgca3uWG&ct=AJ0T0e3y2W

Best regards,
Apple Developer Technical Support
Comment by Phil Labee [ 28/Aug/14 ]
change to buildbot-internal to unlock keychain before running make and lock after:

    http://review.couchbase.org/#/c/41028/

change to couchdbx-app to sign app, on dev branch "plabee/MB-7250":

    http://review.couchbase.org/#/c/41025/

change to manifest to use this dev branch for 3.0.1 builds:

    http://review.couchbase.org/#/c/41026/
Comment by Wayne Siu [ 29/Aug/14 ]
Moving it to 3.0.1.




[MB-12204] New doc-system does not have anchors Created: 17/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: doc-system
Affects Version/s: 3.0-Beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The support team uses anchors all the time to link customers directly to the selection that has the information they required.

 know that we have broken a number of sections out into their own page but there are still some long pages for example:

http://draft.docs.couchbase.com/prebuilt/couchbase-manual-3.0/Misc/security-client-ssl.html


It would be good if we could link the customer directly to: "Configuring the PHP client for SSL"

I have marked this as a blocker as it will affect the way the support team works today.




[MB-12192] XDCR : After warmup, replica items are not deleted in destination cluster Created: 15/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, DCP
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Aruna Piravi Assignee: Sriram Ganesan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.x, 3.0.1-1297-rel

Attachments: Zip Archive 172.23.106.45-9152014-1553-diag.zip     GZip Archive 172.23.106.45-9152014-1623-couch.tar.gz     Zip Archive 172.23.106.46-9152014-1555-diag.zip     GZip Archive 172.23.106.46-9152014-1624-couch.tar.gz     Zip Archive 172.23.106.47-9152014-1558-diag.zip     GZip Archive 172.23.106.47-9152014-1624-couch.tar.gz     Zip Archive 172.23.106.48-9152014-160-diag.zip     GZip Archive 172.23.106.48-9152014-1624-couch.tar.gz    
Triage: Untriaged
Is this a Regression?: Yes

 Description   
Steps
--------
1. Setup uni-xdcr between 2 clusters with atleast 2 nodes
2. Load 5000 items onto 3 buckets at source, they get replicated to destination
3. Reboot a non-master node on destination (in this test .48)
4. After warmup, perform 30% updates and 30% deletes on source cluster
5. Deletes get propagated to active vbuckets on destination but replica vbuckets only experience partial deletion.

Important note
--------------------
This test had passed on 3.0.0-1208-rel and 3.0.0-1209-rel. However I'm able to reproduce this consistently on 3.0.1. Unsure if this is a recent regression.

2014-09-15 14:43:50 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', sasl_bucket_1 bucket
2014-09-15 14:43:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', standard_bucket_1 bucket
2014-09-15 14:43:51 | WARNING | MainProcess | Cluster_Thread | [task.check] Not Ready: vb_replica_curr_items 4250 == 3500 expected on '172.23.106.47:8091''172.23.106.48:8091', default bucket

Testcase
------------
./testrunner -i /tmp/bixdcr.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,reboot=dest_node,items=5000,rdirection=unidirection,replication_type=xmem,standard_buckets=1,sasl_buckets=1,pause=source,doc-ops=update-delete,doc-ops-dest=update-delete

On destination cluster
-----------------------------

Arunas-MacBook-Pro:bin apiravi$ ./cbvdiff 172.23.106.47:11210,172.23.106.48:11210
VBucket 512: active count 4 != 6 replica count

VBucket 513: active count 2 != 4 replica count

VBucket 514: active count 8 != 11 replica count

VBucket 515: active count 3 != 4 replica count

VBucket 516: active count 8 != 10 replica count

VBucket 517: active count 5 != 6 replica count

VBucket 521: active count 0 != 1 replica count

VBucket 522: active count 7 != 11 replica count

VBucket 523: active count 3 != 5 replica count

VBucket 524: active count 6 != 10 replica count

VBucket 525: active count 4 != 6 replica count

VBucket 526: active count 4 != 6 replica count

VBucket 528: active count 7 != 10 replica count

VBucket 529: active count 3 != 4 replica count

VBucket 530: active count 3 != 4 replica count

VBucket 532: active count 0 != 2 replica count

VBucket 533: active count 1 != 2 replica count

VBucket 534: active count 8 != 10 replica count

VBucket 535: active count 5 != 6 replica count

VBucket 536: active count 7 != 11 replica count

VBucket 537: active count 3 != 5 replica count

VBucket 540: active count 3 != 4 replica count

VBucket 542: active count 6 != 10 replica count

VBucket 543: active count 4 != 6 replica count

VBucket 544: active count 6 != 10 replica count

VBucket 545: active count 3 != 4 replica count

VBucket 547: active count 0 != 1 replica count

VBucket 548: active count 6 != 7 replica count

VBucket 550: active count 7 != 10 replica count

VBucket 551: active count 4 != 5 replica count

VBucket 552: active count 9 != 11 replica count

VBucket 553: active count 4 != 6 replica count

VBucket 554: active count 4 != 5 replica count

VBucket 555: active count 1 != 2 replica count

VBucket 558: active count 7 != 10 replica count

VBucket 559: active count 3 != 4 replica count

VBucket 562: active count 6 != 10 replica count

VBucket 563: active count 4 != 5 replica count

VBucket 564: active count 7 != 10 replica count

VBucket 565: active count 4 != 5 replica count

VBucket 566: active count 4 != 5 replica count

VBucket 568: active count 3 != 4 replica count

VBucket 570: active count 8 != 10 replica count

VBucket 571: active count 4 != 6 replica count

VBucket 572: active count 7 != 10 replica count

VBucket 573: active count 3 != 4 replica count

VBucket 574: active count 0 != 1 replica count

VBucket 575: active count 0 != 1 replica count

VBucket 578: active count 8 != 10 replica count

VBucket 579: active count 4 != 6 replica count

VBucket 580: active count 8 != 11 replica count

VBucket 581: active count 3 != 4 replica count

VBucket 582: active count 3 != 4 replica count

VBucket 583: active count 1 != 2 replica count

VBucket 584: active count 3 != 4 replica count

VBucket 586: active count 6 != 10 replica count

VBucket 587: active count 3 != 4 replica count

VBucket 588: active count 7 != 10 replica count

VBucket 589: active count 4 != 5 replica count

VBucket 591: active count 0 != 2 replica count

VBucket 592: active count 8 != 10 replica count

VBucket 593: active count 4 != 6 replica count

VBucket 594: active count 0 != 1 replica count

VBucket 595: active count 0 != 1 replica count

VBucket 596: active count 4 != 6 replica count

VBucket 598: active count 7 != 10 replica count

VBucket 599: active count 3 != 4 replica count

VBucket 600: active count 6 != 10 replica count

VBucket 601: active count 3 != 4 replica count

VBucket 602: active count 4 != 6 replica count

VBucket 606: active count 7 != 10 replica count

VBucket 607: active count 4 != 5 replica count

VBucket 608: active count 7 != 11 replica count

VBucket 609: active count 3 != 5 replica count

VBucket 610: active count 3 != 4 replica count

VBucket 613: active count 0 != 1 replica count

VBucket 614: active count 6 != 10 replica count

VBucket 615: active count 4 != 6 replica count

VBucket 616: active count 7 != 10 replica count

VBucket 617: active count 3 != 4 replica count

VBucket 620: active count 3 != 4 replica count

VBucket 621: active count 1 != 2 replica count

VBucket 622: active count 9 != 11 replica count

VBucket 623: active count 5 != 6 replica count

VBucket 624: active count 5 != 6 replica count

VBucket 626: active count 7 != 11 replica count

VBucket 627: active count 3 != 5 replica count

VBucket 628: active count 6 != 10 replica count

VBucket 629: active count 4 != 6 replica count

VBucket 632: active count 0 != 1 replica count

VBucket 633: active count 0 != 1 replica count

VBucket 634: active count 7 != 10 replica count

VBucket 635: active count 3 != 4 replica count

VBucket 636: active count 8 != 10 replica count

VBucket 637: active count 5 != 6 replica count

VBucket 638: active count 5 != 6 replica count

VBucket 640: active count 2 != 4 replica count

VBucket 641: active count 7 != 11 replica count

VBucket 643: active count 5 != 7 replica count

VBucket 646: active count 3 != 5 replica count

VBucket 647: active count 7 != 10 replica count

VBucket 648: active count 4 != 6 replica count

VBucket 649: active count 8 != 10 replica count

VBucket 651: active count 0 != 1 replica count

VBucket 653: active count 4 != 6 replica count

VBucket 654: active count 3 != 4 replica count

VBucket 655: active count 7 != 10 replica count

VBucket 657: active count 4 != 5 replica count

VBucket 658: active count 2 != 4 replica count

VBucket 659: active count 7 != 11 replica count

VBucket 660: active count 3 != 5 replica count

VBucket 661: active count 7 != 10 replica count

VBucket 662: active count 0 != 2 replica count

VBucket 666: active count 4 != 6 replica count

VBucket 667: active count 8 != 10 replica count

VBucket 668: active count 3 != 4 replica count

VBucket 669: active count 7 != 10 replica count

VBucket 670: active count 1 != 2 replica count

VBucket 671: active count 2 != 3 replica count

VBucket 673: active count 0 != 1 replica count

VBucket 674: active count 3 != 4 replica count

VBucket 675: active count 7 != 10 replica count

VBucket 676: active count 5 != 6 replica count

VBucket 677: active count 8 != 10 replica count

VBucket 679: active count 5 != 6 replica count

VBucket 681: active count 6 != 7 replica count

VBucket 682: active count 3 != 5 replica count

VBucket 683: active count 8 != 12 replica count

VBucket 684: active count 3 != 6 replica count

VBucket 685: active count 7 != 11 replica count

VBucket 688: active count 3 != 4 replica count

VBucket 689: active count 7 != 10 replica count

VBucket 692: active count 1 != 2 replica count

VBucket 693: active count 2 != 3 replica count

VBucket 694: active count 5 != 6 replica count

VBucket 695: active count 8 != 10 replica count

VBucket 696: active count 3 != 5 replica count

VBucket 697: active count 8 != 12 replica count

VBucket 699: active count 4 != 5 replica count

VBucket 700: active count 0 != 1 replica count

VBucket 702: active count 3 != 6 replica count

VBucket 703: active count 7 != 11 replica count

VBucket 704: active count 3 != 5 replica count

VBucket 705: active count 8 != 12 replica count

VBucket 709: active count 4 != 5 replica count

VBucket 710: active count 3 != 6 replica count

VBucket 711: active count 7 != 11 replica count

VBucket 712: active count 3 != 4 replica count

VBucket 713: active count 7 != 10 replica count

VBucket 715: active count 3 != 4 replica count

VBucket 716: active count 1 != 2 replica count

VBucket 717: active count 0 != 2 replica count

VBucket 718: active count 5 != 6 replica count

VBucket 719: active count 8 != 10 replica count

VBucket 720: active count 0 != 1 replica count

VBucket 722: active count 3 != 5 replica count

VBucket 723: active count 8 != 12 replica count

VBucket 724: active count 3 != 6 replica count

VBucket 725: active count 7 != 11 replica count

VBucket 727: active count 5 != 7 replica count

VBucket 728: active count 2 != 4 replica count

VBucket 729: active count 3 != 5 replica count

VBucket 730: active count 3 != 4 replica count

VBucket 731: active count 7 != 10 replica count

VBucket 732: active count 5 != 6 replica count

VBucket 733: active count 8 != 10 replica count

VBucket 737: active count 3 != 4 replica count

VBucket 738: active count 4 != 6 replica count

VBucket 739: active count 8 != 10 replica count

VBucket 740: active count 3 != 4 replica count

VBucket 741: active count 7 != 10 replica count

VBucket 743: active count 0 != 1 replica count

VBucket 746: active count 2 != 4 replica count

VBucket 747: active count 7 != 11 replica count

VBucket 748: active count 3 != 5 replica count

VBucket 749: active count 7 != 10 replica count

VBucket 751: active count 3 != 4 replica count

VBucket 752: active count 4 != 6 replica count

VBucket 753: active count 9 != 11 replica count

VBucket 754: active count 1 != 2 replica count

VBucket 755: active count 4 != 5 replica count

VBucket 758: active count 3 != 4 replica count

VBucket 759: active count 7 != 10 replica count

VBucket 760: active count 2 != 4 replica count

VBucket 761: active count 7 != 11 replica count

VBucket 762: active count 0 != 1 replica count

VBucket 765: active count 6 != 7 replica count

VBucket 766: active count 3 != 5 replica count

VBucket 767: active count 7 != 10 replica count

VBucket 770: active count 3 != 5 replica count

VBucket 771: active count 7 != 11 replica count

VBucket 772: active count 4 != 6 replica count

VBucket 773: active count 6 != 10 replica count

VBucket 775: active count 3 != 4 replica count

VBucket 777: active count 3 != 4 replica count

VBucket 778: active count 3 != 4 replica count

VBucket 779: active count 7 != 10 replica count

VBucket 780: active count 5 != 6 replica count

VBucket 781: active count 8 != 10 replica count

VBucket 782: active count 1 != 2 replica count

VBucket 783: active count 0 != 2 replica count

VBucket 784: active count 3 != 5 replica count

VBucket 785: active count 7 != 11 replica count

VBucket 786: active count 0 != 1 replica count

VBucket 789: active count 4 != 6 replica count

VBucket 790: active count 4 != 6 replica count

VBucket 791: active count 6 != 10 replica count

VBucket 792: active count 3 != 4 replica count

VBucket 793: active count 8 != 11 replica count

VBucket 794: active count 2 != 4 replica count

VBucket 795: active count 4 != 6 replica count

VBucket 798: active count 5 != 6 replica count

VBucket 799: active count 8 != 10 replica count

VBucket 800: active count 4 != 6 replica count

VBucket 801: active count 8 != 10 replica count

VBucket 803: active count 3 != 4 replica count

VBucket 804: active count 0 != 1 replica count

VBucket 805: active count 0 != 1 replica count

VBucket 806: active count 3 != 4 replica count

VBucket 807: active count 7 != 10 replica count

VBucket 808: active count 3 != 4 replica count

VBucket 809: active count 6 != 10 replica count

VBucket 813: active count 4 != 5 replica count

VBucket 814: active count 4 != 5 replica count

VBucket 815: active count 7 != 10 replica count

VBucket 816: active count 1 != 2 replica count

VBucket 817: active count 4 != 5 replica count

VBucket 818: active count 4 != 6 replica count

VBucket 819: active count 8 != 10 replica count

VBucket 820: active count 3 != 4 replica count

VBucket 821: active count 7 != 10 replica count

VBucket 824: active count 0 != 1 replica count

VBucket 826: active count 3 != 4 replica count

VBucket 827: active count 6 != 10 replica count

VBucket 828: active count 4 != 5 replica count

VBucket 829: active count 7 != 10 replica count

VBucket 831: active count 6 != 7 replica count

VBucket 833: active count 4 != 6 replica count

VBucket 834: active count 3 != 4 replica count

VBucket 835: active count 6 != 10 replica count

VBucket 836: active count 4 != 5 replica count

VBucket 837: active count 7 != 10 replica count

VBucket 840: active count 0 != 1 replica count

VBucket 841: active count 0 != 1 replica count

VBucket 842: active count 4 != 6 replica count

VBucket 843: active count 8 != 10 replica count

VBucket 844: active count 3 != 4 replica count

VBucket 845: active count 7 != 10 replica count

VBucket 847: active count 4 != 6 replica count

VBucket 848: active count 3 != 4 replica count

VBucket 849: active count 6 != 10 replica count

VBucket 851: active count 3 != 4 replica count

VBucket 852: active count 0 != 2 replica count

VBucket 854: active count 4 != 5 replica count

VBucket 855: active count 7 != 10 replica count

VBucket 856: active count 4 != 6 replica count

VBucket 857: active count 8 != 10 replica count

VBucket 860: active count 1 != 2 replica count

VBucket 861: active count 3 != 4 replica count

VBucket 862: active count 3 != 4 replica count

VBucket 863: active count 8 != 11 replica count

VBucket 864: active count 3 != 4 replica count

VBucket 865: active count 7 != 10 replica count

VBucket 866: active count 0 != 1 replica count

VBucket 867: active count 0 != 1 replica count

VBucket 869: active count 5 != 6 replica count

VBucket 870: active count 5 != 6 replica count

VBucket 871: active count 8 != 10 replica count

VBucket 872: active count 3 != 5 replica count

VBucket 873: active count 7 != 11 replica count

VBucket 875: active count 5 != 6 replica count

VBucket 878: active count 4 != 6 replica count

VBucket 879: active count 6 != 10 replica count

VBucket 882: active count 3 != 4 replica count

VBucket 883: active count 7 != 10 replica count

VBucket 884: active count 5 != 6 replica count

VBucket 885: active count 9 != 11 replica count

VBucket 886: active count 1 != 2 replica count

VBucket 887: active count 3 != 4 replica count

VBucket 889: active count 3 != 4 replica count

VBucket 890: active count 3 != 5 replica count

VBucket 891: active count 7 != 11 replica count

VBucket 892: active count 4 != 6 replica count

VBucket 893: active count 6 != 10 replica count

VBucket 894: active count 0 != 1 replica count

VBucket 896: active count 8 != 10 replica count

VBucket 897: active count 4 != 6 replica count

VBucket 900: active count 2 != 3 replica count

VBucket 901: active count 2 != 3 replica count

VBucket 902: active count 7 != 10 replica count

VBucket 903: active count 3 != 4 replica count

VBucket 904: active count 7 != 11 replica count

VBucket 905: active count 2 != 4 replica count

VBucket 906: active count 4 != 5 replica count

VBucket 909: active count 0 != 2 replica count

VBucket 910: active count 7 != 10 replica count

VBucket 911: active count 3 != 5 replica count

VBucket 912: active count 0 != 1 replica count

VBucket 914: active count 8 != 10 replica count

VBucket 915: active count 4 != 6 replica count

VBucket 916: active count 7 != 10 replica count

VBucket 917: active count 3 != 4 replica count

VBucket 918: active count 4 != 6 replica count

VBucket 920: active count 5 != 7 replica count

VBucket 922: active count 7 != 11 replica count

VBucket 923: active count 2 != 4 replica count

VBucket 924: active count 7 != 10 replica count

VBucket 925: active count 3 != 5 replica count

VBucket 928: active count 4 != 5 replica count

VBucket 930: active count 8 != 12 replica count

VBucket 931: active count 3 != 5 replica count

VBucket 932: active count 7 != 11 replica count

VBucket 933: active count 3 != 6 replica count

VBucket 935: active count 0 != 1 replica count

VBucket 938: active count 7 != 10 replica count

VBucket 939: active count 3 != 4 replica count

VBucket 940: active count 8 != 10 replica count

VBucket 941: active count 5 != 6 replica count

VBucket 942: active count 2 != 3 replica count

VBucket 943: active count 1 != 2 replica count

VBucket 944: active count 8 != 12 replica count

VBucket 945: active count 3 != 5 replica count

VBucket 946: active count 6 != 7 replica count

VBucket 950: active count 7 != 11 replica count

VBucket 951: active count 3 != 6 replica count

VBucket 952: active count 7 != 10 replica count

VBucket 953: active count 3 != 4 replica count

VBucket 954: active count 0 != 1 replica count

VBucket 956: active count 5 != 6 replica count

VBucket 958: active count 8 != 10 replica count

VBucket 959: active count 5 != 6 replica count

VBucket 960: active count 7 != 10 replica count

VBucket 961: active count 3 != 4 replica count

VBucket 962: active count 3 != 5 replica count

VBucket 963: active count 2 != 4 replica count

VBucket 966: active count 8 != 10 replica count

VBucket 967: active count 5 != 6 replica count

VBucket 968: active count 8 != 12 replica count

VBucket 969: active count 3 != 5 replica count

VBucket 971: active count 0 != 1 replica count

VBucket 972: active count 5 != 7 replica count

VBucket 974: active count 7 != 11 replica count

VBucket 975: active count 3 != 6 replica count

VBucket 976: active count 3 != 4 replica count

VBucket 978: active count 7 != 10 replica count

VBucket 979: active count 3 != 4 replica count

VBucket 980: active count 8 != 10 replica count

VBucket 981: active count 5 != 6 replica count

VBucket 982: active count 0 != 2 replica count

VBucket 983: active count 1 != 2 replica count

VBucket 986: active count 8 != 12 replica count

VBucket 987: active count 3 != 5 replica count

VBucket 988: active count 7 != 11 replica count

VBucket 989: active count 3 != 6 replica count

VBucket 990: active count 4 != 5 replica count

VBucket 993: active count 0 != 1 replica count

VBucket 994: active count 7 != 11 replica count

VBucket 995: active count 2 != 4 replica count

VBucket 996: active count 7 != 10 replica count

VBucket 997: active count 3 != 5 replica count

VBucket 998: active count 5 != 6 replica count

VBucket 1000: active count 4 != 5 replica count

VBucket 1001: active count 1 != 2 replica count

VBucket 1002: active count 9 != 11 replica count

VBucket 1003: active count 4 != 6 replica count

VBucket 1004: active count 7 != 10 replica count

VBucket 1005: active count 3 != 4 replica count

VBucket 1008: active count 7 != 11 replica count

VBucket 1009: active count 2 != 4 replica count

VBucket 1012: active count 4 != 5 replica count

VBucket 1014: active count 7 != 10 replica count

VBucket 1015: active count 3 != 5 replica count

VBucket 1016: active count 8 != 10 replica count

VBucket 1017: active count 4 != 6 replica count

VBucket 1018: active count 3 != 4 replica count

VBucket 1020: active count 0 != 1 replica count

VBucket 1022: active count 7 != 10 replica count

VBucket 1023: active count 3 != 4 replica count

Active item count = 3500

Same at source
----------------------
Arunas-MacBook-Pro:bin apiravi$ ./cbvdiff 172.23.106.45:11210,172.23.106.46:11210
Active item count = 3500

Will attach cbcollect and data files.


 Comments   
Comment by Mike Wiederhold [ 15/Sep/14 ]
This is not a bug. We no longer do this because a replica vbucket cannot delete items on it's own due to dcp.
Comment by Aruna Piravi [ 15/Sep/14 ]
I do not understand why this is not a bug. This is a case where replica items = 4250 and active = 3500. Both were initially 5000 before warmup. However 50% of the actual deletes have happened on replica bucket(5000->4250). And so I would expect the another 750 items to be deleted too so active=replica. If this is not a bug, in case of failover, the cluster will end up having more items than it did before the failover.
Comment by Aruna Piravi [ 15/Sep/14 ]
> We no longer do this because a replica vbucket cannot delete items on it's own due to dcp
Then I would expect the deletes to be propagated from active vbuckets through dcp..but these never get propagated. If you do a cbdiff even now, you can see the mismatch.
Comment by Sriram Ganesan [ 17/Sep/14 ]
Aruna

If there is a testrunner script available for steps (1) - (5), please update the bug. Thanks.
Comment by Aruna Piravi [ 17/Sep/14 ]
Done.




[MB-11060] Build and test 3.0 for 32-bit Windows Created: 06/May/14  Updated: 17/Sep/14  Due: 09/Jun/14

Status: Open
Project: Couchbase Server
Component/s: build, ns_server
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Task Priority: Blocker
Reporter: Chris Hillery Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7/8 32-bit

Issue Links:
Dependency
Duplicate

 Description   
For the "Developer Edition" of Couchbase Server 3.0 on Windows 32-bit, we need to first ensure that we can build 32-bit-compatible binaries. It is not possible to build 3.0 on a 32-bit machine due to the MSVC 2013 requirement. Hence we need to configure MSVC as well as Erlang on a 64-bit machine to produce 32-bit compatible binaries.

 Comments   
Comment by Chris Hillery [ 06/May/14 ]
This is assigned to Trond who is already experimenting with this. He should:

 * test being able to start the server on a 32-bit Windows 7/8 VM

 * make whatever changes are necessary to the CMake configuration or other build scripts to produce this build on a 64-bit VM

 * thoroughly document the requirements for the build team to reproduce this build

Then he can assign this bug to Chris to carry out configuring our build jobs accordingly.
Comment by Trond Norbye [ 16/Jun/14 ]
Can you give me a 32 bit windows installation I can test on. My MSDN license have expired and I don't have Windows media available (and the internal wiki page just have a limited set of licenses and no download links)

Then assign it back to me and I'll try it
Comment by Chris Hillery [ 16/Jun/14 ]
I think you can use 172.23.106.184 - it's a 32-bit Windows 2008 VM that we can't use for 3.0 builds anyway.
Comment by Trond Norbye [ 24/Jun/14 ]
I copied the full result of a build where I set target_platform=x86 on my 64 bit windows server (the "install" directory) over to a 32 bit windows machine and was able to start memcached and it worked as expected.

Our installers make other magic like install the service etc needed in order to start the full server. Once we have such an installer I can do further testing
Comment by Chris Hillery [ 24/Jun/14 ]
Bin - could you take a look at this (figuring out how to make InstallShield on a 64-bit machine create a 32-bit compatible installer)? I won't likely be able to get to it for at least a month, and I think you're the only person here who still has access to an InstallShield 2010 designer anyway.
Comment by Bin Cui [ 04/Sep/14 ]
PM should make the call that whether or not we want to have 32bit support for windows.
Comment by Anil Kumar [ 05/Sep/14 ]
Bin - As confirmed back in March-April supported platforms for Couchbase Server 3.0 - we decided to continue to build 32bit Windows for Development-Only support. As mentioned in our documentation deprecation page http://docs.couchbase.com/couchbase-manual-2.5/deprecated/#platforms.

Comment by Bin Cui [ 17/Sep/14 ]
1. create a 64bit builder with 32bit target.
2. Create a 32bit builder.
3. Transfer 64bit staging image to 32bit builder
4. Run the packaging steps and generate the final package out of 32bit builder.




[MB-6972] distribute couchbase-server through yum and ubuntu package repositories Created: 19/Oct/12  Updated: 17/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.1.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Blocker
Reporter: Anil Kumar Assignee: Phil Labee
Resolution: Unresolved Votes: 3
Labels: devX
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-8693 [Doc] distribute couchbase-server thr... Reopened
blocks MB-7821 yum install couchbase-server from cou... Resolved
Duplicate
duplicates MB-2299 Create signed RPM's Resolved
is duplicated by MB-9409 repository for deb packages (debian&u... Resolved
Flagged:
Release Note

 Description   
this helps us in handling dependencies that are needed for couchbase server
sdk team has already implemented this for various sdk packages.

we might have to make some changes to our packaging metadata to work with this schema

 Comments   
Comment by Steve Yen [ 26/Nov/12 ]
to 2.0.2 per bug-scrub

first step is do the repositories?
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Farshid Ghods (Inactive) [ 19/Dec/12 ]
Phil,
please sync up with Farshid and get instructions that Sergey and Pavel sent
Comment by Farshid Ghods (Inactive) [ 28/Jan/13 ]
we should resolve this task once 2.0.1 is released .
Comment by Dipti Borkar [ 29/Jan/13 ]
Have we figured out the upgrade process moving forward. for example from 2.0.1 to 2.0.2 or 2.0.1 to 2.1 ?
Comment by Jin Lim [ 04/Feb/13 ]
Please ensure that we also confirm/validate the upgrade process moving from 2.0.1 to 2.0.2. Thanks.
Comment by Phil Labee [ 06/Feb/13 ]
Now have DEB repo working, but another issue has come up: We need to distribute the public key so that users can install the key before running apt-get.

wiki page has been updated.
Comment by kzeller [ 14/Feb/13 ]
Added to 2.0.1 RN as:

Fix:

We now provide Couchbase Server as a yum and Debian package
repositories.
Comment by Matt Ingenthron [ 09/Apr/13 ]
What are the public URLs for these repositories? This was mentioned in the release notes here:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-server-rn_2-0-0l.html
Comment by Matt Ingenthron [ 09/Apr/13 ]
Reopening, since this isn't documented that I can find. Apologies if I'm just missing it.
Comment by Dipti Borkar [ 23/Apr/13 ]
Anil, can you work with Phil to see what are the next steps here?
Comment by Anil Kumar [ 24/Apr/13 ]
Yes I'll be having discussion with Phil and will update here with details.
Comment by Tim Ray [ 28/Apr/13 ]
could we either remove the note about yum/deb repo's in the release notes or get those repo locations / sample files / keys added to public pages? The only links that seem that they 'might' contain the info point to internal pages I don't have access to.
Comment by Anil Kumar [ 14/May/13 ]
thanks Tim, we have removed it from release notes. we will add instructions about yum/deb repo's locations/files/keys to documentation once its available. thanks!
Comment by kzeller [ 14/May/13 ]
Removing duplicate ticket:

http://www.couchbase.com/issues/browse/MB-7860
Comment by h0nIg [ 24/Oct/13 ]
any update? maybe i created a duplicate issue: http://www.couchbase.com/issues/browse/MB-9409 but it seems that the repositories are outdated on http://hub.internal.couchbase.com/confluence/display/CR/How+to+Use+a+Linux+Repo+--+debian
Comment by Sriram Melkote [ 22/Apr/14 ]
I tried to install on Debian today. It failed badly. One .deb package didn't match the libc version of stable. The other didn't match the openssl version. Changing libc or openssl is simply not an option for someone using Debian stable because it messes with the base OS too deeply. So as of 4/23/14, we don't have support for Debian.
Comment by Sriram Melkote [ 22/Apr/14 ]
Anil, we have accumulated a lot of input in this bug. I don't think this will realistically go anywhere for 3.0 unless we define specific goals and some considered platform support matrix expansion. Can you please create a goal for 3.0 more precisely?
Comment by Matt Ingenthron [ 22/Apr/14 ]
+1 on Siri's comments. Conversations I had with both Ubuntu (who recommend their PPAs) and Red Hat experts (who recommend setting up a repo or getting into EPEL or the like) indicated that's the best way to ensure coverage of all OSs. Binary packages built on one OS and deployed on another are risky, run into dependency issues.
Comment by Anil Kumar [ 28/Apr/14 ]
This ticket specially for distributing DEB and RPM repositories through YUM and APT repo. We have another ticket for supporting Debian platform MB-10960.
Comment by Anil Kumar [ 23/Jun/14 ]
Assigning ticket to Tony for verification.
Comment by Phil Labee [ 21/Jul/14 ]
Need to do before closing:

[ ] capture keys and process used for build that is currently posted (3.0.0-628), update tools and keys of record in build repo and wiki page
[ ] distribute 2.5.1 and 3.0.0-beta1 builds using same process, testing update capability
[ ] test update from 2.0.0 to 2.5.1 to 3.0.0
Comment by Phil Labee [ 21/Jul/14 ]
re-opening to assign to sprint to prepare the distribution repos for testing
Comment by Wayne Siu [ 30/Jul/14 ]
Phil,
has build 3.0.0-973 be updated in the repos for beta testing?
Comment by Wayne Siu [ 29/Aug/14 ]
Phil,
Please refresh it with build 3.0.0-1205. Thanks.
Comment by Phil Labee [ 04/Sep/14 ]
Due to loss of private keys used to post 3.0.0-628, created new key pairs. Upgrade testing was never done, so starting with 2.5.1 release version (2.5.1-1100).

upload and test using location http://packages.couchbase.com/linux-repos/TEST/:

  [X] ubuntu-12.04 x86_64
  [X] ubuntu-10.04 x86_64

  [X] centos-6-x86_64
  [X] centos-5-x86_64
Comment by Anil Kumar [ 04/Sep/14 ]
Phil / Wayne - Not sure whats happening here please clarify.
Comment by Wayne Siu [ 16/Sep/14 ]
Please refresh with the build 3.0.0-1209.
Comment by Phil Labee [ 17/Sep/14 ]
upgrade to 3.0.0-1209:


  [ ] ubuntu-12.04 x86_64
  [ ] ubuntu-10.04 x86_64

  [ ] centos-6-x86_64
  [ ] centos-5-x86_64

  [ ] debian-7-x86_64




[MB-11917] One node slow probably due to the Erlang scheduler Created: 09/Aug/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Volker Mische Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File crash_toy_701.rtf     PNG File leto_ssd_300-1105_561_build_init_indexleto_ssd_300-1105_561172.23.100.31beam.smp_cpu.png    
Issue Links:
Duplicate
duplicates MB-12200 Seg fault during indexing on view-toy... Resolved
duplicates MB-9822 One of nodes is too slow during indexing Closed
is duplicated by MB-12183 View Query Thruput regression compare... Resolved
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
One node is slow, that's probably due to the "scheduler collapse" bug in the Erlang VM R16.

I will try to find a way to verify that it is really the scheduler and no other problem. This is basically a duplicate of MB-9822. Though that bug has a long history, hence I dare to create a new one.

 Comments   
Comment by Volker Mische [ 09/Aug/14 ]
I forgot to add that our issue sounds exactly like that one: http://erlang.org/pipermail/erlang-questions/2012-October/069503.html
Comment by Sriram Melkote [ 11/Aug/14 ]
Upgrading to blocker as this is doubling initial index time in recent runs on showfast.
Comment by Volker Mische [ 12/Aug/14 ]
I verified that it's the "scheduler collapse". Have a look at the chart I've attached (It's from [1] [172.23.100.31] beam.smp_cpu). It starts with a utilization of around 400% at around 120 I reduced the online schedulers to 1 (with running erlang:system_flag(schedulers_online, 1) via a remote shell). I then increased the schedulers_online again at around 150 to the original value of 24. You can see that it got back to normal.

[1]: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1105_561_build_init_index
Comment by Volker Mische [ 12/Aug/14 ]
I would try to run on R16 and see how often it happens with COUCHBASE_NS_SERVER_VM_EXTRA_ARGS=["+swt", "low", "+sfwi", "100"] set (as suggested in MB-9822 [1]).

[1]: https://www.couchbase.com/issues/browse/MB-9822?focusedCommentId=89219&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-89219
Comment by Pavel Paulau [ 12/Aug/14 ]
We agreed to try:

+sfwi 100/500 and +sbwt long

Will run test 5 times with these options.
Comment by Pavel Paulau [ 13/Aug/14 ]
5 runs of tests/index_50M_dgm.test with -sfwi 100 -sbwt long:

http://ci.sc.couchbase.com/job/leto-dev/19/
http://ci.sc.couchbase.com/job/leto-dev/20/
http://ci.sc.couchbase.com/job/leto-dev/21/
http://ci.sc.couchbase.com/job/leto-dev/22/
http://ci.sc.couchbase.com/job/leto-dev/23/

3 normal runs, 2 with slowness.
Comment by Volker Mische [ 13/Aug/14 ]
I see only one slow run (22): http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1137_6a0_build_init_index

But still :-/
Comment by Pavel Paulau [ 13/Aug/14 ]
See (20), incremental indexing: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_ssd_300-1137_ed9_build_incr_index
Comment by Volker Mische [ 13/Aug/14 ]
Oh, I was only looking at the initial building.
Comment by Volker Mische [ 13/Aug/14 ]
I got a hint in the #erlang IRC channel. I'll try to use the erlang:bump_reductions(2000) and see if that helps.
Comment by Volker Mische [ 13/Aug/14 ]
Let's see if bumping the reductions make it work: http://review.couchbase.org/40591
Comment by Aleksey Kondratenko [ 13/Aug/14 ]
merged that commit.
Comment by Pavel Paulau [ 13/Aug/14 ]
Just tested build 3.0.0-1150, rebalance test but with initial indexing phase.

2 nodes are super slow and utilize only single core.
Comment by Volker Mische [ 18/Aug/14 ]
I can't reproduce it locally. I tend towards closing this issue as "won't fix". We should really not have long running NIFS.

I also think that it won't happen much under real work loads. And even if, the workaround would be to reduce the number of online schedulers to 1 and immediately increasing it again back to the original number.
Comment by Volker Mische [ 18/Aug/14 ]
Assigning to Siri to make the call on whether we close it or not.
Comment by Anil Kumar [ 18/Aug/14 ]
Triage - Not blocking 3.0 RC1
Comment by Raju Suravarjjala [ 19/Aug/14 ]
Triage: Siri will put additional information and this bug is being retargeted to 3.0.1
Comment by Sriram Melkote [ 19/Aug/14 ]
Folks, for too long we've had trouble that get pinned to our NIFs. In 3.5, let's solve them whatever is the correct Erlang approach to running heavy high performance code. Port, or reporting reductions, or moving to R17 with dirty schedulers, or some other option I missed - whatever is the best solution, let us implement in 3.5 and be done.
Comment by Volker Mische [ 09/Sep/14 ]
I think we should close this issue and rather create a new one for whatever we come up with (e.g. the async mapreduce NIF).
Comment by Harsha Havanur [ 10/Sep/14 ]
Toy Build for this change at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-hhs-x86_64_3.0.0-702-toy.deb

Review in progress at
http://review.couchbase.org/#/c/41221/4
Comment by Harsha Havanur [ 12/Sep/14 ]
Please find udpated toy build for this
http://latestbuilds.hq.couchbase.com/couchbase-server-community_ubunt12-3.0.0-toy-hhs-x86_64_3.0.0-704-toy.deb
Comment by Sriram Melkote [ 12/Sep/14 ]
Another occurrence of this, MB-12183.

I'm making this a blocker.
Comment by Harsha Havanur [ 13/Sep/14 ]
Centos build at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent64-3.0.0-toy-hhs-x86_64_3.0.0-700-toy.rpm
Comment by Ketaki Gangal [ 16/Sep/14 ]
Filed bug MB-12200 for this toy-build
Comment by Ketaki Gangal [ 17/Sep/14 ]
Attaching stack from toy-build 701
File

crash_toy_701.rtf

Access to machine is as mentioned previously on MB-12200.




[MB-4593] Windows Installer hangs on "Computing Space Requirements" Created: 27/Dec/11  Updated: 17/Sep/14

Status: Reopened
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.0-developer-preview-3, 2.0-developer-preview-4
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Bin Cui Assignee: Bin Cui
Resolution: Unresolved Votes: 3
Labels: windows, windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 7 Ultimate 64. Sony Vaio, i3 with 4GB RAM and 200 GB of 500 GB free. Also on a Sony Vaio, Windows 7 Ultimate 64, i7, 6 GB RAM and a 750GB drive with about 600 GB free.

Attachments: PNG File couchbase-installer.png     PNG File image001.png     PNG File ss 2014-08-28 at 4.16.09 PM.png    
Triage: Triaged

 Description   
When installing the Community Server 2.0 DP3 on Windows, the installer hangs on the "Computing space requirements screen." There is no additional feedback from the installer. After 90-120 minutes or so, it does move forward and complete. The same issue was reported on Google Groups a few months back - http://groups.google.com/group/couchbase/browse_thread/thread/37dbba592a9c150b/f5e6d80880f7afc8?lnk=gst&q=msi.

Executable: couchbase-server-community_x86_64_2.0.0-dev-preview-3.setup.exe

WORKAROUND IN 3.0 - Create a registry key HKLM\SOFTWARE\Couchbase, name=SkipVcRuntime, type=DWORD, value=1 to skip installing VC redistributable installation which is causing this issue. If VC redistributable is necessary, it must be installed manually if the registry key is set to skip automatic install of it.


 Comments   
Comment by Filip Stas [ 23/Feb/12 ]
Is there any solution for this? I'm experiencing the same problem. Running the unpacked msi does not seem to work because the Installshield setup has been configured to require to install through the exe.

Comment by Farshid Ghods (Inactive) [ 22/Mar/12 ]
from Bin:

Looks like it is related to installshield engine. Maybe installshield tries to access system registry and it is locked by other process. The suggestion is to shut down other running programs and try again if such problem pops up.
Comment by Farshid Ghods (Inactive) [ 22/Mar/12 ]
we were unable to reproduce this on windows 2008 64-bit

the bug mentions this happened on windows 7 64-bit which is not a supported platform but that should not make any difference
Comment by Farshid Ghods (Inactive) [ 23/Mar/12 ]
From Bin:

Windows 7 is my dev environment. And I have no problem to install and test it. From your description, I cannot tell whether it is failed during the installation or after installation finishes but couchcbase server cannot start.
 
If it is due to installshield failure, you can generate the log file for debugging as:
setup.exe /debuglog"C:\PathToLog\setupexe.log"
 
If Couchbase server fails to start, the most possible reason is due to missing or incompatible Microsoft runtime library. You can manually service_start.bat under bin directory and check what is going on. And you can run cbbrowse_log.bat to generate log file for further debugging.
Comment by John Zablocki (Inactive) [ 23/Mar/12 ]
This is an installation only problem. There's not much more to it other than the installer hangs on the screen (see attachment).

However, after a failed install, I did get it to work by:

a) deleting C:\Program Files\Couchbase\*

b) deleting all registry keys with Couchbase Server left over from the failed install

c) rebooting

Next time I see this problem, I'll run it again with the /debuglog

I think the problem might be that a previous install of DP3 or DP4 (nightly build) failed and left some bits in place somewhere.
Comment by Steve Yen [ 05/Apr/12 ]
from Perry...
Comment by Thuan Nguyen [ 05/Apr/12 ]
I can not repo this bug. I test on Windows 7 Professional 64 bit and Windows Server 2008 64 bit.
Here are steps:
- Install couchbase server 2.0.0r-388 (dp3)
- Open web browser and go to initial setup in web console.
- Uninstall couchbase server 2.0.0r-388
- Install couchbase server 2.0.0dp4r-722
- Open web browser and go to initial setup in web console.
Install and uninstall couchbase server go smoothly without any problem.
Comment by Bin Cui [ 25/Apr/12 ]
Maybe we need to get the installer verbose log file to get some clues.

setup.exe /verbose"c:\temp\logfile.txt"
Comment by John Zablocki (Inactive) [ 06/Jul/12 ]
Not sure if this is useful or not, but without fail, every time I encounter this problem, simply shutting down apps (usually Chrome for some reason) causes the hanging to stop. Right after closing Chrome, the C++ redistributable dialog pops open and installation completes.
Comment by Matt Ingenthron [ 10/Jul/12 ]
Workarounds/troubleshooting for this issue:


On installshield's website, there are similar problems reported for installshield. There are several possible reasons behind it:

1. The installation of the Microsoft C++ redistributable is blocked by some other running program, sometimes Chrome.
2. There are some remote network drives that are mapped to local system. Installshield may not have enough network privileges to access them.
3. Couchbase server was installed on the machine before and it was not totally uninstalled and/or removed. Installshield tried to recover from those old images.

To determine where to go next, run setup with debugging mode enabled:
setup.exe /debuglog"C:\temp\setupexe.log"

The contents of the log will tell you where it's getting stuck.
Comment by Bin Cui [ 30/Jul/12 ]
Matt's explanation should be included in document and Q&A website. I reproduced the hanging problem during installation if Chrome browser is running.
Comment by Farshid Ghods (Inactive) [ 30/Jul/12 ]
so does that mean the installer should wait until chrome and other browsers are terminated before proceeding ?

i see this as a very common use case with many installers that they ask the user to stop those applications and if user does not follow the instructions the set up process does not continue until these conditions are met.
Comment by Dipti Borkar [ 31/Jul/12 ]
Is there no way to fix this? At the least we need to provide an error or guidance that chrome needs to be quit before continuing. Is chrome the only one we have seen causing this problem?
Comment by Steve Yen [ 13/Sep/12 ]
http://review.couchbase.org/#/c/20552/
Comment by Steve Yen [ 13/Sep/12 ]
See CBD-593
Comment by Øyvind Størkersen [ 17/Dec/12 ]
Same bug when installing 2.0.0 (build-1976) on Windows 7. Stopping Chrome did not help, but killing the process "Logitech ScrollApp" (KhalScroll.exe) did..
Comment by Joseph Lam [ 13/Sep/13 ]
It's happening to me when installing 2.1.1 on Windows 7. What is this step for and it is really necessary? I see that it happens after the files have been copied to the installation folder. No entirely sure what it's computing space requirements for.
Comment by MikeOliverAZ [ 16/Nov/13 ]
Same problem on 2.2.0x86_64. I have tried everything, closing down chrome and torch from Task Manager to ensure no other apps are competing. Tried removing registry entries but so many, my time please. As is noted above this doesn't seem to be preventing writing the files under Program Files so what's it doing? So I cannot install, it now complains it cannot upgrade and run the installer again.

BS....giving up and going to MongoDB....it installs no sueat.

Comment by Sriram Melkote [ 18/Nov/13 ]
Reopening. Testing on VMs is a problem because they are all clones. We miss many problems like these.
Comment by Sriram Melkote [ 18/Nov/13 ]
Please don't close this bug until we have clear understanding of:

(a) What is the Runtime Library that we're trying to install that conflicts with all these other apps
(b) Why we need it
(c) A prioritized task to someone to remove that dependency on 3.0 release requirements

Until we have these, please do not close the bug.

We should not do any fixes on the lines of checking for known apps that conflict etc, as that is treating the symptom and not fixing the cause.
Comment by Bin Cui [ 18/Nov/13 ]
We install window runtime library because erlang runtime libraries depend on it. Not any runtime library, but the one that comes with erlang distribution package. Without it or with incompatible versions, erl.exe won't run.

In stead of checking any particular applications, the current solution is:
Run a erlang test script. If it runs correctly, no runtime library installed. Otherwise, installer has to install the runtime library.

Please see CBD-593.

Comment by Sriram Melkote [ 18/Nov/13 ]
My suggestion is that let us not attempt to install MSVCRT ourselves.

Let us check the library we need is present or not prior to starting the install (via appropriate registry keys).

If it is absent, let us direct the user to download and install it and exit.
Comment by Bin Cui [ 18/Nov/13 ]
The approach is not totally right. Even if the msvcrt exists, we still need to install it. Here the key is the absolute same msvrt package that comes with erlang distribution. We had problems before that with the same version, but different build of msvcrt installed, erlang won't run.

One possible solution is to ask user to download the msvcrt library from our website and make it a prerequisite for installing couchbase server.
Comment by Sriram Melkote [ 18/Nov/13 ]
OK. It looks like MS distributes some versions of VC runtime with the OS itself. I doubt that Erlang needs anything newer.

So let us rebuild Erlang and have it link to the OS supplied version of MSVCRT (i.e., msvcr70.dll) in Couchbase 3.0 onwards

In the meanwhile, let us point the user to the vcredist we ship in Couchbase 2.x versions and ask them to install it from there.
Comment by Steve Yen [ 23/Dec/13 ]
Saw this in the email inboxes...

From: Tal V
Date: December 22, 2013 at 1:19:36 AM PST
Subject: Installing Couchbase on Windows 7

Hi CouchBase support,
I would like to get your assist on an issue I’m having. I have a windows 7 machine on which I tried to install Couchbase, the installation is stuck on the “Computing space requirements”.
I tried several things without success:

1. 1. I tried to download a new installation package.

2. 2. I deleted all records of the software from the Registry.

3. 3. I deleted the folder that was created under C:\Program Files\Couchbase

4. 4. I restart the computer.

5. 5. Opened only the installation package.

6. 6. Re-install it again.
And again it was stuck on the same step.
What is the solution for it?

Thank you very much,


--
Tal V
Comment by Steve Yen [ 23/Dec/13 ]
Hi Bin,
Not knowing much about installshield here, but one idea - are there ways of forcibly, perhaps optionally, skipping the computing space requirements step? Some environment variable flag, perhaps?
Thanks,
Steve

Comment by Bin Cui [ 23/Dec/13 ]
This "Computing space requirements" is quite misleading. It happens at the post install step while GUI still shows that message. Within the step, we run the erlang test script and fails and the installer runs "vcredist.exe" for microsoft runtime library which gets stuck.

For the time being, the most reliable way is not to run this vcredist.exe from installer. Instead, we should provide a link in our download web site.

1. During installation, if we fails to run the erlang test script, we can pop up a warning dialog and ask customers to download and run it after installation.
 
Comment by Bin Cui [ 23/Dec/13 ]
To work around the problem, we can instruct the customer to download the vcredist.exe and run it manually before set up couchbase server. If running environment is set up correctly, installer will bypass that step.
Comment by Bin Cui [ 30/Dec/13 ]
Use windows registry key to install/skip the vcredist.exe step:

On 32bit windows, Installer will check HKEY_LOCAL_MACHINE\SOFTWARE\Couchbase\SkipVcRuntime
On 64bit windows, Installer will check HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Couchbase\SkipVcRuntime,
where SkipVcRuntime is a DWORD (32-bit) value.

When SkipVcRuntime is set to 1, installer will skip the step to install vcredist.exe. Otherwise, installer will follow the same logic as before.
vcredist_x86.exe can be found in the root directory of couchbase server. It can be run as:
c:\<couchbase_root>\vcredist_x86.exe

http://review.couchbase.org/#/c/31501/
Comment by Bin Cui [ 02/Jan/14 ]
Check into branch 2.5 http://review.couchbase.org/#/c/31558/
Comment by Iryna Mironava [ 22/Jan/14 ]
tested with Win 7 and Win Server 2008
I am unable to reproduce this issue(build 2.0.0-1976, dp3 is no longer available)
Installed/uninstalled couchbase several times
Comment by Sriram Melkote [ 22/Jan/14 ]
Unfortunately, for this problem, if it did not reproduce, we can't say it is fixed. We have to find a machine where it reproduces and then verify a fix.

Anyway, no change made actually addresses the underlying problem (the registry key just gives a way to workaround it when it happens), so reopening the bug and targeting for 3.0
Comment by Sriram Melkote [ 23/Jan/14 ]
Bin - I just noticed that the Erlang installer itself (when downloaded from their website) installs VC redistributable in non-silent mode. The Microsoft runtime installer dialog pop us up, indicates it will install VC redistributable and then complete. Why do we run it in silent mode (and hence assume liability of it running properly)? Why do we not run the MSI in interactive mode like ESL Erlang installer itself does?
Comment by Wayne Siu [ 05/Feb/14 ]
If we could get the information on the exact software version, it could be helpful.
From registry, Computer\HKLM\Software\Microsoft\WindowsNT\CurrentVersion
Comment by Wayne Siu [ 12/Feb/14 ]
Bin, looks like the erl.ini was locked when this issue happened.
Comment by Pavel Paulau [ 19/Feb/14 ]
Just happened to me in 2.2.0-837.
Comment by Anil Kumar [ 18/Mar/14 ]
Triaged by Don and Anil as per Windows Developer plan.
Comment by Bin Cui [ 08/Apr/14 ]
http://review.couchbase.org/#/c/35463/
Comment by Chris Hillery [ 13/May/14 ]
I'm new here, but it seems to me that vcredist_x64.exe does exactly the same thing as the corresponding MS-provided merge module for MSVC2013. If that's true, we should be able to just include that merge module in our project, and not need to fork out to install things. In fact, as of a few weeks ago, the 3.0 server installers are doing just that.

http://msdn.microsoft.com/en-us/library/dn501987.aspx

Is my understanding incomplete in some way?
Comment by Chris Hillery [ 14/May/14 ]
I can confirm that the most recent installers do install msvcr120.dll and msvcp120.dll in apparently the correct places, and the server can start with them. I *believe* this means that we no longer need to fork out vcredist_x64.exe, or have any of the InstallShield tricks to detect whether it is needed and/or skip installing it, etc. I'm leaving this bug open to both verify that the current merge module-based solution works, and to track removal of the unwanted code.
Comment by Sriram Melkote [ 16/May/14 ]
I've also verified that 3.0 build installed VCRT (msvcp100) is sufficient for Erlang R16.
Comment by Bin Cui [ 15/Sep/14 ]
Recently I happen to reproduce this problem on my own laptop. Use setup.exe /verbose"c:\temp\verbose.log", i generated a log file with more verbose debugging information. At the end the file, it looks something like :

MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: OVERVIEW.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_admin\overview\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: BUCKETS.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_admin\buckets\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: MN_DIALOGS.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_dialogs\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: ABOUT.09DE5D66_88FD_4345_97EE_506873561EC1 , Object: C:\t5\lib\ns_server\priv\public\angular\app\mn_dialogs\about\
MSI (c) (C4:C0) [10:51:36:274]: Dir (target): Key: ALLUSERSPROFILE , Object: Q:\
MSI (c) (C4:C0) [10:51:36:274]: PROPERTY CHANGE: Adding INSTALLLEVEL property. Its value is '1'.

It means that installer tried to populate some property values for alluser profile after it copied all data to install location even though it shows this notorious "Computing space requirements" message.

From every installation, installer will use user temp directory to populate installer related data. After I delete or rename temp data under
c:\Users\<logonuser>\AppData\Temp, I reboot the machine. I solve the problem. at least for my laptop.

Conclusion:

1. After installed copied files, it needs to set alluser profiles. This action is synchronous and it waits and checks exit code. And certainly it will hangs on if this action never returns.

2. This is an issue related to setup environment, i.e. caused by other running applications, etc.

Suggestion:

1. Stop any other browers and applications when you install couchbase.
2. Kill the installation process and uninstall the failed setup.
3. Delete/rename the temp location under c:\Users\<logonuser>\AppData\Temp
4. Reboot and try again.

Comment by Bin Cui [ 17/Sep/14 ]
Turns out, it is really about the installation environment, not about a particular installation step.

Suggest to document the work around method.
Comment by Don Pinto [ 17/Sep/14 ]
Bin, some installers kill conflicting processes before installation starts so that it can complete. Why can't we do this?

(Maybe using something like this - http://stackoverflow.com/questions/251218/how-to-stop-a-running-process-during-an-msi-based-un-install)

Thanks,
Don




[MB-12201] Hotfix Rollup Release Created: 16/Sep/14  Updated: 17/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Cihan Biyikoglu Assignee: Raju Suravarjjala
Resolution: Unresolved Votes: 0
Labels: hotfix
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
representing the rollup hotfix for 2.5.1 that includes all hotfixes (without the V8 change) release to date (sept 2014)

 Comments   
Comment by Dipti Borkar [ 16/Sep/14 ]
is this rollup still 2.5.1? it will create lots of confusion. can we tag it 2.5.2? or does that lead to another round of testing? there are way too many hot fixes so really need a new . release.
Comment by Cihan Biyikoglu [ 17/Sep/14 ]
Hi Dipti, to improve the hotfix management, we are changing the way we'll do hotfixes. the rollup will bring in more hotfixes together and ensure we provide customers the all fixes we know about. if we fixed an issue already at the time you requested your hotfix, there is not reason why we should risk exposing you to known+fixed issues in the version you are using. side effects of this should also improve life for support.
-cihan




[MB-12185] update to "couchbase" from "membase" in gerrit mirroring and manifests Created: 14/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.0, 2.5.1, 3.0-Beta
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Blocker
Reporter: Matt Ingenthron Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-8297 Some key projects are still hosted at... Open

 Description   
One of the key components of Couchbase is still only at github.com/membase and not at github.com/couchbase. I think it's okay to mirror to both locations (not that there's an advantage), but for sure it should be at couchbase and the manifest for Couchbase Server releases should be pointing to Couchbase.

I believe the steps here are as follows:
- Set up a github.com/couchbase/memcached project (I've done that)
- Update gerrit's commit hook to update that repository
- Change the manifests to start using that repository

Assigning this to build as a component, as gerrit is handled by the build team. Then I'm guessing it'll need to be handed over to Trond or another developer to do the manifest change once gerrit is up to date.

Since memcached is slow changing now, perhaps the third item can be done earlier.

 Comments   
Comment by Chris Hillery [ 15/Sep/14 ]
Actually manifests are owned by build team too so I will do both parts.

However, the manifest for the hopefully-final release candidate already exists, and I'm a teensy bit wary about changing it after the fact. The manifest change may need to wait for 3.0.1.
Comment by Matt Ingenthron [ 15/Sep/14 ]
I'll leave it to you to work out how to fix it, but I'd just point out that manifest files are mutable.
Comment by Chris Hillery [ 15/Sep/14 ]
The manifest we build from is mutable. The historical manifests recording what we have already built really shouldn't be.
Comment by Matt Ingenthron [ 15/Sep/14 ]
True, but they are. :) That was half me calling back to our discussion about tagging and mutability of things in the Mountain View office. I'm sure you remember that late night conversation.

If you can help here Ceej, that'd be great. I'm just trying to make sure we have the cleanest project possible out there on the web. One wart less will bring me to 999,999 or so. :)
Comment by Trond Norbye [ 15/Sep/14 ]
Just a FYI, we've been ramping up the changes to memcached, so it's no longer a slow moving component ;-)
Comment by Matt Ingenthron [ 15/Sep/14 ]
Slow moving w.r.t. 3.0.0 though, right? That means the current github.com/couchbase/memcached probably has the commit planned to be released, so it's low risk to update github.com/couchbase/manifest with the couchbase repo instead of membase.

That's all I meant. :)
Comment by Trond Norbye [ 15/Sep/14 ]
_all_ components should be slow moving with respect to 3.0.0 ;)
Comment by Chris Hillery [ 16/Sep/14 ]
Matt, it appears that couchbase/memcached is a *fork* of membase/memcached, which is probably undesirable. We can actively rename the membase/memcached project to couchbase/memcached, and github will automatically forward requests from the old name to the new so it is seamless. It also means that we don't have to worry about migrating any commits, etc.

Does anything refer to couchbase/memcached already? Could we delete that one outright and then rename membase/memcached instead?
Comment by Matt Ingenthron [ 16/Sep/14 ]
Ah, that would be my fault. I propose deleting the couchbase/memcached and then transferring ownership from membase/memcached to couchbase/memcached. I think that's what you meant by "actively rename", right? Sounds like a great plan.

I think that's all in your hands Ceej, but I'd be glad to help if needed.

I still think in the interest of reducing warts, it'd be good to fix the manifest.
Comment by Chris Hillery [ 16/Sep/14 ]
I will do that (rename the repo), just please confirm explicitly that temporarily deleting couchbase/memcached won't cause the world to end. :)
Comment by Matt Ingenthron [ 16/Sep/14 ]
It won't since it didn't exist until this last Sunday when I created this ticket. If something world-ending happens as a result, I'll call it a bug to have depended on it. ;)
Comment by Chris Hillery [ 18/Sep/14 ]
I deleted couchbase/memcached and then transferred ownership of membase/memcached to couchbase. The original membase/memcached repository had a number of collaborators, most of which I think were historical. For now, couchbase/memcached only has "Owners" and "Robots" listed as collaborators, which is generally the desired configuration.

http://review.couchbase.org/#/c/41470/ proposes changes to the active manifests. I see no problem with committing that.

As for the historical manifests, there are two:

1. Sooner or later we will add a "released/3.0.0.xml" manifest to the couchbase/manifest repository, representing the exact SHAs which were built. I think it's probably OK to retroactively change the remote on that manifest since the two repositories are aliases for each other. This will affect any 3.0.0 hotfixes which are built, etc.

2. However, all of the already-built 3.0 packages (.deb / .rpm / .zip files) have embedded in them the manifest which was used to build them. Those, unfortunately, cannot be changed at this time. Doing so would require re-packaging the deliverables which have already undergone QE validation. While it is technically possible to do so, it would be a great deal of manual work, and IMHO a non-trivial and unnecessary risk. The only safe solution would be to trigger a new build, but in that case I would argue we would need to re-validate the deliverables, which I'm sure is a non-starter for PM. I'm afraid this particular sub-wart will need to wait for 3.0.1 to be fully addressed.




[MB-12126] there is not manifest file on windows 3.0.1-1253 Created: 03/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Thuan Nguyen Assignee: Chris Hillery
Resolution: Unresolved Votes: 0
Labels: windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 2008 r2 64-bit

Attachments: PNG File ss 2014-09-03 at 12.05.41 PM.png    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Install couchbase server 3.0.1-1253 on windows server 2008 r2 64-bit. There is not manifest file in directory c:\Program Files\Couchbase\Server\



 Comments   
Comment by Chris Hillery [ 03/Sep/14 ]
Also true for 3.0 RC2 build 1205.
Comment by Chris Hillery [ 03/Sep/14 ]
(Side note: While fixing this, log onto build slaves and delete stale "server-overlay/licenses.tgz" file so we stop shipping that)
Comment by Anil Kumar [ 17/Sep/14 ]
Ceej - Any update on this?
Comment by Chris Hillery [ 18/Sep/14 ]
No, not yet.




[MB-12197] [Windows]: Bucket deletion failing with error 500 reason: unknown {"_":"Bucket deletion not yet complete, but will continue."} Created: 16/Sep/14  Updated: 18/Sep/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0.1
Fix Version/s: 3.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Meenakshi Goel Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: windows, windows-3.0-beta, windows_pm_triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.1-1299-rel

Attachments: Text File test.txt    
Triage: Triaged
Operating System: Windows 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.hq.northscale.net/job/win_2008_x64--14_01--replica_read-P0/32/consoleFull
http://qa.hq.northscale.net/job/win_2008_x64--59--01--bucket_flush-P1/14/console
http://qa.hq.northscale.net/job/win_2008_x64--59_01--warmup-P1/6/consoleFull

Test to Reproduce:
newmemcapable.GetrTests.getr_test,nodes_init=4,GROUP=P0,expiration=60,wait_expiration=true,error=Not found for vbucket,descr=#simple getr replica_count=1 expiration=60 flags = 0 docs_ops=create cluster ops = None
flush.bucketflush.BucketFlushTests.bucketflush,items=20000,nodes_in=3,GROUP=P0

*Note that test doesn't fail but further do fails with "error 400 reason: unknown ["Prepare join failed. Node is already part of cluster."]" because cleanup wasn't successful.

Logs:
[rebalance:error,2014-09-15T9:36:01.989,ns_1@10.3.121.182:<0.6938.0>:ns_rebalancer:do_wait_buckets_shutdown:307]Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.3.121.182',
                                                         {'EXIT',
                                                          {old_buckets_shutdown_wait_failed,
                                                           ["default"]}}}]

[error_logger:error,2014-09-15T9:36:01.989,ns_1@10.3.121.182:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
  crasher:
    initial call: erlang:apply/2
    pid: <0.6938.0>
    registered_name: []
    exception exit: {buckets_shutdown_wait_failed,
                        [{'ns_1@10.3.121.182',
                             {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                     ["default"]}}}]}
      in function ns_rebalancer:do_wait_buckets_shutdown/1 (src/ns_rebalancer.erl, line 308)
      in call from ns_rebalancer:rebalance/5 (src/ns_rebalancer.erl, line 361)
    ancestors: [<0.811.0>,mb_master_sup,mb_master,ns_server_sup,
                  ns_server_cluster_sup,<0.57.0>]
    messages: []
    links: [<0.811.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 46422
    stack_size: 27
    reductions: 5472
  neighbours:

[user:info,2014-09-15T9:36:01.989,ns_1@10.3.121.182:<0.811.0>:ns_orchestrator:handle_info:483]Rebalance exited with reason {buckets_shutdown_wait_failed,
                              [{'ns_1@10.3.121.182',
                                {'EXIT',
                                 {old_buckets_shutdown_wait_failed,
                                  ["default"]}}}]}
[ns_server:error,2014-09-15T9:36:09.645,ns_1@10.3.121.182:ns_memcached-default<0.4908.0>:ns_memcached:terminate:798]Failed to delete bucket "default": {error,{badmatch,{error,closed}}}

Uploading Logs

 Comments   
Comment by Meenakshi Goel [ 16/Sep/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12197/11dd43ca/10.3.121.182-9152014-938-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/e7795065/10.3.121.183-9152014-940-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/6442301b/10.3.121.102-9152014-942-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/10edf209/10.3.121.107-9152014-943-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/9f16f503/10.1.2.66-9152014-945-diag.zip
Comment by Ketaki Gangal [ 16/Sep/14 ]
Assigning to ns_server team for a first look.
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
For cases like this it's very useful to get sample of backtraces from memcached on bad node. Is it still running ?
Comment by Aleksey Kondratenko [ 16/Sep/14 ]
Eh. It's windows....
Comment by Aleksey Kondratenko [ 17/Sep/14 ]
I've merged diagnostics commit (http://review.couchbase.org/41463). Please rerun, reproduce and give me new set of logs.
Comment by Meenakshi Goel [ 18/Sep/14 ]
Tested with 3.0.1-1307-rel, Please find logs below.
https://s3.amazonaws.com/bugdb/jira/MB-12197/c2191900/10.3.121.182-9172014-2245-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/28bc4a83/10.3.121.183-9172014-2246-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/8f1efbe5/10.3.121.102-9172014-2248-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/91a89d6a/10.3.121.107-9172014-2249-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12197/2d272074/10.1.2.66-9172014-2251-diag.zip




[MB-11048] Range queries result in thousands of GET operations/sec Created: 05/May/14  Updated: 18/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
Benchmark for range queries demonstrated very high latency. At the same time I noticed extremely high rate of GET operations.

Even single query such as "SELECT name.f.f.f AS _name FROM bucket-1 WHERE coins.f > 224.210000 AND coins.f < 448.420000 LIMIT 20" led to hundreds of memcached reads.

Explain:

https://gist.github.com/pavel-paulau/5e90939d6ab28034e3ed

Engine output:

https://gist.github.com/pavel-paulau/b222716934dfa3cb598e

I don't like to use JIRA as forum but why does it happen? Do you fetch entire range before returning limited output?

 Comments   
Comment by Gerald Sangudi [ 05/May/14 ]
Pavel,

Yes, the scan and fetch are performed before we do any LIMIT. This will be fixed in DP4, but it may not be easily fixable in DP3.

Can you please post the results of the following query:

SELECT COUNT(*) FROM bucket-1 WHERE coins.f > 224.210000 AND coins.f < 448.420000

Thanks.
Comment by Pavel Paulau [ 05/May/14 ]
cbq> SELECT COUNT(*) FROM bucket-1 WHERE coins.f > 224.210000 AND coins.f < 448.420000
{
    "resultset": [
        {
            "$1": 2134
        }
    ],
    "info": [
        {
            "caller": "http_response:160",
            "code": 100,
            "key": "total_rows",
            "message": "1"
        },
        {
            "caller": "http_response:162",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "547.545767ms"
        }
    ]
}
Comment by Pavel Paulau [ 05/May/14 ]
Also it looks like we are leaking memory in this scenario.

Resident memory of cbq-engine grows very fast (several megabytes per second) and never goes down...




[MB-11007] Request for Get Multi Meta Call for bulk meta data reads Created: 30/Apr/14  Updated: 30/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Parag Agarwal Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: All


 Description   
Currently we support per key call for getMetaData. As a result our verification requires per key fetch during verification phase. This request is to support for get bulk meta data call which can get us meta data per vbucket for all keys or in batches. This would help enhance our verification ability for meta data per documents over time or after operations like rebalance, as it will be faster. If there is a better alternative, please recommend.

Current Behavior

https://github.com/couchbase/ep-engine/blob/master/src/ep.cc

ENGINE_ERROR_CODE EventuallyPersistentStore::getMetaData(
                                                        const std::string &key,
                                                        uint16_t vbucket,
                                                        const void *cookie,
                                                        ItemMetaData &metadata,
                                                        uint32_t &deleted,
                                                        bool trackReferenced)
{
    (void) cookie;
    RCPtr<VBucket> vb = getVBucket(vbucket);
    if (!vb || vb->getState() == vbucket_state_dead ||
        vb->getState() == vbucket_state_replica) {
        ++stats.numNotMyVBuckets;
        return ENGINE_NOT_MY_VBUCKET;
    }

    int bucket_num(0);
    deleted = 0;
    LockHolder lh = vb->ht.getLockedBucket(key, &bucket_num);
    StoredValue *v = vb->ht.unlocked_find(key, bucket_num, true,
                                          trackReferenced);

    if (v) {
        stats.numOpsGetMeta++;

        if (v->isTempInitialItem()) { // Need bg meta fetch.
            bgFetch(key, vbucket, -1, cookie, true);
            return ENGINE_EWOULDBLOCK;
        } else if (v->isTempNonExistentItem()) {
            metadata.cas = v->getCas();
            return ENGINE_KEY_ENOENT;
        } else {
            if (v->isTempDeletedItem() || v->isDeleted() ||
                v->isExpired(ep_real_time())) {
                deleted |= GET_META_ITEM_DELETED_FLAG;
            }
            metadata.cas = v->getCas();
            metadata.flags = v->getFlags();
            metadata.exptime = v->getExptime();
            metadata.revSeqno = v->getRevSeqno();
            return ENGINE_SUCCESS;
        }
    } else {
        // The key wasn't found. However, this may be because it was previously
        // deleted or evicted with the full eviction strategy.
        // So, add a temporary item corresponding to the key to the hash table
        // and schedule a background fetch for its metadata from the persistent
        // store. The item's state will be updated after the fetch completes.
        return addTempItemForBgFetch(lh, bucket_num, key, vb, cookie, true);
    }
}



 Comments   
Comment by Venu Uppalapati [ 30/Apr/14 ]
Server has support for quiet CMD_GETQ_META call which can be used on the client side to create a multi-getMeta call similar to multiGet call implementation.
Comment by Parag Agarwal [ 30/Apr/14 ]
Please point to a working example for this call
Comment by Venu Uppalapati [ 30/Apr/14 ]
Parag, you can find some relevant information on using queuing requests using quiet call at https://code.google.com/p/memcached/wiki/BinaryProtocolRevamped#Get,_Get_Quietly,_Get_Key,_Get_Key_Quietly
Comment by Chiyoung Seo [ 30/Apr/14 ]
Changing the fix version to the feature backlog given that 3.0 feature complete date was already passed and it is requested for the QE testing framework.




[MB-10993] Cluster Overview - Usable Free Space documentation misleading Created: 29/Apr/14  Updated: 29/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.1
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Jim Walker Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: documentation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Issue relates to:
 http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#viewing-cluster-summary

I was working through a support case and trying to explain the cluster overview free space and usable free space.

The following statement is from out documentation. After code review of ns_server I concluded that this is incorrect.

Usable Free Space:
The amount of usable space for storing information on disk. This figure shows the amount of space available on the configured path after non-Couchbase files have been taken into account.

The correct statement should be

Usable Free Space:
The amount of usable space for storing information on disk. This figure is calculated from the node with least amount of available storage in the cluster. The final value is calculated by multiplying by the number of nodes in the cluster.


This change is important as it is important for users to understand why Usable Free Space can be less than Free Space. The cluster considers all nodes to be equal. If you actually have a "weak" node in the cluster, e.g. one with a small disk, then the cluster nodes all have to ensure they keep storage under the weaker nodes limits, else for example we can never failover to the weak node as it cannot take on the job of a stronger node. When Usable Free Space is less than Free space, the user may actually want to see why a node has less storage available.




[MB-10920] unable to start tuq if there are no buckets Created: 22/Apr/14  Updated: 18/Jun/14  Due: 23/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
node is initialized but has no buckets
[root@kiwi-r116 tuqtng]# ./tuqtng -couchbase http://localhost:8091
10:26:56.520415 Info line disabled false
10:26:56.522641 FATAL: Unable to run server, err: Unable to access site http://localhost:8091, err: HTTP error 401 Unauthorized getting "http://localhost:8091/pools": -- main.main() at main.go:76




[MB-10834] update the license.txt for enterprise edition for 2.5.1 Created: 10/Apr/14  Updated: 19/Jun/14

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word 2014-04-07 EE Free Clickthru Breif License.docx    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
document attached.

 Comments   
Comment by Phil Labee [ 10/Apr/14 ]
2.5.1 has already been shipped, so this file can't be included.

Is this for 3.0.0 release?
Comment by Phil Labee [ 10/Apr/14 ]
voltron commit: 8044c51ad7c5bc046f32095921f712234e74740b

uses the contents of the attached file to update LICENSE-enterprise.txt on the master branch.




[MB-10821] optimize storage of larger binary object in couchbase Created: 10/Apr/14  Updated: 10/Apr/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-10084] Sub-Task: Changes required for Data Encryption in Client SDK's Created: 30/Jan/14  Updated: 28/May/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Anil Kumar Assignee: Andrei Baranouski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on JCBC-441 add SSL support in support of Couchba... Open
depends on CCBC-344 add support for SSL to libcouchbase i... Resolved
depends on NCBC-424 Add SSL support in support of Couchba... Resolved

 Description   
Changes required for Data Encryption in Client SDK's

 Comments   
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
wanted to make sure we agree this will be in 3.0. Matt any concerns?
thanks
Comment by Matt Ingenthron [ 20/Mar/14 ]
This should be closed in favor of the specific project issues. That said, the description is a bit fuzzy. Is this SSL support for memcached && views && any cluster management?

Please clarify and then we can open specific issues. It'd be good to have a link to functional requirements.
Comment by Matt Ingenthron [ 20/Mar/14 ]
And Cihan: it can't be "in 3.0", unless you mean concurrent release or release prior to 3.0 GA. Is that what you mean? I'd actually aim to have this feature support in SDKs prior to 3.0's release and we are working on it right now, though it has some other dependencies. See CCBC-344, for example.
Comment by Cihan Biyikoglu [ 20/Mar/14 ]
thanks Matt. I meant 3.0 paired client SDK release so prior or shortly after is all good for me.
context - we are doing a pass to clean up JIRA. Like to button up what's in and out for 3.0.
Comment by Cihan Biyikoglu [ 24/Mar/14 ]
Matt, is there a client side ref implementation you guys did for this one? would be good to pass that onto test folks for initial validation until you guys completely integrate so no regressions creep up while we march to GA.
thanks
Comment by Matt Ingenthron [ 24/Mar/14 ]
We did verification with a non-mainline client since that was the quickest way to do so and have provided that to QE. Also, Brett filed a bug around HTTPS with ns-server and streaming configuration replies. See MB-10519.

We'll do a mainline client with libcouchbase and the python client as soon as it's dependency for handling packet IO is done. This is under CCBC-298 and CCBC-301, among others.




[MB-10003] [Port-configurability] Non-root instances and multiple sudo instances in a box cannot be 'offline' upgraded Created: 24/Jan/14  Updated: 27/Mar/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.5.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Unix/Linux


 Description   
Scenario
------------
As of today, we do not support offline 'upgrade' per se for packages installed in non-root/sudo users. Upgrades are usually handled by package managers. Since these are absent in non-root users and rpm cannot handle more than a a single package upgrade(if there are many instances running), offline upgrades are not supported (confirmed with Bin).

ALL non-root installations will be affected by this limitation. Although a single instance running on a box under sudo user can be offline upgraded, it cannot be extended to more than one such instance.

This is important

Workaround
-----------------
- Online upgrade (swap with nodes running latest build, take old nodes down and do clean install)
- Backup data and restore after fresh install (cbbackup and cbrestore)

Note : At this point, these are mere suggestions and both these workarounds haven't been tested yet.




[MB-10146] Document editor overwrites precision of long numbers Created: 06/Feb/14  Updated: 09/May/14

Status: Reopened
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged

 Description   
Just tested this out, not sure what diagnostics to capture so please let me know.

Simple test case:
-Create new document via document editor in UI
-Document contents are:
{"id": 18446744072866779556}
-As soon as you save, the above number is rewritten to:
{
  "id": 18446744072866780000
}
-The same effect is had if you edit a document that was inserted with the above "long" number

 Comments   
Comment by Aaron Miller (Inactive) [ 06/Feb/14 ]
It's worth noting views will always suffer from this, as it is a limitation of Javascript in general. Many JSON libraries have this behavior as well (even though they don't *have* to).
Comment by Aleksey Kondratenko [ 11/Apr/14 ]
cannot fix it. Just closing. If you want to reopen, please pass it to somebody responsible for overall design.
Comment by Perry Krug [ 11/Apr/14 ]
Reopening and assigning to docs, we need this to be release noted IMO.
Comment by Ruth Harris [ 14/Apr/14 ]
Reassigning to Anil. He makes the call on what we put in the release notes for known and fixed issues.
Comment by Anil Kumar [ 09/May/14 ]
Ruth - Lets release note this for 3.0.




[MB-11282] Separate stats for internal memory allocation (application vs. data) Created: 02/Jun/14  Updated: 02/Jun/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Story Priority: Critical
Reporter: Pavel Paulau Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
AFAIK currently we track allocation for data and application together.

But sometimes application (memcached / ep-engine) overhead is huge and cannot be ignored.




[MB-11250] Go-Coucbase: Provide DML APIs using CAS Created: 29/May/14  Updated: 18/Jun/14  Due: 30/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Gerald Sangudi Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-11247] Go-Couchbase: Use password to connect to SASL buckets Created: 29/May/14  Updated: 19/Jun/14  Due: 30/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Gerald Sangudi Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Gerald Sangudi [ 19/Jun/14 ]
https://github.com/couchbaselabs/query/blob/master/docs/n1ql-authentication.md




[MB-11208] stats.org should be installed Created: 27/May/14  Updated: 27/May/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: techdebt-backlog
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Trond Norbye Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
stats.org contains a description of the stats we're sending from ep-engine. It could be useful for people

 Comments   
Comment by Matt Ingenthron [ 27/May/14 ]
If it's "useful" shouldn't this be part of official documentation? I've often thought it should be. There's probably a duplicate here somewhere.

I also think the stats need stability labels applied as people may rely on stats when building their own integration/monitoring tools. COMMITTED, UNCOMMITTED, VOLATILE, etc. would be useful for the stats.

Relatedly, someone should document deprecation of TAP stats for 3.0.




[MB-11195] Support binary collation for views Created: 23/May/14  Updated: 16/Jun/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Melkote Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
N1QL would benefit significantly if we could allow memcmp() collation for views it creates. So much so that we should consider this for a minor release after 3.0 so it can be available for N1QL beta.




[MB-11102] extended documentation about stats flowing out of CBSTATS and the correlation between them Created: 12/May/14  Updated: 12/May/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Update documentation about stats flowing out of CBSTATS and the correlation between them - Need this to be able to accurately predict capacity/other bottlenecks as well as detect trends.




[MB-11101] supported go SDK for couchbase server Created: 12/May/14  Updated: 16/Jun/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
go client




[MB-11098] Ability to set block size written to storage for better alignment with SSD¹s and/or HDD¹s for better throughput performance Created: 12/May/14  Updated: 12/May/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Cihan Biyikoglu Assignee: Cihan Biyikoglu
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Ability to set block size written to storage for better alignment with SSD¹s and/or HDD¹s for better throughput performance




[MB-10767] DOC: Misc - DITA conversion Created: 04/Apr/14  Updated: 04/Apr/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Ruth Harris Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-10651] The guide for install user defined port doesn't work for Rest port change Created: 26/Mar/14  Updated: 17/Jun/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Larry Liu Assignee: Aruna Piravi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#install-user-defined-ports

I followed the instruction to change admin port (Rest port):
append to the /opt/couchbase/etc/couchbase/static_config file.
{rest_port, 9000}.

[root@localhost bin]# netstat -an| grep 9000
[root@localhost bin]# netstat -an| grep :8091
tcp 0 0 0.0.0.0:8091 0.0.0.0:* LISTEN

logs:
https://s3.amazonaws.com/customers.couchbase.com/larry/output.zip

Larry



 Comments   
Comment by Larry Liu [ 26/Mar/14 ]
The log files shows that the change was taken by server:

[ns_server:info,2014-03-26T19:13:24.063,nonode@nohost:<0.58.0>:ns_server:log_pending:30]Static config terms:
[{error_logger_mf_dir,"/opt/couchbase/var/lib/couchbase/logs"},
 {error_logger_mf_maxbytes,10485760},
 {error_logger_mf_maxfiles,20},
 {path_config_bindir,"/opt/couchbase/bin"},
 {path_config_etcdir,"/opt/couchbase/etc/couchbase"},
 {path_config_libdir,"/opt/couchbase/lib"},
 {path_config_datadir,"/opt/couchbase/var/lib/couchbase"},
 {path_config_tmpdir,"/opt/couchbase/var/lib/couchbase/tmp"},
 {nodefile,"/opt/couchbase/var/lib/couchbase/couchbase-server.node"},
 {loglevel_default,debug},
 {loglevel_couchdb,info},
 {loglevel_ns_server,debug},
 {loglevel_error_logger,debug},
 {loglevel_user,debug},
 {loglevel_menelaus,debug},
 {loglevel_ns_doctor,debug},
 {loglevel_stats,debug},
 {loglevel_rebalance,debug},
 {loglevel_cluster,debug},
 {loglevel_views,debug},
 {loglevel_mapreduce_errors,debug},
 {loglevel_xdcr,debug},
 {rest_port,9000}]
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
This is because rest_port entry in static_config is only taken into account for fresh install.

There's some way to install our package without starting server first. And that has to be documented. I don't know who owns working with docs people.
Comment by Anil Kumar [ 09/May/14 ]
Alk - Before it gets to documentation we need to test it and verify the instructions. Can you provide those instructions and assign this ticket to Aruna to test it.
Comment by Anil Kumar [ 03/Jun/14 ]
Alk - can you provide those instructions and assign this ticket to Aruna to test it.
Comment by Aleksey Kondratenko [ 04/Jun/14 ]
Instructions fail to mention the fact that rest_port must be changed before config.dat is written. And config.dat is initialized on first server start.

There's some way to install server without starting it.

But here's what I managed to do:

# dpkg -i ~/Desktop/forReview/couchbase-server-enterprise_ubuntu_1204_x86_2.5.1-1086-rel.deb

# /etc/init.d/couchbase-server stop

# rm /opt/couchbase/var/lib/couchbase/config/config.dat

# emacs /opt/couchbase/etc/couchbase/static_config

# /etc/init.d/couchbase-server start

I.e. I stoped service, removed config.dat, edited static_config, then started it back and found rest port to be updated.
Comment by Anil Kumar [ 04/Jun/14 ]
Thanks Alk. Assigning this to Aruna for verification and later please assign this ticket to Documentation (Ruth).




[MB-10531] No longer necessary to wait for persistence to issue stale=false query Created: 21/Mar/14  Updated: 25/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Sriram Melkote Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Matt pointed out that in the past, we had to wait for an item to persist to disk before issuing stale=false query for correct results. In 3.0, this is not necessary. One can issue a stale=false view query anytime and results will fetch all changes that have been made when the query was issued. This task is a placeholder to update docs to remove the unnecessary step of waiting for persistence in 3.0 docs.

 Comments   
Comment by Matt Ingenthron [ 21/Mar/14 ]
Correct. Thanks for making sure this is raised Siri. While I'm thinking of it, two points need to be in there:
1) if you have older code, you will need to change it to take advantage of the semantic change to the query
2) application developers still need to be a bit careful to ensure any modifications being done aren't async operations-- they'll have to wait for the responses before doing the stale=false query
Comment by Anil Kumar [ 25/Mar/14 ]
This is for 3.0 documentation.
Comment by Sriram Melkote [ 25/Mar/14 ]
Not an improvement. This is a task.




[MB-10511] Feature request for supporting rolling downgrades Created: 19/Mar/14  Updated: 11/Apr/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Abhishek Singh Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to

 Description   
Some customers are interested in Couchbase supporting rolling downgrades. Currently we can't add 2.2 nodes inside a cluster that has all nodes on 2.5.




[MB-10512] Update documentation to convey we don't support rolling downgrades Created: 19/Mar/14  Updated: 27/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Task Priority: Critical
Reporter: Abhishek Singh Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Update documentation to convey we don't support rolling downgrades to 2.2 once all nodes are running on 2.5




[MB-10432] Removed ep_max_txn_size stat/engine_parameter Created: 11/Mar/14  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Critical
Reporter: Mike Wiederhold Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This value is no longer used in the server. Please not that you need to update the documentation for cbepctl since this stat could be set with that script.




[MB-10430] Add AWS AMI documentation to Installation and Upgrade Guide Created: 11/Mar/14  Updated: 25/Mar/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 3.0
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Brian Shumate Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It would be useful to have some basic installation instructions for those who want to use the Couchbase Server Amazon Machine Instances in a direct manner without RightScale.

This is particularly with regards to the special case of the Administrator user and password, which can become a stumbling point for some users.


 Comments   
Comment by Anil Kumar [ 25/Mar/14 ]
Ruth - Please add reference to Couchbase on AWS Whitepaper - http://aws.typepad.com/aws/2013/08/running-couchbase-on-aws-new-white-paper.html that has all the information.




[MB-10379] index is not used for simple query Created: 06/Mar/14  Updated: 28/May/14  Due: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: 2.5.0
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Iryna Mironava
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 64-bit

Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
I created index for name field of bucket b0 and then my_skill index for b0
cbq> select * from :system.indexes
{
    "resultset": [
        {
            "bucket_id": "b0",
            "id": "#alldocs",
            "index_key": [
                "META().id"
            ],
            "index_type": "view",
            "name": "#alldocs",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
        {
            "bucket_id": "b0",
            "id": "my_name",
            "index_key": [
                "name"
            ],
            "index_type": "view",
            "name": "my_name",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
       {
            "bucket_id": "b0",
            "id": "my_skill",
            "index_key": [
                "skills"
            ],
            "index_type": "view",
            "name": "my_skill",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
        {
            "bucket_id": "b1",
            "id": "#alldocs",
            "index_key": [
                "META().id"
            ],
            "index_type": "view",
            "name": "#alldocs",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        },
        {
            "bucket_id": "default",
            "id": "#alldocs",
            "index_key": [
                "META().id"
            ],
            "index_type": "view",
            "name": "#alldocs",
            "pool_id": "default",
            "site_id": "http://localhost:8091"
        }
    ],
    "info": [
        {
            "caller": "http_response:160",
            "code": 100,
            "key": "total_rows",
            "message": "4"
        },
        {
            "caller": "http_response:162",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "1.185438ms"
        }
    ]
}

I see my view on UI, I can query it.
but explain says i am still using #alldocs

cbq> explain select name from b0
{
    "resultset": [
        {
            "input": {
                "as": "b0",
                "bucket": "b0",
                "ids": null,
                "input": {
                    "as": "",
                    "bucket": "b0",
                    "cover": false,
                    "index": "#alldocs",
                    "pool": "default",
                    "ranges": null,
                    "type": "scan"
                },
                "pool": "default",
                "projection": null,
                "type": "fetch"
            },
            "result": [
                {
                    "as": "name",
                    "expr": {
                        "left": {
                            "path": "b0",
                            "type": "property"
                        },
                        "right": {
                            "path": "name",
                            "type": "property"
                        },
                        "type": "dot_member"
                    },
                    "star": false
                }
            ],
            "type": "projector"
        }
    ],
    "info": [
        {
            "caller": "http_response:160",
            "code": 100,
            "key": "total_rows",
            "message": "1"
        },
        {
            "caller": "http_response:162",
            "code": 101,
            "key": "total_elapsed_time",
            "message": "1.236104ms"
        }
    ]
}
same result i see for skills


 Comments   
Comment by Sriram Melkote [ 07/Mar/14 ]
I think the current implementation considers secondary indexes only for filtering operations. When you do SELECT <anything> FROM <bucket>, it is a full bucket scan, and that is implemented by #alldocs and by #primary index only.

So the current behavior looks to be correct. Try running "CREATE PRIMARY INDEX USING VIEW" and please see if the query will then switch from #alldocs to #primary. Please also try adding a filter, like WHERE name > 'Mary' and see if the my_name index gets used for the filtering.

As a side note, what you're running is a covered query, where all the data necessary is held in a secondary index completely. However, this is not implemented. A secondary index is only used as an access path, and not as a source of data.
Comment by Gerald Sangudi [ 11/Mar/14 ]
This particular query will always use #primary or #alldocs. Even for documents without a "name" field, we return a result object that is missing the "name" field.

@Iryna, please test WHERE name iS NOT MISSING to see if it uses the index. If not, we'll fix that for DP4. Thanks.




[MB-9883] High CPU utilization of SSL proxy (both source and dest) Created: 09/Jan/14  Updated: 17/Apr/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.5.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: ns_server-story, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 2.5.0-1032

Platform = Physical
OS = CentOS 6.5
CPU = Intel Xeon E5-2630
Memory = 64 GB
Disk = 2 x SSD

Attachments: JPEG File cpu_agg_nossl.jpeg     JPEG File cpu_agg_ssl.jpeg    
Triage: Triaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: http://ci.sc.couchbase.com/job/perf-dev/17/artifact/

 Description   
-- 4 -> 4, unidir, xmem, 1 bucket, moderate DGM
-- Initial replication

Based on manual observation, confirmed by your internal counters.

 Comments   
Comment by Wayne Siu [ 15/Jan/14 ]
Deferring from 2.5. Potential candidate for 2.5.1. ns_server team will make changes on top of 2.5.1.
Comment by Pavel Paulau [ 15/Jan/14 ]
Some benchmarks from our environment:

# openssl speed aes
Doing aes-128 cbc for 3s on 16 size blocks: 15847532 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 4282124 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1078408 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 274532 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 34227 aes-128 cbc's in 2.99s
Doing aes-192 cbc for 3s on 16 size blocks: 13432096 aes-192 cbc's in 3.00s
Doing aes-192 cbc for 3s on 64 size blocks: 3576384 aes-192 cbc's in 3.00s
Doing aes-192 cbc for 3s on 256 size blocks: 906793 aes-192 cbc's in 3.00s
Doing aes-192 cbc for 3s on 1024 size blocks: 227850 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 8192 size blocks: 28528 aes-192 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 11684190 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 3014948 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 771234 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 190996 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 24076 aes-256 cbc's in 3.00s
OpenSSL 1.0.1e-fips 11 Feb 2013
built on: Tue Dec 3 20:18:14 UTC 2013
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 84802.85k 91351.98k 92024.15k 93706.92k 93775.11k
aes-192 cbc 71637.85k 76296.19k 77379.67k 78032.91k 77900.46k
aes-256 cbc 62315.68k 64318.89k 66032.07k 65193.30k 65743.53k

This is what one core demonstrates. Notice that during the test single node was able to serve as max as 8K documents/sec (~2KB), utilizing on average 3-4 cores.
Comment by Aleksey Kondratenko [ 15/Jan/14 ]
BTW one core can get much higher than that. When AES NI hardware support is enabled. But those benchmarks are not doing it by default. You need to pass -evp as noted for example here: http://stackoverflow.com/questions/19307909/how-do-i-enable-aes-ni-hardware-acceleration-for-node-js-crypto-on-linux

With aes-ni enabled I've seen my box to show more than one _billion_ bytes per second in aes 128 bit! So there's definitely large potential here. And I bet erlang is unable to utilize it. I.e. perf showed us that erlang did not use aes-ni enabled versions of aes in openssl at least on my box.
Comment by Pavel Paulau [ 20/Jan/14 ]
Just for quantification: in bidir scenario with encryption we should easily expect 3x higher CPU utilization on both source and destination sides. Apparently absolute numbers depend on rate of replication.
Comment by Wayne Siu [ 21/Jan/14 ]
Alk will provide a build to Pavel to test. Will review the results in the next meeting.
Wanted to check
a. how much CPU has improved.
b. if there is any added latency.
Comment by Aleksey Kondratenko [ 21/Jan/14 ]
>> b. if there is any added latency.

and c. if there's any throughput change.

Also if possible I'd like to see results with/without rc4 and GSO.
Comment by Pavel Paulau [ 22/Jan/14 ]
a. 1.5-2x lower CPU utilization than in build 2.5.0-1054.
b. No extra latency.
c. No change.

GSO/TSO results will be reported in MB-9896.
Comment by Wayne Siu [ 23/Jan/14 ]
Lowering the priority to Critical as the fix has helped/improved the CPU utilization in build 1054.
Keep this ticket open for further optimization in 3.0.
Comment by Cihan Biyikoglu [ 11/Mar/14 ]
I think we need to be below %5-10 with SSL overhead. we should look for ways to ensure this is a feature we can recommend in general.
Comment by Aleksey Kondratenko [ 17/Apr/14 ]
Moved out of 3.0




[MB-9446] there's chance of starting janitor while not having latest version of config (was: On reboot entire cluster , see many conflicting bucket config changes frequently.) Created: 30/Oct/13  Updated: 04/Jun/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.0, 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Aliaksey Artamonau
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 0.0.0-7040toy

Triage: Triaged
Is this a Regression?: Yes

 Description   

Load items on a cluster , build toy-000-704
Reboot cluster

Post reboot, see a lot of messages on conflicting bucket config on the web logs.

Cluster logs here: https://s3.amazonaws.com/bugdb/bug_9445/9435.tar

Sample

{fastForwardMap,undefined}]}]}]}, choosing the former, which looks newer.
ns_config003 ns_1@soursop-s11207.sc.couchbase.com 18:59:30 - Wed Oct 30, 2013
Conflicting configuration changes to field buckets:
{[{'ns_1@172.23.105.45',{5088,63550403967}},
{'ns_1@soursop-s11203.sc.couchbase.com',{1,63550403967}},
{'ns_1@soursop-s11204.sc.couchbase.com',{1764,63550403283}}],
[{'_vclock',[{'ns_1@172.23.105.45',{5088,63550403967}},
{'ns_1@soursop-s11203.sc.couchbase.com',{1,63550403967}},
{'ns_1@soursop-s11204.sc.couchbase.com',{1764,63550403283}}]},
{configs,[{"saslbucket",
[{uuid,<<"b51edfdad356db7e301d9b32c6ef47a3">>},
{num_replicas,1},
{replica_index,false},
{ram_quota,3355443200},
{auth_type,sasl},
{sasl_password,"password"},
{autocompaction,false},
{purge_interval,undefined},
{flush_enabled,false},
{num_threads,3},
{type,membase},
{num_vbuckets,1024},
{servers,['ns_1@soursop-s11203.sc.couchbase.com',
'ns_1@soursop-s11204.sc.couchbase.com',
'ns_1@soursop-s11205.sc.couchbase.com',
'ns_1@soursop-s11207.sc.couchbase.com']},
{map,[['ns_1@soursop-s11207.sc.couchbase.com',
'ns_1@soursop-s11205.sc.couchbase.com'],
['ns_1@soursop-s11207.sc.couchbase.com',
'ns_1@soursop-s11203.sc.couchbase.com'],
['ns_1@soursop-s11207.sc.couchbase.com',
'ns_1@soursop-s11204.sc.couchbase.com'],

 Comments   
Comment by Aleksey Kondratenko [ 30/Oct/13 ]
Very weird. But if indeed issue, there's likely exactly same issue on 2.5.0. And if it's the case looks pretty scary.
Comment by Aliaksey Artamonau [ 01/Nov/13 ]
I set affect version to 2.5 because I really know that it affects 2.5. And actually many preceding releases.
Comment by Maria McDuff (Inactive) [ 31/Jan/14 ]
Alk,

is this already merged in 2.5? pls confirm and mark as resolved if that's the case, assign back to QE.
Thanks.
Comment by Aliaksey Artamonau [ 31/Jan/14 ]
No, it's not fixed in 2.5.
Comment by Anil Kumar [ 04/Jun/14 ]
Triage - 06/04/2014 Alk, Wayne, Parag, Anil




[MB-9356] tuq crash during query + rebalance having 1M items Created: 16/Oct/13  Updated: 18/Jun/14  Due: 23/Jun/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP3
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos 64 bit

Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Initial setup:
2 buckets, 1M items each of them, 1 node

Steps:
1) start a query using curl
2) add a node and start rebalance
3) start same query using tuq_client console.


[root@localhost tuqtng]# ./tuqtng -couchbase http://localhost:8091
07:19:57.406786 tuqtng started...
07:19:57.406880 version: 0.0.0
07:19:57.406887 site: http://localhost:8091
panic: runtime error: index out of range

goroutine 283323 [running]:
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:151 +0x4f1
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c101fc, 0xc20043af70, 0x1, 0x1, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 1 [chan receive]:
github.com/couchbaselabs/tuqtng/server.Server(0x8464a0, 0x5, 0x7fff3931ab76, 0x15, 0x8554a0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:66 +0x4f4
main.main()
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/main.go:71 +0x28a

goroutine 2 [syscall]:

goroutine 4 [syscall]:
os/signal.loop()
/usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x1c
created by os/signal.init·1
/usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x2f

goroutine 13 [chan send]:
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).SendResult(0xc2001cb930, 0x70d420, 0xc200a27880)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:47 +0x46
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).processItem(0xc2001aa080, 0xc2001c9a80, 0xc2001c99c0, 0xc200a27080, 0x2b5e52485d01, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:119 +0x119
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal(0xc2001aa080, 0xc2001d7e10, 0xc2001c9a80, 0xc2001c99c0, 0xc2001e2720, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:90 +0x2a7
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).Execute(0xc2001aa080, 0xc2001d7e10, 0xc2001c9a80, 0xc2001c99c0, 0xc2000004f8, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:42 +0x100
github.com/couchbaselabs/tuqtng/server.Dispatch(0xc2001c9a80, 0xc2001c99c0, 0xc2001c1b10, 0xc2001ab000, 0xc2001c1b40, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:85 +0x191
created by github.com/couchbaselabs/tuqtng/server.Server
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:67 +0x59c

goroutine 6 [chan receive]:
main.dumpOnSignal(0x2b5e52484fa0, 0x1, 0x1)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/main.go:80 +0x7f
main.dumpOnSignalForPlatform()
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/dump.go:19 +0x80
created by main.main
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/main.go:62 +0x1d7

goroutine 7 [IO wait]:
net.runtime_pollWait(0x2aaaaabacf00, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001242c0, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).accept(0xc200124240, 0x90ae00, 0x0, 0xc200198660, 0xb, ...)
/usr/local/go/src/pkg/net/fd_unix.go:385 +0x2c1
net.(*TCPListener).AcceptTCP(0xc2000005f8, 0x4443f6, 0x2b5e52483e28, 0x4443f6)
/usr/local/go/src/pkg/net/tcpsock_posix.go:229 +0x45
net.(*TCPListener).Accept(0xc2000005f8, 0xc200125420, 0xc2001ac2b0, 0xc2001f46c0, 0x0, ...)
/usr/local/go/src/pkg/net/tcpsock_posix.go:239 +0x25
net/http.(*Server).Serve(0xc200107a50, 0xc2001488c0, 0xc2000005f8, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/server.go:1542 +0x85
net/http.(*Server).ListenAndServe(0xc200107a50, 0xc200107a50, 0xc2001985d0)
/usr/local/go/src/pkg/net/http/server.go:1532 +0x9e
net/http.ListenAndServe(0x846860, 0x5, 0xc2001985d0, 0xc200107960, 0x0, ...)
/usr/local/go/src/pkg/net/http/server.go:1597 +0x65
github.com/couchbaselabs/tuqtng/network/http.func·001()
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:37 +0x6c
created by github.com/couchbaselabs/tuqtng/network/http.NewHttpEndpoint
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:41 +0x2a0

goroutine 31 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).SendItem(0xc2001e29c0, 0xc200eb0a40, 0x85b2c0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:151 +0xbe
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange(0xc2001e29c0, 0x0, 0x8a7970)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:121 +0x640
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).Run(0xc2001e29c0, 0xc2001e2960)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:61 +0xdc
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 11 [IO wait]:
net.runtime_pollWait(0x2aaaaabace60, 0x77, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitWrite(0xc200124620, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:80 +0x31
net.(*netFD).Write(0xc2001245a0, 0xc2001af000, 0x44, 0x1000, 0x4, ...)
/usr/local/go/src/pkg/net/fd_unix.go:294 +0x3e6
net.(*conn).Write(0xc2000008f0, 0xc2001af000, 0x44, 0x1000, 0x452dd2, ...)
/usr/local/go/src/pkg/net/net.go:131 +0xc3
net/http.(*switchWriter).Write(0xc2001ad040, 0xc2001af000, 0x44, 0x1000, 0x4d5989, ...)
/usr/local/go/src/pkg/net/http/chunked.go:0 +0x62
bufio.(*Writer).Flush(0xc200148f40, 0xc20092e6b4, 0x34)
/usr/local/go/src/pkg/bufio/bufio.go:465 +0xb9
net/http.(*chunkWriter).flush(0xc2001cb8e0)
/usr/local/go/src/pkg/net/http/server.go:270 +0x59
net/http.(*response).Flush(0xc2001cb8c0)
/usr/local/go/src/pkg/net/http/server.go:953 +0x5d
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).ProcessResults(0xc2001cb930, 0x2, 0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:109 +0x16b
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).Process(0xc2001cb930, 0x40519c, 0x71b260)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:61 +0x52
github.com/couchbaselabs/tuqtng/network/http.(*HttpQuery).Process(0xc2001c99c0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_query.go:72 +0x29
github.com/couchbaselabs/tuqtng/network/http.(*HttpEndpoint).ServeHTTP(0xc200000508, 0xc2001b1140, 0xc2001cb8c0, 0xc2001cd000)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:55 +0xcd
github.com/gorilla/mux.(*Router).ServeHTTP(0xc200107960, 0xc2001b1140, 0xc2001cb8c0, 0xc2001cd000)
/tmp/gocode/src/github.com/gorilla/mux/mux.go:90 +0x1e1
net/http.serverHandler.ServeHTTP(0xc200107a50, 0xc2001b1140, 0xc2001cb8c0, 0xc2001cd000)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc200124630)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266

goroutine 21 [chan receive]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.keepPoolFresh(0xc2001fa200)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/couchbase.go:157 +0x4b
created by github.com/couchbaselabs/tuqtng/catalog/couchbase.newPool
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/couchbase.go:149 +0x34c

goroutine 30 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).SendItem(0xc2001c1660, 0xc200a27940, 0xc200a27940)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:49 +0xbf
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).flushBatch(0xc2001e1c40, 0xc2001e2a00)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:141 +0x7cd
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).processItem(0xc2001e1c40, 0xc200390c00, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:78 +0xd9
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc2001c1660, 0xc2002015f0, 0xc2001e1c40, 0xc2001e2840)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:107 +0x1b0
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).Run(0xc2001e1c40, 0xc2001e2840)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:57 +0xa8
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 32 [chan send]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange(0xc200201500, 0x0, 0x0, 0x0, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:179 +0x386
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanEntries(0xc200201500, 0x0, 0xc2001e2b40, 0xc2001e2ba0, 0xc2001250c0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:112 +0x78
created by github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:82 +0x18b

goroutine 29 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).SendItem(0xc2001c1600, 0xc200a27140, 0x87ea70)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:49 +0xbf
github.com/couchbaselabs/tuqtng/xpipeline.(*Project).processItem(0xc2001c1630, 0xc200a27140, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/project.go:95 +0x33b
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc2001c1600, 0xc2002015a0, 0xc2001c1630, 0xc2001e2ae0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:107 +0x1b0
github.com/couchbaselabs/tuqtng/xpipeline.(*Project).Run(0xc2001c1630, 0xc2001e2ae0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/project.go:46 +0x91
created by github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:79 +0x1c7

goroutine 33 [chan send]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.WalkViewInBatches(0xc2001251e0, 0xc2001252a0, 0xc2000e8480, 0x845260, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_util.go:90 +0x424
created by github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:159 +0x209

goroutine 165124 [select]:
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal(0xc2001aa080, 0xc2005d6340, 0xc2001c9a80, 0xc200282580, 0xc2005bba80, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:87 +0x667
github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).Execute(0xc2001aa080, 0xc2005d6340, 0xc2001c9a80, 0xc200282580, 0xc2000004f8, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:42 +0x100
github.com/couchbaselabs/tuqtng/server.Dispatch(0xc2001c9a80, 0xc200282580, 0xc2001c1b10, 0xc2001ab000, 0xc2001c1b40, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:85 +0x191
created by github.com/couchbaselabs/tuqtng/server.Server
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/server/server.go:67 +0x59c

goroutine 283325 [chan receive]:
github.com/dustin/gomemcached/client.(*Client).GetBulk(0xc200d55990, 0xc200d50092, 0xc2001d7d60, 0x1, 0x1, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:228 +0x3c3
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:158 +0x1dc
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c10092, 0xc2001d7d60, 0x1, 0x1, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 283623 [runnable]:
net.runtime_pollWait(0x2aaaaabac820, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f46b0, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4630, 0xc20063fda0, 0x18, 0x18, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc2002f2658, 0xc20063fda0, 0x18, 0x18, 0x1, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
io.ReadAtLeast(0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:284 +0xf7
io.ReadFull(0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:302 +0x6f
github.com/dustin/gomemcached.(*MCResponse).Receive(0xc2002b4d80, 0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/mc_res.go:155 +0xc7
github.com/dustin/gomemcached/client.getResponse(0xc200198840, 0xc2002f2658, 0xc20063fda0, 0x18, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/transport.go:30 +0xc6
github.com/dustin/gomemcached/client.(*Client).Receive(0xc200b13f30, 0xc2002b4c00, 0x0, 0x0)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:81 +0x67
github.com/dustin/gomemcached/client.func·003()
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:193 +0xaf
created by github.com/dustin/gomemcached/client.(*Client).GetBulk
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:207 +0x1e6

goroutine 165127 [chan receive]:
github.com/couchbaselabs/go-couchbase.(*Bucket).GetBulk(0xc2000e8480, 0xc200977000, 0x3e8, 0x3e8, 0xc200000001, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:278 +0x341
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*bucket).BulkFetch(0xc2001c11e0, 0xc200977000, 0x3e8, 0x3e8, 0x15, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/couchbase.go:249 +0x83
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).flushBatch(0xc2006e8230, 0xc2005bbd00)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:113 +0x35d
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).processItem(0xc2006e8230, 0xc200c19840, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:78 +0xd9
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc200257300, 0xc2002015f0, 0xc2006e8230, 0xc2005bbba0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:107 +0x1b0
github.com/couchbaselabs/tuqtng/xpipeline.(*Fetch).Run(0xc2006e8230, 0xc2005bbba0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/fetch.go:57 +0xa8
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 165126 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator(0xc2002572a0, 0xc2002015a0, 0xc2002572d0, 0xc2005bbe40)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:104 +0x32c
github.com/couchbaselabs/tuqtng/xpipeline.(*Project).Run(0xc2002572d0, 0xc2005bbe40)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/project.go:46 +0x91
created by github.com/couchbaselabs/tuqtng/executor/interpreted.(*InterpretedExecutor).executeInternal
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/executor/interpreted/interpreted.go:79 +0x1c7

goroutine 165123 [chan receive]:
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).ProcessResults(0xc2006e80e0, 0x2, 0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:88 +0x3c
github.com/couchbaselabs/tuqtng/network/http.(*HttpResponse).Process(0xc2006e80e0, 0x40519c, 0x71b260)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_response.go:61 +0x52
github.com/couchbaselabs/tuqtng/network/http.(*HttpQuery).Process(0xc200282580)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http_query.go:72 +0x29
github.com/couchbaselabs/tuqtng/network/http.(*HttpEndpoint).ServeHTTP(0xc200000508, 0xc2001b1140, 0xc2006e8070, 0xc2001cdea0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/network/http/http.go:55 +0xcd
github.com/gorilla/mux.(*Router).ServeHTTP(0xc200107960, 0xc2001b1140, 0xc2006e8070, 0xc2001cdea0)
/tmp/gocode/src/github.com/gorilla/mux/mux.go:90 +0x1e1
net/http.serverHandler.ServeHTTP(0xc200107a50, 0xc2001b1140, 0xc2006e8070, 0xc2001cdea0)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc2001f46c0)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266

goroutine 165129 [select]:
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange(0xc200201500, 0x0, 0x0, 0x0, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:166 +0x6b9
github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanEntries(0xc200201500, 0x0, 0xc2005bbea0, 0xc2005bbf00, 0xc2005bbf60, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:112 +0x78
created by github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:82 +0x18b

goroutine 165128 [select]:
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).scanRange(0xc2005bbd20, 0x0, 0x8a7970)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:99 +0x7a2
github.com/couchbaselabs/tuqtng/xpipeline.(*Scan).Run(0xc2005bbd20, 0xc2005bbcc0)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/scan.go:61 +0xdc
created by github.com/couchbaselabs/tuqtng/xpipeline.(*BaseOperator).RunOperator
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/xpipeline/base.go:97 +0xe3

goroutine 165130 [select]:
net/http.(*persistConn).roundTrip(0xc200372f00, 0xc200765660, 0xc200372f00, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/transport.go:857 +0x6c7
net/http.(*Transport).RoundTrip(0xc20012e080, 0xc2004dac30, 0xc2005a5808, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/transport.go:186 +0x396
net/http.send(0xc2004dac30, 0xc2000e7e70, 0xc20012e080, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/client.go:166 +0x3a1
net/http.(*Client).send(0xb7fcc0, 0xc2004dac30, 0x7c, 0x2b5e52429020, 0xc200fca2c0, ...)
/usr/local/go/src/pkg/net/http/client.go:100 +0xcd
net/http.(*Client).doFollowingRedirects(0xb7fcc0, 0xc2004dac30, 0x90ae80, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/client.go:282 +0x5ff
net/http.(*Client).Do(0xb7fcc0, 0xc2004dac30, 0xc20052cae0, 0x0, 0x0, ...)
/usr/local/go/src/pkg/net/http/client.go:129 +0x8d
github.com/couchbaselabs/go-couchbase.(*Bucket).ViewCustom(0xc2000e8480, 0x845260, 0x0, 0x873ab0, 0x9, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/views.go:115 +0x210
github.com/couchbaselabs/go-couchbase.(*Bucket).View(0xc2000e8480, 0x845260, 0x0, 0x873ab0, 0x9, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/views.go:155 +0xcc
github.com/couchbaselabs/tuqtng/catalog/couchbase.WalkViewInBatches(0xc20045b000, 0xc20045b060, 0xc2000e8480, 0x845260, 0x0, ...)
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_util.go:80 +0x2ce
created by github.com/couchbaselabs/tuqtng/catalog/couchbase.(*viewIndex).ScanRange
/tmp/gocode/src/github.com/couchbaselabs/tuqtng/catalog/couchbase/view_index.go:159 +0x209

goroutine 283324 [chan receive]:
github.com/dustin/gomemcached/client.(*Client).GetBulk(0xc200b13f30, 0xc200b1001b, 0xc200fce200, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:228 +0x3c3
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:158 +0x1dc
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c1001b, 0xc200fce200, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 283622 [runnable]:
net.runtime_pollWait(0x2aaaaabacb40, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f4e00, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4d80, 0xc200eda4e0, 0x18, 0x18, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc20080b948, 0xc200eda4e0, 0x18, 0x18, 0x1, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
io.ReadAtLeast(0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:284 +0xf7
io.ReadFull(0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:302 +0x6f
github.com/dustin/gomemcached.(*MCResponse).Receive(0xc2002b4ba0, 0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/mc_res.go:155 +0xc7
github.com/dustin/gomemcached/client.getResponse(0xc200198840, 0xc20080b948, 0xc200eda4e0, 0x18, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/transport.go:30 +0xc6
github.com/dustin/gomemcached/client.(*Client).Receive(0xc2004bfa20, 0xc2002b4a20, 0x0, 0x0)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:81 +0x67
github.com/dustin/gomemcached/client.func·003()
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:193 +0xaf
created by github.com/dustin/gomemcached/client.(*Client).GetBulk
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:207 +0x1e6

goroutine 283331 [select]:
net/http.(*persistConn).writeLoop(0xc200372f00)
/usr/local/go/src/pkg/net/http/transport.go:774 +0x26f
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:512 +0x58b

goroutine 283321 [chan receive]:
github.com/couchbaselabs/go-couchbase.errorCollector(0xc20101a900, 0xc200372c80)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:246 +0x9f
created by github.com/couchbaselabs/go-couchbase.(*Bucket).GetBulk
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:275 +0x2f2

goroutine 283544 [select]:
net/http.(*persistConn).writeLoop(0xc2006cbd80)
/usr/local/go/src/pkg/net/http/transport.go:774 +0x26f
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:512 +0x58b

goroutine 283322 [chan receive]:
github.com/dustin/gomemcached/client.(*Client).GetBulk(0xc2004bfa20, 0xc2004b01ec, 0xc200e8cd60, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:228 +0x3c3
github.com/couchbaselabs/go-couchbase.func·001(0x0, 0x0)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:158 +0x1dc
github.com/couchbaselabs/go-couchbase.(*Bucket).doBulkGet(0xc2000e8480, 0xc200c101ec, 0xc200e8cd60, 0x2, 0x2, ...)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:188 +0x150
github.com/couchbaselabs/go-couchbase.func·002()
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:212 +0x115
created by github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:218 +0x1ef

goroutine 283621 [runnable]:
net.runtime_pollWait(0x2aaaaabac960, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f4500, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4480, 0xc200a83940, 0x18, 0x18, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc20084a5d8, 0xc200a83940, 0x18, 0x18, 0x1, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
io.ReadAtLeast(0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:284 +0xf7
io.ReadFull(0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, 0x18, ...)
/usr/local/go/src/pkg/io/io.go:302 +0x6f
github.com/dustin/gomemcached.(*MCResponse).Receive(0xc2002b49c0, 0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/mc_res.go:155 +0xc7
github.com/dustin/gomemcached/client.getResponse(0xc200198840, 0xc20084a5d8, 0xc200a83940, 0x18, 0x18, ...)
/tmp/gocode/src/github.com/dustin/gomemcached/client/transport.go:30 +0xc6
github.com/dustin/gomemcached/client.(*Client).Receive(0xc200d55990, 0xc2002b4720, 0x0, 0x0)
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:81 +0x67
github.com/dustin/gomemcached/client.func·003()
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:193 +0xaf
created by github.com/dustin/gomemcached/client.(*Client).GetBulk
/tmp/gocode/src/github.com/dustin/gomemcached/client/mc.go:207 +0x1e6

goroutine 283543 [runnable]:
net/http.(*persistConn).readLoop(0xc2006cbd80)
/usr/local/go/src/pkg/net/http/transport.go:761 +0x64b
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:511 +0x574

goroutine 283320 [chan send]:
github.com/couchbaselabs/go-couchbase.(*Bucket).processBulkGet(0xc2000e8480, 0xc200c19980, 0xc20101a8a0, 0xc20101a900)
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:222 +0x26b
created by github.com/couchbaselabs/go-couchbase.(*Bucket).GetBulk
/tmp/gocode/src/github.com/couchbaselabs/go-couchbase/client.go:273 +0x2d1

goroutine 283330 [IO wait]:
net.runtime_pollWait(0x2aaaaabacdc0, 0x72, 0x0)
/usr/local/go/src/pkg/runtime/znetpoll_linux_amd64.c:118 +0x82
net.(*pollDesc).WaitRead(0xc2001f47d0, 0xb, 0xc200198660)
/usr/local/go/src/pkg/net/fd_poll_runtime.go:75 +0x31
net.(*netFD).Read(0xc2001f4750, 0xc20097b000, 0x1000, 0x1000, 0x0, ...)
/usr/local/go/src/pkg/net/fd_unix.go:195 +0x2b3
net.(*conn).Read(0xc20084a498, 0xc20097b000, 0x1000, 0x1000, 0x8, ...)
/usr/local/go/src/pkg/net/net.go:123 +0xc3
bufio.(*Reader).fill(0xc200b83180)
/usr/local/go/src/pkg/bufio/bufio.go:79 +0x10c
bufio.(*Reader).Peek(0xc200b83180, 0x1, 0xc200198840, 0x0, 0xc200eda4e0, ...)
/usr/local/go/src/pkg/bufio/bufio.go:107 +0xc9
net/http.(*persistConn).readLoop(0xc200372f00)
/usr/local/go/src/pkg/net/http/transport.go:670 +0xc4
created by net/http.(*Transport).dialConn
/usr/local/go/src/pkg/net/http/transport.go:511 +0x574
[root@localhost tuqtng]#


 Comments   
Comment by Marty Schoch [ 16/Oct/13 ]
Looks like a bug in go-couchbase. I have filed an issue there:

http://cbugg.hq.couchbase.com/bug/bug-906
Comment by Ketaki Gangal [ 16/Oct/13 ]
Can easily hit this on any rebalance, bumping this upto a critical.




[MB-9321] Get us off erlang's global facility and re-elect failed master quickly and safely Created: 10/Oct/13  Updated: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
Duplicate
is duplicated by MB-9691 rebalance repeated failed when add no... Closed
Relates to
relates to MB-9691 rebalance repeated failed when add no... Closed
Triage: Triaged
Is this a Regression?: No

 Description   
We have a number of bugs due to erlang global facility or related issue of not being able to spawn new master quickly. I.e.:

* MB-7282 (erlang's global naming facility apparently drops globally registered service with actual service still alive (was: impossible to change settings/autoFailover after rebalance))

* MB-7168 [Doc'd 2.2.0] failover of node that's completely down is still not quick (was: Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node)

* MB-8682 start rebalance request is hunging sometimes (looks like another global facility issue)

* MB-5622 Crash of master node may lead to autofailover in 2 minutes instead of configured shorter autofailover period or similarly slow manual failover

By getting us off global, we will fix all this issues.


 Comments   
Comment by Aleksey Kondratenko [ 10/Oct/13 ]
This also includes making sure autofailover takes into account time it takes for master election in case of master crash.

Current thinking is that every node will run autofailover service but it will run only if it's on master node. And we can have special code that speeds up master re-election if we detect that master node is down.
Comment by Aleksey Kondratenko [ 10/Oct/13 ]
Note that currently mb_master is the thing that first suffers when timeout-ful situation starts.

So we should look at making mb_master more robust if necessary
Comment by Aleksey Kondratenko [ 17/Oct/13 ]
I'm _really_ curious who makes decisions to move this into 2.5.0. Why. And why they think we have bandwidth to handle it.
Comment by Aleksey Kondratenko [ 09/Dec/13 ]
Workaround diag/eval snippet:

rpc:call(mb_master:master_node(), erlang, apply ,[fun () -> erlang:exit(erlang:whereis(mb_master), kill) end, []]).

Detection snippet:

F = (fun (Name) -> {Oks, NotOks} = rpc:multicall(ns_node_disco:nodes_actual(), global, whereis_name, [Name], 60000), case {lists:usort(Oks), NotOks} of {_, [_|_]} -> {failed_rpc, NotOks}; {[_], _} -> ok; {L, _} -> {different_answers, L} end end), [(catch {N, ok} = {N, F(N)}) || N <- [ns_orchestrator, ns_tick, auto_failover]].

Detection snipped should return:

 [{ns_orchestrator,ok},{ns_tick,ok},{auto_failover,ok}]

If not, there's decent chance that we're hitting this issue.
Comment by Aleksey Kondratenko [ 13/Dec/13 ]
As part of that we'll likely have to revamp autofailover. And John Liang suggested nice idea for single ejected node to disable memcached traffic on itself to signal smart clients that something notable occurred.

On Fri, Dec 13, 2013 at 11:48 AM, John Liang <john.liang@couchbase.com> wrote:
>> I.e. consider client that only updates vbucket map if it receives "not my vbucket". And consider 3 node cluster where 1 node is partitioned off other nodes but is accessible from client. Lets name this node C. And imagine that remaining two nodes did failover that node. It can be seen that client will happily continue using old vbucket map and reading/writing to/from node C, because it'll never get a single "not my vbucket" reply.

Thanks Alk. In this case, is there a reason why not to change the vbucket state on the singly-paritioned node on auto-failover? There still be a window for "data loss", but this window should be much smaller.

Yes we can do it. Good idea.
Comment by Perry Krug [ 13/Dec/13 ]
But if that node (C) is partitioned off...how will we be able to tell it to set those vbucket states? IMO, wouldn't it be better for the clients to implement a true quorum approach to the map when they detect that something isn't right? Or am I being too naive and missing something?
Comment by Aleksey Kondratenko [ 13/Dec/13 ]
It's entirely possible that I misunderstood original text, but I understand it the following:

* when autofailover is enabled every node observes if it's alone. If node finds itself alone and usual autofailover threshold passes. This node can be somewhat sure that it was automatically failed over by other cluster

* when that happens, node can either turn all vbuckets to replicas or disable traffic (similarly to what we're doing during flush).

There is of course a chance that all other nodes have truly failed and that single node is all that's left. But in can be argued that in this case amount of data loss is big enough anyways. And one node that artificially disables traffic doesn't change things much.

Regarding "quorum on clients". I've seen one proposal for that. And I don't think it's good idea. I.e. being in majority and being right are almost completely independent things. We can do far better than that. Particularly with CCCP we have rev field that gives sufficient ordering between bucket configurations.
Comment by Perry Krug [ 13/Dec/13 ]
My concern is that our recent observations of false-positive autofailovers may lead lots of individual nodes to decide that they have been isolated and disable their traffic...whether they've been automatically failed over or not.

As you know, one of the very nice safety nets of our autofailover is that it will not activate if it sees more than one node down at once which means that we can never do the wrong thing. If we allow one node to disable its traffic when it can't intelligently reason about the state of the rest of the cluster, IMO we go away from this safety net...no?
Comment by Aleksey Kondratenko [ 13/Dec/13 ]
No. Because node can only do that when it's sure that other side of cluster is not accessible. And it can immediately recover it's memcached traffic ASAP after it detects that rest of cluster is back.`
Comment by Perry Krug [ 13/Dec/13 ]
But it can't ever be sure that the other side of the cluster is actually not accessible...clients may still be able to reach it right?

I'm thinking about some extreme corner cases...but what about the situation where two nodes of a >2-node cluster are completely isolated via some weird networking situation and yet are still reachable to the clients. Both of them would decide that they were isolated from the whole cluster, both of them would disable all their vbuckets and yet neither would be auto-failed over because the rest of the cluster would see two nodes down and not trigger the autofailover. I realize it's rare...but I bet there are less convoluted scenarios that would lead the software to do something undesirable.

I think this is a good discussion...but not directly relevant to the purpose of this bug which I believe is still an important fix that needs to be made. Do you want to take this discussion offline from this bug?
Comment by Aleksey Kondratenko [ 13/Dec/13 ]
There's definitely ways how this can backfire. But tradeoffs are quite clear. You "buy" ability detect autofailovers (and only autofailovers in my words above; but this can be potentially extended to other cases), at expense of small chance of node false-positively disabling it's traffic, briefly and without data loss.

Thinking about this more I now see that it's less good idea than I thought. I.e. particularly autofailover but not manual failover is not as interesting. But we can return to this discussion when mb_master work is actually in progress.

Comment by Aleksey Kondratenko [ 13/Mar/14 ]
Lowered to critical. It's not blocking anyone




[MB-9234] Failover message should take into account availability of replica vbuckets Created: 08/Oct/13  Updated: 20/Jun/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 3.0
Fix Version/s: feature-backlog
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If a node goes down and its vbuckets do not have corresponding replicas available in the cluster, we should warn the user that pressing failover will result in perceived dataloss. At the moment, we have the same failover message whether those replica vbuckets are available or not.




[MB-9143] Allow replica count to be edited Created: 17/Sep/13  Updated: 12/Jun/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0
Fix Version/s: 2.5.0, 3.0

Type: Task Priority: Critical
Reporter: Perry Krug Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-2512 Allow replica count to be edited Closed

 Description   
Currently the replication factor cannot be edited after a bucket has been created. It would be nice to have this functionality.

 Comments   
Comment by Ruth Harris [ 06/Nov/13 ]
Currently, it's added to the 3.0 Eng branch by Alk. See MB-2512. This would be a 3.0 doc enhancement.
Comment by Perry Krug [ 25/Mar/14 ]
FYI, this is already in as-of 2.5 and probably needs to be documented there as well...if possible before 3.0
Comment by Amy Kurtzman [ 16/May/14 ]
Anil, Can you verify whether this was added in 2.5 or 3.0?
Comment by Anil Kumar [ 28/May/14 ]
Verified - As perry mentioned this was added in 2.5 release. We need to document this soon for 2.5 docs.




[MB-8686] CBHealthChecker - Fix fetching number of CPU processors Created: 23/Jul/13  Updated: 05/Jun/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.1.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Anil Kumar Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: customer
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-8817 REST API support to report number of ... Technical task Open Bin Cui  
Triage: Untriaged

 Description   
Issue reported by customer - cbhealthchecker report showing incorrect information for 'Minimum CPU core number required'.


 Comments   
Comment by Bin Cui [ 07/Aug/13 ]
it will depend on ns_server to provide number of cpu processors in the collected stats. Suggest to push to next release.
Comment by Maria McDuff (Inactive) [ 01/Nov/13 ]
per Bin:
Suggest to push the following two bugs to next release:
1. MB-8686: it depends on ns_server to provide capability to retrieve number of cpu cores
2. MB-8502: caused by async communication between main installer thread and api to get status. Change will be dramatic for installer.
 
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
Bin,

Raising to Critical.
If this is still dependent on ns_server, pls assign to Alk.
This needs to be fixed for 3.0.
Comment by Anil Kumar [ 05/Jun/14 ]
We need this information to be provided from ns_server. Created ticket MB-11334.

Traige - June 05 2014 Bin, Anil, Tony, Ashvinder
Comment by Aleksey Kondratenko [ 05/Jun/14 ]
Ehm. I don't think it's good idea to treat ns_server as "provider of random system-level stats". I believe you'll need to find other way of getting it.




[MB-8915] Tombstone purger need to find a better home for lifetime of deletion Created: 21/Aug/13  Updated: 15/Jun/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, cross-datacenter-replication, storage-engine
Affects Version/s: 2.2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Junyi Xie (Inactive) Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-8916 migration tool for offline upgrade Technical task Open Anil Kumar  
Triage: Untriaged

 Description   
=== copy and paste my email to a group of people, it should explain clearly why we need this ticket ===

Thanks for your comments. Probably it is easier to read in email than code review.

Let me explain a bit to see if we can be on the same page. First of all the current resolution algorithm (comparing all fields) is still right, yes there is small chance we would touch fields after CAS, but for correctness we should have them there.

The cause of MB-8825 is that tombstone purger uses expiration time field to put the purger specific "lifetime of deletion". This is just a "temporary solution" because IMHO the expiration time of a key is not the right place for "lifetime of deletion" (this is purely a storage specific metadata, IMHO should not be in eo_engine), but unfortunately today we cannot find a better place to put such info unless we change the storage format, which has too much overhead at this time. In future, I think we need to figure out the best place for "lifetime of deletion" and move it out of key expiration time field.

In practice, today this temporary solution in tombstone purger is OK in most cases because rarely you have collision in CAS for two deletions on the same key. But MB-8825 just hit the small dark area, when destination tries to replicate a deletion from source back to source in bi-dir XDCR, both share the same (SeqNo, CAS) but different expiration time field (which is not exp time of key, but lifetime of deletion created by tombstone purger), exp time at destination is some times bigger than that at source, causing incorrect resolution results at source. The problem exists for both CAPI and XMEM.

For backward compatibility,
1) If both sides are 2.2, we uses new resolution algorithm for deletion and we are safe.
2) if both sides are pre-2.2, since they do not have tombstone purger, the current algorithm (comparing all fields) should be safe.
3) If a bi-dir XDCR between pre-2.2 and 2.2 cluster on CAPI. deletion born at 2.2 replicating to pre-2.2 should be safe because there is no tombstone purger at pre-2.2. For deletions born at pre-2.2, we may see them bounced back from 2.2. But there should be no dataloss since you just re-delete something already deleted.

This fix may not be perfect, but it is still much better than issues in MB-8825. I hope in near future we can find a right place for "lifetime of deletion" in tombstone purger.


Thanks,

Junyi

 Comments   
Comment by Junyi Xie (Inactive) [ 21/Aug/13 ]
Anil and Dipti,

Please determine the priority of this task, and comment if I miss anything. Thanks.


Comment by Anil Kumar [ 21/Aug/13 ]
Upgrade - We need migration tool (which we talked about) in case of Offline upgrade to move the data. Created a subtask for that.
Comment by Aaron Miller (Inactive) [ 17/Oct/13 ]
Considering that fixing this has lots of implications w.r.t. upgrade and all components that touch the file format, and that not fixing it is not causing any problems, I believe that this is not appropriate for 2.5.0
Comment by Junyi Xie (Inactive) [ 22/Oct/13 ]
I agree with Aaron that this may not be a small task and may have lots of implications to different components.

Anil, please reconsider if this is appropriate for 2.5. Thanks.
Comment by Anil Kumar [ 22/Oct/13 ]
Moved it to 3.0.
Comment by Aleksey Kondratenko [ 13/Mar/14 ]
As "temporary head of xdcr for 3.0" I don't need this fixed in 3.0

And my guess is that after 3.0 when "the plan" for xdcr will be ready, we'll just close it as won't fix, but lets wait and see.
Comment by Cihan Biyikoglu [ 15/Jun/14 ]
Aaron is no longer here. assigning to Chiyoung for consideration.




[MB-8845] spend 5 days prototyping master-less cluster orchestration Created: 15/Aug/13  Updated: 11/Mar/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Task Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: ns_server-story
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to

 Description   
We have a number of issues caused by master election and stuff like that.

We have some ideas about doing better than that but it needs to be protyped.

 Comments   
Comment by Aleksey Kondratenko [ 26/Aug/13 ]
See MB-7282
Comment by Aleksey Kondratenko [ 16/Sep/13 ]
I've spent 1 day on that on Fri Sep 13
Comment by Andrei Baranouski [ 10/Feb/14 ]
Alk, could you provide a cases to reproduce it when finished the task. because this problem occurs very rarely in our tests
Comment by Aleksey Kondratenko [ 10/Feb/14 ]
No. That task will not lead to any commits into mainline code. It's just prototype.

After prototype is done we'll have more specific plan for mainline codebase
Comment by Maria McDuff (Inactive) [ 14/Feb/14 ]
Removed from 3.0 Release.




[MB-8832] Allow for some back-end setting to override hard limit on server quota being 80% of RAM capacity Created: 14/Aug/13  Updated: 28/May/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.1.0, 2.2.0
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Perry Krug Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Relates to
relates to MB-10180 Server Quota: Inconsistency between d... Open
Triage: Untriaged
Is this a Regression?: Yes

 Description   
At the moment, there is no way to override the 80% of RAM limit for the server quota. At very large node sizes, this can end up leaving a lot of RAM unused.

 Comments   
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
Passing this to Dipti.

We've seen memory fragmentation to easily be 50% of memory usage. So even with 80% you can get into swap and badness.

I'd recommend _against_ this until we solve fragmentation issues we have today.

Also keep in mind that today you _can_ raise this above all limites with simple /diag/eval snippet
Comment by Perry Krug [ 14/Aug/13 ]
We have seen this I agree, but it's been fairly uncommon in production environments and is something that can be monitored and resolved when it does occur. In larger RAM systems, I think we would be better served for most use cases by allowing more RAM to be used.

For example, 80% of 60GB is 48GB...leaving 12GB unused. Even worse for 256GB (leaving 50+GB unused)
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
And on 256 gigs machine fragmentation can be as big as 128 gigs (!) IMHO this is not about absolute numbers but about percentages. Anyways, Dipti will tell us what to do, but your numbers above are just saying how bad our _expected_ fragmentation is.
Comment by Perry Krug [ 14/Aug/13 ]
But that's where I disagree...I think it _is_ about absolute numbers. If we leave fragmentation out of it (since it's something we will fix eventually, something that is specific to certain workloads and something that can be worked around via rebalancing), the point of this overhead was specifically to leave space available for the operating system and any other processes running outside of Couchbase. I'm sure you'd agree that Linux doesn't need anywhere near 50GB of RAM to run properly :) Even if we could decrease that by half it would provide huge savings in terms of hardware and costs to our users.

Is fragmentation the only concern of yours? If we were able to analyze a running production cluster to quantify the RAM fragmentation that exists and determine that it is within a certain bounds...would it be okay to raise the quota about 80%?
Comment by Aleksey Kondratenko [ 14/Aug/13 ]
My point was that fragmentation is also % not absolute. So with larger ram, waste from fragmentation looks scarier.

Now that you're asking if that's my only concern I see that there's more.

Without sufficient space for page cache disk performance will suffer. How much we need to be at least on par with sqlite I cannot say. Nobody can, apparently. Things depend on whether you're going to do bgfetches or not.

Because if you do care about quick bgfetches (or, say views and xdcr) then you may want to set lowest possible quota and give us much ram as possible for page cache, hoping that at least all metadata is in page cache.

If you do not care about residency of metadata, that means you don't care about btree leafs being page-cache-resident. But in order to remain io-efficient you do need to keep non-leaf nodes in page cache. The issue is that with our append-only design nobody knows how well it works in practice and exactly how much of page cache you need to give to keep few perhaps hundreds of megs of metadata-of-metadata page-cache resident. And quite possibly that "correct" recommendation is something like "you need XX percents of your data size for page cache to keep disk subsystem efficient".
Comment by Perry Krug [ 14/Aug/13 ]
Okay, that does make a very good point.

But it also highlights the need for a flexible configuration on our end depending on the use case and customer's needs. i.e., certain customers want to enforce that they are 100% resident and to me that would mean giving Couchbase more than the default quota (while still keeping the potential for fragmentation in mind).
Comment by Patrick Varley [ 11/Feb/14 ]
MB-10180 is strongly related to this issue.
Comment by Maria McDuff (Inactive) [ 19/May/14 ]
Anil, pls see my comment on MB-10180.




[MB-8054] Couchstore's mergesort module, currently used for db compaction, can buffer too much data in memory Created: 10/Apr/13  Updated: 19/May/14

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0.1, 2.1.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Filipe Manana Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged

 Description   
The size of the buffer used by the mergesort module is exclusively bounded by number of elements. This
is dangerous, because elements can have variable sizes, and
a small number of elements does not necessarily means that
the buffer size (byte size) is small.
    
Namely the treewriter module, used by the database compactor
to sort the temporary file containing records for the id btree,
was specifying a buffer element count of 100 * 1024 * 1024.
If for example there are 100 * 1024 * 1024 id records and each
has an average size of 512 bytes, the merge sort module buffers
50Gb of data in memory!
    
Although the id btree records are currently very small (under
a hundred bytes or so), use of other types of records may easily
cause too much memory consumption - this will be the case for
view records. Issue MB-8029 adds a module that uses the mergesort
module to sort files containing view records.


 Comments   
Comment by Filipe Manana [ 10/Apr/13 ]
http://review.couchbase.org/#/c/25588
Comment by Filipe Manana [ 11/Apr/13 ]
It turns out this is not a simple change.

Simply adding a buffer byte size limit breaks the merge algorithm for some cases, particularly when the file to sort is larger than the specified buffer sizes. The mergesort.c merge phase relies on the fact that each sorted batch written to the tmp files always has the same number of elements - a thing that doesn't hold true when records have a variable size such as with views (MB-8029).

For now it's not too bad because for the current use of mergesort.c by the views, the files to sort are small (up to 30Mb max). Later this will have to change, as the files to sort can have any size, from a few kb to hundreds of mbs or gbs. I'll look for alternative external mergesort implementation, which allows to control max buffer size, merge only a group of already sorted files (like Erlang's file_sorter allows) and ideally more optimized as well (allow for N-way merge, instead of fixed 2-way merge, etc).
Comment by Filipe Manana [ 16/May/13 ]
There's a new and improved (flexibility, error handling, some performance optimizations) on-disk file sorter now in master branch.
It's being used for views already.

Introduced in:

https://github.com/couchbase/couchstore/commit/fdb0da52a1e3c059fef3fa7e74ec54b03e62d5db

Advantages:

1) Allow in-memory buffer sizes to be bounded by number of
   bytes, unlike mergesort.c which bounds buffers by number of
   records regardless of their sizes.

2) Allow for a N-way merges, allowing for better performance,
   due to a significant reduction of moving records between
   temporary files;

3) Some optimizations to avoid unncessary moving of records
   between temporary files (specially when total number of
   records is smaller than buffer size);

4) Allow specifying which directory is used to store temporary
   files. The mergesort.c uses the C stdlib function tmpfile()
   to create temporary files - the standard doesn't specify in
   which directory such files are created, but on GNU/Linux it
   seems to be in /tmp (see http://linux.die.net/man/3/tmpfile).
   For database compaction and index compaction, it's important
   to use a directory from within the configured database and
   index directories (settings database_dir and view_index_dir),
   because those directories are what the administrator configured
   and may be part of a disk drive that offers better performance
   or just has more available space for example.
   Further, in some system /tmp might map to a tmpfs mount, which
   is an in-memory filesystem (http://en.wikipedia.org/wiki/Tmpfs);

5) Better and more fine grained error handling. Confront with MB-8055 -
   the mergesort.c module ignored completely read errors when
   reading from the temporary files - which could lead to silent data loss.
Comment by Filipe Manana [ 16/May/13 ]
See above.
Since this is core database, I believe it belongs to you.
Comment by Maria McDuff (Inactive) [ 10/Feb/14 ]
Aaron,

is this going to be in 3.0?
Comment by Aaron Miller (Inactive) [ 18/Feb/14 ]
I wouldn't count on it. This sort of thing affects views a lot more than the storage files, and the view code has already been modified to use the newer disk sort.

This is unlikely to cause any problems with storage file compaction, as the sizes of the records in storage files can't grow arbitrarily.

Using the newer sort will probably perform better, but *not* using it shouldn't cause any problems, making this issue more of a performance enhancement than a bug, and as such will probably lose to other issues I'm working on for 3.0 and 2.5.X




[MB-8022] Fsync optimizations (remove double fsyncs) Created: 05/Feb/13  Updated: 01/Apr/14

Status: Open
Project: Couchbase Server
Component/s: