[MB-7690] couchstore compactor interleaves byseq btree nodes and doc bodies Created: 05/Feb/13  Updated: 05/Feb/13

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Damien Katz
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
(Assigning on Damien for initial prioritization)

Looks like my old finding is still valid. Number of page cache pages required to keep byseq index in ram is _actually increasing_ after compactor.

That's because it saves by seq leaf node after it's data items.

I have tools and data that confirm my finding (real vbucket files from perf run).

When I told about this problem to Aaron prior to 2.0 he said it's going to be easy on afaik indexer branch. Looks like this branch is merged at last and it's time to reconsider.




[MB-7667] System Test : Higher ep_commit_time, beam size on source nodes during xdcr. Created: 01/Feb/13  Updated: 22/Mar/13

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ketaki Gangal Assignee: Ketaki Gangal
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.1-144-rel

EC2 env.
2 buckets,
1 bidirectional replication on default bucket
1 unifdirectional replication on sasl bucket.


 Description   
Seeing unusual stats for

( during xdcr_start phase)
1.ep_commit_time
2. rsize_beam : > 2.8G on some points
3. Uneven compaction growth
4. Uneven CPU across the cluster.

The source node cluster stats are here

https://github.com/couchbaselabs/couchbase-qe-docs/tree/master/system-tests/plum-cluster/01_30

 Comments   
Comment by Jin Lim [ 01/Feb/13 ]
Per bug scrub, engineering must at least identify and understand the root cause of this behavior. Assuming a fix may require cross-components change, the actual delivery of the fix may schedule into the next release after 2.0.1.
Comment by Jin Lim [ 01/Feb/13 ]
Ketaki, per bug scrub, please collect atop from the 2nd running of the test. If and only if possible though understanding that it is over ec2 environment and longevity test. Thanks.
Comment by Ketaki Gangal [ 03/Feb/13 ]
Setup the xdcr_kv ec2 cluster for system testing.

Source : http://ec2-107-22-40-124.compute-1.amazonaws.com:8091
Destination : http://ec2-54-242-239-237.compute-1.amazonaws.com:8091

Stats Collection
Source : http://10.3.121.45:3133/fast
Destination: http://10.6.2.58:3133/fast

To add : Stats after a run time of 10 hours.

-Ketaki
Comment by Jin Lim [ 22/Mar/13 ]
more informations are needed for describing this undesired behavior from QE.




[MB-7514] Monitoring and Best Practices Out of Date/Incorrect Created: 09/Jan/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Intake from Perry on Server Guide:

-We also severely need to update our best practices and monitoring guides. These pages are very out of date (and in some cases, patently incorrect):

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-bestpractice-ongoing.html

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-bestpractice-ongoing-ui.html

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-monitoring.html

 Comments   
Comment by Karen Zeller [ 25/Jan/13 ]
MC- It would be great if you can sync with Perry these next few weeks and 1) get details from Perry, 2) start updating in chapter, and 3) prepare handoff/list-of-info of anything that can't be completed in next few weeks.


Thanks,

Karen




[MB-8211] making lots of streaming connections in general, or lots of streaming connections in a short period of time causes unexpected resource usage and/or crashing owing to OOM Created: 07/May/13  Updated: 08/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Matt Ingenthron Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Note this is not centos-64 bit, but we now require OS as a field even if it's OS independent.

Operating System: Centos 64-bit

 Description   
Going into version 2.0, we'd integrated changes that make handling of many streaming connections much less expensive in terms of memory, but there is some evidence that creating a lot of them at the same time or creating a large number of streaming connections could cause instability and crashing in ns_server.

Note that in the case of PHP deployments, we've found the number of worker processes to be 256, 512, or 5000 processes in some deployments. That is per server.

Then, deployments can be 50 or more systems.

Thus, we could have 12,000+ or 25,000+ or 250,000 processes which may all try to connect to a cluster.

This issue is to track possible solutions to this kind of thing occurring.

Possible solutions could be to drop connections or to only allow a number of connections tested to work in our minimum system configuration.

Note that at the client library side we have a method of sharing the configuration between multiple processes. However, it is not currently a complete solution and may never be because:
- It is not on by default as it requires the configuration of a file path.
- In some cases, during a rebalance, we can still have a rush of processes trying to get a configuration.
- We do not currently have a good solution for updating the configuration quickly if it's a memcached type bucket, as there is no NOT_MY_VBUCKET response.

Even if we can solve all of those issues at the client library side, a bug or an very large number of client systems can still cause problems, so we will need ns_server to be reliable even in the face of lots of requests.

 Comments   
Comment by Perry Krug [ 08/May/13 ]
From Matt:
I would argue that if ns-server starts defending itself and logging, we'll fail in a far more reliable way.

For instance, if the customer has thousands of PHP clients and they start throwing exceptions when trying to initialize, we'll be in a position to ask them to turn on the configuration cache. Right now, the failure is much harder to diagnose (requires someone to analyze logs).

From Perry:
I fully agree with Matt. We are frequently running into situations where it takes a few iterrations back and forth to even understand what the issue is...the symptoms range from general slowness, to timeouts, to crashes. If we have a much clearer failure mode it will allow immense shortcutting of investigation down to the solution. It's something we can document, we can write about, we can share with the support team, etc.

So from my perspective there are two things we can do and should probably do both:
1) Improve ns_server's general behavior as much as reasonably possible to prevent it from using so much memory and causing timeouts, etc. I think this is somewhat captured in MB-8033 but may need further discussion
2) Provide some mechanism where ns_server will actively refuse new connections after a certain rate or threshold and provide a captureable message back to the requester stating "this server is currently overloaded...perhaps you'd like to not create so many client connections all at once".

Further thoughts? Disagreements? Code changes? :)




[MB-8207] moxi does not allow a noop before authentication on binary protocol Created: 07/May/13  Updated: 08/May/13

Status: Open
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Matt Ingenthron Assignee: Ronnie Sun
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We had added a feature to spymemcached based on a Couchbase user's request to detect hung processes. This tries to complete a noop before doing auth.

It's fine against memcached/ep-engine in all cases, and it appears to be fine for ascii (where there is no authentication and it fails back to the version command), but moxi does not seem to allow auth after the noop. This may be because it's expecting the first command to wire it up to a downstream for the "gateway" moxi?

We're going to search for a workaround, but I wanted to make sure this issue was known.

See also:
https://code.google.com/p/spymemcached/issues/detail?id=272&thanks=272&ts=1364702110
and
https://github.com/mumoshu/play2-memcached/issues/17

 Comments   
Comment by Maria McDuff [ 08/May/13 ]
per bug triage, assigning to ronnie.
ronnie --- pls take a look. thanks.




[MB-8255] Update Sizing Calculator for 2.0.2 Created: 13/May/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Will need help on Javascript portion.

Schedules meeting with Anil to get new stats for HTML file.




[MB-8124] unit test to ensure that error strings do not change (used in client API) Created: 18/Apr/13  Updated: 18/Apr/13

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Tim Smith Assignee: Damien Katz
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 8h
Time Spent: Not Specified
Original Estimate: 8h
Environment: any couchbase server, client SDK like Java


 Description   
Currently the Java SDK at least (and probably others) just passes on error strings from ep-engine to the end user (in Java client, via a getMessage() method on OperationStatus object, for example). In java client this is the only way to get what kind of error happened (e.g., if an add() operation fails, was it tmp oom error, or item already exists? or server is down? only way to know is look at the string from getMessage()).

Implies that the error string reported by ep-engine is part of the API, currently. Ideally all clients would have error code enumerations, and error strings could be improved over time. But until that is in place, and the current string-oriented error checking is deprecated, we should not change the error strings or risk breaking application code in subtle ways upon server upgrades.

To ensure that error strings do not change without taking the API-breakage into account, we should have a unit test that ensures that checks the error messages explicitly. And that unit test should be well-commented to ensure that engineer doesn't just modify the strings and the test case without understanding why the tests exist in the first place.

 Comments   
Comment by Matt Ingenthron [ 18/Apr/13 ]
See also JCBC-206 and SPY-117.




[MB-8108] Document how to properly shutdown a complete cluster Created: 16/Apr/13  Updated: 16/Apr/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Tug Grall Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We need to document how to properly shutdown a full cluster (many nodes) in the context of autofailover and not. (what are the things administrator need to be aware when doing this)




[MB-8001] Metadata Size Value Discrepancy as a part of documentation Created: 01/Apr/13  Updated: 01/Apr/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Muthu Kumar Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: doc-editing, documentation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Meta Data Sizes:
Looking here
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-bestpractice-sizing-ram.html
the meta data size is listed as 64 bytes, but when looking over this
document
http://www.couchbase.com/docs/couchbase-devguide-2.0/more-on-metadata.html
it states 150 bytes..




[MB-7952] Need documentation on View/Query errors and limitations Created: 21/Mar/13  Updated: 22/Mar/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   
From Damien:
It's hard to quantify when it would fail, but the problem with too many fields is the reduction value keeps growing unbounded and it's stored in a inner node in the btree. Then it becomes a problem of only being able to store 2 reduce values in the node and parsing and computing on these big structures every time the view is queried or updated, and slows everything to a crawl. If it gets too big, the js engine will timeout or run out of memory.

I believe the best advice is to not let the total size of a reduce value, when serialized to text, be larger than 1k. If it's larger than that, you are probably doing something wrong.

Can we please:
-Investigate and reproduce the errors associated with http://www.couchbase.com/issues/browse/CBSE-483
-Create and resolve any necessary bugs
-Document the limitations and provide best practices to avoid this in the future

 Comments   
Comment by Filipe Manana [ 21/Mar/13 ]
What do you want me to do here?

There's a rather explicit error logged:

[couchdb:error,2013-03-06T9:11:52.905, ns_1@172.31.0.65 :<0.4820.4931>:couch_log:error:42]Set view `cdb`, main group `_design/dev_counters`, writer error
error: {too_large_btree_state,70447}
stacktrace: [{couch_set_view_util,btree_state_to_bin,1},
{couch_set_view_util,'-group_to_header_bin/1-fun-0-',2},
{lists,foldl,3},
{couch_set_view_util,group_to_header_bin,1},
{couch_set_view_updater,write_header,2},
{couch_set_view_updater,checkpoint,2},
{couch_set_view_updater,maybe_checkpoint,1},
{couch_set_view_updater,flush_writes,1}]

Unlike in couchstore for example where reductions exceeding 64Kb are silently truncated, or at best a hard crash. But in database btrees, reductions would hardly ever reach such size and beyond.

The error will be returned for queries with ?stale=false, otherwise it's only logged (because such queries are async, don't wait for the indexer to finish).
Generic error tracking, with a proper distributed API is something already planned for 2.1, so this is part of it.
Comment by Perry Krug [ 21/Mar/13 ]
Thanks Filipe. The challenge we had initially was telling the difference between a "reducer that didn't reduce" and this one which was actually just dealing with too many fields...don't know if there's a better way to distinguish.

The improved error tracking will definitely help, let's keep this open and connect to that bug so that we can make sure QA can reproduce.

The other side is improving the documentation so that an end-user can see this error and know what it means, but also so that they know to avoid such large (we don't know what "large" is yet...also part of this process) reductions.

Perry
Comment by Filipe Manana [ 21/Mar/13 ]
Originally our btrees supported reductions of any size (well, at least 2Gb or 4Gb, 32 bits size). This was because the file format was based on Erlang's native encoding.

But after DP4, as you know, couchstore was created, and an custom btree file format was developed (Damien and Aaron), to make it independent from Erlang.
So the Erlang side (couchdb), specifically the btree module was updated to use this new format as well, so that we could still read from database btrees, etc. This btree module is generic, and used both by database layer and view engine layer.

For databases, reductions are just simple counts and tracking data sizes (that is something like 12 or 16 bytes). The btree file format allows a maximum reduction value of 64Kb, which should be more than enough for any sane application.

The original changes didn't raise any error if a reduction size ever exceeded 64Kb. Seeing that this would happen for views, due to user mistakes, I decided to explicitly throw an exception for this case:

https://github.com/couchbase/couchdb/commit/37a93afc5815f74f2fe73928fe34608e41a25d5b

And the more specific case, which got logged as expected:

error: {too_large_btree_state,70447}

tells us the reduction size is 70447 bytes, so technically it's an expected failure.

Large reductions are not efficient (by the contrary), as it makes btree nodes too large, making them have a smaller branching factor, which in turn results in deeper trees - the extreme case leads to a binary tree, which is what happens in this case. Same applies to too large values, such as emitting the whole document.

As you probably now, the typical optimal size of btree nodes should be around the same as the size of a filesystem block (typically 4Kb), from wikipedia (http://en.wikipedia.org/wiki/Btree):

"The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full disk block or an analogous size in secondary storage."

And I don't there's a need to do anything special regarding testing. This is tested in couchdb's (view engine side) unit tests. Same sort of test is performed regarding key and value sizes. Would only be an issue if it an error wasn't raised, corrupting the view or database file. This is something that components' unit tests must have, not something imho that we should rely only on QA to track.

Also:

Max size for a key is 4Kb - this is the key the user emits in the map functions plus the document's ID, so the size of something like ["user_key", "some_doc_id"]

Max size for a value is 255Mb (28 bits size)

For the case of a too large key, not fitting into 4Kb, you'll see an error such as the following (from the unit tests):

[error] [<0.440.0>] Set view `couch_test_set_index_errors`, main group `_design/test`, writer error
error: {error,<<"key emitted for document `doc1` is too long: \"doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc...">>}
stacktrace: [{couch_set_view_updater,convert_primary_index_kvs_to_binary,3},
             {couch_set_view_updater,'-flush_writes/1-fun-4-',3},
             {lists,foldl,3},
             {couch_set_view_updater,flush_writes,1},
             {couch_set_view_updater,'-update/8-fun-1-',15}]
[error] [<0.426.0>] Set view `couch_test_set_index_errors`, main group `_design/test`, received error from updater: {error,
                                                                                                                     <<"key emitted for document `doc1` is too long: \"doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc1doc...">>}

(It dumps the first 100 bytes/characters of the key only).

Comment by Perry Krug [ 21/Mar/13 ]
Thanks Filipe. This sort of description is really helpful, and I imagine you have lots more to say about various other places as well. Is that planned to be documented somewhere where users can get access to it?
Comment by Filipe Manana [ 21/Mar/13 ]
I don't know if it it's already documented or not.

I would suppose that when the new format was developed, the documentation team would be informed about the limits. But maybe not. The only one I know is in couchstore's file format document: https://github.com/couchbase/couchstore/blob/master/file_format.md - either way this is generic, not mentioning explicitly views (even though same exact limits apply to views).
Comment by Perry Krug [ 21/Mar/13 ]
Thanks Filipe. I've changed the title and moved this over to the documentation side to be appropriately collected and created.
Comment by Dipti Borkar [ 22/Mar/13 ]

Perry,

Which parts do you want to document? The key and value limits in the index?
Comment by Perry Krug [ 22/Mar/13 ]
I want to document EVERYTHING :)

Yes, key and value limits in the index would be helpful, but the reason this bug was created was because someone wrote a reduce that was too big...so I'm trying to avoid that in the future. It would also be good to document what Filipe has in his head regarding the different error messages that one might see, what they mean and what to do about them.




[MB-7726] reAddNode returning HTTP 200 on garbled input Created: 12/Feb/13  Updated: 12/Feb/13

Status: Open
Project: Couchbase Server
Component/s: RESTful-APIs
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Mark Nunberg Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
-----------------------------------------------------------------------------

mnunberg@csure:~/src/cbsdkd-stable$ curl -X POST -u Administrator:123456 http://10.6.2.78:8091/controller/reAddNode -d 'kjhkjlhkaskjlsjas' -vvv
* About to connect() to 10.6.2.78 port 8091 (#0)
* Trying 10.6.2.78...
* connected
* Connected to 10.6.2.78 (10.6.2.78) port 8091 (#0)
* Server auth using Basic with user 'Administrator'
> POST /controller/reAddNode HTTP/1.1
> Authorization: Basic QWRtaW5pc3RyYXRvcjoxMjM0NTY=
> User-Agent: curl/7.26.0
> Host: 10.6.2.78:8091
> Accept: */*
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 17 out of 17 bytes
* additional stuff not fine transfer.c:1037: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Server: MochiWeb/1.0 (Any of you quaids got a smint?)
< Date: Tue, 12 Feb 2013 18:41:54 GMT
< Content-Length: 0
<
* Connection #0 to host 10.6.2.78 left intact
* Closing connection #0

-----------------------------------------------------------------------------

This might be a known issue, but I haven't seen this in a cursory search for 'reAddNode', so mentioning it here.

I've not tried with every single type of input variation, but I came across this when trying to "re-add" nodes but they did not actually end up getting re-added..




[MB-7721] Docs: Improve best practices docs Created: 12/Feb/13  Updated: 22/Feb/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Page here: http://www.couchbase.com/docs/couchbase-manual-2.0/best-practice-guide.html

Can we please add:
-We recommend a minimum of 3 nodes in any production deployment. This is so that auto-failover can be enabled, that you don't have a single point of failure should one node go down, if one node does go down the load of restoring that node is shared by 2 instead of 1, and so that the need for growth can be spread by 3 servers instead of just two and online upgrades can be performed without bringing the cluster down to one node.
-We recommend (and in fact enforce) no more than 10 buckets per cluster due to resource requirements.
-A link to the View writing best practices
-We recommend separating the disk paths of config files, data files and index files.

This would also be a great page to have a set of "added" reading at the top:
-Perry's rebalance blog
-Perry's sizing blog
-Architectural whitepaper
-Link to our "learn" section for more content




[MB-7694] Inventory on SDK "Advanced Topics" Created: 06/Feb/13  Updated: 28/Mar/13

Status: Open
Project: Couchbase Server
Component/s: clients, documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Story Priority: Major
Reporter: Karen Zeller Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
TODO:

-Karen/Matt meet/call
-Get inventory of examples, sample code for each SDK on "Advanced Topics", formerly "Troubleshooting", naming TBD
-If needed, Matt to get additional samples from SDK team members for consistency across all guides
-Karen: add to each guide.

Decisions:

-Naming of content, organization





[MB-7537] Rabalance button should change status depending on node statuses Created: 16/Jan/13  Updated: 16/Jan/13

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Andrei Baranouski Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive 10.3.121.112-8091-diag.txt.gz     GZip Archive 10.3.121.113-8091-diag.txt.gz     PNG File cleean_browser_cash_F5.png     PNG File rebalance_enabled.png    

 Description   
2.0.1-129-rel
steps:
1. cluster with 2 nodes: 10.3.121.112 & 10.3.121.113
2. click Remove for 10.3.121.113
3. stop cb on 10.3.121.113
result:
on UI 10.3.121.113 node is Down, but Rebalance button is enabled

only after cache refresh(F5 or Ctrl+F5) the button becomes disabled.

Alk, and another question related to rebalance with nodes down:

So we can run rebalance( through UI or Rest) when any node is not ready to rebalance( down or warmup). If I run a rebalance I see progress on a minute on UI. Does it make sense to start it at all, if any of the servers is certainly not ready yet?
Because now we can't trigger rebalance on UI after F5) but can via rest ( rebalance failure after 1 min)

Rebalance exited with reason {not_all_nodes_are_ready_yet,['ns_1@10.3.121.113']} ns_orchestrator002 ns_1@10.3.121.112 14:26:30 - Wed Jan 16, 2013
Shutting down bucket "defaullt" on 'ns_1@10.3.121.113' for server shutdown ns_memcached002 ns_1@10.3.121.113 14:26:09 - Wed Jan 16, 2013
Bucket "defaullt" loaded on node 'ns_1@10.3.121.113' in 0 seconds. ns_memcached001 ns_1@10.3.121.113 14:25:51 - Wed Jan 16, 2013
Couchbase Server has started on web port 8091 on node 'ns_1@10.3.121.113'. menelaus_sup001 ns_1@10.3.121.113 14:25:49 - Wed Jan 16, 2013
Node 'ns_1@10.3.121.113' synchronized otp cookie pozzoodiircndvke from cluster ns_cookie_manager002 ns_1@10.3.121.113 14:25:49 - Wed Jan 16, 2013
Started rebalancing bucket defaullt ns_rebalancer000 ns_1@10.3.121.112 14:25:30 - Wed Jan 16, 2013
Starting rebalance, KeepNodes = ['ns_1@10.3.121.112'], EjectNodes = ['ns_1@10.3.121.113'] ns_orchestrator004 ns_1@10.3.121.112 14:25:30 - Wed Jan 16, 2013








[MB-7336] Store revision trees in CouchStore Created: 03/Dec/12  Updated: 27/Feb/13

Status: Open
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0-beta-2
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Jens Alfke Assignee: Jens Alfke
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If we can store trees of multiple revisions in a CouchStore document, we can reimplement CouchDB functionality. This could be useful for TouchDB as well as Couchbase Server.

 Comments   
Comment by Jens Alfke [ 03/Dec/12 ]
http://review.couchbase.org/#/c/23006/




[MB-7160] if flush times out final stage (janitor creating vbuckets back) it returns success causing clients to see TMPFAIL after flush succeeds (WAS: there are reports that even after invoking FLUSH nodes return TMPFAIL...) Created: 11/Nov/12  Updated: 21/Mar/13

Status: Reopened
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0-beta-2, 2.0.1, 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Farshid Ghods Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: 2.0-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.rb     Text File test.txt    
Issue Links:
Duplicate
duplicates MB-6232 ep-engine needs 1.5 minutes to create... Open

 Description   
"reported by SDK team"
After a flush (through REST) for a time we still get tmpfail returned by server nodes. This is not expected, and would be kind of annoying from an application and/or cause problems with automated tests.

update:

I've updated subject. Indeed this is possible. But IMHO given that clients should always be prepared to handle TMPFAIL out of everything I've lowered down to minor.

further update:

I disagree on the "always be able to handle TMPFAIL", especially in the case of running tests at the SDK side. At the moment, we ask our users to handle tmpfail directly. That's intentional and seems to make sense as a pressure relief valve for steady state, but between these flushes and especially from unit tests, it'd be best if the cluster could just block either the operation request or the flush response until complete.

Raising this to major owing to end-user reports of trouble here.


 Comments   
Comment by Farshid Ghods [ 11/Nov/12 ]
Deep,

can you please reproduce this case after coordinating with SDK team
Comment by Farshid Ghods [ 11/Nov/12 ]
maybe enabling traffic command is async instead of sync ?

Matt,
if this is easy to reprodcue can you please add more information
1- how many nodes
2- buckets
3- hardware info ( cpu , RAM )

and upload diags if possible
Comment by Matt Ingenthron [ 12/Nov/12 ]
Sergey: this issue originally came from you.

Can you:
1) let us know if you still see this issue and
2) let us know if you'd opened a bug previously?

Thanks.
Comment by Sergey Avseyev [ 12/Nov/12 ]
The issue still there. Take a look at attached files.

1 node
1 bucket
default number of vbuckets
Comment by Steve Yen [ 12/Nov/12 ]
bug-scrub: mike described a workaround to sdk team.
Comment by Sergey Avseyev [ 12/Nov/12 ]
Where can I find that description?
Comment by Aleksey Kondratenko [ 13/Nov/12 ]
Sergey, I see potential for confusion. Is that related to slow flush you're seeing? Are you sure you're getting 200 back ?

Please describe your case better.

As for the question of workaround, its basically same as any tmpfail error: just keep re-trying.
Comment by Sergey Avseyev [ 14/Nov/12 ]
as title says some spec is saying that after flush all operations should be Ok. the script show a that the cluster might return 500 error to flush
Comment by Aleksey Kondratenko [ 14/Nov/12 ]
Sergey, this just can't be fair. 5xx reply means all bets are off.

Real bug you're after is: MB-6232
Comment by Aleksey Kondratenko [ 14/Nov/12 ]
See MB-6232 as pointed out above
Comment by Aleksey Kondratenko [ 14/Nov/12 ]
And workaround is to run off ram disk
Comment by Karen Zeller [ 06/Dec/12 ]
Added to RN as:

Several incidents have been reported that after using Flush on nodes, Couchbase
Server returns TMPFAIL even after a successful flush.
Comment by Matt Ingenthron [ 07/Mar/13 ]
Accidentally thought this was the right one to re-open. It actually is a dupe here, but there's a related issue.
Comment by Aleksey Kondratenko [ 07/Mar/13 ]
Matt also said the following over email:

BUT recent testing by our team indicates the fsync is not the underlying cause. UPDATE, as of 3/7, MB-6232 has now been moved from Backlog to 2.0.2. Need to reopen MB-7160 with current findings

I'm waiting those datails that could indicate there's some other reason for tmpfails
Comment by Matt Ingenthron [ 21/Mar/13 ]
This may or may not be directly related to the specific issue reported here. It seems to be a contributing factor anyway.

Trond carried out a series of experiments recently recently to test whether or not the time problems with flush time were owing to the underlying MB-6232 as previously suspected. MB-6232 says fsyncs and serially doing a lot of work with vbuckets cause lots of the creation/deletion slowness we see.

To isolate this and find a solution for a particular deployment, a number of tests were run. In the last experiment, two tests were run:
1) Reduce the number of vbuckets to 8, but otherwise use a real fileystem, disks, etc. (~2sec)
2) Use ramfs (9.5sec)

1) Reduce the number of vbuckets:

I uninstalled 2.0.1-build 170 from my box and removed /opt/couchbase before reinstalling it and stopping it immediately. Then I edited /opt/couchbase/bin/couchbase-server and added (right under the license header):

COUCHBASE_NUM_VBUCKETS=8
export COUCHBASE_NUM_VBUCKETS

This sets the server to be using only 8 vbuckets instead of the default number of 1024. This number _HAS_ to be a "power-of-two", and should not be less than the number of nodes you have in your cluster (then you'll have nodes without any vbuckets).

Running the same program as I used earlier (who did a flush of an empty bucket) now use ~2 secs on a filesystem that is mounted with "barrier=1" and less than a second (0.7) on a filesystem that is mounted with "barrier=0"

2) Using ramfs

I created the ramfs by running:

mount -t ramfs -o size=100m ramfs /tmp/couchbase_data
chown couchbase:couchbase /tmp/couchbase_data

Then installed couchbase and had it use /tmp/couchbase_data for storage.

Running flush takes ~9.5 sec for an empty bucket. Reducing the number of vbuckets to 8 like described in 1 used 0.36 s

--- --- ---

In an earlier set of experiments, there was more measurement of where time was going. I've edited this slightly from Trond:

Test rig: (Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GH with 4GB memory).
The system is using ext4 as a filesystem, 2GB of memory to Couchbase (and installed the two sample databases during installation).

Test was run with the following PHP script which was intended to simulate a series of flushes as unit tests are run:

<?php
   $counter = 1;
   $errors = 0;

   $cb = new Couchbase("localhost", "Administrator", "password");

   while ($counter < 11) {
      try {
         echo "\rFlushing: " . $counter . " Errors: " . $errors;
         $cb->flush();
      } catch (CouchbaseException $e) {
         $errors++;
      }

      $counter++;
   }
?>

When I initially ran the script it reported:
trond@desktop:~/compile/sdk/php$ time php -c php.ini example/buckets.php
Flushing: 10 Errors: 8
real 5m36.613s
user 0m0.012s
sys 0m0.020s

Please note that we can't really "trust" the numbers above when we had an error, because I don't check for the reason _why_ it failed (the next one could for instance fail due to the fact that we've already got a flush running etc).

When I remounted the filesystem with barrier=0 and I got:
trond@deskttime op:~/compile/sdk/php$ php -c php.ini example/buckets.php
Flushing: 10 Errors: 0
real 3m4.609s
user 0m0.028s
sys 0m0.008s

At least all of the flush calls succeed, but look at the time.. It takes more than 3 _minutes_ to run 10 flush commands, so for a user running 500 unit tests each separated by a flush, just running the flush for their tests would result in a roughly 2 1/2 hours waiting for the flush commands in their 500 tests cycles.. (then depending on how much data they add it may take longer...)

What I do find interesting is when I run top while I'm doing this I see beam.smp (erlang vm) using from 100-170% CPU (aka almost two cores), whereas memcached is relatively idle. From what I can see it looks to me that flushing a vbucket is setting all of the vbuckets to "dead" before activating them again.

I know we don't have stats for _everything_ we do, but from running the "timings" cbstat command we see (removing the printout of the distribution there):

 set_vb_cmd (2048 total)
    Avg : ( 27us)
 del_vb_cmd (2048 total)
    Avg : ( 44us)
 disk_vbstate_snapshot (1622 total)
    Avg : ( 6ms)
 disk_vb_del (1024 total)
    Avg : ( 3ms)

If I'm adding all of these up (assuming that they're all done in sequence) we end up at roughly 13 secs, but the command "time" report of the php process doing a single flush reports:
real 0m23.613s
user 0m0.016s
sys 0m0.012s

So there is 10 secs not accounted for, but this seems to vary.. (starting php and doing a single set use 0m0.033s so it's not the php runtime overhead)..

I'm not entirely sure what the "disk_vbstate_snapshot" is used for but it accounts for roughly 9 of the 13 secs)

Wen I disabled that function in ep-engine I got the example program above running those 10 flushes:

Flushing: 10 Errors: 0
real 0m22.411s
user 0m0.020s
sys 0m0.012s

Interestingly enough eam.smp is now down at ~100% CPU... For fun I tried to run the script with 500 flushes (to see if it changed over time), and we're down to less than 18 minutes flushing (which isn't that bad ;-))

I know we added that call to disk_vbstate_snapshot for some reason, so just removing isn't an alternative. I am however pretty sure we don't need to snapshot them 1600+ times while we're running a flush (I would have guessed that two would do the trick.. one when they are all disabled, one when they are all back up again). It is _ns-server_ and not ep-engine who needs to ensure that the clustermap is consistent after a potential crash during the process, and to be honest ns_server could just persist a flag saying that it is doing a flush before it starts, then nuke the flag when it receives the response that all of the vbuckets are dead if it want to protect itself from coming back up in a "hosed" configuration if a crash occurs during the flush.

Personally I think we should change our code to let ns_server send a _SINGLE_ message to ep_engine listing _ALL_ of the vbuckets it want to shut down, and it should get a single return message when that is in place (and we could then have a _SINGLE_ snapshot vbucket when we've disabled all of them "atomically") then a SINGLE enable message (returning in a single snapshot_vbucket_state).

Right now we're also adding FLUSH markers to our tap connections. Given that ns_server is the organizer of the entire process it could just shutdown the TAP connections before running the flush command, and then restart the replication chains when its done. (the database should be empty anyway, and we wouldn't have to add special flush logic to the tap streams).

It would be interesting to know if all of this message passing and constant vbucket state change is part of its heavy CPU usage during the flush.




[MB-7007] Killing myself errors after upgrade to 1.8.1 Created: 25/Oct/12  Updated: 28/Dec/12

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Dmytro Kovalov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File cbcollect_info_20121025-063018.aa     File cbcollect_info_20121025-063018.ab     File cbcollect_info_20121025-063018.ac     File cbcollect_info_20121025-063018.ad    

 Description   
We are running 1.8.1-937 Coucbase server on 10 Linux nodes with about 200M items bucket.

About 1.5 month ago we've upgraded servers from 1.7 to 1.8 and started to see following error messages in the log:

killing myself due to unexpected upstream sender exit with reason: {{badmatch,
{error,
closed}},
[{ebucketmigrator_srv,
upstream_sender_loop,
1}]} ebucketmigrator_srv000 ns_1@172.19.4.41 03:52:33 - Thu Sep 20, 2012

In many cases this error comes from same node, but sometimes from different ones.

Since the upgrade we've experienced a failure of the node, which became unresponsive after spitting multiple errors of this kind. But in other cases (with single error appearing once a day or so), there seems to be no impact.

Google search gave me only following references for this error:

http://www.couchbase.com/issues/browse/MB-5171
http://www.couchbase.com/issues/browse/MB-5467
http://www.couchbase.com/issues/secure/ReleaseNote.jspa?projectId=10010&version=10295

It looks like according to the last reference, this error supposed to be fixed in 1.8.1 version.

I also can't find clear information as to whether this is a 'bad' error or not and how to fix and/or prevent it.


 Comments   
Comment by Aleksey Kondratenko [ 26/Oct/12 ]
I cannot explain what I see. Debug logs are rotated past interesting moments. But info logs have a lot of useful details.

One such moment is around Sep 19 18:30 and onwards. There are lots of ebucketmigrator restarts caused mostly by source node (.41) closing it's tap connection. They are rapidly restarted by their supervisor, but fail rapidly again (or sometimes rapidly again), which causes supervisor to give up and fail. Parent supervisor restarts replication supervisor and janitor sets up replications again in about 10 seconds, only to start this 'loop' again.

I see that apparently eventually replicas fall back far enough behind so that tap cannot reuse checkpoints anymore. It seems to start at around this:

[rebalance:info] [2012-09-19 20:56:53] [ns_1@172.19.4.41:<0.26205.227>:ebucketmigrator_srv:init:252] Starting tap stream:
[{vbuckets,[17,320,321,322,323,932,933,934,935,936,937,938]},
 {checkpoints,[{17,32722},
               {320,32508},
               {321,32523},
               {322,32566},
               {323,32393},
               {932,32455},
               {933,32368},
               {934,32570},
               {935,32578},
               {936,32505},
               {937,32580},
               {938,32547}]},
 {name,"replication_ns_1@172.19.4.44"},
 {takeover,false}]
{{"172.19.4.41",11209},
 {"172.19.4.44",11209},
 [{username,"default"},
  {password,[]},
  {vbuckets,[17,320,321,322,323,932,933,934,935,936,937,938]},
  {takeover,false},
  {suffix,"ns_1@172.19.4.44"}]}
[rebalance:info] [2012-09-19 20:56:53] [ns_1@172.19.4.41:<0.26205.227>:ebucketmigrator_srv:process_upstream:465] Initial stream for vbucket 320

And soon after that we see backfills for many more vbuckets for this and other different tap connections too.

I see last reached_max_restart_intensity at 22:33 Oct 19. After that I'm seeing period of massive stats archiver timeouts. Perhaps indicating there's lots of disk contention.

At around 3 AM next night (Oct 20) box was apparently restarted (or just couchbase server, doesn't matter).

Comment by Aleksey Kondratenko [ 26/Oct/12 ]
Farshid, you gate-keep-ed all bugs and you may know more. Feel free to comment/help, or assign to Chiyoung for further analysis.
Comment by Farshid Ghods [ 26/Oct/12 ]
Hi Dimitry,

would you please confirm whether you have applied these patch to the 1.8.1 cluster

http://www.couchbase.com/issues/browse/MB-5343




[MB-6923] Please rename lib/couchbase module so that it doesn't conflict with public sdk Created: 15/Oct/12  Updated: 17/Oct/12

Status: Open
Project: Couchbase Server
Component/s: test-execution
Affects Version/s: None
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Tommie McAfee Assignee: Deepkaran Salooja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Hey Deep, believe you have many tests relying on the couchbase.py module. Problem is whenever I have public couchbase sdk installed and try to use anything from testrunner that relies on rest_client, I get the following error:

  File "../lib/membase/api/rest_client.py", line 8, in <module>
    from couchbase.document import DesignDocument, View
ImportError: No module named document


In the past I've done workaround by modifying sys.path to import local couchbase module first. But now I have a complicated situation where this is no longer working.
Will be great to find another name for this: 'lib/couchbaseinternal' or 'lib/rest_couchbase' ?



 Comments   
Comment by Thuan Nguyen [ 16/Oct/12 ]
Integrated in single-node-2.0.x-windows7-64-view #18 (See [http://qa.hq.northscale.net/job/single-node-2.0.x-windows7-64-view/18/])
    MB-6923: workaround for couchbase module confilct (Revision a81142c9617483e325e4eab98e8fc92ecae68b5a)

     Result = UNSTABLE
tmcafee :
Files :
* lib/membase/api/rest_client.py
Comment by Thuan Nguyen [ 16/Oct/12 ]
Integrated in multi-nodes-2.0.x-windows-64-backup-cli #18 (See [http://qa.hq.northscale.net/job/multi-nodes-2.0.x-windows-64-backup-cli/18/])
    MB-6923: workaround for couchbase module confilct (Revision a81142c9617483e325e4eab98e8fc92ecae68b5a)

     Result = UNSTABLE
tmcafee :
Files :
* lib/membase/api/rest_client.py
Comment by Thuan Nguyen [ 16/Oct/12 ]
Integrated in single-node-windows-64-install #372 (See [http://qa.hq.northscale.net/job/single-node-windows-64-install/372/])
    MB-6923: workaround for couchbase module confilct (Revision a81142c9617483e325e4eab98e8fc92ecae68b5a)

     Result = SUCCESS
tmcafee :
Files :
* lib/membase/api/rest_client.py
Comment by Thuan Nguyen [ 17/Oct/12 ]
Integrated in multi-nodes-windows-64-viewtest #20 (See [http://qa.hq.northscale.net/job/multi-nodes-windows-64-viewtest/20/])
    MB-6923: workaround for couchbase module confilct (Revision a81142c9617483e325e4eab98e8fc92ecae68b5a)

     Result = SUCCESS
tmcafee :
Files :
* lib/membase/api/rest_client.py
Comment by Thuan Nguyen [ 17/Oct/12 ]
Integrated in multi-nodes-2.0.x-windows-64-install #17 (See [http://qa.hq.northscale.net/job/multi-nodes-2.0.x-windows-64-install/17/])
    MB-6923: workaround for couchbase module confilct (Revision a81142c9617483e325e4eab98e8fc92ecae68b5a)

     Result = SUCCESS
tmcafee :
Files :
* lib/membase/api/rest_client.py




[MB-7196] moxi doesn't seem to failover on downstream hung Created: 15/Nov/12  Updated: 11/Apr/13

Status: Open
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 1.8.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Guido Serra Assignee: Steve Yen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: couchbase community package 1.8.1 for Ubuntu/Debian
 - - -
couchbase-server 1.8.1-937-rel Linux-x86_64
bucket_engine 1.8.1-937-rel Linux-x86_64
couchbase-python-client 1.8.1-937-rel Linux-x86_64
ep-engine 1.8.1-937-rel Linux-x86_64
gperftools 1.8.1-937-rel Linux-x86_64
icu4c 1.8.1-937-rel Linux-x86_64
libconflate 1.8.1-937-rel Linux-x86_64
libmemcached 1.8.1-937-rel Linux-x86_64
libvbucket 1.8.1-937-rel Linux-x86_64
manifest-master 1.8.1-937-rel Linux-x86_64
manifest 1.8.1-937-rel Linux-x86_64
membase-cli 1.8.1-937-rel Linux-x86_64
membasex 1.8.1-937-rel Linux-x86_64
memcached 1.8.1-937-rel Linux-x86_64
memcachetest 1.8.1-937-rel Linux-x86_64
moxi 1.8.1-937-rel Linux-x86_64
ns_server 1.8.1-937-rel Linux-x86_64
otp 1.8.1-937-rel Linux-x86_64
portsigar 1.8.1-937-rel Linux-x86_64
sigar 1.8.1-937-rel Linux-x86_64
spidermonkey 1.8.1-937-rel Linux-x86_64
testrunner 1.8.1-937-rel Linux-x86_64
tlm 1.8.1-937-rel Linux-x86_64
vbucketmigrator 1.8.1-937-rel Linux-x86_64
workload-generator 1.8.1-937-rel Linux-x86_64


 Description   
just run a couple of nodes, freeze one of the downstreams with a "kill -STOP"
while running this simple piece of code:
{code}
<?php

ini_set('display_errors', 1);
error_reporting(-1);

if (! extension_loaded('memcached')) {
    dl('memcached.so');
}
$sessionId = $argc > 1 ? $argv[1] : md5(mt_rand());

ini_set('session.save_handler', 'memcached');
ini_set('session.save_path', 'localhost:11211'); // host:port

session_id($sessionId);
session_start();

printf("Session %s: %s\n", $sessionId, var_export($_SESSION, true));

$_SESSION['timestamp'] = time();
{code}

it will randomly fail

p.s. initially mentioned at MB-3023

 Comments   
Comment by Guido Serra [ 15/Nov/12 ]
v1:CouchBase zeph$ php memcache-session-1.php
Session c6edc9f4624f78cdfa49af134d60887b: array (
)
PHP Warning: Unknown: Failed to write session data (memcached). Please verify that the current setting of session.save_path is correct (localhost:11211) in Unknown on line 0

Warning: Unknown: Failed to write session data (memcached). Please verify that the current setting of session.save_path is correct (localhost:11211) in Unknown on line 0




[MB-4386] As an administrator, I want to be able to give view only access to the web console for the benefit of communicating cluster, bucket and server status to stakeholders Created: 27/Oct/11  Updated: 01/Nov/11

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: Backlog
Fix Version/s: None
Security Level: Public

Type: Story Priority: Major
Reporter: Chris Cooper Assignee: Dipti Borkar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-1648] Store last persisted time in the FS Created: 29/Jul/10  Updated: 29/Mar/13

Status: Reopened
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 1.6.0 beta1, 2.0.1
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Dustin Sallings Assignee: Aaron Miller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Operating System: All
Platform: All


 Description   
We don't actually need this value, but it'd be useful to have something written we can use for auditing and what-not.

 Comments   
Comment by Peter Wansch [ 05/Jul/12 ]
Another req.
Comment by Perry Krug [ 26/Mar/13 ]
Dipti, can this get assigned to Aaron and priotized for a release? Related to the discussion in mb-7958
Comment by Dipti Borkar [ 27/Mar/13 ]
Perry, there are several other things higher priority so we likely wont get to this for a while.

Aaron, if you have a chance to fix this in an efficient way without big impact to the system, please do think about it.
Comment by Aaron Miller [ 28/Mar/13 ]
I explained the potential impact on the other thread. (Will add some overhead in the form of I/O required to do disk kv/ops and disk size) Currently there's not any way to do it cheaper than that, so I'll close this until we decide we actually want to make that tradeoff.
Comment by Perry Krug [ 29/Mar/13 ]
Thanks Aaron, I think the tradeoffs are understood and that we still want to proceed with this enhancement.




[MB-7490] impossible to rebalance cluster when one node was failover before offline upgrade 2.0.0->2.0.1 cluster(cluster is broken, pun on the UI) Created: 04/Jan/13  Updated: 30/Apr/13

Status: Reopened
Project: Couchbase Server
Component/s: installer, ns_server
Affects Version/s: 2.0, 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Andrei Baranouski Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive 10.3.121.112-1182013-1818-diag.zip     GZip Archive 10.3.121.112-8091-diag.txt.gz     GZip Archive 10.3.121.113-8091-diag.txt.gz     Zip Archive 10.3.121.114-1182013-1819-diag.zip     PNG File add_back1.png     PNG File add_back2.png     PNG File failo0ver_13.png     PNG File restart_13_man.png     PNG File Screenshot from 2013-01-04 13-21-20.png     PNG File Screenshot from 2013-01-04 13-28-54.png     PNG File Screenshot from 2013-01-04 13-31-45.png     PNG File Screenshot from 2013-01-04 13-39-34.png     PNG File step4.png     PNG File step5.png     Text File test_logs.txt    

 Description   
test: newupgradetests.MultiNodesUpgradeTests.offline_cluster_upgrade,initial_version=2.0.0-1978-rel,nodes_init=2,during-ops=failover,upgrade_version=2.0.1-112-rel,initial_vbuckets=64

steps:
1. 2.0.0 release cluster with 2 nodes 10.3.121.112 & 10.3.121.113(2.0.0-1978-rel)
2. failover 10.3.121.112
3. stop 2 nodes and upgrade them on 2.0.1-112

result:
for some reason the node 10.3.121.113 does not start
panels Active Servers and Pending Rebalance reversed or show nonsense( see screenshots)

I tried to play with failover, add back, rebalance, start manually 10.3.121.113, etc. but it did not help

server's logs, tests logs and some screenshots are attached


 Comments   
Comment by Aleksey Kondratenko [ 04/Jan/13 ]
Something prevented .113 from starting up:

[error_logger:error,2013-01-04T2:09:45.459,nonode@nohost:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: couch_server:init/1
    pid: <0.217.0>
    registered_name: []
    exception exit: {undef,[{file2,ensure_dir,["/tmp/.delete/foo"]},
                            {couch_file,init_delete_dir,1},
                            {couch_server,init,1},
                            {gen_server,init_it,6},
                            {proc_lib,init_p_do_apply,3}]}
      in function gen_server:init_it/6
    ancestors: [couch_primary_services,couch_server_sup,cb_couch_sup,
                  ns_server_cluster_sup,<0.59.0>]
    messages: []
    links: [<0.212.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 377
    stack_size: 24
    reductions: 186
  neighbours:

Looks like some missing file because that function actually exists and didn't change between 2.0.0 and 2.0.1

I'd need another reproduction with collect info to see what's going on. I.e. collect_info will give me list of files.

Second problem is that somehow UI allowed you to failover .113 even though it was last remaining active node in cluster. I'll try to rerproduce and will file separate bug.
Comment by Aleksey Kondratenko [ 04/Jan/13 ]
See above
Comment by Aleksey Kondratenko [ 04/Jan/13 ]
Indeed there's issue with incorrectly allowing failover in that case. Filed: MB-7493
Comment by Andrei Baranouski [ 08/Jan/13 ]
can't reproduce it now
Comment by Andrei Baranouski [ 18/Jan/13 ]
steps:
1.cluster with 2 nodes: 10.3.121.112 and 10.3.121.114
2. failover node 10.3.121.114
3. stop 2 nodes
4. start only node 10.3.121.114( step4.png)
5. on UI console of 10.3.121.114 add its back ( step5.png)

new screenshosts and collect_info are attached
Comment by Aleksey Kondratenko [ 18/Jan/13 ]
Not sure exactly what you expect. Rebalancing requires both nodes to be up.
Comment by Andrei Baranouski [ 30/Apr/13 ]
Alk, after such steps I want to return back node that was failover in step#2.
I know that it should be cleaned but if node 10.3.121.112 disappeared without a trace, we can not revive the second one and the only solution is to reinstall it
Comment by Andrei Baranouski [ 30/Apr/13 ]
need confirmation that we will not handle it




[MB-7693] Doc request: Document administrative task of "vertically scaling" Couchbase Created: 06/Feb/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
As a specific section, document the best practices around changing the RAM size of the whole cluster via rebalance, quota change, etc.

Should be accompanied by description of when this might be used, the considerations invovled. Here is a writeup from support than should be used/adapted: http://support.couchbase.com/entries/21719273-How-to-Reduce-the-RAM-size-on-an-existing-Couchbase-cluster-without-shutting-down-the-cluster

 Comments   
Comment by Karen Zeller [ 18/Mar/13 ]
Hi Perry,

Who would be someone in the organization who knows about this topic and could provide the underlying information/knowledge/guidance?


Thanks,

Karen




[MB-7125] Doc Request: Instructions for using non-latin characters in views/queries Created: 08/Nov/12  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Aaron Miller
Resolution: Unresolved Votes: 0
Labels: documentation, info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Could we have some specific instructions on dealing with non-latin characters in views and queries?

The question is around ensuring that documents and values with a mix of latin and non-latin in the same range are included in the query and providing developers the best practice from a client-side library as well as Couchbase syntax perspective,

 Comments   
Comment by Karen Zeller [ 29/Apr/13 ]
Hi Aaron,

I've been asked to bother you to get more information about how to use non-latin characters in views/queries. Do you have some information you can send via email?


Thanks,

Karen
Comment by Aaron Miller [ 30/Apr/13 ]
Views have no problems with non-latin characters in data, as long as they are in UTF-8 coded Unicode strings.

As for how they're ordered, that's defined by a pretty complex set of rules and data tables, being the Unicode Collation Algorithm as implemented by the ICU library.

ICU Collation is documented at: http://userguide.icu-project.org/collation/architecture
the Unicode Collation Algorithm standard at: http://www.unicode.org/reports/tr10/

We use ICU with the "root locale" (meaning none of the customizations or tailorings mentioned in the UCA document are at play, I believe).

A much easier way to figure out how your data will be ordered is just to try it.
Comment by Perry Krug [ 01/May/13 ]
Thanks Aaron. Unfortunately "just try it" doesn't really match the needs of customers who are trying to figure out how to make it work. Documenting and providing examples will do that.
Comment by Karen Zeller [ 01/May/13 ]
Hi Aaron,

See Perry's questions above. To fulfill this need for documentation, I believe we need the following underlying information:

1) List all the rules we use for non-latin collation in the engine

2) What is the order of precedence between the rules? What does this imply for certain characters vs. others.

3) 3-4 examples of queries and the results you get based on the rules.

I can't think of anything else, but this can at least get me started drafting the information.


Perry - if you know specific areas of confusion/need for info. from customers, let me know.


Thanks,

Karen
Comment by Aaron Miller [ 01/May/13 ]
The rules are at http://www.unicode.org/reports/tr10/ where there are also examples. There are a lot of them, they are not simple.

The rules also require looking at data tables to classify characters, mostly this one: http://www.unicode.org/Public/UCA/latest/allkeys.txt




[MB-4568] Need detailed sizing information for Couchbase Server 2.0 Created: 21/Dec/11  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Jin Lim
Resolution: Unresolved Votes: 0
Labels: customer, info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Need updated sizing information, both for RAM and disk for 2.0.

For disk, sizing needs to take into account:
-Number of indexes
-Rate of new items / updates
-Compaction timing and thresholds
-Overhead of CouchDB storage
-Difference between JSON and binary data

 Comments   
Comment by Karen Zeller [ 22/Feb/13 ]
For investigation, would defer the research to post 2.0.1




[MB-7704] Docs: Document behavior and recommend approach for dealing with pagination across a query whose view is changing Created: 07/Feb/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Tug Grall
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Need some documentation to describe how our software works, and how developers should work with it regarding paginating over a view that is changing.

For example, a customer is trying to iterate over 100M items (in 50 item "pages") and delete the data. However, since the delete is eventually consistent and the view update is going on in the background, he is not able to see all of the items because the paging skips over it.

The same situation would apply when paginating over a view that has items being added to it.

This is really just about describing this as a higher-level use case and what our recommendations for dealing with it would be.

 Comments   
Comment by Karen Zeller [ 14/Mar/13 ]
Who would be the best person to gather this information from in engineering?

Let me know and I will assign for info/core-dump/whiteboarding.


Thanks,

Karen
Comment by Perry Krug [ 15/Mar/13 ]
Don't know who specifically, but likely one of our SDK team and/or developer evangelists.
Comment by Karen Zeller [ 15/Mar/13 ]
Hi Tug,

See this thread on needing information documented on pagination when views are changing and consistency. Can we get some planned blog on the topic from your team that I can later use to update docs?

Thanks,

Karen




[MB-7431] Docs request: docuemntation/blog on using k/v versus views Created: 17/Dec/12  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Sorry if this is already started or in existence...

Need to be able to help users understand the differences in performance/scale/behavior between using views or managing indexes within k/v.

Something i just sent to a customer:
It's really a matter of understanding two things: the consistency your application requires between writes and those writes being seen in a view update, and the latency/throughput requirements you have for the queries of the views themselves.

Consistency: Because views are processed after data has been written to disk, there will be a delay between when you make a write and when it is available in the view. You can mitigate this with our "observe" function, but that will introduce a performance penalty on the write itself since you will be waiting for it to be written to disk...not something we'd recommend for every write. The amount of delay is really dependent on the resources available to the system and the amount of load you're putting on it. The nice thing is that this scales out very linearly by adding more nodes since each node only has to process the data that it is responsible for and more nodes means each node does less work.

Query latency/throughput: This is where the scale and performance questions really come in. Because a query will always have to gather data from all nodes of the cluster, it's not something that necessarily scales out with more nodes. Additionally, the query results are coming from disk. The nice thing here is that not all requests actually have to get served from disk, the OS will do a very good job of caching the disk IO and actually sending from RAM. However, it will be a bit variable and unpredictable.


Also:
-need specific use case examples
-need instructions on how to accomplish within k/v

 Comments   
Comment by Karen Zeller [ 18/Mar/13 ]
Any suggestions on who could contribute the information if this turns into docs, or who would write this blog (which can then be incorporated into our docs)?
Comment by Dipti Borkar [ 22/Apr/13 ]
This should be in the documentation.
Comment by Karen Zeller [ 02/May/13 ]
Blog from JChris: here ya go

http://blog.couchbase.com/performance-oriented-architecture




[MB-7839] Docs: "Disk storage" in architecture needs heavy improvement Created: 28/Feb/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Just a note for now, will obviously require much more work.

This link: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-introduction-architecture-diskstorage.html

Is for our "disk storage" architecture but really doesn't talk at all about the append-only disk format and all the things that come along with that (b-tree, compaction, views files, data files, etc, etc).

This is the link I would go to to find that info, so we need to rework it a bit.

Most of the discussion should go into the section about managing the interaction between RAM and disk. I would recommend a section about RAM management, a section about Disk management and a section about the interaction between the two.




[MB-7374] Doc Request: Best practices for hardening and security of Couchbase Created: 06/Dec/12  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Perry Krug [ 06/Dec/12 ]
Something to reference at least: http://blog.couchbase.com/memcached-security

Discussion needed on firewalls, system-level security, links to XDCR, passwords (both UI and SASL), encryption, etc.

I've got thoughts and descriptions in my head, need someone to help format and writeup.
Comment by MC Brown [ 10/Dec/12 ]
Type up your notes, even in outline form, for you want, and we'll get the information expanded and added into the docs.
Comment by Perry Krug [ 12/Dec/12 ]
Hardening Couchbase in production - Couchbase is not meant to be on the open-internet, etc
   What security does Couchbase provide out of the box? (admin pass, sasl)
       -SASL - where does the password get stored and where does it get configured (client/server)
       -Admin pass is cleartext, no user roles yet
    What kind of external security can be applied?
        -Root password best practices
        -firewalls
        -data encryption
        -network encryption (internal replication and XDCR)
    Currently known vulnerabilities




[MB-7790] Docs: Document "administrative task" of regular, planned server maintenance Created: 20/Feb/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
From a customer request:
We are obliged to follow a regular OS patching schedule for all our servers and have a maintenance window every Friday night.
How would you recommend we deal with our Couchbase clusters for patching?
 
From reading the Couchbase 2.0 Manual it looks like we have two options, one being a failover, and the other removing the node then re-adding it.
What steps would you recommend we do when taking a node out to do maintenance on it? We plan to do this during our regular maintenance window when load on the servers would be really light.

And the answer:
Our best practice would be a graceful remove and rebalance so I would recommend that first. If you find it takes too long, you could do a failover. The danger with that is that some data would not be replicated and so an unexpected failure during that time would introduce a situation where you need to manually recover data. The graceful remove doesn't introduce that.

Given that these are vms, it would actually be best to spin up one or more new nodes and swap them into the cluster, that way you never reduce capacity.

 Comments   
Comment by Thuan Nguyen [ 10/Apr/13 ]
Integrated in github-ep-engine-2-0 #483 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/483/])
    MB-7790: Always use 127.0.0.1 instead of localhost (Revision 2942eef19363e0c1ae01996e2d86583222a77588)

     Result = SUCCESS
Mike Wiederhold :
Files :
* configuration.json




[MB-7718] Docs: Document Couchbase installation file structure Created: 12/Feb/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Please create a section of documentation detailing the on-disk file/directory structure for inux/Windows/Mac.

What/where each file is, the purpose, which files are important for backup, how much relative IO is expected for certain files and which are expected to grow and/or take up the most space.

 Comments   
Comment by Karen Zeller [ 20/Mar/13 ]
Consolidating MB-7708: Customer request for on disk file formats too.




[MB-8195] Current visual design is inconsistent and unpretty. Need new global visual (re)design. Created: 02/May/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Aleksey Kondratenko Assignee: Dipti Borkar
Resolution: Unresolved Votes: 0
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Sub-Tasks:
Key
Summary
Type
Status
Assignee
MB-6081 "next" button in sample buckets scree... Technical task Reopened Pavel Blagodov  
Operating System: Centos 64-bit

 Description   
SUBJ.

Because our UI has grown for multiple releases there are tons of inconsistencies and just plain bugs. We tried twice to fix CSS without touching look but failed. We should do both CSS/HTML cleanup and visual cleanup as one process. Starting with some consistent and clean visual design.

Discussed this with Dipti some time ago and in general AFAIR she was ok. But initially assigned to her just in case.

Also intention of this work is to have some UI guidelines and some defined common style. E.g. how buttons should look (disabled, normal, ...).





[MB-7585] [windows] rebalance out 1 node failed with reason {bulk_set_vbucket_state_failed Created: 23/Jan/13  Updated: 03/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 2.0.1-140-rel
windows server r2 2008
<manifest>
<remote name="couchbase" fetch="git://10.1.1.210/"/>
<remote name="membase" fetch="git://10.1.1.210/"/>
<remote name="apache" fetch="git://github.com/apache/"/>
<remote name="erlang" fetch="git://github.com/erlang/"/>
<default remote="couchbase" revision="master"/>
<project name="tlm" path="tlm" revision="12abea946eafd7411273d18a10ae1f84390db3d4">
<copyfile dest="Makefile" src="Makefile.top"/>
</project>
<project name="bucket_engine" path="bucket_engine" revision="1495eb770b9735dd791c483eee7e69641f82f09c"/>
<project name="ep-engine" path="ep-engine" revision="35e248148abc810ab67715e86417191cd73e39dd"/>
<project name="libconflate" path="libconflate" revision="2cc8eff8e77d497d9f03a30fafaecb85280535d6"/>
<project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/>
<project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/>
<project name="membase-cli" path="membase-cli" revision="d510a56bd07e7da5743d192300d73e8d7db2f8d2" remote="membase"/>
<project name="memcached" path="memcached" revision="e6f892cf7bd61e91a79be0d8d2c6f075d3c4ae1b" remote="membase"/>
<project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/>
<project name="ns_server" path="ns_server" revision="1fa77a48cc3ea1fcdaf5a686ebe5b45de202553d"/>
<project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/>
<project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/>
<project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/>
<project name="couchbase-python-client" path="couchbase-python-client" revision="006c1aa8b76f6bce11109af8a309133b57079c4c"/>
<project name="couchdb" path="couchdb" revision="c2ee2bfb20b3c53b01afea557b6907aaa501e179"/>
<project name="couchdbx-app" path="couchdbx-app" revision="b86a1a83ab74f1ef77ec9bdc814b81d739bfd930"/>
<project name="couchstore" path="couchstore" revision="b1d9a9c77545085e188b3140efa89d3dbc7890e4"/>
<project name="geocouch" path="geocouch" revision="8997159c44282cfcd89ea9984dd8c0944a35b2b4"/>
<project name="testrunner" path="testrunner" revision="9556e9972c7bc069256e51935cc4fb575e2dc5bc"/>
<project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/>
<project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/>
<project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/>
<project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/>
<project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/>
<project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/>
</manifest>


 Description   
-t view.createdeleteview.CreateDeleteViewTests.incremental_rebalance_out_with_ddoc_ops,ddoc_ops=create,test_with_view=True,num_ddocs=2,num_views_per_ddoc=3,items=200000

3 nodes cluster rebalance out 1 node

[error_logger:error,2013-01-23T2:48:33.268,ns_1@10.3.3.38:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.26716.18> terminating
** Last message in was {'EXIT',<0.26702.18>,shutdown}
** When Server state == {state,"default",652,'ns_1@10.3.3.38',
                               [{'ns_1@10.3.3.39',<21119.19056.4>}]}
** Reason for termination ==
** {{badmatch,{error,timeout}},
    [{ns_replicas_builder_utils,kill_a_bunch_of_tap_names,3},
     {misc,try_with_maybe_ignorant_after,2},
     {gen_server,terminate,6},
     {proc_lib,init_p_do_apply,3}]}

[ns_server:error,2013-01-23T2:48:33.268,ns_1@10.3.3.38:<0.26702.18>:misc:sync_shutdown_many_i_am_trapping_exits:1416]Shutdown of the following failed: [{<0.26716.18>,
                                    {{badmatch,{error,timeout}},
                                     [{ns_replicas_builder_utils,
                                       kill_a_bunch_of_tap_names,3},
                                      {misc,try_with_maybe_ignorant_after,2},
                                      {gen_server,terminate,6},
                                      {proc_lib,init_p_do_apply,3}]}}]
[ns_server:error,2013-01-23T2:48:33.268,ns_1@10.3.3.38:<0.26702.18>:misc:try_with_maybe_ignorant_after:1452]Eating exception from ignorant after-block:
{error,{badmatch,[{<0.26716.18>,
                   {{badmatch,{error,timeout}},
                    [{ns_replicas_builder_utils,kill_a_bunch_of_tap_names,3},
                     {misc,try_with_maybe_ignorant_after,2},
                     {gen_server,terminate,6},
                     {proc_lib,init_p_do_apply,3}]}}]},
       [{misc,sync_shutdown_many_i_am_trapping_exits,1},
        {misc,try_with_maybe_ignorant_after,2},
        {ns_single_vbucket_mover,mover,6},
        {proc_lib,init_p_do_apply,3}]}
[user:info,2013-01-23T2:48:33.268,ns_1@10.3.3.38:<0.389.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {bulk_set_vbucket_state_failed,
                              [{'ns_1@10.3.3.38',
                                {'EXIT',
                                 {noproc,
                                  {gen_server,call,
                                   [{'janitor_agent-default','ns_1@10.3.3.38'},
                                    {if_rebalance,<0.23803.18>,
                                     {update_vbucket_state,426,replica,
                                      passive,undefined}},
                                    infinity]}}}}]}

attaching diags

 Comments   
Comment by Iryna Mironava [ 23/Jan/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-7585/66ca3441/diag-10.3.3.38.txt.gz
https://s3.amazonaws.com/bugdb/jira/MB-7585/66ca3441/diag-10.3.3.39.txt.gz
https://s3.amazonaws.com/bugdb/jira/MB-7585/66ca3441/diag-10.3.2.239.txt.gz
https://s3.amazonaws.com/bugdb/jira/MB-7585/66ca3441/diag-10.3.2.243.txt.gz

Comment by Aleksey Kondratenko [ 08/Apr/13 ]
Just timeout and windows




[MB-7683] Docs: Add statistics/verification discussion to all places where a command is run Created: 05/Feb/13  Updated: 03/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We have many pages that advise the user to run a command to change some setting. Can we please go through all of these and also show the related statistics (both UI and via cbstats) that are affected by these settings?

For instance (and there are MANY): http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-cbepctl-access-scanner.html, has a number of related statistics that come from here: http://hub.internal.couchbase.com/confluence/display/cbeng/EP-Engine+Working+Set+and+Access+Scanner.



 Comments   
Comment by MC Brown [ 05/Feb/13 ]
Karen wrote this documentation and should address the request




[MB-7707] Logging documentation Created: 08/Feb/13  Updated: 03/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Need a whole (detailed) section on the server-side logging including:
-Log-levels
-Files used
-Per-node / across cluster
-Example log messages and their meanings (both expected messages and failure messages)




[MB-7714] Restructuring Language References Created: 11/Feb/13  Updated: 08/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
I've been thinking about restructuring the Java Guides because I find it
rather confusing. So I propose the following new structure, which takes
enhancements and new features into account as well. Any inputs would be
greatly appreciated!

Here are the main changes
- Moving the "Reference" into its own subsection, clearing out the main
level
- Providing more information on using the APIs, also with 2.0 content.
- Getting started & tutorial mainly unchanged.

1. Getting Started
1.1 Preparations
1.2 Hello Couchbase
1.3 Working with Documents
1.4 Advanced Topics
1.5 Next Steps

2. Tutorial
2.1 Preparations
2.2 Quickstart
2.3 Connection Management
2.4 The Welcome Page
2.5 Managing Beers
2.6 Wrapping Up

3. Using the APIs
3.1 Connection Management
3.2 Retreiving Data
3.3 Mutating Data
3.4 Working with Views
3.5 Applying Persistence Constraints
3.6 Error Handling
3.7 JSON & Object Serialization
3.8 Design Document Management

4. Advanced Usage
4.1 Bulk Loading
4.2 Logging & Debugging

5. API Reference
5.1 Method Summary
5.2 Connecting & Disconnecting
5.3 Retreiving Data
5.4 Mutating Data
5.5 Management Operations
5.6 Other Useful Operations

A. Release Notes
B. Contributing
B.1 General Information
B.2 Source Code Styleguide

 Comments   
Comment by Karen Zeller [ 13/Mar/13 ]
Call on topic 3/13/2013:

-Michael/Matt will validate contents/topics with internal (support, evangelists) and external customers/developers
-Michael to provides descriptions (generally describes what information a chapter should contain)
Comment by Karen Zeller [ 25/Mar/13 ]
**Dependencies:

-Internal and external customer- validated/reviewed/finalized table of contents from MichaelN delivered to KarenZ
-Raw notes, existing content, new examples and tehnical information from MichaelN
Comment by Karen Zeller [ 26/Mar/13 ]
On 3/13 Matt assigned you the action to validate your proposed Table of Contents for Java. He had mentioned getting input from technical evangelists, and a couple external customers.

 Is this complete?
Comment by Michael Nitschinger [ 27/Mar/13 ]
Not yet - I still need to gather feedback, I'll update this issue with feedback once I got it.
Comment by Karen Zeller [ 27/Mar/13 ]
Ok. Once you have the reviewed and revised TOC, send to me to get structured as chapters/etc.
Comment by Karen Zeller [ 19/Apr/13 ]
Input from Michael for Advanced Topics Chapter:

Hi,

yes I think its good to go with it!

You can also incorporate the new article in to the "advanced topic" section:

Here is the markdown version from the published one:

## Motivation
This blog post is intended to be a very detailed and informative article for those who already have used the Couchbase Java SDK and want to know how the internals work. This is not a introduction on how to use the Java SDK and we'll cover some fairly advanced topics on the way.

Normally, when talking about the SDK we mean everything that is needed to get you going (Client library, documentation, release notes,...). In this article though, the SDK refers to the Client library (code) unless stated otherwise.

As always, if you have feedback please let me/us know!

## Introduction
First and foremost, it is important to understand that the SDK wraps and extends the functionality of the [spymemcached](https://github.com/couchbase/spymemcached) (called "spy") memcached library. One of the protocols used internally is the memcached protocol, and a lot of functionality can be reused. On the other hand, once you start to peel off the first layers of the SDK you will notice that some components are somewhat more complex because of the fact that spy provides more features than the SDK needs in the first place. The other part is to remember that a lot of the components are interwoven, so you always need to get the dependency right. Most of the time, we release a new spy version at the same date with a new SDK, because new stuff has been added or fixed.

So, aside from reusing the functionality provided by spy, the SDK mainly adds two blocks of functionality: automatic cluster topology management and since 1.1 (and 2.0 server) support for Views. Aside from that it also provides administrative facilities like bucket and design document management.

To understand how the client operates, we'll dissect the whole process in different life cycle phases of the client. After we go through all three phases (bootstrap, operation and shutdown) you should have a clear picture of whats going on under the hood. Note that there is a separate blog post in the making about error handling, so we won't cover that here in greater detail (which will be published a few weeks later on the same blog here).

## Phase 1: Bootstrap
Before we can actually start serving operations like `get()` and `set()`, we need to bootstrap the `CouchbaseClient` object. The important part that we need to accomplish here is to initially get a cluster configuration (which contains the nodes and vBucket map), but also to establish a streaming connection to receive cluster updates in (near) real-time.

We take the list of nodes passing during bootstrap and iterate over it. The first node in the list that can be contacted on port 8091 is used to walk the RESTful interface on the server. If it is not available, the next one will be tried. This means that going from the provided `http://host:port/pools` URI we eventually follow the links to the bucket entity. All this happens inside a `ConfigurationProvider`, which is in this case the `com.couchbase.client.vbucket.ConfigurationProviderHTTP`. If you want to poke around on the internals, look for `getBucketConfiguration` and `readPools` methods.

A (successful) walk can be illustrated like this:

 - GET /pools
 - look for the "default" pools
 - GET /pools/default
 - look for the "buckets" hash which contains the bucket list
 - GET /pools/default/buckets
 - parse the list of buckets and extract the one provided by the application
 - GET /pools/default/buckets/<bucketname>

Now we are at the REST endpoint we need. Inside this JSON response, you'll find all useful details that gets also be used by SDK internally (for example `stre
amingUri`, `nodes` and `vBucketServerMap`). The config gets parsed and stored. Before we move on, let's quickly discuss the strange `pools` part inside our REST walk:

The concept of a resource pool to group buckets was designed for Couchbase Server, but is currently not implemented. Still, the REST API is implemented that way and therefore all SDKs support it. That said, while we could theoretically just go directly to `/pools/default/buckets` and skip the first few queries, the current behaviour is future proof so you won't have to change the bootstrap code once the server implements it.

Back to our bootstrap phase. Now that we have a valid cluster config which contains all the nodes (and their hostnames or ip addresses), we can establish connections to them. Aside from establishing the data connections, we also need to instantiate a streaming connection to one of them. For simplicity reasons, we just establish the streaming connection to the node from the list where we got our initial configuration.

This gets us to an important point to keep in mind: if you have lots of CouchbaseClient objects running on many nodes and they all get bootstrapped with the same list, they may end up connecting to the same node for the streaming connection and create a possible bottleneck. Therefore, to distribute the load a little better I recommend shuffling the array before it gets passed in to the CouchbaseClient object. When you only have a few CouchbaseClient objects connected to your cluster, that won't be a problem at all.

The streaming connection URI is taken from the config we got previously, and normally looks like this:

streamingUri: "/pools/default/bucketsStreaming/default?bucket_uuid=88cae4a609eea500d8ad072fe71a7290"

If you point your browser to this address, you will also get the cluster topology updates streamed in real-time. Since the streaming connection needs to be established all the time and potentially blocks a thread, this is done in the background handled by different threads. We are using the NIO framework [Netty](http://netty.io) for this task, which provides a very handy way of dealing with asynchronous operations. If you want to start digging into this part, keep in mind that all read operations are completely separate from write operations, so you need to deal with handlers that take care of what comes back from the server. Aside from some wiring needed for Netty, the business logic can be found in `com.couchbase.client.vbucket.BucketMonitor` and `com.couchbase.client.vbucket.BucketUpdateResponseHandler`. We also try to reestablish this streaming connection if the socket gets closed (for example if this node gets rebalanced out of the cluster).

To actually shuffle data to the cluster nodes, we need to open various sockets to them. Note that there is absolutely no connection pooling needed inside the client, because we manage all sockets proactively. Aside from the special streaming connection to one of the severs (which is opened against port 8091), we need to open the following connections:

 - Memcached Socket: Port 11210
 - View Socket: Port 8092

Note that port 11211 is not used inside the client SDKs, but used to connect generic memcached clients that are not cluster aware. This means that these generic clients do not get updated cluster topologies.

So as a rule of thumb, if you have a 10 node cluster running, one CouchbaseClient object open about 21 (2*10 + 1) client sockets. These are directly managed, so if a node gets removed or added the numbers will change accordingly.

Now that all sockets have been opened, we are ready to perform regular cluster operations. As you can see, there is a lot of overhead involved when the CouchbaseClient object gets bootstrapped. Because of this fact, we strongly discourage you from either creating a new object on every request or running a lot of CouchbaseClient objects in one application server. This only adds unnecessary overhead and load on the application server and adds on the total sockets opened against the cluster (resulting in a possible performance problem).

As a point of reference, with regular INFO level logging enabled this is how connecting and disconnecting to a 1-node cluster (Couchbase bucket) should look like:

Apr 17, 2013 3:14:49 PM com.couchbase.client.CouchbaseProperties setPropertyFile
INFO: Could not load properties file "cbclient.properties" because: File not found with system classloader.
2013-04-17 15:14:49.656 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2013-04-17 15:14:49.673 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@2adb1d4
2013-04-17 15:14:49.718 INFO com.couchbase.client.ViewConnection: Added localhost to connect queue
2013-04-17 15:14:49.720 INFO com.couchbase.client.CouchbaseClient: viewmode property isn't defined. Setting viewmode to production mode
2013-04-17 15:14:49.856 INFO com.couchbase.client.CouchbaseConnection: Shut down Couchbase client
2013-04-17 15:14:49.861 INFO com.couchbase.client.ViewConnection: Node localhost has no ops in the queue
2013-04-17 15:14:49.861 INFO com.couchbase.client.ViewNode: I/O reactor terminated for localhost

If you are connecting to a Couchbase Server 1.8 or against a Memcache-Bucket you won't see View connections getting established:


INFO: Could not load properties file "cbclient.properties" because: File not found with system classloader.
2013-04-17 15:16:44.295 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/192.168.56.101:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2013-04-17 15:16:44.297 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/192.168.56.102:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2013-04-17 15:16:44.298 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/192.168.56.103:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2013-04-17 15:16:44.298 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/192.168.56.104:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2013-04-17 15:16:44.306 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@38b5dac4
2013-04-17 15:16:44.313 INFO com.couchbase.client.CouchbaseClient: viewmode property isn't defined. Setting viewmode to production mode
2013-04-17 15:16:44.332 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@69945ce
2013-04-17 15:16:44.333 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@6766afb3
2013-04-17 15:16:44.334 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@2b2d96f2
2013-04-17 15:16:44.368 INFO net.spy.memcached.auth.AuthThread: Authenticated to 192.168.56.103/192.168.56.103:11210
2013-04-17 15:16:44.368 INFO net.spy.memcached.auth.AuthThread: Authenticated to 192.168.56.102/192.168.56.102:11210
2013-04-17 15:16:44.369 INFO net.spy.memcached.auth.AuthThread: Authenticated to 192.168.56.101/192.168.56.101:11210
2013-04-17 15:16:44.369 INFO net.spy.memcached.auth.AuthThread: Authenticated to 192.168.56.104/192.168.56.104:11210
2013-04-17 15:16:44.490 INFO com.couchbase.client.CouchbaseConnection: Shut down Couchbase client


## Phase 2: Operations
When the SDK is bootstrapped, it enables your application to run operations against the attached cluster. For the purpose of this blog post, we need to distinguish between operations that get executed against a stable cluster and operations on a cluster that is currently experiencing some form of topology change (be it planned because of adding nodes or unplanned because of a node failure). Let's tackle the regular operations first.

### Operations against a stable cluster
While not directly visible in the first place, inside the SDK we need to distinguish between memcached operations and View operations. All operations that have a unique key in their method signature can be treaded as memcached operations. All of them eventually end up getting funneled through spy. View operations on the other hand are implemented completely inside the SDK itself.

Both View and memcached operations are asynchronous. Inside spy, there is one thread (call the I/O thread) dedicated to deal with IO operations. Note that in high-traffic environments, its not unusual that this thread is always active. It uses the non-blocking Java NIO mechanisms to deal with traffic, and loops around "selectors" that get notified when data can either be written or read. If you profile your application you'll see that this thread spends most of its time waiting on a `select` method, it means that it is idling there waiting to be notified for new traffic. The concepts used inside spy to deal with this are common Java NIO knowledge, so you may want to look into the [NIO internals](https://www.ibm.com/developerworks/java/tutorials/j-nio/) first before digging into that code path. Good starting points are the `net.spy.memcached.MemcachedConnection` and `net.spy.memcached.protocol.TCPMemcachedNodeImpl` classes. Note that inside the SDK, we override the `MemcachedConnection` to hook in our own reconfiguration logic. This class can be found inside the SDK at `com.couchbase.client.CouchbaseConnection` and for memcached-type buckets in `com.couchbase.client.CouchbaseMemcachedConnection`.

So if a memcached operations (like `get()`) gets issued, it gets passed down until it reaches the IO thread. The IO thread will then put it on a write queue towards its target node. It gets written eventually and then the IO thread adds information to a read queue so the responses can be mapped accordingly. This approach is based on futures, so when the result actually arrives, the Future is marked as completed, the result gets parsed and attached as Object.

The SDK only uses the memcached binary protocol, although spy would also support ASCII. The binary format is much more efficient and some of the advanced operations are only implemented there.

You may wonder how the SDK knows where to send the operation? Since we already have the up-to-date cluster map, we can hash the key and then based on the node list and vBucketMap determine which node to access. The vBucketMap not only contains the information for the master node of the array, but also the information for zero to three replica nodes. Look at this (shortened) example:

vBucketServerMap: {
hashAlgorithm: "CRC",
numReplicas: 1,
serverList: [
"192.168.56.101:11210",
"192.168.56.102:11210"
],
vBucketMap: [
[0,1],
[0,1],
[0,1],
[1,0],
[1,0],
[1,0]
//.....
},

The `serverList` contains our nodes, and the `vBucketMap` has pointers to the `serverList` array. We have 1024 vBuckets, so only some of them are shown here. You can see from looking at it that all keys that has into the first vBucket have its master node at index 0 (so the `.101` node) and its replica at index 1 (so the `.102` node). Once the cluster map changes and the vBuckets move around, we just need to update our config and know all the time where to point our operations towards.

View operations are handled differently. Since views can't be sent to a specific node (because we don't have a way to hash a key or something), we round-robin between the connected nodes. The operation gets assigned to a [com.couchbase.client.ViewNode](http://www.couchbase.com/autodocs/couchbase-java-client-1.1.5/com/couchbase/client/ViewNode.html) once it has free connections and then executed. The result is also handled through futures. To implement this functionality, the SDK uses the third party Apache HTTP Commons (NIO) library.

The whole View API hides behind port 8092 on every node and is very similar to [CouchDB](http://couchdb.apache.org/). It also contains a RESTful API, but the structure is a little bit different. For example, you can reach a design document at `/<bucketname>/_design/<designname>`. It contains the View definitions in JSON:

{
language: "javascript",
views: {
all: {
map: "function (doc) { if(doc.type == "city") {emit([doc.continent, doc.country, doc.name], 1)}}",
reduce: "_sum"
}
}
}

You can then reach down one level further like `/<bucketname>/_design/<designname>/_view/<viewname>` to actually query it:

{"total_rows":9,"rows":[
{"id":"city:shanghai","key":["asia","china","shanghai"],"value":1},
{"id":"city:tokyo","key":["asia","japan","tokyo"],"value":1},
{"id":"city:moscow","key":["asia","russia","moscow"],"value":1},
{"id":"city:vienna","key":["europe","austria","vienna"],"value":1},
{"id":"city:paris","key":["europe","france","paris"],"value":1},
{"id":"city:rome","key":["europe","italy","rome"],"value":1},
{"id":"city:amsterdam","key":["europe","netherlands","amsterdam"],"value":1},
{"id":"city:new_york","key":["north_america","usa","new_york"],"value":1},
{"id":"city:san_francisco","key":["north_america","usa","san_francisco"],"value":1}
]
}

Once the request is sent and a response gets back, it depends on the type of View request to determine on how the response gets parsed. It makes a difference, because reduced View queries look different than non-reduced. The SDK also includes support for spatial Views and they need to be handled differently as well.

The whole View response parsing implementation can be found inside the [com.couchbase.client.protocol.views](http://www.couchbase.com/autodocs/couchbase-java-client-1.1.5/com/couchbase/client/protocol/views/package-frame.html) namespace. You'll find abstract classes and interfaces like `ViewResponse` in there, and then their special implementations like `ViewResponseNoDocs`, `ViewResponseWithDocs` or `ViewResponseReduced`. It also makes a different if `setIncludeDocs()` is used on the Query object, because the SDK also needs to load the full documents using the memcached protocol behind the scenes. This is also done while parsing the Views.

Now that you have a basic understanding on how the SDK distributes its operations under stable conditions, we need to cover an important topic: how the SDK deals with cluster topology changes.

### Operations against a rebalancing cluster
Note that there is a separate blog post upcoming dealing with all the scenarios that may come up when something goes wrong on the SDK. Since rebalancing and failover are crucial parts of the SDK, this post deals more with the general process on how this is handled.

As mentioned earlier, the SDK receives topology updates through the streaming connection. Leaving the special case aside where this node actually gets removed or fails, all updates will stream in near real-time (in a eventually consistent architecture, it may take some time until the cluster updates get populated to that node). The chunks that come in over the stream look exactly like the ones we've seen when reading the initial configuration. After those chunks have been parsed, we need to check if the changes really affect the SDK (since there are many more parameters than the SDK needs, it won't make sense to listen to all of them). All changes that affect the topology and/or vBucket map are considered as important. If nodes get added or removed (be it either through failure or planned), we need to open or close the sockets. This process is called "reconfiguration".

Once such a reconfiguration is triggered, lots of actions need to happen in various places. Spymemcached needs to handle its sockets, View nodes need to be managed and new configuration needs to be updated. The SDK makes sure that only one reconfiguration can happen at the same time through locks so we don't have any race conditions going on.

The Netty-based `BucketUpdateResponseHandler` triggers the `CouchbaseClient#reconfigure` method, which then starts to dispatch everything. Depending on the bucket type used (i.e. memcached type buckets don't have Views and therefore no ViewNodes), configs are updated and sockets closed. Once the reconfiguration is done, it can receive new ones. During planned changes, everything should be pretty much controlled and no operations should fail. If a node is actually down and cannot be reached, those operations will be cancelled. Reconfiguration is tricky because the topology changes while operations are flowing through the system.

Finally, let's cover some differences between Couchbase and Memcache type buckets. All the information hat you've been reading previously only applies to Couchbase buckets. Memcache buckets are pretty basic and do not have the concept of vBuckets. Since you don't have vBuckets, all that the Client has to do is to manage the nodes and their corresponding sockets. Also, a different hashing algorithm is used (mostly Ketama) to determine the target node for each key. Also, memcache buckets don't have views, so you can't use the View API and it doesn't make much sense to keep View sockets around. So to clarify the previous statement, if you are running against a memcache bucket, for a 10 node cluster you'll only have 11 open connections.

Phase 3: Shutdown
-----------------
Once the `CouchbaseClient#shutdown()` method is called, no more operations are allowed to be added onto the `CouchbaseConnection`. Until the timeout is reached, the client wants to make sure that all operations went through accordingly. All sockets for both memcached and View connections are shut down once there are no more operations in the queue (or they get dropped). Note that that the `shutdown` methods on those sockets are also used when a node gets removed from the cluster during normal operations, so it's basically the same, but just for all attached nodes at the same time.

Summary
-------
After reading this blog post, you should have a much more clear picture on how the client SDK works and why it is designed the way it is. We have lots of enhancements planned for future releases, mostly enhancing the direct API experience. Note that this blog post didn't cover how errors are handled inside the SDK; this will be published in a separate blog post because there is also lots of information to cover.
Comment by Karen Zeller [ 22/Apr/13 ]
Confirm with MikeN:

Classpath dependencies section is up to date for 1.1


Also get a screen shot from Netbeans from him for this process:

 In this example we use the NetBeans IDE, but you can use any other Java-compatible IDE with the Java SDK as well. After you install the IDE and open it:
Select File -> New Project => Maven => Java Application.
Provide a name for your new project "examples" and change the location to the directory you want.
. . We'll use the com.couchbase namespace, but you can use your own if you like (just make sure to change it later in the source files when you copy them).
Comment by Karen Zeller [ 23/Apr/13 ]
Updates from Perry/ Michael on Complex Key (for advanced topics, cross reference from Views chapters - confirm best place where with JamesM)

Perry asked me about more information on the ComplexKey object and the current docs are little bit thin ;)

http://www.couchbase.com/docs/couchbase-sdk-java-1.1/advanced-complexkey.html

I wrote some stuff last year that can either be incorportated or linked, whatever do you think fits:

http://nitschinger.at/New-Features-in-the-Couchbase-Java-Client-1-1-dp4

This whole thing will be replaced by a full "view usage" doc from me in the next weeks, so you may not put huge effort into rewriting it or so! Just make it findable for people..


*In this specific instance, I'm going to be rolling in and editing the blog from Michael into the Language Reference. I will also cross reference it from the Views chapter.
Comment by Karen Zeller [ 24/Apr/13 ]
Sent to Michael for clarification 4/24:

Hi Michael,

So I finished reworking the content for chapter 1 and I noticed when I actually worked on the content for 1.3 and 1.4 it seems abrupt and out of place for a Getting Started chapter to suddenly go into this detail.

This chapter is typically the most bare minimum, hello world type example we already have. I suggest we move 1.3 right after 3.2 and move 1.4, which is primarily CAS and GETL to right after 3.3.

Let me know if this sounds good to you.


Karen
Comment by Karen Zeller [ 30/Apr/13 ]
Request info from Michael:

Hi Michael,

I'm editing/overhauling the tutorial chapter this week now and I need some clarification:

 "If you want to get up and running really quickly, here is how to do it with Jetty. Note that this guide assumes you have MacOS or Linux. If you use Windows, you will need to modify the paths accordingly. Also, make sure to have at least Maven installed on your machine.
 Download Couchbase Server 2.0 and install it. Make sure you install the beer-sample dataset when you run the wizard, because this tutorial will use it.
 Add the following views and design documents to the beer-sample bucket. Later we publish them to production:
 The first design document name is beer and view name is by_name:"

-Are you assuming that someone creates these in Web Console? Or as file then put them from Java SDK, or RESt? In the later steps you run the sample app, so I am assuming they need to create/put into production at this point otherwise the beer sample application will not work.



Regards,

Karen


Comment by Karen Zeller [ 02/May/13 ]
Info from Michael:


On 30.04.2013, at 20:19, "Karen Zeller" <karen.zeller@couchbase.com> wrote:

Hi Michael,
I also notice that the "Quickstart" section is named a pretty vague title. It is really previewing the application if people have Jetty. So I think we should call it that: "Previewing the Application". This also distinguishes it from the next section "Preparation" which I am relabeling "Preparing Your Project"

Sounds good!

In general it is better practice to provide titles as actions intend of things. It is more engaging for the audience:
"Couchbase Server Administration" sounds much better as "Managing Couchbase Server".
Do these changes make sense?
Jup please go ahead with it!
Thanks,
Karen
Comment by Karen Zeller [ 02/May/13 ]
Yes you need to! Please cross ref! Thanks :)



On 02.05.2013, at 19:13, "Karen Zeller" <karen.zeller@couchbase.com> wrote:

Just to make sure I am 100% clear - even at this previewing the
application phase with Jetty, someone needs to go into web console and add
those views?
If so I cross reference how to add views in Web ConsoleÅ ..
Regards,
Karen
Comment by Karen Zeller [ 07/May/13 ]
From Michael:



Logging with the Couchbase Java Client

This blog post tells you everything that you need to know about logging with the Couchbase Java Client (and Spymemcached).


## Introduction
There is a huge variety in logging frameworks for Java, and its hard to please everyone. To understand how logging is handled in the SDK, we have to go back a few years. As you may know, the SDK depends on the [spymemcached]() library and therefore also inherits its logging mechanisms. Back in the days when [@dustin]() wrote spymemcached, there was no good abstraction available (speak SLF4J), so he wrote his own. Nowadays things have changed, but spymemcached still inherits this legacy.

At the time of writing, the SDK supports logging to a simple default logger (logs to STDERR from INFO level up), Log4J and the SunLogger (java.util.logging). In the upcoming 2.9.0 release of spymemcached, it will also support the SLF4J logging facade where you can plug in your own implementation. The next version of the SDK (1.1.6) will depend on spy 2.9, so you'll also get the benefits there.

Before we dig into the concepts, here are the supported Log Levels (defined by `net.spy.memcached.compat.log.Level`):

- TRACE (since 2.9)
- DEBUG
- INFO
- WARN
- ERROR
- FATAL

Keep in mind that different loggers implement different levels, so for some of them a mapping happens. This will be noted during the description of each implementation.

We'll now look at the different logging mechanisms available and how you can configure and work with them. SLF4J will be covered towards the end.

## Switching Logging
If you don't change anything, the default logger will be used. This mechanism just prints log messages to STDERR output (from INFO level upwards). Chances are that you want to integrate the SDK with the same logging library that you use as well (in your codebase). The LoggerFactory inside spy decides at construction which one to choose, based on a system property. So you can either change this programmatically or with a param to the `java` command.

If you want to use the Log4JLogger programmatically, do it this way (before initializing the `CouchbaseClient` object):

Properties systemProperties = System.getProperties();
    systemProperties.put("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.Log4JLogger");
    System.setProperties(systemProperties);

Of course, you need to add the Log4J JAR to your CLASSPATH to make it work (as we'll see later). Alternatively, you can set it dynamically this way:

java -Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.Log4JLogger ...

Now that we are aware of the different implementations, lets look at each of them.

## The Simple Default Logger
If you don't change anything, the SDK will use the DefaultLogger (net.spy.memcached.compat.log.DefaultLogger). This logger has no dependencies and prints every log message that is INFO level or higher (INFO, WARN, ERROR and FATAL) to the systems STDERR. Since the STDERR is covered by most IDEs automatically, you'll also see them in the console output window.

Since its so simple, you can't change the behavior right now. Every log message gets timestamped as well (the format is `yyyy-MM-dd HH:mm:ss.SSS`). Connecting to Couchbase commonly looks like this:

2013-05-07 12:28:41.852 INFO com.couchbase.client.CouchbaseConnection: Added {QA sa=/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
2013-05-07 12:28:41.862 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@3d9360e2
2013-05-07 12:28:41.887 INFO com.couchbase.client.ViewConnection: Added localhost to connect queue
2013-05-07 12:28:41.888 INFO com.couchbase.client.CouchbaseClient: viewmode property isn't defined. Setting viewmode to production mode
2013-05-07 12:28:41.986 INFO com.couchbase.client.CouchbaseConnection: Shut down Couchbase client
2013-05-07 12:28:41.991 INFO com.couchbase.client.ViewConnection: Node localhost has no ops in the queue
2013-05-07 12:28:41.992 INFO com.couchbase.client.ViewNode: I/O reactor terminated for localhost

So the format is always: `<timestamp> <level> <classname> <message>`. Remeber that DEBUG messages or so will not be logged, so you won't see them with the DefaultLogger.

## The SunLogger (java.util.logging)
The SunLogger also doesn't introduce additonal dependencies, since it depends on the `java.util.logging` implementation. The `java.util.logging.Level` enum defines the following levels: ALL, CONFIG, FINEST, FINER, FINE, INFO, WARNING, SEVERE and OFF. Since this does not map well to our defined Levels, here is the mapping that happens:

- TRACE to FINEST (since 2.9)
- DEBUG to FINE
- INFO to INFO
- WARN to WARNING
- ERROR to SEVERE
- FATAL to SEVERE

Without any further changes, the SunLogger also prints from INFO level upwards like this:

May 7, 2013 12:42:16 PM com.couchbase.client.CouchbaseProperties setPropertyFile
INFO: Could not load properties file "cbclient.properties" because: File not found with system classloader.
May 7, 2013 12:42:16 PM net.spy.memcached.MemcachedConnection createConnections
INFO: Added {QA sa=/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
May 7, 2013 12:42:16 PM net.spy.memcached.MemcachedConnection handleIO
INFO: Connection state changed for sun.nio.ch.SelectionKeyImpl@4ce2cb55
May 7, 2013 12:42:16 PM com.couchbase.client.ViewConnection createConnections
INFO: Added localhost to connect queue
May 7, 2013 12:42:16 PM com.couchbase.client.CouchbaseClient <init>
INFO: viewmode property isn't defined. Setting viewmode to production mode
May 7, 2013 12:42:16 PM com.couchbase.client.CouchbaseConnection run
INFO: Shut down Couchbase client
May 7, 2013 12:42:16 PM com.couchbase.client.ViewConnection shutdown
INFO: Node localhost has no ops in the queue
May 7, 2013 12:42:16 PM com.couchbase.client.ViewNode$1 run
INFO: I/O reactor terminated for localhost

If you want to change the log level, do DEBUG and lower, you can do it like this:

Logger.getLogger("com.couchbase.client").setLevel(Level.FINEST);

Now there is one more thing you need to do if you want to print all debug messages to the console. You set the logging level correctly, but the `ConsoleHandler` is not set to debug yet.

for(Handler h : Logger.getLogger("com.couchbase.client").getParent().getHandlers()) {
        if(h instanceof ConsoleHandler) {
            h.setLevel(Level.FINEST);
        }
    }

So, here is a full example on how to use the `SunLogger` and get all Debug messages on the console.

    Properties systemProperties = System.getProperties();
    systemProperties.put("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.SunLogger");
    System.setProperties(systemProperties);

    Logger logger = Logger.getLogger("com.couchbase.client");
    logger.setLevel(Level.FINEST);
    for(Handler h : logger.getParent().getHandlers()) {
if(h instanceof ConsoleHandler){
h.setLevel(Level.FINEST);
}
    }

Then just go ahead and create your `CouchbaseClient` object, you will see detailed output like this (trimmed here):

May 7, 2013 12:54:34 PM com.couchbase.client.vbucket.ReconfigurableObserver update
FINEST: Received an update, notifying reconfigurables about a com.couchbase.client.vbucket.config.Bucketcom.couchbase.client.vbucket.config.Bucket@3d77949
May 7, 2013 12:54:34 PM com.couchbase.client.vbucket.ReconfigurableObserver update
FINEST: Received an update, notifying reconfigurables about a com.couchbase.client.vbucket.config.Bucketcom.couchbase.client.vbucket.config.Bucket@4e927aef
May 7, 2013 12:54:34 PM com.couchbase.client.vbucket.ReconfigurableObserver update
FINEST: It says it is default and it's talking to /pools/default/bucketsStreaming/default?bucket_uuid=adfff22b70e09fafaa26ca37b7e05e9d
May 7, 2013 12:54:34 PM com.couchbase.client.vbucket.ReconfigurableObserver update
FINEST: It says it is default and it's talking to /pools/default/bucketsStreaming/default?bucket_uuid=adfff22b70e09fafaa26ca37b7e05e9d

## Log4J
Most people will need more flexibility, and Log4J was (and still is) standard in lots of applications. The SDK provides support for Log4J as well. To make it work, you first need to set the Instance correctly:

    Properties systemProperties = System.getProperties();
    systemProperties.put("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.Log4JLogger");
    System.setProperties(systemProperties);

Now if you run this, you'll get an error that some of the Log4J classes can not be found. This is not a surprise, because its not on the classpath. Let's fix this by adding it accordingly. If you use maven, add the `log4j.log4j` dependency (current version is 1.2.17). You can also just download the JAR and add it to the CLASSPATH as needed.

Now if we run it again, we get another error:

log4j:WARN No appenders could be found for logger (com.couchbase.client.vbucket.ConfigurationProviderHTTP).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

One way to fix this is to get a correct `log4j.xml` configuration file into our CLASSPATH, but to make it work quickly Log4J provides a `BasicConfigurator`. Right after the system property configurations, call this:

org.apache.log4j.BasicConfigurator.configure();

Now if you run it again, you will see that we get nicely printed log messages. You can also see that they show up straight from the DEBUG level (and even contain information from which thread they got logged):

69 [main] INFO com.couchbase.client.CouchbaseConnection - Added {QA sa=/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
70 [main] DEBUG com.couchbase.client.vbucket.VBucketNodeLocator - Updating nodesMap in VBucketNodeLocator.
73 [main] DEBUG com.couchbase.client.vbucket.VBucketNodeLocator - Adding node with hostname 127.0.0.1:11210.
74 [main] DEBUG com.couchbase.client.vbucket.VBucketNodeLocator - Node added is {QA sa=localhost/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=8}.
74 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG com.couchbase.client.CouchbaseConnection - Done dealing with queue.
74 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG com.couchbase.client.CouchbaseConnection - Selecting with delay of 0ms
79 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG com.couchbase.client.CouchbaseConnection - Selected 1, selected 1 keys
79 [Memcached IO over

You can control the logging levels through the usual Log4J mechanisms. I won't go into detail about them here, so please [check out](http://logging.apache.org/log4j/1.2/manual.html) their official documentation (for example on how to use the `PropertyConfigurator` instead).

Speaking of Log4J, [Steffen Larsen](https://twitter.com/zooldk) implemented a [Log4J appender](https://github.com/zooldk/log4j-couchbase) to store logs in Couchbase (instead of a file)!

## The new Facade: SLF4J
Not binding the application to a specific logging library is always a good idea. SLF4J is a facade for various pluggable logging frameworks behind it. So you can choose the logging implementation during runtime, be it [logback](http://logback.qos.ch/), Log4J or others. Since we already tried Log4J, let's make SLF4J work with Logback, one of the other very common log frameworks out there.

Note that SLF4J support will be available in the 1.9.0 release of spymemcached and therefore also in the 1.1.6 release of the SDK.

First, we need to configure it accordingly:

Properties systemProperties = System.getProperties();
    systemProperties.put("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.SLF4JLogger");
    System.setProperties(systemProperties);

Now, we need to include two JARs into our classpath. The first one is the SLF4J facade API and the other one is our logging framework of choice. The facade API package is called `slf4j-api` (this package always needs to be in place) and since we want to use logback we need to include the `logback-classic` JAR. Note that this is not specific to the SDK, you can find this information [here](http://logback.qos.ch/manual/introduction.html). If you use maven, you can use this snippet:

    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-api</artifactId>
      <version>1.7.5</version>
    </dependency>
    <dependency>
      <groupId>ch.qos.logback</groupId>
      <artifactId>logback-classic</artifactId>
      <version>1.0.12</version>
    </dependency>

SLF4J will automatically pick up our logback implementation, so the logs will look like this:

13:25:43.692 [main] INFO c.c.client.CouchbaseConnection - Added {QA sa=/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect queue
13:25:43.694 [main] DEBUG c.c.c.vbucket.VBucketNodeLocator - Updating nodesMap in VBucketNodeLocator.
13:25:43.697 [main] DEBUG c.c.c.vbucket.VBucketNodeLocator - Adding node with hostname 127.0.0.1:11210.
13:25:43.697 [main] DEBUG c.c.c.vbucket.VBucketNodeLocator - Node added is {QA sa=localhost/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=8}.
13:25:43.698 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG c.c.client.CouchbaseConnection - Done dealing with queue.
13:25:43.699 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG c.c.client.CouchbaseConnection - Selecting with delay of 0ms
13:25:43.702 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG c.c.client.CouchbaseConnection - Selected 1, selected 1 keys
13:25:43.703 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] DEBUG c.c.client.CouchbaseConnection - Handling IO for: sun.nio.ch.SelectionKeyImpl@48ff2413 (r=false, w=false, c=true, op={QA sa=localhost/127.0.0.1:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=8})
13:25:43.703 [Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11210}] INFO c.c.client.CouchbaseConnection - Connection state changed for sun.nio.ch.SelectionKeyImpl@48ff2413
13:25:43.713

As you can see, they also include DEBUG level logging here. If you don't include the logging implementation during runtime, SLF4J will complain at startup:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details

If you want to learn how to configure logback, [look here](http://logback.qos.ch/manual/configuration.html).

## Summary
tba.




[MB-8250] Doc: Need description and best practices on how to add a field/column to all documents within a bucket Created: 13/May/13  Updated: 13/May/13

Status: Open
Project: Couchbase Server
Component/s: clients, documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The two strategies are:
-Update the application to "work" with both types of schemas and add the field to documents as they are accessed
-Create a view to find documents that don't have the new field, and then iterrate over every document to do it.

Want a description on what's being accomplished, why, how, etc including examples in different client libraries.




[MB-8252] Docs: How to work with data across different client languages Created: 13/May/13  Updated: 13/May/13

Status: Open
Project: Couchbase Server
Component/s: clients, documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
There are a few cases where the customer wants to use the same data from different languages and it would be helpful to provide some documentation on how to do this. There are certain things like serialization, etc to take into account.




[MB-8251] Docs: how to handle "document is too large" exception across various SDK's Created: 13/May/13  Updated: 13/May/13

Status: Open
Project: Couchbase Server
Component/s: clients, documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
A specific request for dealing with the exception that comes back when a document is too large...it probably differs across all client libraries




[MB-8247] Provide documentation and examples on using map-reduce views versus elastic search Created: 13/May/13  Updated: 13/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Coming up a lot is the request for "when do I use views and when do I use elastic search"




[MB-8278] Need to provide Tool Tips in Web Console Created: 14/May/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Karen Zeller Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Hi,

I was talking to Maria about 2.1 futures and right now we have all of these field descriptions from Web Console in the Web Console chapter of documentation. Example here: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-web-console-data-buckets-individual.html#couchbase-admin-web-console-data-buckets-tapqueues


Where this information really belong is as tool tips in the UI itself. People are not going to go into the docs to find out what a field means. That is awkward and makes them leave the software. It is much easier to roll over the item and get context-sensitive information about the item in the UI.

I think having the amount of web console doc in documentation is a legacy of not having a UI designer/developer.


It is much easier and a standard industry practice to:

- only document how to manage something in a UI
- not document wizards since the process of how to use a wizard is already in the UI
- provide tool tips for UI elements (what it means, what it does) and make sure they are section 508 compliant.

If we can have this for 2.1, that would be great.

Thanks,

Karen






[MB-7881] Views FAQ from Support/Customer Created: 07/Mar/13  Updated: 14/May/13

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   


Question #1: Main issue is unexpected behavior of XCDR. My team saying that they were able to produce a situation in which "old" keys overriding newer keys on different cluster.

I will approach this by explaining how the XDCR conflict resolution works. I hope that will be the information you need.

If you have further concerns about the behavior, I'll need to know whether you're observing that Couchbase does *not* behave as expected (i.e., it appears to be having some wrong behavior according to the definition), OR if the conflict resolution that Couchbase does is not suitable for your specific needs in this case.

I'll also ask you to provide a clear description of what behavior you're seeing and how it either conflicts with the expected resolution policy, or how it is causing problems specifically for your use case.

This is described in the "Document Handling, and Conflict Resolution" section on this page:

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-tasks-xdcr-functionality.html

I quote:

XDCR automatically performs conflict resolution between the source and destination clusters and is designed to ensure that changes to individual documents are replicated successfully. For each stored document, XDCR looks at the following items to create a check value to resolve conflicts:
Numerical sequence, which is incremented on each mutation
CAS value
Document flags
Expiration (TTL) value
During conflict resolution, XDCR sequentially checks the values until it identifies the document with the highest value. XDCR will use this version of the document for replication. The algorithm is designed to consistently select the same document on either a source or destination cluster.

The key point here is that the main determiner of which document will be selected is which one has been modified most frequently.

As an example, assume two clusters, A and B, with bi-directional replication streams. At time T1, both clusters have the same revision of the document, let's call it revision 8. To think about it more easily, let's say that at this point the link between the two clusters goes down temporarily, and clients are modifying each cluster independently. Examine these events:

T1: initial state: A=8, B=8
T2: link goes down
T3: client updates doc on A: A=9, B=8
T4: client updates doc on A: A=10, B=8
T5: client updates doc on B: A=10, B=9
T6: link goes up, A's version wins: A=10, B=10

I think this is the simplest example I can give. Here we see that the most recent change is not necessarily the one that will win. Rather, it is the version of the doc which has seen the most updates. Indeed, "most recent change" is difficult to determine precisely in distributed systems, and is not reliable when needing to resolve conflicts from N clusters. The algorithm Couchbase uses ensures that each cluster can independently come to the same consistent view of which document wins.

However, this example does point out a case where that algorithm is counter-intuitive. Particularly if there are several minutes, or hours, between T4 and T5, one would expect the revision from cluster B to win, because it was more recent. There is also an intuitive aspect to the "highest revision number wins" algorithm, in that the change made on B was ignorant of the 2 previous changes on A; cluster B had less information available to it than cluster A.

Regardless of which makes more sense to you, this is how Couchbase is defined to work.

In some cases it makes sense to have some cluster-specific documents that ensure there will be no data lost due to conflict resolution. For example, imagine that in our example, the document is a simple integer counter that gets incremented. In the above example, assume the counter starts out at 100:

T1: A=100, B=100
T3: A=101, B=100
T4: A=102, B=100
T5: A=102, B=101
T6: A=102, B=102 // Oops, this should be 103!

It depends on what the counter represents. If it's a statistic counting some common event, it may be fine to drop an increment here or there and not really make any difference, so this may be fine. But if it's counting coins in a bank account, it might be much more important. We could split this into two different documents, then, that track updates on each cluster separately. Like this:

T1: Aa=50, Ab=50, Ba=50, Bb=50 // Total of counter a + b is 100
T3: Aa=51, Ab=50, Ba=50, Bb=50
T4: Aa=52, Ab=50, Ba=50, Bb=50
T5: Aa=52, Ab=50, Ba=50, Bb=51
T6: Aa=52, Ab=51, Ba=52, Bb=51 // No conflicts, total is 103 as expected

So a client that wants the value for this counter needs to get counter_a, counter_b, counter_..., and sum them to get the actual count.

So that's it. Let us know if this resolves the issue for you or not.


Question #2: We created a simple view that aggregate counters stored in JSON format. See the errors in the log below.

[couchdb:error,2013-03-06T9:11:47.600,ns_1@172.31.0.65:<0.32666.4809>:couch_log:error:42]Set view `cdb`, main group `_design/dev_counters`, received error from updater: {too_large_btree_state, 70447}
…

[couchdb:error,2013-03-06T9:11:37.671,ns_1@172.31.0.62:<0.9322.2210>:couch_log:error:42]Set view `cdb`, main group `_design/dev_counters`, writer error
error: {error, {reduction_too_long, <<0,0,0,2,10,… (many more numbers, very long)>>
…

The errors you're getting say: "too_large_btree_state" and "reduction_too_long". This says that you have defined a custom reduce function in your view, and it is growing too fast. It isn't in fact reducing, instead the value computed there is growing larger and larger along with the number of items.

A proper reduction will be of constant size, regardless of the number of items in the view. A simple example of this is the _sum reduction. It is just a single number, no matter how many items are contributing to the sum.

An invalid reduction will grow bigger as more items are aggregated. A simple example is "reducing" to a list of values, one for each document. The reduction value at the root of the B-tree (that is, the reduction over all values in the index) will have N elements in that list.

When you realize that the reduction value is stored in each non-leaf B-tree node, which reduces all the nodes under it, you'll see that such a growing reduction will use up a *very* large amount of space. The B-tree state grows too large; the reduction grows too long.

You don't need to limit reduce values to simple integers. It's OK, for example, to have a reduce with a dictionary (object) with a finite number of members. Or an array with a small number of elements. Etc. But you can't have a reduce value that continues to grow as the size of the input, that will explode the index very quickly.


Question #3: We see errors on the log about cluster taking too long to retrieve key (5 second!!) .

I suspect this problem is related to #2. We'll need to get that sorted out first. I recommend you stick to just the built-in reduce functions (_sum, _count, _stats) first, and ensure things are working with that. Often it is more efficient to put some of the summary logic into the client instead of into the index itself. Also, a reasonable ballpark for the number of views is around 10-20; if you're creating 100 views, for example, probably you're wanting to use a fulltext index instead.

Once that kind of issue is addressed, if you're still seeing problems, we'll want to look at your design documents in full, look at the complete error logs, and dig deeper into it.

Here is a tool we ask you to use when collecting stats and logs from a Couchbase system. Probably we don't need this immediately, but it is helpful to know about for future problem reports:

http://www.couchbase.com/wiki/display/couchbase/Working+with+the+Couchbase+Technical+Support+Team


 Comments   
Comment by Tim Smith [ 07/Mar/13 ]
FYI, question #2 is not XDCR-related. It would go at the end of this section: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing-reduce.html

There is already a Note about it, but probably that could be expanded.
Comment by Karen Zeller [ 13/May/13 ]
Added this info in new Conflict Resolution in XDCR section:

<para>By default XDCR fetches metadata twice from every document before it replicates the document
at a destination cluster. XDCR fetches metadata on the source cluster and looks at the
number of document revisions. It compares this number of revisions for a document with the
number of revisions on the destination cluster and the document with more revisions is
considered the 'winner.' </para>
<para>If the document from a source cluster will win conflict resolution, XDCR puts the
document into the replication queue. If the document will lose conflict resolution because
it has a lower number of mutations than on the destination cluster, XDCR will not put it
into the replication queue. Once the document reaches the destination cluster, this cluster
will request metadata once again from the source and perform actual conflict resolution. The
destination cluster will discard the document version with the lower number of
mutations.</para>

<para>The key point is that the number of document mutations is the key factor that determine whether XDCR keeps a document version or not. This means
that the a document that has the most recent mutation may not be necessarily the one that wins conflict resolution. In fact the most
recently changed document is often difficult to precisely determine in a distributed system. The algorithm Couchbase
Server uses does ensure that each cluster can independently reach a consistent decision on which document wins.</para>
<!-- -->





[MB-7680] Error handling documentation Created: 05/Feb/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Please create some documentation specifying possible error/failures to operations, what they "look" like in the stats and what our recommendation is on how to handle them (pointing back to the relevant SDK documentation)

i.e. tmp_oom, "get miss" (it's technically a failure, let's make it overly obvious what it means), CAS failure, add() failure, replace() failure,

Some of this should be covered in the SDK material, but this bug is specifically for a single page where this information is aggregated that a customer/user could read about how to handle errors both on the server-side and the SDK side.




[MB-8285] DevGuide: metadata reported as 150 bytes needs to be revised Created: 15/May/13  Updated: 15/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Found a few places where the metadata size is still reading 150 bytes and should be reduced to 60 for 2.0.1 and 54 for 2.0.2...maybe we can setup some sort of linking or monitoring as this will continue to change in future versions and customers are stumbling across the old values

http://www.couchbase.com/docs/couchbase-devguide-2.0/cb-store-operations.html
http://www.couchbase.com/docs/couchbase-devguide-2.0/cb-store-operations.html
http://www.couchbase.com/docs/couchbase-devguide-2.0/couchbase-keys.html
\http://www.couchbase.com/docs/couchbase-devguide-2.0/cb-store-operations.html




[MB-8244] add or enable dispatcher stats for each multi-readers/writers thread Created: 10/May/13  Updated: 15/May/13

Status: In Progress
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Jin Lim Assignee: Jin Lim
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-8297] Some key projects are still hosted at Membase GitHub account Created: 16/May/13  Updated: 16/May/13

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.2, 2.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
memcached, libmemcached, grommit, buildbot-internal...

They are important components of build workflow. For instance, repo manifests have multiple references to these projects.

This is very confusing legacy, I believe we can avoid it.




[MB-8302] create couchbase readme.txt file for Windows and Linux Created: 16/May/13  Updated: 16/May/13

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Maria McDuff Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
per bug triage, we need couchbase installation readme.txt for Windows and Linux OS.




[MB-8304] Grommit synchronization is not automated Created: 17/May/13  Updated: 17/May/13

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.2, 2.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Phil Labee
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Currently any project update requires manual checkout at build slaves.

Apart from being annoying it introduces bugs.




[MB-8307] Litmus dashboard becomes unresponsive after fetching big set of results Created: 17/May/13  Updated: 17/May/13

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.1, 2.0.2, 2.1
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Pavel Paulau Assignee: Ronnie Sun
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Chrome latest, Firefox latest

Attachments: PNG File litmus.png    

 Description   
I can see that it eats one of my cores for a couple of seconds.

Firefox tries to kill the script from time to time.

 Comments   
Comment by Ronnie Sun [ 17/May/13 ]
Pavel, Would you plz provide more info: What every user action? What it the network latencies you have thru vpn?

It worked fine on my computer, chrome and safari.

Comment by Ronnie Sun [ 17/May/13 ]
need more info
Comment by Pavel Paulau [ 17/May/13 ]
Sorry, not every:
-- Initial load
-- Clicking filter buttons, especially if there are many results (like ALL or KV)

Latency is about 200-300ms.




[MB-7720] Impossible to remove instaled samples from initialization dialogs when they were selected before, better to start samples loading on the last step of initialization Created: 12/Feb/13  Updated: 27/Feb/13

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Andrei Baranouski Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: 2.1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: cluster run

Attachments: PNG File samples_init.png    

 Description   
steps:
1. in UI console Step 2 of 5 check any sample for installation
2. click Next
3. click Back

result: sample bucket is marked as installed and user can't remove it during initialization( samples are loaded after the second step)

I suppose that all installation/configuration dialogs must be able to change the settings made in the previous steps

can we start samples loading on the last step of initialization(when click Next on Step 5 of 5)?








[MB-7665] Incorrect (negative value) displayed for "other buckets" in Admin, "Data Buckets" Tab Created: 01/Feb/13  Updated: 01/Feb/13

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Michael Leib Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 6.1

Attachments: JPEG File negfilesize.jpg    

 Description   
Please see the attached image. Pretty explanatory - see bottom center "other buckets" value...

I don't know if this only happens during compaction, but I have seen this quite often.




[MB-7553] no warning in UI when memcached can't set maxconns (ulimit -n not in effect) Created: 17/Jan/13  Updated: 17/Jan/13

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Tim Smith Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.0 release, Centos 5.8 x86_64


 Description   
There's no warning in the UI when memcached can't set maxconns because of system resource limits. There is a warning in the memcached.log.2.txt file, but no one is going to look there probably.

Reproduce is easy. Edit /etc/init.d/couchbase-server and comment out "ulimit -n 10240". Ensure hard nofiles limit is not set in /etc/security/limits.conf. Run "/etc/init.d/couchbase-server restart".

Look in most recent memcached*log*txt file in /opt/couchbase/var/lib/couchbase/logs, and notice this warning:

Thu Jan 17 14:08:51.673016 PST 3: WARNING: maxconns cannot be set to (10000) connections due to system
resouce restrictions. Increase the number of file descriptors allowed
to the memcached user process or start memcached as root (remember
to use the -u parameter).
The maximum number of connections is set to 1006.

Look in UI, at logs tab, etc., and notice no warnings anywhere.

There may be other conditions that should bubble up to the UI as well, but this is one I have noticed. This doesn't happen often, but some settings (maybe SELinux or something?) may prevent the ulimit from being run or taking effect, and it is important for the admin to know this.




[MB-7308] Temporarily incorrect data bucket item count after failed 'add server node' Created: 01/Dec/12  Updated: 01/Dec/12

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: None
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Quinn Slack Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
I attempted to add a new server node to an existing 1-node Couchbase cluster, but it failed because the other node didn't have enough RAM. After displaying the error message, the item counts on all of the buckets were 0, even though the status was green. This made it appear (to me, a fairly new Couchbase user) as though all of the data had been lost. Within about 30 seconds, the correct node counts were being displayed again in the UI, and no data was lost.




[MB-7222] Unable to vacuum other than default bucket using "cbdbmaint" tool. Created: 20/Nov/12  Updated: 11/Apr/13

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 1.8.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Vijayaraghavan Mohanasundaram Assignee: Steve Yen
Resolution: Unresolved Votes: 0
Labels: usability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Linux OS, Centos, Couchbase version : 1.8.1


 Description   
Hi team,
     Using cbdbmaint script to vacuum buckets. But no options or writings found to vacuum other than default bucket. Need help on this


 Comments   
Comment by Tim Smith [ 20/Nov/12 ]
We don't recommend people to use separate ports for different buckets anymore, so the --port option for cbdbmaint is insufficient. The tool should take a -bBUCKET option, and a -pPASSWORD, like other tools.




[MB-7013] Changes that are for the GeoCouch version for Apache CouchDB Created: 25/Oct/12  Updated: 13/Mar/13

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: .next
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This bug number tracks changes that are only meant for the GeoCouch version for Apache CouchDB. Like updating the README.




[MB-7442] Information about SDK error-handling on Warmup Created: 18/Dec/12  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Karen Zeller Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Hi Matt, can you get information from the team about SDK handling of this server error:

"To CouchBaser Server clients, ENGINE_TMPFAIL (0x0d) gets generated during warmup."

Is that converted into respective language error objects, or what happens?




[MB-7361] Add Documentation for http://support.couchbase.com/tickets/2144 Created: 05/Dec/12  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 1.8.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Muthu Kumar Assignee: Gokul Krishnan
Resolution: Unresolved Votes: 0
Labels: customer, info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
As a apart of this ticket, http://support.couchbase.com/tickets/2144, we need to add allowable length for the login credentials as a part of documentation.

 Comments   
Comment by MC Brown [ 05/Dec/12 ]
I don't have access to that ticket.

Could you please give a description of what you want added?
Comment by Karen Zeller [ 29/Apr/13 ]
Hi,

Can you please give me the information from that support ticket and add it to this ticket to document?


Thanks,

Karen





[MB-7433] More detailed information about retrieving binary in views Created: 17/Dec/12  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
I think in the case of a document that is binary, the "doc" argument to the map function will be a base64 encoded string. So you can call

 var arrayOfBytes = decodeBase64(doc);

you can get an approximation of the size of the doc from doc.length, or an exact answer from arrayOfBytes.length

I think you can do:

if (meta.deleted) emit("deleted")

But all of this is closer to conjucture about how I think it *should* work, mixed with memories of what the code looks like. I'm not if we have advanced docs on this stuff. CC'ing Karen sho can hopefully point to the place in the docs.

Chris



On Thu, Dec 13, 2012 at 5:41 AM, Perry Krug <perry@couchbase.com> wrote:
Hey man,

-We discussed a while ago having access within the view code to the binary doc object even if it wasn't json. Do we have any write up on how to get access to it and what's possible? Let's imagine it's just a string or csv...how would you go about attacking it?

-Is the meta.* that you see in the view screen all that's avialable? Customer was asking for size of the document...maybe you can just calculate that within the javascript on the doc object?

-Anyway to get a view on deleted items (as we're keeping them around for the time being with xdcr)?

Thanks

 Comments   
Comment by Karen Zeller [ 25/Apr/13 ]
From Dipti: temporary workaround for now, lower priority (priority 3)




[MB-7955] Docs: cbstats documentation needs cleaning up Created: 21/Mar/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This link: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-cmdline-cbstats.html

The text: "Where BUCKET_HOST is the hostname and port (HOSTNAME[:PORT]) combination for a Couchbase bucket, and username and password are the authentication for the named bucket. COMMAND(and[options]) are one of the follow options:"

Is quite confusing and doesn't match the examples.

Additionally, this link: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-monitoring-nodestats.html, doesn't include all of the command descriptions and duplicates some of the information in the previous page. All of this seems to need a bit of reworking




[MB-8078] Compaction process: Long List requiring grouping Created: 11/Apr/13  Updated: 02/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-tasks-compaction-process.html




[MB-8197] The REST API /settings/viewUpdateDaemon does not raise any exception with the user enter invalid login Created: 03/May/13  Updated: 03/May/13

Status: Open
Project: Couchbase Server
Component/s: RESTful-APIs
Affects Version/s: 2.0, 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Tug Grall Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Tested on my 2.0.1 cluster on Linux and Mac

Operating System: Ubuntu 64-bit

 Description   

The command documented here:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-operation-autoupdate.html

are correct but the following command does not return anything (error message, 401, ...) when a user calls the following command with an invalid username password:

curl http://Administrator:passwords@192.168.0.34:8091/settings/viewUpdateDaemon

We should return a 401 error




[MB-7450] Make it easier to identify multiple clusters Created: 19/Dec/12  Updated: 19/Dec/12

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When managing multiple clusters (especially now with XDCR) they all look the same.

It would be great if we could add a very simple "cluster name" that was displayed across the top of the UI. It could be added to the setup wizard when the first node is created and then changeable later on.

Would also help identify logs if we wrap it into that...




[MB-7395] Need way to document how to stop currently running compaction process per-bucket Created: 12/Dec/12  Updated: 08/May/13

Status: Reopened
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When compaction is going to be running for a long time, we need a way to stop it if it is causing problems.

 Comments   
Comment by Dipti Borkar [ 17/Dec/12 ]
Can you add more details of what you mean by causing problems?

Compaction can be stopped if manually started (see image https://www.evernote.com/shard/s161/sh/e16f76d0-91b7-409b-9a40-f205e3dcd6f0/8cf0fd73a7a4c5de79c13f870fdc88fa ). I believe there is a REST API as well. I thought it was documented but may not be.

Assigning to Aliaksey to check on the REST API to stop compacting.
Comment by Aleksey Kondratenko [ 17/Dec/12 ]
Yes, manual compaction can be cancelled.

If it's automatic compaction and you want to stop, you can do it by disabling autocompaction either globally or in bucket details.
Comment by Dipti Borkar [ 17/Dec/12 ]
Aliaksey,

if there is a REST API to cancel, can you please add the details here and assign to Karen for documentation? Thanks

Comment by Dipti Borkar [ 17/Dec/12 ]
I think I found them.

/pools/PoolId/buckets/Id/controller/compactBucket/
/pools/PoolId/buckets/Id/controller/cancelBucketCompaction/
/pools/PoolId/buckets/Id/controller/compactDatabases/
/pools/PoolId/buckets/Id/controller/cancelDatabasesCompaction/
Comment by Perry Krug [ 18/Dec/12 ]
It would be great to have a button on the bucket to stop compaction regardless of whether it was automatically or manually started. The main problem I'm referring to is the fact that compaction adds quite a bit of disk IO and can impact the speed both of the disk writer and background fetches. if an application in production starts experiencing problems because of this we will want to stop compaction...and as part of usability/supportability, there should be an easy and obvious way to do that by the end user.
Comment by Perry Krug [ 18/Dec/12 ]
Adding documentation...

MC/Karen, could we get some documentation specifically on how to stop compaction until we have a button?
Comment by Perry Krug [ 18/Dec/12 ]
Sorry, just saw Dipti's screenshot. So can we just enable that button (and make it work obviously) when compaction is running automatically as well?
Comment by Aleksey Kondratenko [ 18/Dec/12 ]
There's not much point manually stopping automatic compaction. It'll restart itself within 30 seconds. _Right_ way is by disabling autocomaction if that's what you want.

Comment by Perry Krug [ 18/Dec/12 ]
Yeah, that makes enough sense now.

Docs, can we have a writeup on how to stop compaction effectively both the automatic and manual kind?

Comment by Karen Zeller [ 29/Apr/13 ]
Stopping compaction per bucket available here:

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-rest-compacting-bucket.html
Comment by Karen Zeller [ 29/Apr/13 ]
Stopping compaction per bucket available here:

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-rest-compacting-bucket.html
Comment by Perry Krug [ 30/Apr/13 ]
Karen, I think we need a little more work on this one.

-Manual compaction can also be started and stopped through the UI
-How does one stop an automatic compaction?




[MB-8248] GetAndLock doesn't prevent a regular "get" from succeeding Created: 13/May/13  Updated: 13/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
In the dev guide (as well as maybe other places?): http://www.couchbase.com/docs/couchbase-devguide-2.0/get-and-lock.html

The text at the beginning says that getandlock will prevent the retrieval of an item...that will only be the case if getandlock is also used to retrieve it, but a normal get will still succeed.

 Comments   
Comment by Perry Krug [ 13/May/13 ]
Reading further, it seems that a lot of the text here needs to be reworked. Can we get someone from the SDK team engaged to review it?




[MB-8249] Docs: link in dev guide doesn't go anywhere Created: 13/May/13  Updated: 13/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This page: http://www.couchbase.com/docs/couchbase-devguide-2.0/get-and-lock.html

Has a link to "memcached protocol" which doesn't go anywhere: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-architecture-apis-memcached-protocol.html




[MB-8271] Document usage examples of vbuckettool Created: 14/May/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Minor
Reporter: Perry Krug Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-cmdline-vbuckettool.html

There are a few other command-line tools that need similar attention.




[MB-8276] Run Japanese Translations Created: 14/May/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Task Priority: Minor
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Karen Zeller [ 14/May/13 ]
Ran and checked HTML output on server then rsync'd

Need info from Sharon/Atware on what URL this actually appears on after webserver sync.




[MB-8283] Typos from Japanese Translator Created: 14/May/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Task Priority: Minor
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

I should also mention that Japanese translation team found a number of
typos in the original English documentation.
It is tough to submit as issues to JIRA one by one, so I've created a
sheet in the Google Spreadsheet to share with you.
Please find the "Typos" sheet in this document:
https://docs.google.com/a/atware.co.jp/spreadsheet/ccc?key=0Ak1vmI0MjYfZdDYzM2FqYXJWSU5ueTNvTlRHaVZ5aGc&usp=sharing

We put original documentation URL, the sentence and suggestion for correction.
Please check it and update its 'status' column or put your comments if needed.

Thank you!
Koji Kawamura




[MB-8277] Front Matter for Release Notes Created: 14/May/13  Updated: 14/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Task Priority: Minor
Reporter: Karen Zeller Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
As we did for 2.0.1, I need the intro/front-matter for release notes. Currently:

Couchbase Server 2.0.2 is second maintenance release for
Couchbase Server 2.0. This release contains a number of enhancements
including:




[MB-8232] [Doc'd] Multi-reader/writer Created: 09/May/13  Updated: 16/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Karen Zeller [ 09/May/13 ]
Sent draft for review:

Hi,

Below is the draft content on multi-reader-writers for 2.0.2 for review/input:

Release Notes:
<rnentry type="feature">

<version ver="2.0.0m"/>

<class id="db"/>

<issue type="cb" ref="MB-7518"/>


<rntext>

<para>
We now provide multiple readers and writers per data bucket for disk persistence. In the past, Couchbase
Server had only one reader/writer per data bucket. This enhancement provides significant
performance improvements for restoring data, replicating via XDCR, as well as rebalance.
For more information, see <xref linkend="couchbase-introduction-architecture-diskstorage" />.
</para>


</rntext>

</rnentry>


2. Update to existing Disk Storage Section, page 8 of attached PDF.


If I could get your input by Wednesday of next week, that would be great. Please provide on the existing ticket or as an attachment to it: http://www.couchbase.com/issues/browse/MB-8232



Regards,

Karen
Comment by Perry Krug [ 13/May/13 ]
[FIXED]Karen, there's a small bit of discussion that says the mrw feature is designed to "improve cache miss ratios". In fact it doesn't...the ratio of cache misses will not change with mrw, but the misses will be served faster and more efficiently by having multiple threads that can service them in parallel.

NEW = In order to utilize increased disk speeds and improve the read rate from disk

[FIXED: assigns] Also, a typo on page 9: "and assign each thread".
Comment by Karen Zeller [ 16/May/13 ]
Sent for final review:

Mandatory: Jin, abhinav




Generated at Sun May 19 11:04:21 CDT 2013 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.