[MB-8066] Observed rebalance regression from 2.0.1 to 2.0.2 Created: 10/Apr/13  Updated: 22/May/13  Resolved: 21/May/13

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket, performance
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Abhinav Dangeti Assignee: Chiyoung Seo
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.1-170, 2.0.2-755
Centos Physical servers

Attachments: Zip Archive 2.0.1_170.zip     Zip Archive 2.0.1-185-reb-in-diag.zip     Zip Archive 2.0.1-803-reb-in-diag.zip     Zip Archive 2.0.2_755.zip     Zip Archive 2.0.2-809-reb-in-diag.zip     PDF File reb-1.loop_2.0.1-185-rel-enterprise_2.0.2-803-rel-enterprise_thor_May-20-2013_10-35-12.pdf     PDF File reb-1.loop_2.0.1-185-rel-enterprise_2.0.2-809-rel-enterprise_thor_May-21-2013_16-30-38.pdf     File rebalance-2.0.1.svg     File rebalance-2.0.2.svg     PDF File reb-in-litmus.loop_2.0.1-170-rel-enterprise_2.0.2-772-rel-enterprise_litmuses_Apr-24-2013_00-13-16.pdf     PDF File reb-out-large-2.loop_2.0.1-170-rel-enterprise_2.0.2-755-rel-enterprise_thor_Apr-17-2013_11-33-09.pdf     PDF File reb-out-litmus.loop_2.0.1-170-rel-enterprise_2.0.2-772-rel-enterprise_litmuses_Apr-24-2013_00-21-10.pdf     PDF File reb-swap-litmus.loop_2.0.1-170-rel-enterprise_2.0.2-772-rel-enterprise_litmuses_Apr-24-2013_00-29-07.pdf    
Issue Links:
Relates to
relates to MB-8308 Rebalance time regression in large sc... Open

 Description   
Noticed that Rebalance consistently taking longer time in 2.0.2-755 than 2.0.1-170.

Consider the last or last 2 results from each rebalance job on:
http://dashboard.hq.couchbase.com/litmus/dashboard/?env=thor

Also consider the graphs of rebalance jobs attached in http://www.couchbase.com/issues/browse/CBD-914, to note the increase in rebalance times.

Logs and graphs from reb-out-large-2 attached.

 Comments   
Comment by Aleksey Kondratenko [ 17/Apr/13 ]
attaching visualizations of last rebalances from logs above

and about 1.5x time difference is obvious there
Comment by Aleksey Kondratenko [ 17/Apr/13 ]
Looks like difference is mostly due to waiting for checkpoint persistence.

root@beta:~/src/altoros/moxi/ns_server# ./scripts/analyze-rebalance-waiting.rb rebalance-2.0.2-master-events.json
total waitings: 41420.484508275986
total index waitings: 0.4656705856323242
total checkpoint waitings: 41420.01883769035
total vbucket moves: 512
root@beta:~/src/altoros/moxi/ns_server# ./scripts/analyze-rebalance-waiting.rb rebalance-2.0.1-master-events.json
total waitings: 19539.872338056564
total index waitings: 0.6576874256134033
total checkpoint waitings: 19539.21465063095
total vbucket moves: 512


May I ask you guys to retry same litmus test on 2.0.2 but with reduced rebalanceMovesBeforeCompaction value ?
Comment by Aleksey Kondratenko [ 17/Apr/13 ]
Also you will simplify my life a bit by grabbing master events in addition to collectinfos
Comment by Aleksey Kondratenko [ 17/Apr/13 ]
Hm. this is unrelated but apparently this hardware has some sort of forcefully disabled numa.
Comment by Aleksey Kondratenko [ 17/Apr/13 ]
Here's another per-vbucket visualiazation: http://i.imgur.com/fI2EEeI.png

We can see quite dramatic difference in per-vbucket move times. My guess is that current 2.0.2 ends up waiting 64 vbuckets at a time when 2.0.1 did 16 and vbucket persistence prioritization code in ep-engine is not effective anymore.

That large gap between final takeover and last master event before that is likely waiting for second checkpoint persistence that I've found we are not sending to master events.

So my proposal is the following: wait next build with better diagnostics and re-run 2.0.2 with default rebalanceMovesBeforeCompaction and with old value of 16.
Comment by Aleksey Kondratenko [ 17/Apr/13 ]
See above
Comment by Ketaki Gangal [ 19/Apr/13 ]
Hi Aliaksey,

Are there new diagnostic info pushed into the newer builds? "wait next build with better diagnostics" or does this mean, we get cbcollect_info on the bug as additional diags.

We plan to re-run this w/ the latest 202 , 773.

-Ketaki
Comment by Aleksey Kondratenko [ 19/Apr/13 ]
I don't know what 773 means. I'm seeing 768 as latest in http://builds.hq.northscale.net/latestbuilds/

Anyway 2.0.2-768 has newer diagnostics I need.
Comment by Ketaki Gangal [ 19/Apr/13 ]
Yes, 768*.

Ok, will start up the runs w/ latest diags and changed default value.

Thanks!
Ketaki
Comment by Abhinav Dangeti [ 24/Apr/13 ]
The latest run against build 772 still suggests that there is some regression in rebalance times.
Attached reb-in, reb-out and reb-swap comparison charts between 170 and 772.

*Results from litmus jobs.
Comment by Ketaki Gangal [ 24/Apr/13 ]
Does this run also have the "rebalanceMovesBeforeCompaction " as 16?
Comment by Abhinav Dangeti [ 24/Apr/13 ]
Yes.
Comment by Aleksey Kondratenko [ 30/Apr/13 ]
I didn't explicitly mention it here, but for any perf runs that affect rebalance I also need so called master events gathered and attached. In general I can often extract them from collect infos but explicitly getting them is more convenient.
Comment by Abhinav Dangeti [ 30/Apr/13 ]
For the reb-in-litmus job on 2.0.2-772-rel:

All the logs that the parent collected from all the nodes:
- https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/logs.zip

From the client:
Phase1: Stats from the load phase
- https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/load_phase_client.zip

Phase2: Stats from the hot load phase
- https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/hot_load_phase_client.zip

Phase3: Stats from the access phase
- https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus_logs/access_phase_client.zip

Master_events are available under each of the client's stats (phase1, phase2, phase3)
Comment by Aleksey Kondratenko [ 30/Apr/13 ]
Don't have all data I need. Abhinav is aware and will reassign
Comment by Abhinav Dangeti [ 01/May/13 ]
reb-in-litmus job

2.0.2-772-rel:: rebalanceMovesBeforeCompaction=16
Logs: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/logs_772.zip
Load_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/client_load_phase.zip
Hot_load_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/client_hot_load_phase.zip
Access_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/client_access_phase.zip

2.0.1-170-rel:
Logs: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/logs_170.zip
Access_phase: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/170_client_access_phase.zip

Master_events available in access_phase logs.

Comparison graphs: https://s3.amazonaws.com/bugdb/abhinav/772_reb_in_litmus/reb-in-litmus.loop_2.0.1-170-rel-enterprise_2.0.2-772-rel-enterprise_litmuses_May-01-2013_19-11-29.pdf
Comment by Aleksey Kondratenko [ 02/May/13 ]
Indeed still seeing about 2x difference in rebalance time: http://i.imgur.com/XBX3PyI.png

With 2.0.2 being clearly slower
Comment by Aleksey Kondratenko [ 02/May/13 ]
Traced this down to quite massive difference between 2.0.1 and 2.0.2 in time it takes for replica building tap stream to complete backfill phase.

See below:

10 matches for "0.8075.0" in buffer: ns_server.debug.log
  16317:[ns_server:info,2013-04-12T18:01:49.319,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:init:509]Setting {"10.6.2.38",11209} vbucket 1021 to state replica
  16318:[ns_server:debug,2013-04-12T18:01:49.424,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:kill_tapname:1005]killing tap named: replication_building_1021_'ns_1@10.6.2.38'
  16319:[rebalance:info,2013-04-12T18:01:49.428,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:init:568]Starting tap stream:
  16331:[rebalance:debug,2013-04-12T18:01:49.429,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:init:605]upstream_sender pid: <0.8076.0>
  16332:[rebalance:debug,2013-04-12T18:01:49.429,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:handle_call:336]Suspended had_backfill waiter
  16334:[rebalance:info,2013-04-12T18:01:49.429,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:process_upstream:932]Initial stream for vbucket 1021
  16335:[rebalance:debug,2013-04-12T18:01:49.430,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:process_downstream:896]Replied had_backfill: true to [{<17946.9219.1>,#Ref<17946.0.32.211193>}]
  16365:[ns_server:debug,2013-04-12T18:01:49.585,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:process_upstream:946]seen backfill-close message
  16424:[ns_server:debug,2013-04-12T18:01:50.011,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:confirm_sent_messages:727]Going to wait for reception of opaque message ack
  16427:[rebalance:info,2013-04-12T18:01:50.011,ns_1@10.6.2.38:<0.8075.0>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack!


10 matches for "0.8649.0" in buffer: ns_server.debug.log|cbcollect_info_ns_1@10.6.2.38_20130502-011157
  14362:[ns_server:info,2013-05-01T15:05:38.736,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:init:509]Setting {"10.6.2.38",11209} vbucket 1021 to state replica
  14371:[ns_server:debug,2013-05-01T15:05:38.860,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:kill_tapname:1005]killing tap named: replication_building_1021_'ns_1@10.6.2.38'
  14372:[rebalance:info,2013-05-01T15:05:38.865,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:init:568]Starting tap stream:
  14384:[rebalance:debug,2013-05-01T15:05:38.866,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:init:605]upstream_sender pid: <0.8653.0>
  14385:[rebalance:debug,2013-05-01T15:05:38.866,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:handle_call:336]Suspended had_backfill waiter
  14387:[rebalance:info,2013-05-01T15:05:38.869,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:process_upstream:932]Initial stream for vbucket 1021
  14388:[rebalance:debug,2013-05-01T15:05:38.870,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:process_downstream:896]Replied had_backfill: true to [{<15619.4214.1>,#Ref<15619.0.18.184492>}]
  14431:[ns_server:debug,2013-05-01T15:05:39.409,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:process_upstream:946]seen backfill-close message
  14464:[ns_server:debug,2013-05-01T15:05:39.714,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:confirm_sent_messages:727]Going to wait for reception of opaque message ack
  14467:[rebalance:info,2013-05-01T15:05:39.715,ns_1@10.6.2.38:<0.8649.0>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack!

Both sets of lines are replica building ebucketmigrator logs for same vbucket into same node (and hopefully same number if items). Former is for 2.0.1 and later is for 2.0.2.

You can see about 500 milliseconds gap between tap initiation in "seen backfill-close message" message in 2.0.2 and much shorter ~100 milliseconds. And this is consistent with numbers I'm seeing for other nodes and other vbuckets.

Given ns_server didn't have any changes in this area it must be something either in memcached or ep-engine.
Comment by Aleksey Kondratenko [ 02/May/13 ]
See above. Given I don't know how is best to own this on ep-engine side I'm just passing it to Ravi for further distribution.
Comment by Aleksey Kondratenko [ 02/May/13 ]
Sorry accidently assigned back to me
Comment by Ravi Mayuram [ 02/May/13 ]
Chiyoung, can you pls look into this. Thanks,ravi
Comment by Chiyoung Seo [ 06/May/13 ]
Abhinav,

Here are the steps that I took for debugging this issue:

1) Install 2.0.1-185 package (final build for 2.0.1 release) and set up one node cluster
2) Load 1M items
3) Access phase with 200K items and the mixed load (50% set, 50% get)
4) Add the second node and rebalance

Rebalance time ranged from 6 mins 20 secs to 6 mins 50 secs in multiple runs


Repeat the above steps with the 2.0.2-786 build:

Rebalance time ranged from 4 mins 40 secs to 5 mins 10 secs in multiple runs.


Please let me know if your scenarios are quite different from the above ones. Can you also please compare 2.0.1-185 with 2.0.2-786 or latest build?
Comment by Abhinav Dangeti [ 07/May/13 ]
Chiyoung, when I compared 2.0.1-185 against 2.0.2-786, I see a slight regression:
http://qa.hq.northscale.net/job/litmuses-graph-loop/142/
Rebalance in time went from 727s to 800s

However, the 2.0.1 GA was 2.0.1-170 (if I'm not wrong), so I've been comparing all 2.0.2 builds against this:
http://qa.hq.northscale.net/job/litmuses-graph-loop/140/
Rebalance in time went from 491s to 800s

So the bigger regression was from 2.0.1-170 to 2.0.1-185 then?
http://qa.hq.northscale.net/job/litmuses-graph-loop/144/
Comment by Chiyoung Seo [ 08/May/13 ]
Abhinav,

From the graphs that you attached, it seems to me that there is definitely a regression from 2.0.1-170 and 2.0.1-185.

Let me take a look at the changes in ep_engine between these two builds.
Comment by Chiyoung Seo [ 08/May/13 ]
There is only one change in ep-engine between 2.0.1-170 and 2.0.1-185:

http://review.couchbase.org/#/c/25284/

The above commit simply changed a type off_t ==> cs_off_t in couchstore stats. It is NOT related to the rebalance regression.
Comment by Maria McDuff [ 08/May/13 ]
per chiyoung, abhinav running more tests.

abhinav -- pls update with your latest test result.
Comment by Abhinav Dangeti [ 14/May/13 ]
Agree with chiyoung, multiple tests' results show that this rebalance regression (between 170 and 185 at least) is not consistent.
For example, consider this larger scale run, showing no rebalance regression,
http://qa.hq.northscale.net/job/litmuses-graph-loop/171/
Comment by Chiyoung Seo [ 14/May/13 ]
Abhinav,

If you don't see any rebalance regression in a large-scale test, I don't think it's a regression.

I leave it up to you to make a decision regarding if we close this bug or not.
Comment by Maria McDuff [ 14/May/13 ]
per bug triage, assigning to mike to investigate if this regression issue (between 2.0.1 170 vs 185) belongs to ep-engine.
Comment by Chiyoung Seo [ 15/May/13 ]
I need to discuss this with Abhinav to see if we can close this bug or not.
Comment by Chiyoung Seo [ 15/May/13 ]
Discussed it further with Abhinav.

Here is the summary of the things that we observed so far:

1) Rebalance performance in 2.0.2 latest including MRW is comparable (or sometimes better) to 2.0.1 185 build.

2) There is a high variation in rebalance performance between 2.0.1 170 (Linux GA) and 2.0.1 185 build in a small scale test. However, the variation becomes very minor or almost no difference in a large-scale test.

There is only one minor warning fix in ep-engine between 2.0.1 170 and 2.0.1 185 build, which doesn't have any impact on the rebalance performance.

Abhinav will work with the perf team and do more small-scale tests between 2.0.1 170 and 2.0.1 185 in physical nodes cluster.

Abhinav and I agreed that this is not a blocker at this time.


 
Comment by Maria McDuff [ 15/May/13 ]
per bug mtg, wayne to coordinate with ronnie on large scale test between 2.0.1 185 vs 2.0.2 with MRW latest build -- not toybuild.
Comment by Wayne Siu [ 16/May/13 ]
Ronnie,
Can we run the perf tests on the same hardware set with 2.0.1-185 and the latest 2.0.2? Can you give us an estimate when we can expect the results? Thanks.
Comment by Ronnie Sun [ 16/May/13 ]
- There is no regression from 2.0.1-170 to 2.0.1-185 in the reb-in large scale test.

- For 2.0.2 large scale tests, noticed the there is some bizarre cache miss in the acces phase for 2.0.2. Now falls back to medium scale test. ETA tmr morning.
Comment by Ronnie Sun [ 17/May/13 ]
The medim scale tests look ok so far, there are minor regressions MIGHT be variance-related. Will continue the full sets (probably multiple runs) - then to large scale reb out/in tests.
Comment by Ronnie Sun [ 20/May/13 ]
Medium scale rebalance in test (very small cache miss ratio)..

2.0.2 has regression. Plz refer to the graph.

I don't know either ns_server or ep-engine should be diagnosing the problem, so assign to wayne for distribution.
Comment by Wayne Siu [ 20/May/13 ]
Jin,
Can you please take a look at this issue?
Comment by Jin Lim [ 20/May/13 ]
Perf team, can you please specify which particular stats suggesting rebalance regression? disk write kb/s? Thanks.
Comment by Ronnie Sun [ 20/May/13 ]
Hi Jin, rebalance time bumped from 527.37 seconds to 7196.88 seconds.Thanks.
Comment by Jin Lim [ 20/May/13 ]
Thanks, I peeked at logs and graphs and bit confused. All stats other than the total elapsed time look healthy (comparable). I am not sure if this is because of 1) unknown delay from collecting final stats from ep engine or 2) some of changes ep engine team made in the flusher scheduling algorithm (i.e. high priority vbuckets that are rebalancing aren't flushing as fast as before?!?)

Chiyoung is going to take a first look at the issue. Thanks!
Comment by Chiyoung Seo [ 20/May/13 ]
The fix was just pushed into gerrit for review:

http://review.couchbase.org/#/c/26438/

Comment by Wayne Siu [ 21/May/13 ]
Ronnie,
The fix has been merged. Can you run the perf tests again, and update the tickets?
Comment by Ronnie Sun [ 21/May/13 ]
The new build-809 took 5298 seconds.

Graph and diag attached. Thanks.
Comment by Filipe Manana [ 22/May/13 ]
Just going though this, the last comment from Ronnie says:

"The new build-809 took 5298 seconds. "

And his previous one says:

"Hi Jin, rebalance time bumped from 527.37 seconds to 7196.88 seconds.Thanks."

Does it refer to the same test and environment? (527.37 vs 5298 seconds, so why is it considered fixed?)

thanks




[MB-8329] ui doesn't show document content Created: 21/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.2-807-rel
<manifest><remote name="couchbase" fetch="git://github.com/couchbase/"/><remote name="membase" fetch="git://github.com/membase/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="adb5d29b739fc98d721665978091807d9abe74f1"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="d03c48ac89e92ad0a674a6653377fe68e38a1aba"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/><project name="couchbase-cli" path="couchbase-cli" revision="ba4b6057ca214715e85ed11522601e2e13b88c1d" remote="couchbase"/><project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="3bb4e9525bc6b9a600b2285384d2e20d0cae9723"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/><project name="couchdb" path="couchdb" revision="3d9a23c4a79767aec43c05636e117eedf690a4f1"/><project name="couchdbx-app" path="couchdbx-app" revision="e7e3680d94ff592915bb369a74756e7e8ba64365"/><project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="26873dc894b3db0fa67424da7c41ad4d37e82650"/><project name="healthchecker" path="healthchecker" revision="32d33ebee48ddfcbfa0163946f71fb0d4381f6dc"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

Attachments: PNG File doc_old.png     PNG File rsz_doc_new.png    
Operating System: Centos 64-bit

 Description   
earlier documents page shown id and content, now content is omitted

 Comments   
Comment by Aleksey Kondratenko [ 21/May/13 ]
Sorry. Not sure I understand what's wrong here.

Can you elaborate please ?
Comment by Iryna Mironava [ 22/May/13 ]
earlier we had 2 columns on this page - ids and contents of documents, now contents column is disappeared
Comment by Maria McDuff [ 22/May/13 ]
bumping to major. we need it fixed for 2.0.2 release.




[MB-8308] Rebalance time regression in large scale DGM vperf tests Created: 17/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server, performance, view-engine
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 4 physical boxes, 128GB ram, 24 cores, 2x SATA.
3 - > 4, 1 ddoc
build 2.0.2-740 http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-740-rel.rpm.manifest.xml

Issue Links:
Relates to
relates to MB-8066 Observed rebalance regression from 2.... Resolved
Operating System: Centos 64-bit

 Description   
Timings (builds vs. rebalance time in sec):
--------------------
| 170 | 19763 |
--------------------
| 740 | 28559 |
| 761 | 26748 |
| 799 | 37526 |
| 803 | 34903 |
| 807 | 26430 |
| 809 | 26430 |
--------------------

diags, master events, atop:
2.0.1-170 - http://172.23.96.10:8080/view/views/job/apollo-views/105/artifact/
2.0.2-740 - http://172.23.96.10:8080/view/views/job/apollo-views/118/artifact/
2.0.2-809 - http://172.23.96.10:8080/view/views/job/apollo-views/121/artifact/


 Comments   
Comment by Maria McDuff [ 20/May/13 ]
pls re-run in build 807
Comment by Pavel Paulau [ 21/May/13 ]
I did improve but something is wrong for a long time, it's 1.3-1.4x slower at least since build 2.0.2-740.




[MB-8306] Number of document mutations pending XDC replication is rapidly growing Created: 17/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Junyi Xie
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: VMs, 24GB RAM, 4 cores.

build 2.0.2-802

4 <-> 4 bidir replication, non-DGM.

Attachments: PDF File xperf-mixed-bi_2.0.1-170-rel-2.0.2-787-rel.pdf     PDF File xperf-mixed-bi_2.0.1-170-rel-2.0.2-809-rel.pdf     PDF File xperf-mixed-bi_2.0.2-787-rel-2.0.2-788-rel.pdf     PDF File xperf-mixed-bi_2.0.2-787-rel-2.0.2-809-rel.pdf     PDF File xperf-mixed-bi_2.0.2-788-rel-2.0.2-809-rel.pdf    
Operating System: Centos 64-bit

 Description   
Also time spent waiting for checkpointing is much higher.

Diags (internal IP address):
http://172.23.96.10:8080/view/xdcr/job/xperf-lnx-bi/25/artifact/


 Comments   
Comment by Junyi Xie [ 17/May/13 ]
In the log, the XDCR worked as expected. No error was recorded within XDCR during test except 1 or 2 of checkpoint timeout, which should not matter much in terms of performance.

The increasing number of docs pending XDC is most likely due to the increasing drain rate (hence increasing the inflow rate to XDCR) , not because XDCR replicated significantly slower. This can be verified by the drain rate stat. This is not regression.

The remaining question is, why checkpoint at destination becomes a bit longer. There are around 12 checkpoints per vb during the test, the average checkpoint time in 2.0.2 like 48K/(1024*12) = 4sec while in 2.0.1, it is around 1 sec. If i am correct, this should be a priority checkpoint in ep-engine, which also used in rebalance, I am not sure if it is a side-effects of MRW.


Comment by Junyi Xie [ 17/May/13 ]
1. increasing pending docs
This is not a XDCR regression to me but we can try increase # of maxConcurrentReps from 32 to higher number to see if it is better.


2 slower checkpoint
From discussion with Mike, recent ep_engine change to flusher may make checkpoint slower. XDCR will not complain much since we only issue checkpoint once every 30 min, but rebalance may possibly slow down due to slower checkpoint.


Comment by Pavel Paulau [ 20/May/13 ]
btw, it looks like regression happened in build 2.0.2-788:

http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-788-rel.rpm.manifest.xml

http://172.23.96.10:8080/view/graphs/job/graph-loop/184/artifact/xperf-mixed-bi.loop_2.0.2-787-rel-enterprise_2.0.2-788-rel-enterprise_DEST_May-20-2013_09%3A19%3A53.pdf
Comment by Ketaki Gangal [ 20/May/13 ]
The drain rate on 787 (12.5k) is faster than 788( 10.2k )
Comment by Junyi Xie [ 20/May/13 ]
Hi Pavel,

I found several stats are missing in your report.

1) In report 787vs 788, from page 52-54, there are no stats of persistent time and lag for build 788.

2) more importantly, in 2.0.2 we introduced several new XDCR stats, can you please add them in your report? Namely, there are
    - the four stats at the 4th row in Outbound XDCR stats section
    - the single stat the 5th row in Outbound XDCR stats section

See attached screenshot for the name. They are better than "xdc pending docs" to measure the performance of XDCR.

Comment by Pavel Paulau [ 20/May/13 ]
Ok, thanks for update. I will add these stats to the report.
Comment by Junyi Xie [ 20/May/13 ]
Thanks Pavel. Once you have these stats, please rerun the test (before and after MRW) so we can have deeper understanding what happened.
Comment by Junyi Xie [ 20/May/13 ]
Pavel,

I chatted with Jin that ep_engine merged global io manager in build 788. This may possibly explain the issues you saw in test 787 vs 788.

At the meantime, please finish all the new stats and fix the current stats in test (persistent time and lag).

After that, It would be nice to run test for at least three builds

1) 2.0.1-170 as baseline

2) 2.0.2-787, this is the build before IO manager, MRW and all other recent stuff.

3) latest build with all recent changes, Jin is now working on some fixes and he should merge the fixes by end of today. Once the build is out, Jin will update this bug with build number you could try.

Thanks.




Comment by Pavel Paulau [ 21/May/13 ]
Attaching reports with additional/updated stats.

There is obvious improvement in 2.0.2-809 but regression is not fixed completely.
Comment by Junyi Xie [ 21/May/13 ]
Pavel,

First, it seems you still miss some important stats. There is no "percent of complete" in your report (it is the single stat in the 5th row in outbound xdcr stat section, please see my early comment), and the persistent latency and lags in your original report (xperf-mixed-bi.loop_2.0.1-170-rel-enterprise_2.0.2-802-rel-enterprise_DEST_May-16-2013_22%3A42%3A54.pdf) are gone in your latest report, is there any reason for that?

There is no much difference of avg XDC ops (4553.16 vs 4492.56) between 2.0.1(build 170) and latest build 809. As I said before, the accumulated pending docs is mostly due to faster ep_engine drain rate, which IMHO is a good thing in general, you may want to increase the max parallel replicators to speed up XDCR rate.


There is a big regression from 787 to 788, but ep_engine team seems have fixed the issue quite a bit in 809.



Comment by Pavel Paulau [ 21/May/13 ]
Unlike other metrics "percent_completeness" has no aggregated value, it's always associated with unique replication ID:
"replications/c9f298565529d88780b48545e78c11dc/default/default/percent_completeness"

Other metrics have common interface, e.g. "replication_active_vbreps" or "replication_bandwidth_usage". That's why it's missed.
Comment by Junyi Xie [ 21/May/13 ]
 "percent_completeness" is similar to other lazy computed stats ""ms doc ops latency" and "ms meta ops latency", these three stats are all per replication stat now, that is why they come with a uuid prefix (see commit http://review.couchbase.org/#/c/26247 , I think Alk has backported it to 2.0.2)

Apparently you have successfully added ""ms doc ops latency" and "ms meta ops latency" into your report, I am not sure why "percent_completeness" cannot be added.

Also, I really hope you can add the "persistence latency" and "xdcr lag" stats back to your report., which will be very useful to measure the lag from memory to disk.
Comment by Pavel Paulau [ 21/May/13 ]
There are "replication_docs_latency_aggr", "replication_docs_latency_wt", "replication_meta_latency_aggr", "replication_meta_latency_wt". I used them in my report.
Comment by Pavel Paulau [ 21/May/13 ]
Please take a look at "xperf-mixed-bi_2.0.2-787-rel-2.0.2-809-rel.pdf". There is no difference in drain rate but "outbound XDCR mutations" is still higher (and its derivative).

+doc/meta latency.

I agree with ep_engine regression but I can't agree that it's resolved. Probably it's expected trade-off...
Comment by Junyi Xie [ 22/May/13 ]
In 2.0.2, it will be incorrect to use these per-replication stats without prefix "replication_docs_latency_aggr", "replication_docs_latency_wt", "replication_meta_latency_aggr", "replication_meta_latency_wt". They will give you wrong resuts if you have more than 1 replication from that bucket.
Today, in your report there is only one replication so you luckily see the correct results, but I highly suggest fix it fundamentally. To get the correct stats, you need get the prefix for each replication from ns_server (I am not sure if ns_server provides such API to return you the replication id, if not, they need to implement it).

At the meantime, you can implement "percent of complete" like stats ""ms doc ops latency" and "ms meta ops latency", it can be computed as

100 * docs_checked / (docs_checked + changes_left)


Again, this just works for the case that only one outbound replication is created for a bucket, it is a temporary workaround to allow you create that that in your report. In near future, you may need to fix it.





Couchbase logo needs to be updated on UI, desktop and program-settings icon (MB-7804)

[MB-8010] Couchbase Logo update on Windows Installer Created: 02/Apr/13  Updated: 22/May/13  Resolved: 16/May/13

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Technical task Priority: Blocker
Reporter: Anil Kumar Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File banner.bmp     File couchbase.ico     PNG File new.png     PNG File old.png     File sidebar.bmp    

 Comments   
Comment by Anil Kumar [ 10/Apr/13 ]
Can you take look at the Spec and logo assets and let me know if you've what you need to make the changes.
Comment by Steve Yen [ 16/Apr/13 ]
Anil,
Assigning to you to get the right assets from the visual design folk. Bin's sent email to you regarding what he needs.

Once you've got the new stuff, please attach them here and reassign this jira issue back to Bin.
Thanks,
Steve
Comment by Bin Cui [ 18/Apr/13 ]
http://review.membase.org/#/c/25770/
Comment by Maria McDuff [ 23/Apr/13 ]
pls verify / close.
Comment by Shashank Gupta [ 24/Apr/13 ]
Verified.
Build : 2.0.2-772-rel
Comment by Anil Kumar [ 02/May/13 ]
As discussed attached are new images please fix.
Comment by Bin Cui [ 02/May/13 ]
update revised bmp and icn files
Comment by Maria McDuff [ 06/May/13 ]
pls verify / close.
Comment by Shashank Gupta [ 07/May/13 ]
I couldn't find banner image in the latest build during setup. Attaching the screenshots of the old(having banner image) and new(no banner) setup processes.
Comment by Bin Cui [ 07/May/13 ]
I found the problem and get it fixed at http://review.couchbase.org/#/c/26141/. It should be included in next build.
Comment by Shashank Gupta [ 07/May/13 ]
Ok. Will verify then.
Comment by Shashank Gupta [ 09/May/13 ]
Verified with build 2.0.2-787
Comment by Thuan Nguyen [ 16/May/13 ]
The icon in
/cygdrive/c/Program Files/Couchbase/Server/share/couchdb/www/favicon.ico need to update to the new logo
Comment by Bin Cui [ 16/May/13 ]
http://review.couchbase.org/#/c/26369/
Comment by Thuan Nguyen [ 22/May/13 ]
Integrated in github-couchdb-preview #584 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/584/])
    MB-8010: update couchbase logo (Revision 3d9a23c4a79767aec43c05636e117eedf690a4f1)

     Result = SUCCESS
Bin Cui :
Files :
* share/www/favicon.ico




[MB-8328] High latency variation and regression in query throughput Created: 21/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server, performance, view-engine
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Pavel Paulau Assignee: Aliaksey Artamonau
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PDF File vperf-lnx_2.0.1-170-rel_2.0.2-755-rel.pdf     PDF File vperf-lnx_2.0.1-170-rel_2.0.2-809-rel.pdf     PDF File vperf-lnx_2.0.2-755-rel_2.0.2-758-rel.pdf     File vperf-lnx.conf    

 Description   
Build vs. query throughput
 -----------------
| 170 | 1357 |
 -----------------
| 740 | 1210 |
| 745 | 1338 |
| 755 | 1363 |
 -----------------
| 758 | 579 |
| 759 | 571 |
| 760 | 583 |
| 765 | 531 |
| 770 | 597 |
| 790 | 530 |
| 799 | 561 |
| 803 | 587 |
| 807 | 573 |
| 809 | 576 |
 ---------------

80th and 90th percentiles (that we usually track) are not affected but 99th and 99.9th percentiles are much worse.
Obviously average throughput regressed (>2x).

Diags, stats, atop:
2.0.1-170 - http://172.23.96.10:8080/view/views/job/vesta-views/45/artifact/
2.0.2-755 - http://172.23.96.10:8080/view/views/job/vesta-views/62/artifact/
2.0.2-758 - http://172.23.96.10:8080/view/views/job/vesta-views/60/artifact/
2.0.2-809 - http://172.23.96.10:8080/view/views/job/vesta-views/63/artifact/

Manifests:
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_2.0.2-755-rel.rpm.manifest.xml
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_2.0.2-758-rel.rpm.manifest.xml

Changes in bucket_engine:
2a797a8 MB-6144: fixed typo
0992a55 MB-6144: Some packages used incorrect offsets

Changes in ns_server:
301435b MB-100: fixed build error due to bad .PHONY and rule interaction
414e304 MB-100: provide api for testrunner for messing with memcached
759135b MB-7398: added CHANGES entry for hostname management feature
4ac8e1b CBD-220: run memcached and ns_server from babysitting VM
ec84aad CBD-220: [ale] allow syncing all log messages to all sinks
717902e CBD-220: switched ns_server_sup to supervisor2
68c1e9f MB-100: added typespec to ns_pubsub:subscribe_link
f1032ac MB-7398: don't allow hostnames without dot
337c98e MB-100: moved samples_loader_tasks earlier in ns_server_sup list
89661aa MB-5307 Handle failure to save address after rename.
1dd0e9e MB-7398 Rename added node to what was specified as its address.
c2b0108 MB-7398 Ability to specify server hostname in wizard.
3fc9d9d MB-7398 REST call to rename a node.
e6aa76a MB-7398 Handle ip_start file in dist_manager.
e3b488a MB-7398 Move is_good_address to misc.
531d168 MB-5307 Try to stop net_kernel on dist_manager start.
400c20a MB-5307 Remove references to non-existent path-box class.


 Comments   
Comment by Maria McDuff [ 21/May/13 ]
per bug triage, what build number did you run against?
What is the actual issue?
Comment by Pavel Paulau [ 22/May/13 ]
There is a table in description with builds and query throughput which is 2-2.5x lower since build 2.0.2-758.
Comment by Pavel Paulau [ 22/May/13 ]
Filipe,
May I ask you to take a look? I know this is not your issue but your input could help.
Comment by Filipe Manana [ 22/May/13 ]
I don't what might be the problem here.

Nothing has changed in the view engine for 2.0.2 for a long time:

commit 3d9a23c4a79767aec43c05636e117eedf690a4f1
Author: Bin Cui <bin.cui@gmail.com>
Date: Thu May 16 19:10:23 2013 -0700

    MB-8010: update couchbase logo

commit 586e4bb73b92db4362192616370c4e3edb8c34a0
Author: Filipe David Borba Manana <fdmanana@apache.org>
Date: Wed Apr 17 12:50:51 2013 +0100

    MB-8109 Set pending transition for outdated updater groups
    
    This was missing in MB-7522, commit ac82c60302422747e8804d566211a20684ec78fb.



The last change is not on the query path and can't have any influence on it.

Is it possible to know if there's a network issue? Like between some pair of nodes, network latency is high for e.g.
Ep-engine has been having lots of changes recently, perhaps it has some influence - I remember seeing MB-8273 where the memcached process was using huge amount of swap.

Unfortunately at the time I'm unable to read the atop files:

fdmanana 10:41:53 ~/jira/MB-8328 > atop -r 10.2.1.65.atop
raw file 10.2.1.65.atop has incompatible format
(created by version 1.25 - current version 1.26)
trying to activate atop-1.25....
activation of atop-1.25 failed!
Comment by Filipe Manana [ 22/May/13 ]
Even one of the CentOS VMs in the office, has a too recent atop:

[root@cen-0408 ~]# wget http://172.23.96.10:8080/view/views/job/vesta-views/63/artifact/10.2.1.65.atop
--2013-05-22 02:49:23-- http://172.23.96.10:8080/view/views/job/vesta-views/63/artifact/10.2.1.65.atop
Connecting to 172.23.96.10:8080... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4880872 (4.7M) [application/octet-stream]
Saving to: `10.2.1.65.atop'

100%[========================================================================================================================================>] 4,880,872 2.59M/s in 1.8s

2013-05-22 02:49:24 (2.59 MB/s) - `10.2.1.65.atop' saved [4880872/4880872]

[root@cen-0408 ~]# atop -r 10.2.1.65.atop
raw file 10.2.1.65.atop has incompatible format
(created by version 1.25 - current version 1.26)
trying to activate atop-1.25....
activation of atop-1.25 failed!
Comment by Filipe Manana [ 22/May/13 ]
Checked the atop files after login to 10.2.1.65. Saw nothing particularly odd, and there was plenty of free memory per node (10Gb to 15Gb) all the time, and no swap used.

Double checking the manifests of 755 and 758, the couchdb revision is the same for both - so it's clearly not a regression introduced in the view engine.
I
Comment by Pavel Paulau [ 22/May/13 ]
Aliaksey,

Basically there were only ns_server changes. Could you share your feedback?
Comment by Filipe Manana [ 22/May/13 ]
Just run a single node test. Used the attached vperf-lnx.conf to create a bucket - it's the same as in testrunner/conf/perf except it creates 1 ddoc (with 3 views) instead of 3 ddocs (each with 3 views).

Then built 755 and 758 and used the same database files.

And then used the view_query_perf script (couchbaselabs) to spawn 10 clients, each doing 10 000 equal queries (all hitting the page cache). I've confirmed that 758 offers higher latency (and repeated the test twice), however my results were not as bad as the ones Pavel got in a cluster.

Build 755:

$ ./run --workers 10 --queries 10000 --query-url 'http://localhost:9500/default/_design/A/_view/name_and_email_by_category_and_and_coins?startkey=[0,0]&endkey=[0,201.48]&limit=30' --output-times 755_10.txt
Spawning 10 workers, each will perform 10000 view queries
View query URL is: http://localhost:9500/default/_design/A/_view/name_and_email_by_category_and_and_coins?startkey=[0,0]&endkey=[0,201.48]&limit=30

Waiting for workers to finish...

All workers finished. Final statistics are:

    Average response time: 5.507140460000002ms
    Highest response time: 49.601ms
    Lowest response time: 1.582ms
    Response time std dev: 2.0056106158202724ms
    # of errors: 0

Saving query response times to file 755_10.txt


Build 758:

$ ./run --workers 10 --queries 10000 --query-url 'http://localhost:9500/default/_design/A/_view/name_and_email_by_category_and_and_coins?startkey=[0,0]&endkey=[0,201.48]&limit=30' --output-times 758_10.txt
Spawning 10 workers, each will perform 10000 view queries
View query URL is: http://localhost:9500/default/_design/A/_view/name_and_email_by_category_and_and_coins?startkey=[0,0]&endkey=[0,201.48]&limit=30

Waiting for workers to finish...

All workers finished. Final statistics are:

    Average response time: 6.2769707200000004ms
    Highest response time: 60.328ms
    Lowest response time: 1.356ms
    Response time std dev: 2.5133877564678584ms
    # of errors: 0

Saving query response times to file 758_10.txt



Histograms:

755:

$ cat 755_10.txt | perl -MStatistics::Histogram -e '@data = <>; chomp @data; print get_histogram(\@data);'
Count: 100000
Range: 1.582 - 49.601; Mean: 5.507; Median: 5.057; Stddev: 2.006
Percentiles: 90th: 7.653; 95th: 9.056; 99th: 12.781
   1.582 - 2.477: 184 |
   2.477 - 3.682: 7251 ########
   3.682 - 5.304: 50257 #####################################################
   5.304 - 7.489: 31428 #################################
   7.489 - 10.430: 8181 #########
  10.430 - 14.391: 2181 ##
  14.391 - 19.725: 423 |
  19.725 - 26.907: 61 |
  26.907 - 36.578: 10 |
  36.578 - 49.601: 24 |

758:

$ cat 758_10.txt | perl -MStatistics::Histogram -e '@data = <>; chomp @data; print get_histogram(\@data);'
Count: 100000
Range: 1.356 - 60.328; Mean: 6.277; Median: 5.637; Stddev: 2.513
Percentiles: 90th: 9.399; 95th: 11.049; 99th: 14.476
   1.356 - 2.264: 67 |
   2.264 - 3.521: 4192 ######
   3.521 - 5.264: 37798 #####################################################
   5.264 - 7.677: 36211 ###################################################
   7.677 - 11.020: 16684 #######################
  11.020 - 15.652: 4437 ######
  15.652 - 22.068: 547 #
  22.068 - 30.957: 36 |
  30.957 - 43.270: 8 |
  43.270 - 60.328: 20 |




[MB-8215] [windows] firewalled node is seen as healthy Created: 08/May/13  Updated: 22/May/13  Resolved: 20/May/13

Status: Closed
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Deepkaran Salooja
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 2.0.2-786
<manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="8adbb64f2fd38c89cd8e2f21e49d593577ca548f"><copyfile dest="Makefile" src="Makefile.top"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="b27577b5e1f476a50432b5f57549821b2c189cc6"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/><project name="couchbase-cli" path="couchbase-cli" revision="f550cdac33c231a13b6025a281d6308a518bcab2" remote="couchbase"/><project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="477f595ebbfc6ef1429a7ba7215830fd56269688"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/><project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/><project name="couchdbx-app" path="couchdbx-app" revision="ce8722ea78596663a1932881e1ea51af0164a313"/><project name="couchstore" path="couchstore" revision="8de31a9e4232688de0b0fa70e218601881cdd0af"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="e98e0059f8da7927a9e5c7896d8a69e5656befbb"/><project name="healthchecker" path="healthchecker" revision="53b4ae787cb93f53c3dbaa90c266882a211ff4d8"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

Operating System: Windows 64-bit

 Description   
.29 node was firewalled
Domain Profile Settings:
----------------------------------------------------------------------
State ON
Firewall Policy BlockInbound,AllowOutbound
LocalFirewallRules N/A (GPO-store only)
LocalConSecRules N/A (GPO-store only)
InboundUserNotification Disable
RemoteManagement Disable
UnicastResponseToMulticast Enable

Logging:
LogAllowedConnections Disable
LogDroppedConnections Disable
FileName %systemroot%\system32\LogFiles\Firewall\pfirewall.log
MaxFileSize 4096


Private Profile Settings:
----------------------------------------------------------------------
State ON
Firewall Policy BlockInbound,AllowOutbound
LocalFirewallRules N/A (GPO-store only)
LocalConSecRules N/A (GPO-store only)
InboundUserNotification Disable
RemoteManagement Disable
UnicastResponseToMulticast Enable

Logging:
LogAllowedConnections Disable
LogDroppedConnections Disable
FileName %systemroot%\system32\LogFiles\Firewall\pfirewall.log
MaxFileSize 4096


Public Profile Settings:
----------------------------------------------------------------------
State ON
Firewall Policy BlockInbound,AllowOutbound
LocalFirewallRules N/A (GPO-store only)
LocalConSecRules N/A (GPO-store only)
InboundUserNotification Disable
RemoteManagement Disable
UnicastResponseToMulticast Enable

Logging:
LogAllowedConnections Disable
LogDroppedConnections Disable
FileName %systemroot%\system32\LogFiles\Firewall\pfirewall.log
MaxFileSize 4096

Ok.



 Comments   
Comment by Iryna Mironava [ 08/May/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-8215/16fc64ab/172.27.33.26-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8215/16fc64ab/172.27.33.27-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8215/16fc64ab/172.27.33.29-diag.zip

Comment by Maria McDuff [ 08/May/13 ]
iryna, can you check if the couchbase-ports are enabled or disabled?
did you also specifically disabled ports for couchbase?


Deep -- since Iryna is on vacation, can you take a look? thanks.
Comment by Aliaksey Artamonau [ 08/May/13 ]
I see that .29 receives fresh heartbeats from other nodes when it supposed to be firewalled. I know nothing about windows firewall but it seems that you're doing something wrong.
Comment by Deepkaran Salooja [ 17/May/13 ]
Tested below on a 4 node windows cluster(2.0.2-803-rel)

1. Created a 4 node cluster and default bucket.

2. Enabled windows firewall on node 10.3.2.25. Verified connecting to this node by following means:

bash> rdesktop 10.3.2.25
Autoselected keyboard map en-us
ERROR: 10.3.2.25: unable to connect

bash> ssh Administrator@10.3.2.25
ssh: connect to host 10.3.2.25 port 22: Connection timed out

Http access returns - Error 118 (net::ERR_CONNECTION_TIMED_OUT): The operation timed out.
http://10.3.2.25:8091/

[root@caper-012 bin]# ./cbstats 10.3.2.25:11210 all
Stats '' are not available from the requested engine.


But still the node .25 is shown as healthy in the cluster.
Enabled auto-failover. But .25 didn't get failed over.

3. Loaded 10k items in the default bucket. 7.5k items get loaded. No error is returned to the client.

Cluster is available for debugging at http://10.3.2.23:8091

Attaching logs from 3 nodes as I can't login to the .25 node now.
Comment by Deepkaran Salooja [ 17/May/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-8215/e9125b6b/10.3.2.23-5172013-457-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8215/e9125b6b/10.3.2.24-5172013-458-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8215/e9125b6b/10.3.2.73-5172013-52-diag.zip
Comment by Aliaksey Artamonau [ 17/May/13 ]
Our auto failover mechanism is based on the heartbeats that every node sends to every other node every 5 seconds. Of course we don't print out all of the heartbeats to the logs. But we do log all node statuses every 60 seconds. Every status has a timestamp attached to it. So here's the differences between pairs of consecutive timestamps for node .25 in microseconds starting from the time when auto failover was enabled:

[60000000,60000000,59999999,60000001,60000000,60000000,
 60000000,60000000,60000000,60000006,59999994,60000000,
 60000000,60000000,60000000,60000000,60000002,59999998,
 60000000,59999999,60000003,59999998,60000000,60000000,
 59999999,60000001,60000000,60000000,60000000,60000000,
 59999999,60000001,60000000,60000000,59999999,60000001,
 60000000,60000000,60000000,60000000,59999999,60000001,
 60000000,60000000,60000002,59999998,60000000,60000000,
 60000000,60000000,60000000,59999999,60000001,60000000,
 60000000,60000000,60000000,60000000,60000004,59999996,
 60000000,60000000,60000000,60000000,59999999,60000001,
 60000000,60000000,59999999,60000003,59999998,59999999,
 60000001,60000000,60000000,60000000,60000000,60000000,
 60000000,60000000,60000000,60000000,60000000,60000000,
 59999999,60000001,60000000,59999999,60000001,60000000,
 60000000,60000000,60000000,60000000,60000000,59999999,
 60000001,60000000,59999999,60000001,60000000,59999999,
 60000001,60000000,60000000,59999999,60000001,59999999,
 60000001,60000000,60000000,59999999,60000001,59999999,
 60000001,60000000,59999999,60000001,60000000,60000002,
 59999998,60000000,60000000,60000000,60000000,60000000,
 60000000,59999999,60000000,60000001,59999999,60000000,
 60000001,59999999,60000001,60000000,60000000,59999999,
 60000001,59999999,60000001,60000000,59999999,60000001,
 60000000,60000000,59999999,60000001,59999999,60000000,
 60000001,60000000,59999999,60000001,60000000,60000000,
 59999999,60000001,60000000,59999999,60000000,60000001,
 60000000,59999999,60000001,60000006,59999993,60000001,
 60000000,60000000,60000000,59999999,60000001,60000000,
 60000000,60000000,59983999,60016001,60000000,60000000,
 60000000,60000000,59999999,60000015,59999986,60000000,
 60000000,59999999,60000001,60000000,60000000,59999999,
 60000000,60000001,60000000,60000000,60000002,59999998,
 60000000,60000000,60000000,60000000,60000000,60000000,
 59999999,60000001,59999999,60000001,60000000,60000000,
 59999999,60000001,60000000,60000000,60000000,60000004]

You can see that all of them are pretty close to 60 seconds. So if node .25 ever went down, it could not go down for more than 60 seconds. Here's another evidence for this:

2013-05-17 02:03:22.347 ns_node_disco:5:warning:node down(ns_1@10.3.2.75) - Node 'ns_1@10.3.2.75' saw that node 'ns_1@10.3.2.25' went down. Details: [{nodedown_reason,
                                                                           connection_closed}]
2013-05-17 02:03:30.441 ns_node_disco:4:info:node up(ns_1@10.3.2.75) - Node 'ns_1@10.3.2.75' saw that node 'ns_1@10.3.2.25' came up. Tags: []

I guess you waited for more than 60 seconds. And there's absolutely no evidence that auto failover process has ever seen node .25 being down for at least one heartbeat period. Though it's briefly seen nodes .24 and .75 down.

So I compelled to reiterate my previous conclusion that you guys are doing something wrong with firewall configuration. Though I have no idea what exactly. Maybe already established connections are not dropped. Or maybe firewall is enabled only in one direction. Or maybe something else. I don't know. But something is definitely wrong.
Comment by Aliaksey Artamonau [ 17/May/13 ]
And btw, the node was auto failovered in the end.
Comment by Aleksey Kondratenko [ 17/May/13 ]
Resolution from ns_server team lead. Please show us evidence that indeed all packets on already established connections are rejected after firewall is enabled. We do have good reasons to suspect that indeed firewall is not quite rejecting everything.

And keep in mind that you're trying to use firewall to simulate network failure, so you have to make sure that indeed you're simulating what you think you're simulating.
Comment by Deepkaran Salooja [ 20/May/13 ]
This time for a single node, I blocked both the incoming/outgoing connections while enabling the firewall. Once this is done, the node is shown as "Down" immediately. Blocking incoming connections while allow outgoing when firewall is enabled, allows the node to be seen as healthy.




[MB-8294] Destination node with XDCR (heavy DGM) goes into pending. Created: 16/May/13  Updated: 22/May/13  Resolved: 20/May/13

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Deepkaran Salooja Assignee: Deepkaran Salooja
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.2-800-rel

<manifest>
<remote name="couchbase" fetch="git://github.com/couchbase/"/>
<remote name="membase" fetch="git://github.com/membase/"/>
<remote name="apache" fetch="git://github.com/apache/"/>
<remote name="erlang" fetch="git://github.com/erlang/"/>
<default remote="couchbase" revision="master"/>
<project name="tlm" path="tlm" revision="f30cd57af02e51eafa0b6d5fb71176c2a46a2cf9">
<copyfile src="Makefile.top" dest="Makefile"/>
</project>
<project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/>
<project name="ep-engine" path="ep-engine" revision="02714d36d61509195fb1b18953445fbdd4240ed3"/>
<project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/>
<project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/>
<project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/>
<project name="couchbase-cli" path="couchbase-cli" revision="3c3aa79db86684ba2bb01a952c43995b28797cd9" remote="couchbase"/>
<project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/>
<project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/>
<project name="ns_server" path="ns_server" revision="2251cebb7efa5b0f77e73dc53435ce9348faae9e"/>
<project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/>
<project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/>
<project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/>
<project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/>
<project name="couchdb" path="couchdb" revision="586e4bb73b92db4362192616370c4e3edb8c34a0"/>
<project name="couchdbx-app" path="couchdbx-app" revision="e83b255bc7f7548e2bc36e709666e564c2a488dd"/>
<project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/>
<project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/>
<project name="testrunner" path="testrunner" revision="c184e199382d5af23ac4a282deb12924f0635cd3"/>
<project name="healthchecker" path="healthchecker" revision="29d45e7776ecb20800f6ad97aec585a1e1636370"/>
<project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/>
<project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/>
<project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/>
<project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/>
<project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/>
<project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/>
</manifest>

Attachments: PNG File Buckets.png     PNG File Server.png    
Operating System: Centos 64-bit

 Description   

Destination node with XDCR (heavy DGM) goes into pending.

Setup Information:

- Cluster Config : 4 nodes, OS: Centos 6.3, CPU : 6 Core, RAM : 16G, Disk : 500G
- XDCR Topology : Unidirectional ( Master(3) -> Slave(1))
- 4 buckets AbRegNums(2GB), MsgsCalls(2GB), RevAB(3GB), UserInfo(2GB)
- Data Loaded using Viber Workload with 6.7M(RR ~45%), 0.1M(RR 100%), 11M(RR 100%), 12M(RR ~30%) data on the 4 buckets
- Unidirectional XDCR setup to 1 node(master -> slave) for 3 buckets AbRegNums, RevAB, UserInfo

Destination node goes into pending state(screenshot attached). Both beam.smp and memcached are running on the node.

There are lot of crash reports like below but there are no views/design docs in the system.

[error_logger:error,2013-05-16T1:40:30.265,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: set_view_update_daemon:init/1
    pid: <0.9511.707>
    registered_name: set_view_update_daemon
    exception exit: {noproc,
                        {gen_server,call,
                            ['capi_set_view_manager-UserInfo',
                             {foreach_doc,
                                 #Fun<capi_ddoc_replication_srv.2.102018441>},
                             infinity]}}
      in function gen_server:terminate/6
    ancestors: [ns_server_sup,ns_server_cluster_sup,<0.58.0>]
    messages: []
    links: [<0.298.0>,<0.9512.707>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 121393
    stack_size: 24
    reductions: 7537
  neighbours:


=========================CRASH REPORT=========================
  crasher:
    initial call: compaction_daemon:-spawn_bucket_compactor/3-fun-2-/0
    pid: <0.9508.707>
    registered_name: []
    exception exit: {noproc,
                        {gen_server,call,
                            ['capi_set_view_manager-RevAB',
                             {foreach_doc,
                                 #Fun<capi_ddoc_replication_srv.1.36030090>},
                             infinity]}}
      in function gen_server:call/3
      in call from capi_ddoc_replication_srv:foreach_live_ddoc_id/2
      in call from capi_ddoc_replication_srv:fetch_ddoc_ids/1
      in call from compaction_daemon:'-spawn_bucket_compactor/3-fun-2-'/4
    ancestors: [compaction_daemon,ns_server_sup,ns_server_cluster_sup,
                  <0.58.0>]
    messages: []
    links: [<0.412.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 28657
    stack_size: 24
    reductions: 30607
  neighbours:

The pending node is in the same state and can be used for investigation:
coconut-h20804.hq.couchbase.com


 Comments   
Comment by Deepkaran Salooja [ 16/May/13 ]
Collect_info destination node
https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20804.hq.couchbase.com-5162013-244-diag.zip

Collect_info source cluster
https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20801.hq.couchbase.com-5162013-235-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20802.hq.couchbase.com-5162013-238-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8294/e9125b6b/coconut-h20803.hq.couchbase.com-5162013-241-diag.zip
Comment by Aliaksey Artamonau [ 16/May/13 ]
Because of the race that we found recently, bucket supervisors went down on destination node. Fix for this is already merged. But unfortunately I cannot figure out the reason of the system_limit errors that triggered the race without the fix. Could you please rerun with the same test with latest build. The fix in questions is this: http://review.couchbase.org/26331.
Comment by Maria McDuff [ 18/May/13 ]
deep, pls update this bug with your re-run using the latest build.
Comment by Deepkaran Salooja [ 20/May/13 ]
Not able to reproduce with build 2.0.2-807-rel




[MB-8334] [windows] unable to get back to previous step in installer Created: 22/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server r2 2008
upgrade from 1.8.1 to 2.0.2-807-rel
<manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="adb5d29b739fc98d721665978091807d9abe74f1"><copyfile dest="Makefile" src="Makefile.top"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="d03c48ac89e92ad0a674a6653377fe68e38a1aba"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/><project name="couchbase-cli" path="couchbase-cli" revision="ba4b6057ca214715e85ed11522601e2e13b88c1d" remote="couchbase"/><project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="3bb4e9525bc6b9a600b2285384d2e20d0cae9723"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/><project name="couchdb" path="couchdb" revision="3d9a23c4a79767aec43c05636e117eedf690a4f1"/><project name="couchdbx-app" path="couchdbx-app" revision="e7e3680d94ff592915bb369a74756e7e8ba64365"/><project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="26873dc894b3db0fa67424da7c41ad4d37e82650"/><project name="healthchecker" path="healthchecker" revision="32d33ebee48ddfcbfa0163946f71fb0d4381f6dc"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

Attachments: PNG File rsz_installer_back.png    
Operating System: Windows 64-bit

 Description   
On the final step of installation wizard(see screenshot) try to click Back btn.
there is no any reaction on clicking it




[MB-8159] Performance / efficiency improvements for view compaction retry phase Created: 26/Apr/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0, 2.0.1, 2.0.2
Fix Version/s: 2.1
Security Level: Public

Type: Improvement Priority: Major
Reporter: Filipe Manana Assignee: Filipe Manana
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
In the same vein as MB-8029, move much of the IO/CPU intensive work from Erlang to C for the view compaction retry phase

 Comments   
Comment by Thuan Nguyen [ 22/May/13 ]
Integrated in github-couchdb-preview #583 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/583/])
    MB-8159 Use couch_view_file_merger for view compaction (Revision 203fbfac19c6c671568c6578e309e9a031c4e354)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couch_set_view/src/couch_set_view_updater_helper.erl
* src/couch_set_view/src/couch_set_view_compactor.erl
* src/couch_set_view/src/mapreduce_view.erl




[MB-8295] Dev views uses bsuperstar, but single vbucket Created: 11/Apr/12  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0-beta, 2.0, 2.0.1
Fix Version/s: 2.1
Security Level: Public

Type: Task Priority: Major
Reporter: Steve Yen Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
this is something aliaksey needs

 Comments   
Comment by Peter Wansch [ 05/Jul/12 ]
Workaround for 2.0 is to schedule compaction for dev indexes. We'll continue to use stock couch indexes.
Comment by Filipe Manana [ 12/Feb/13 ]
This can be accomplished as a substep of CBD-40 / CBD-41.

Please give this higher priority, so that it lands for 2.1 and saves a lot of work for many other upcoming 2.1 features.
Comment by Volker Mische [ 18/Feb/13 ]
I've removed the `.next` fix version. This should really go into 2.1.
Comment by Thuan Nguyen [ 22/May/13 ]
Integrated in github-couchdb-preview #583 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/583/])
    MB-8295: Parametrise Set View ETS tables (Revision c71fd2fb09dcd0bbe4ac4990fef749f7c075ae66)
MB-8295: Prepare storing development views (Revision 092b26f61ecd55563418b68221c321b7a8a83d19)

     Result = SUCCESS
vmx :
Files :
* src/couch_set_view/test/22-compactor-cleanup.t
* src/couch_set_view/include/couch_set_view.hrl
* src/couch_set_view/test/20-debug-params.t
* src/couch_set_view/test/07-replica-compaction.t
* src/couch_set_view/test/14-duplicated-keys-per-doc.t
* src/couch_set_view/test/19-compaction-retry.t
* src/couch_set_view/test/23-replica-group-missing.t
* src/couch_set_view/src/couch_set_view_group.erl
* etc/couchdb/default.ini.tpl.in
* src/couch_set_view/test/16-pending-transition.t
* src/couch_set_view/test/04-handle-db-deletes.t
* src/couch_set_view/src/couch_set_view_compactor.erl
* src/couch_set_view/src/couch_set_view.erl
* src/couch_set_view/test/03-db-compaction-file-leaks.t
* src/couch_set_view/test/13-progressive-cleanup.t
* src/couch_set_view/src/couch_set_view_util.erl
* src/couch_set_view/test/15-passive-partitions.t
* src/couch_set_view/test/12-errors.t
* src/couch_set_view/test/24-updater-add-more-passive-partitions.t
* src/couch_set_view/test/25-util-stats.t
* src/couch_set_view/test/06-main-compaction.t
* src/couch_set_view/test/17-unindexable-partitions.t
* src/couch_set_view/test/02-old-index-cleanup.t
* src/couch_set_view/test/26-multiple-reductions.t
* src/couch_set_view/test/05-replicas-transfer.t
* src/couch_set_view/test/18-monitor-partition-updates.t
* src/couch_set_view/test/couch_set_view_test_util.erl
* src/couch_set_view/test/21-updater-cleanup.t

vmx :
Files :
* src/couch_set_view/test/02-old-index-cleanup.t
* src/couch_set_view/test/04-handle-db-deletes.t
* src/couch_set_view/src/couch_set_view_group.erl
* src/couch_set_view/src/couch_set_view.erl
* src/couch_set_view/test/23-replica-group-missing.t




[MB-7269] cbtransfer/cbrestore throws BadStatusLine exception when using wrong port number Created: 27/Nov/12  Updated: 22/May/13  Resolved: 16/May/13

Status: Closed
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.0
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Steve Yen Assignee: Shashank Gupta
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Thanks to Tony Fonager....

When cbtransfer is invoked incorrectly against the wrong port (11211 rather than 8091), the tool should give a more polite, useful error message rather than an ugly stack trace.

[root@cbnode05 bin]# ./cbtransfer /root/Backup/cbtransfer http://192.168.0.75:11211 -b quizmo -B quizmo -u username -p password
Traceback (most recent call last):
  File "/opt/couchbase/lib/python/cbtransfer", line 33, in <module>
    pump_transfer.exit_handler(pump_transfer.Transfer().main(sys.argv))
  File "/opt/couchbase/lib/python/pump_transfer.py", line 85, in main
    sink_class, sink).run()
  File "/opt/couchbase/lib/python/pump.py", line 100, in run
    rv, source_map, sink_map = self.check_endpoints()
  File "/opt/couchbase/lib/python/pump.py", line 150, in check_endpoints
    rv, sink_map = self.sink_class.check(self.opts, self.sink_spec, source_map)
  File "/opt/couchbase/lib/python/pump_cb.py", line 71, in check
    rv, sink_map = pump.rest_couchbase(opts, spec)
  File "/opt/couchbase/lib/python/pump.py", line 879, in rest_couchbase
    rest_request_json(host, int(port), user, pswd, path)
  File "/opt/couchbase/lib/python/pump.py", line 856, in rest_request_json
    reason=reason)
  File "/opt/couchbase/lib/python/pump.py", line 829, in rest_request
    resp = conn.getresponse()
  File "/usr/lib64/python2.6/httplib.py", line 990, in getresponse
    response.begin()
  File "/usr/lib64/python2.6/httplib.py", line 391, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.6/httplib.py", line 355, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine

 Comments   
Comment by Bin Cui [ 27/Nov/12 ]
http://review.couchbase.org/#/c/22854/
Comment by Maria McDuff [ 27/Mar/13 ]
Bin -- can you ask Pavel to review your code changes? Thanks.
Comment by Anil Kumar [ 28/Mar/13 ]
Pavel will code review and this will be checked-in.
Comment by Anil Kumar [ 10/Apr/13 ]
Bin: If this is already code-reviewed can you merge the changes.
Comment by Steve Yen [ 16/Apr/13 ]
Hi Bin, marking this resolved as Bin says it's been merged.
Steve
Comment by Maria McDuff [ 18/Apr/13 ]
pls verify / close using 2.0.2 latest build.
Comment by Shashank Gupta [ 29/Apr/13 ]
Tried using the build 2.0.2-767.

Command used:

./cbtransfer http://Administrator:password@10.3.3.100:11223 http://Administrator:password@10.3.3.100:8091 -b default -B def

Got the following message:
error: could not access REST API: 10.3.3.100:11223/pools/default/buckets; please check source URL, username (-u) and password (-p); exception: (111, 'Connection refused')

which I guess is correct


But when i give port number as 11211, it does nothing and hangs. No error message or exception is thrown. Below is the command I used:

./cbtransfer http://Administrator:password@10.3.3.100:11211 http://Administrator:password@10.3.3.100:8091 -b default -B def

Same is the case with cbrestore.
Comment by Maria McDuff [ 06/May/13 ]
pls see shashank's comment.
Comment by Bin Cui [ 13/May/13 ]
11211 is used by moxi as ascii protocal for memcached. And it will wait for response from customer. That's why it hangs.
Comment by Maria McDuff [ 13/May/13 ]
Bin, do you think the same err msg shld be returned even for port 11211? see below:
error: could not access REST API: 10.3.3.100:11223/pools/default/buckets; please check source URL, username (-u) and password (-p); exception: (111, 'Connection refused')
Comment by Bin Cui [ 13/May/13 ]
Unfortunately, the answer is no.
Since 11211 is a a valid port number, but reserved for moxi instead of for ns_server, so we connect moxi instead.

Talked to steve about this issue. He suggested it is a low priority bug. Two options to solve this problem:
1. implement a timeout for this operation, which is preferred way
2. check out upfront about this port number. this is not recommended because we don't want to hard code any port number in our logic.

All in all, this is a edge case.
Comment by Maria McDuff [ 14/May/13 ]
per bug triage, #1 is fine to implement.
Comment by Bin Cui [ 14/May/13 ]
Thought over again, the option one is still not good and clear enough.
First, after a timeout, we still cannot clearly tell customers that the port number is incorrect for cbtransfer tool to use. And timeout may be caused by other issues too.
Second, we have a clear context that these ports are reserved for moxi service and by no means used as REST api calls. We can identify it and give customers an accurate
answers for the input error.

http://review.couchbase.org/#/c/26301/
Comment by Maria McDuff [ 21/May/13 ]
pls verify / close.
Comment by Shashank Gupta [ 22/May/13 ]
Verified with build 2.0.2-809




[MB-6231] 405 Method text response to POST /default/_design/doc includes POST as a valid method Created: 15/Aug/12  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: None
Fix Version/s: .next
Security Level: Public

Type: Bug Priority: Minor
Reporter: Benjamin Young Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Likely just the 405 Method text response needs updating to remove POST

 Comments   
Comment by Thuan Nguyen [ 22/May/13 ]
Integrated in github-couchdb-preview #582 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/582/])
    MB-6231: dont advertise POST on db_doc_req (Revision d3978218af46b55ff4fb168077ef7a3d0f055214)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couchdb/couch_httpd_db.erl




[MB-8333] [windows] unable to add a 1.8.0 node to 2.0.2 cluster Created: 22/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.2-807-rel
<manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="adb5d29b739fc98d721665978091807d9abe74f1"><copyfile dest="Makefile" src="Makefile.top"/></project><project name="bucket_engine" path="bucket_engine" revision="2a797a8d97f421587cce728f2e6aa2cd42c8fa26"/><project name="ep-engine" path="ep-engine" revision="d03c48ac89e92ad0a674a6653377fe68e38a1aba"/><project name="libconflate" path="libconflate" revision="c0d3e26a51f25a2b020713559cb344d43ce0b06c"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="408057ec55da3862ab8d75b1ed25d2848afd640f"/><project name="couchbase-cli" path="couchbase-cli" revision="ba4b6057ca214715e85ed11522601e2e13b88c1d" remote="couchbase"/><project name="memcached" path="memcached" revision="b6ceb46fc26ac6f1d6be7a5866d6c6c0f6e6d32a" remote="membase"/><project name="moxi" path="moxi" revision="4b391021af7a453bf88716724d2c644916ebd969"/><project name="ns_server" path="ns_server" revision="3bb4e9525bc6b9a600b2285384d2e20d0cae9723"/><project name="portsigar" path="portsigar" revision="159b6179ea8a3c2075ee9eb2afa6f91c98c0fda6"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/><project name="couchbase-python-client" path="couchbase-python-client" revision="d443169c0694fca1be67d8f6934a8c50f0175ee7"/><project name="couchdb" path="couchdb" revision="3d9a23c4a79767aec43c05636e117eedf690a4f1"/><project name="couchdbx-app" path="couchdbx-app" revision="e7e3680d94ff592915bb369a74756e7e8ba64365"/><project name="couchstore" path="couchstore" revision="963fc26eafc67514eed5c9a3752d5d4cbdf5971d"/><project name="geocouch" path="geocouch" revision="ed9ad43aa361df0829262fef811b5236331b44c8"/><project name="testrunner" path="testrunner" revision="26873dc894b3db0fa67424da7c41ad4d37e82650"/><project name="healthchecker" path="healthchecker" revision="32d33ebee48ddfcbfa0163946f71fb0d4381f6dc"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="44a584d1de8c89addfb4f1d0522bdbbbed83ba48" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

Operating System: Windows 64-bit

 Description   
2 nodes cluster (2.0.2), trying to add a 1.8.0 node

2013-05-22 00:40:06.346 ns_cluster:5:info:message(ns_1@172.23.105.74) - Failed to add node 172.23.105.72:8091 to cluster. This node cannot add another node ('ns_1@172.23.105.72') because of cluster version compatibility mismatch. Cluster works in [2, 0] mode and node only supports [1, 8]


 Comments   
Comment by Iryna Mironava [ 22/May/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-8333/16fc64ab/172.23.105.77-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8333/16fc64ab/172.23.105.72-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8333/16fc64ab/172.23.105.74-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8333/16fc64ab/172.23.105.75-diag.zip




[MB-8239] cbcollect_info only collects couchbase.log and misses other logs Created: 09/May/13  Updated: 22/May/13

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Chisheng Hong Assignee: Ravi Mayuram
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 2.0.2-793

Attachments: Text File couchbase.log     Text File ns_server.couchdb.log     Text File ns_server.debug.log     Text File ns_server.error.log     Text File ns_server.info.log     Text File ns_server.mapreduce_errors.log     Text File ns_server.stats.log     Text File ns_server.views.log     Text File ns_server.xdcr_errors.log     Text File ns_server.xdcr.log     PNG File Screen Shot 2013-05-14 at 6.11.39 PM.png     Text File stats.log    
Operating System: MacOSX 64-bit

 Description   
Chishengs-MacBook-Pro:bin chisheng$ ./cbcollect_info
info.zipuname (uname -a) - OK
Directory structure membase - previous versions (ls -lR /opt/membase /var/membase /var/opt/membase /etc/opt/membase) - Exit code 1
sysctl settings (sysctl -a) - OK
Process list snapshot (top -l 1) - OK
Disk activity (iostat 1 10) - OK
Process list (ps -Aww -o user,pid,lwp,ppid,nlwp,pcpu,pri,nice,vsize,rss,tty,stat,wchan:12,start,bsdtime,command) - Exit code 1
Network configuration (ifconfig -a) - OK
Taking sample 2 after 10.000000 seconds -
OK
Network status (netstat -an) - OK
Network routing table (netstat -rn) - OK
Arp cache (arp -na) - OK
Filesystem (df -ha) - OK
System activity reporter (sar 1 10) - OK
System paging activity (vmstat 1 10) - Exit code 127
System uptime (uptime) - OK
couchbase user definition (getent passwd couchbase) - Exit code 127
couchbase user limits (su couchbase -c "ulimit -a") - skipped (needs root privs)
membase user definition (getent passwd membase) - Exit code 127
couchbase user limits (su couchbase -c "ulimit -a") - skipped (needs root privs)
membase user limits (su membase -c "ulimit -a") - skipped (needs root privs)
Interrupt status (intrstat 1 10) - Exit code 127
Processor status (mpstat 1 10) - Exit code 127
System log (cat /var/adm/messages) - Exit code 1
Kernel log buffer (dmesg) - Exit code 1
Checking for server guts in /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/var/lib/couchbase/initargs...
./erl: line 28: /lib/erlang/erts-5.8.*/bin/erlexec: No such file or directory
./erl: line 28: exec: /lib/erlang/erts-5.8.*/bin/erlexec: cannot execute: No such file or directory
Checking for server guts in /opt/couchbase/var/lib/couchbase/initargs...
./erl: line 28: /lib/erlang/erts-5.8.*/bin/erlexec: No such file or directory
./erl: line 28: exec: /lib/erlang/erts-5.8.*/bin/erlexec: cannot execute: No such file or directory
Checking for server guts in /Users/chisheng/Library/Application Support/Couchbase/var/lib/couchbase/initargs...
./erl: line 28: /lib/erlang/erts-5.8.*/bin/erlexec: No such file or directory
./erl: line 28: exec: /lib/erlang/erts-5.8.*/bin/erlexec: cannot execute: No such file or directory

 Comments   
Comment by Maria McDuff [ 10/May/13 ]
need for 2.0.2. bumping up to critical.
Comment by Maria McDuff [ 14/May/13 ]
anil to sit with alk k and loan his mac.
chisheng will help if needed.
Comment by Anil Kumar [ 14/May/13 ]
tested this on 2.0.1 build it works as expected collects all the logs. snapshot included. thanks!
Comment by Aleksey Kondratenko [ 14/May/13 ]
Thats not enough. May I have those 2.0.1 collectinfos from mac?
Comment by Anil Kumar [ 14/May/13 ]
attached all the log files
Comment by Wayne Siu [ 17/May/13 ]
Anil to provide a live system for Alk to look at.
Comment by Aleksey Kondratenko [ 20/May/13 ]
This is what I suspected.

Since about a month ago we're relying a lot more on functional escript.

We just confirmed with Anil that it was broken in exactly same way in 2.0.1. Which caused cbdump-config to not work. So important part of information was not gathered even in 2.0.1

Now we depend on escript in a hard way which causes everything to fail rather than just silently skipping tiny but critical piece of information.


Something is clearly broken with a way we package or sjip erlang on osx and somebody with OSX background has to investigate. I cannot help.
Comment by Ravi Mayuram [ 21/May/13 ]
Looks like the script is very specific to Linux and even there some of the commands it is trying to run require validation.
Bin, pls take a look. This needs to be fixed so that this script runs on all platforms - Linux, MAC and Windows.
Comment by Bin Cui [ 21/May/13 ]
On centos, cbcollectinfo runs correctly under root account because it needs to access var/lib/couchbase directory which only allows couchbase account to access it. We need to document
this limitation.
Comment by Ravi Mayuram [ 22/May/13 ]
Have it working with some changes. Need to validate with experts before checking it in.




[MB-8332] moxi crashed when delete non default bucket Created: 21/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Thuan Nguyen Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows 8 64bit

Attachments: Zip Archive 10.17.46.220-5212013-1858-diag.zip    
Operating System: Windows 64-bit

 Description   
Test on build 2.0.2-809 on windows 8 64bit
Create default bucket,
Delete default bucket ==> no moxi crashed

Create sasl bucket
Delete sasl bucket ==> moxi crashed as show in log

Deleted bucket "saslbucket"
menelaus_web011 ns_1@127.0.0.1 18:43:37 - Tue May 21, 2013
Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-21 18:40:02: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-05-21 18:40:02: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-05-21 18:40:36: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({
"name": "default",
"nodeLocator": "vbucket",
"saslPassword": "",
"nodes": [{
"hostname": "127.0.0.1:8091",
"ports": {
"direct": 11210,
"proxy": 11211
}
}],
"vBucketServerMap": {
"hashAlgorithm": "CRC",
"numReplicas": 1,
"serverList": ["127.0.0.1:11210"],
"vBucketMap": []
}
})
2013-05-21 18:43:11: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({
"name": "saslbucket",
"nodeLocator": "vbucket",
"saslPassword": "",
"nodes": [{
"hostname": "127.0.0.1:8091",
"ports": {
"direct": 11210,
"proxy": 11211
}
}],
"vBucketServerMap": {
"hashAlgorithm": "CRC",
"numReplicas": 1,
"serverList": ["127.0.0.1:11210"],
"vBucketMap": []
}
})
EOL on stdin. Exiting ns_log000 ns_1@127.0.0.1 18:43:37 - Tue May 21, 2013
Shutting down bucket "saslbucket" on 'ns_1@127.0.0.1' for deletion ns_memcached002 ns_1@127.0.0.1 18:43:33 - Tue May 21, 2013
Bucket "saslbucket" loaded on node 'ns_1@127.0.0.1' in 0 seconds. ns_memcached001 ns_1@127.0.0.1 18:43:09 - Tue May 21, 2013
Created bucket "saslbucket" of type: membase
[{num_replicas,1},
{replica_index,false},
{ram_quota,2515533824},
{auth_type,sasl},
{autocompaction,false},
{flush_enabled,false}] menelaus_web012 ns_1@127.0.0.1 18:43:09 - Tue May 21, 2013
Shutting down bucket "default" on 'ns_1@127.0.0.1' for deletion ns_memcached002 ns_1@127.0.0.1 18:41:59 - Tue May 21, 2013
Bucket "default" loaded on node 'ns_1@127.0.0.1' in 0 seconds. ns_memcached001 ns_1@127.0.0.1 18:40:34 - Tue May 21, 2013
Created bucket "default" of type: membase
[{num_replicas,1},
{replica_index,false},
{ram_quota,2515533824},
{auth_type,sasl},
{autocompaction,false},
{flush_enabled,false}] menelaus_web012 ns_1@127.0.0.1 18:40:34 - Tue May 21, 2013
Deleted bucket "new-bucket-97c90481-0efb-4ad3-84ed-8a0bc00799ad"
(repeated 1 times) menelaus_web011 ns_1@127.0.0.1 18:40:27 - Tue May 21, 2013
Client-side error-report for user "Administrator" on node 'ns_1@127.0.0.1':
User-Agent:Python-httplib2/$Rev: 259 $
2013-05-21 18:39:58.017545 : test_non_default_moxi finished
menelaus_web102 ns_1@127.0.0.1 18:40:02 - Tue May 21, 2013
Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-21 18:39:43: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-05-21 18:39:43: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
EOL on stdin. Exiting ns_log000 ns_1@127.0.0.1 18:40:02 - Tue May 21, 2013
Shutting down bucket "new-bucket-97c90481-0efb-4ad3-84ed-8a0bc00799ad" on 'ns_1@127.0.0.1' for deletion

Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.2-809-rel.setup.exe.manifest.xml





[MB-8023] Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Created: 02/Apr/13  Updated: 21/May/13  Resolved: 21/May/13

Status: Closed
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Aleksey Kondratenko
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: ubuntu 11.04 64 bit

Attachments: Text File ns-diag-moxi_crashed_node_46_20130517143152.txt    

 Description   
I reboot one vms install couchbase server 2.0.2-746.
After the vm is up and couchbase server started, I saw error "Port server moxi on node 'ns_1@10.3.121.97' exited with status 1." in log page

Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:43: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-04-02 15:48:43: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-04-02 15:48:45: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:43 - Tue Apr 2, 2013
Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:38: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-04-02 15:48:38: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-04-02 15:48:40: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:38 - Tue Apr 2, 2013
Service moxi exited on node 'ns_1@10.3.121.97' in 0.01s
supervisor_cushion001 ns_1@10.3.121.97 15:48:33 - Tue Apr 2, 2013
Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:33: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-04-02 15:48:33: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-04-02 15:48:35: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:33 - Tue Apr 2, 2013
Port server moxi on node 'ns_1@10.3.121.97' exited with status 1. Restarting. Messages: 2013-04-02 15:48:28: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-04-02 15:48:28: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-04-02 15:48:30: (cproxy.c.330) ERROR: could not listen on port 11211. Please use -Z port_listen=PORT_NUM to specify a different port number. ns_port_server000 ns_1@10.3.121.97 15:48:28 - Tue Apr 2, 2013


I will attach collect info file soon

 Comments   
Comment by Steve Yen [ 02/Apr/13 ]
That looks like some other process is already using port 11211.

For example, perhaps there's another memcached that is configured to auto-launch after a server reboot and it grabs port 11211 first before couchbase/moxi can start.
Comment by Thuan Nguyen [ 02/Apr/13 ]
Link to collect info file of this node https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_2/2013_04/node10.3.121.97.zip

Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.2-746-rel.deb.manifest.xml
Comment by Maria McDuff [ 13/May/13 ]
tony, is this still an issue with the latest 2.0.2 build?
if so, can you analyze the log and pinpoint what is causing the error?
assign to steve yen afterwards.
thanks.
Comment by Thuan Nguyen [ 14/May/13 ]
I still see the error in windows build 2.0.2-801

user:info,2013-05-13T18:08:54.777,ns_1@10.3.2.143:<0.4483.4>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-13 18:05:28: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-05-13 18:05:28: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-05-13 18:07:55: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({
"name": "default",
"nodeLocator": "vbucket",
"saslPassword": "",
"nodes": [{
"hostname": "10.3.2.143:8091",
"ports": {
"direct": 11210,
"proxy": 11211
}
}],
"vBucketServerMap": {
"hashAlgorithm": "CRC",
"numReplicas": 1,
"serverList": ["10.3.2.143:11210"],
"vBucketMap": []
}
})

I will dig into log to investigate it.
Comment by Maria McDuff [ 14/May/13 ]
tony, pls respond to steve's inquiry (i.e., what other process isusing 11211).
Comment by Thuan Nguyen [ 17/May/13 ]
Moxi restart after delete bucket.

default bucket.
- create default bucket
- grep moxi process ID

[root@cen-1907 ~]# ps aux | grep moxi
101 29434 0.0 0.0 158060 2612 ? Ssl 13:32 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr
root 29524 0.0 0.0 61228 744 pts/3 S+ 13:39 0:00 grep moxi

- Load 10K items to default bucket
root@cen-1907 ~]# /opt/couchbase/bin/cbworkloadgen -n localhost:8091 -i 10000 -r .5 -s 100 --threads=10
  [####################] 100.0% (199990/200000 msgs)
bucket: default, msgs transferred...
       : total | last | per sec
 batch : 200 | 200 | 8.4
 byte : 19999000 | 19999000 | 844890.7
 msg : 199990 | 199990 | 8448.9
done

- delete default bucket, moxi crashed and restart
- grep moxi process ID right after delete default bucket, there is a new process ID
[root@cen-1907 ~]# ps aux | grep moxi
101 29561 0.0 0.0 92260 1704 ? Ssl 13:41 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr
root 29571 0.0 0.0 61228 748 pts/3 S+ 13:41 0:00 grep moxi

- log from info.1 log
[ns_server:info,2013-05-17T13:41:27.912,ns_1@127.0.0.1:ns_memcached-default<0.771.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"default/1">>: ok

[ns_server:info,2013-05-17T13:41:27.913,ns_1@127.0.0.1:ns_memcached-default<0.771.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"default/0">>: ok

[ns_server:info,2013-05-17T13:41:27.913,ns_1@127.0.0.1:ns_memcached-default<0.771.0>:ns_storage_conf:delete_databases_and_files:471]Couch dbs are deleted. Proceeding with bucket directory
[ns_server:info,2013-05-17T13:41:28.122,ns_1@127.0.0.1:<0.659.0>:ns_orchestrator:idle:536]Restarting moxi on nodes ['ns_1@127.0.0.1']
[user:info,2013-05-17T13:41:28.124,ns_1@127.0.0.1:<0.620.0>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-17 13:32:07: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-05-17 13:32:07: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
EOL on stdin. Exiting
[menelaus:info,2013-05-17T13:41:28.126,ns_1@127.0.0.1:<0.11748.0>:menelaus_web_buckets:handle_bucket_delete:345]Deleted bucket "default"

I will test in non default bucket next



Comment by Thuan Nguyen [ 17/May/13 ]
Test on non default bucket with build 2.0.2-806 on centos 5.8 64bit
Moxi also restarted after delete sasl bucket.

- create sasl bucket
- grep moxi process ID
[root@cen-1907 ~]# ps aux | grep moxi
101 29561 0.0 0.0 92908 2464 ? Ssl 13:41 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr
root 29757 0.0 0.0 61228 748 pts/3 S+ 14:25 0:00 grep moxi

- load items to sasl bucket
[root@cen-1907 ~]# /opt/couchbase/bin/cbworkloadgen -n localhost:8091 -i 10000 -r .5 -s 100 --bucket=sasl -u Administrator -p password --threads=10
  [####################] 100.0% (199990/200000 msgs)
bucket: default, msgs transferred...
       : total | last | per sec
 batch : 200 | 200 | 8.5
 byte : 19999000 | 19999000 | 846631.0
 msg : 199990 | 199990 | 8466.3
done

- delete sasl bucket, moxi crashed and restart
[root@cen-1907 ~]# ps aux | grep moxi
101 29761 0.0 0.0 92260 1700 ? Ssl 14:25 0:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=30000,connect_timeout=400,auth_timeout=100,cycle=200,downstream_conn_queue_timeout=200,downstream_timeout=5000,wait_queue_timeout=200 -z url=http://127.0.0.1:8091/pools/default/saslBucketsStreaming -p 0 -Y y -O stderr
root 29770 0.0 0.0 61228 748 pts/3 S+ 14:25 0:00 grep moxi

- in info log

ns_server:info,2013-05-17T14:25:31.812,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/1001">>: ok

[ns_server:info,2013-05-17T14:25:31.813,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/1000">>: ok

[ns_server:info,2013-05-17T14:25:31.815,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/100">>: ok

[ns_server:info,2013-05-17T14:25:31.816,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/10">>: ok

[ns_server:info,2013-05-17T14:25:31.817,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/1">>: ok

[ns_server:info,2013-05-17T14:25:31.818,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_couch_database:428]Deleting database <<"sasl/0">>: ok

[ns_server:info,2013-05-17T14:25:31.818,ns_1@127.0.0.1:ns_memcached-sasl<0.15869.0>:ns_storage_conf:delete_databases_and_files:471]Couch dbs are deleted. Proceeding with bucket directory
[ns_server:info,2013-05-17T14:25:32.026,ns_1@127.0.0.1:<0.659.0>:ns_orchestrator:idle:536]Restarting moxi on nodes ['ns_1@127.0.0.1']
[user:info,2013-05-17T14:25:32.029,ns_1@127.0.0.1:<0.620.0>:ns_log:crash_consumption_loop:64]Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: 2013-05-17 13:41:28: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2013-05-17 13:41:28: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
2013-05-17 13:49:55: (agent_config.c.705) ERROR: bad JSON configuration from http://127.0.0.1:8091/pools/default/saslBucketsStreaming: Number of vBuckets must be a power of two > 0 and <= 65536 ({
"name": "sasl",
"nodeLocator": "vbucket",
"saslPassword": "password",
"nodes": [{
"hostname": "127.0.0.1:8091",
"ports": {
"direct": 11210,
"proxy": 11211
}
}],
"vBucketServerMap": {
"hashAlgorithm": "CRC",
"numReplicas": 1,
"serverList": ["127.0.0.1:11210"],
"vBucketMap": []
}
})
EOL on stdin. Exiting
[menelaus:info,2013-05-17T14:25:32.033,ns_1@127.0.0.1:<0.1865.1>:menelaus_web_buckets:handle_bucket_delete:345]Deleted bucket "sasl"


So every time any bucket deleted, moxi crashed and restart.

Comment by Wayne Siu [ 17/May/13 ]
Tony,
Can you quantify what the impact is when this happens?
Comment by Thuan Nguyen [ 21/May/13 ]
sasl buckets share one moxi server. if we delete one bucket and moxi crashed, the normal operation of other sasl bucket will be affected.
Comment by Steve Yen [ 21/May/13 ]
This bug report may be slightly confusing as there's two moxi-related issues here...

1) moxi exiting, reported back in April, perhaps related to windows. Not sure what that's about.

2) moxi exiting, reported this month/May, related to bucket deletions, on at least linux, where moxi seems to be complaining about a mal-formed vBucketMap (this is perhaps not the key problem) AND, more importantly, also seeing that it's stdin is closing ("EOL on stdin" is an error message from moxi). (Aside, I think this #2 issue should have been reported as a separate, distinct bug compared to #1)

On #2, assigning to Aliaksey K as first gut intuition (since moxi has not changed) is that the new babysitting interactions are probably involved.
Comment by Aleksey Kondratenko [ 21/May/13 ]
On Steve's point #2 above the answer is: moxi restart after bucket deletion is expected and we do that explicitly. Moxi issues with empty vbucket map shortly after bucket creation is expected too.

In diags named node10.3.121.97.zip I'm seeing moxi restarting because of inability to bind to port 11211. And couchbase.log explains why. I see standalone moxi package installed and started which obviously takes port 11211.
Comment by Thuan Nguyen [ 21/May/13 ]
On Steve point #1, I will close this bug due to not repro on latest build (2.0.2-809) on windows. I will open other bug relate to moxi crash only on non default bucket.




[MB-8046] [need release notes] config.dat and other potentially security sensitive files are world readable in world readable directories Created: 09/Apr/13  Updated: 21/May/13

Status: Reopened
Project: Couchbase Server
Component/s: installer, ns_server
Affects Version/s: 2.0, 2.0.1
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Aleksey Kondratenko Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
SUBJ. Everything appears to be world-readable. Data files, config, cookie files, etc.

 Comments   
Comment by Aleksey Kondratenko [ 09/Apr/13 ]
ns_server side fix is in http://review.couchbase.org/25579

But at least rpm packaging needs to be fixed as well. Patch for that is in gerrit.
Comment by Anil Kumar [ 10/Apr/13 ]
Waiting for code-review.
Comment by Aleksey Kondratenko [ 10/Apr/13 ]
Merged two fixes
Comment by Maria McDuff [ 17/Apr/13 ]
pls verify on 2.0.2 build. thanks.
Comment by Shashank Gupta [ 19/Apr/13 ]
Verified using build 2.0.2-767-rel.
Comment by Aleksey Kondratenko [ 21/May/13 ]
Worth documenting imho. See Bin's comment MB-8239 about needing root.

Here's what I'd add to release notes:

Since couchbase server 2.0.2 some internal directories are not world readable anymore. On multiuser systems that was a security issue. Which is now closed. It will affect people trying to run cbcollect_info under non-root accounts because cbcollect_info will be unable to see many details about couchbase server. Which was quite limited before, but since 2.0.2 it's nearly totally invisible.




[MB-8325] ns_server should collect various stats about supervised processes Created: 20/May/13  Updated: 21/May/13  Resolved: 21/May/13

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.1
Fix Version/s: 2.0.2
Security Level: Public

Type: Improvement Priority: Major
Reporter: Aliaksey Artamonau Assignee: Aliaksey Artamonau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Aleksey Kondratenko [ 21/May/13 ]
Resolved by merging chain of commits.




[MB-8302] [READMEs] Win, Linux, Mac Created: 16/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Minor
Reporter: Maria McDuff Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File README_Linux_2.0.2.txt     Text File README_Mac_2.0.2.txt     Text File README_Windows_2.0.2.txt    

 Description   
per bug triage, we need couchbase installation readme.txt for Windows and Linux OS.

 Comments   
Comment by Anil Kumar [ 20/May/13 ]
Karen, we need to provide documentation for Readme.txt file for both Windows and Linux platform.

Currently for Mac we do have readme.txt and it has below text. We need to have something similar for Windows and Linux platform.

--*--
Couchbase Server Community Edition 2.0.1

This is a self-contained installation of Couchbase Server.
  * To start the server, simply launch the application.
  * Click the Couchbase icon in the menu bar to access menu commands.
  * To stop the server, choose "Quit Couchbase Server" from the menu.

This application may be run from any location on any writeable volume.
You may choose to move it to /Applications, but this is not required.
However:
  * Do not move the application while it's running.
  * After installing the command-line tools (via the item in the menu),
    moving the app will break the symbolic links that were created in
    /usr/local/bin (or wherever you installed the tools into.)

Before you start the server for the first time, please do make sure you
have erased any previous (1.8) Membase/Couchbase settings by deleting
"~/Library/Application Support/Couchbase" and
"~/Library/Application Support/Membase".
-- * --
Comment by Karen Zeller [ 20/May/13 ]
Ok, yes, do you have any idea what raw information we should put in the Linux and Windows readmes?

Comment by Karen Zeller [ 20/May/13 ]
See note:

We have any raw information/content we know we want to put in these readmes?
Comment by Karen Zeller [ 20/May/13 ]
Input from Anil:

-Get install locations/paths on each platform for 2.0.2. Get from SteveY.

-Location/function of any shortcuts

-Anything you need to do before you start server for platform

-Any major highlights from getting started that are specific to platform. Anything that must be done or else cannot start without broken version.



-ubuntu: open ssl
-windows: get platform specific


Cross reference to upgrade section
Comment by Steve Yen [ 21/May/13 ]
Linux install locations (for both *.rpm and *.deb) are...

  /opt/couchbase

For windows, please ask Bin; for mac, please ask Traun or Chisheng.

Steve
Comment by Karen Zeller [ 21/May/13 ]
Consolidating Mac one from MB-7621




[MB-7621] [Mac README]No way to install CLI tools as per readme.tx instructions Created: 29/Jan/13  Updated: 21/May/13  Resolved: 21/May/13

Status: Closed
Project: Couchbase Server
Component/s: documentation, installer
Affects Version/s: 2.0
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Perry Krug Assignee: Chisheng Hong
Resolution: Duplicate Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File README_2.0.2.txt     Text File README.txt    

 Description   
From the readme.txt in the Mac download:
  * After installing the command-line tools (via the item in the menu),
    moving the app will break the symbolic links that were created in
    /usr/local/bin (or wherever you installed the tools into.)

There is no "item in the menu" with which to install the tools. How does one do so?

We also need docs on the Mac installation in general.

 Comments   
Comment by MC Brown [ 29/Jan/13 ]
We have documentation on the Mac installation (http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-getting-started-install-macosx.html) and the installed menubar app (http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-basics-running-macosx.html#fig-couchbase-getting-started-macosx-menubar).

The README is provided by QA, not documentation.
Comment by Perry Krug [ 29/Jan/13 ]
Thanks MC, sorry I missed that.
Comment by Karen Zeller [ 27/Mar/13 ]
Hi Steve, Perry,

Confirm whether or not the links provided above are adequate information to resolve.


Thanks,

Karen
Comment by Perry Krug [ 28/Mar/13 ]
Karen, this is something that needs to change in the product, which is why it's assigned to Steve Yen. The links above are not sufficient as they don't address the issue of the readme.txt instructions being incorrect.
Comment by Karen Zeller [ 25/Apr/13 ]
removing doc component until resolved in product. Then assign back to be once UI/func changes ready to be doc'd.
Comment by Maria McDuff [ 29/Apr/13 ]
Steve, per bug scrub moving this up to critical.
Comment by Traun Leyden [ 09/May/13 ]
Ravi asked me to help dig into this.. I was able to successfully install CouchBase server on my mac.

I noticed something that needs to be fixed:

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-getting-started-install-macosx.html

When you open Couchbase for the first time, you will be asked whether you want to install the Couchbase command-line tools into another directory. This creates symbolic links to the core command line tools in the chosen directory.

--

When I tested it, none of the above things actually happened, so it seems that documentation is wrong and should be removed.

*Action item #1: Remove above from documentation*

Regarding the readme.txt:

After installing the command-line tools (via the item in the menu),
    moving the app will break the symbolic links that were created in
    /usr/local/bin (or wherever you installed the tools into.)

That is definitely wrong.

So there is no "item in the menubar app" for installing the command line tools. Nor are any symbolic links created during the installation process (as stated by the misleading documentation related to action item #1 above).

I would say that snippet in the readme.txt is completely irrelevant and should also be removed.

*Action item #2: Remove snippet from readme.txt : "After installing the command-line tools...."*



Comment by Traun Leyden [ 09/May/13 ]
I forgot to mention that although it installs fine on the mac, it doesn't symlink the CLI tools when it probably should be doing so. This means that in order to use the CLI tools, I'll have to manually cd into the /Applications/Couchbase Server.app/Contents//Resources/couchbase-core/bin/ directory, or add them to my PATH.

So it would probably be a good idea to change the installation process, or add a menu item to the menu app (if that's possible), which symlinks the CLI tools to a directory that's commonly already in the user's PATH (eg, /usr/bin or /usr/local/bin). This will make life easier for users.

I'd say the overall plan of action for this ticket should be:

1. Remove section from documentation in couchbase-getting-started-install-macosx.html "When you open Couchbase for the first time, you will be asked.."
2. Remove snippet from readme.txt : "After installing the command-line tools...."*

(both of these were already mentioned earlier in previous comment)

And as part of a separate ticket:

3. Figure out a way to change the installation process, or add a menu item to the menu app (if that's possible), which creates symlinks to the CLI tools in a directory that's commonly already in the user's PATH (eg, /usr/bin or /usr/local/bin).
4. Update the documentation to inform users that the CLI tools are symlinked to a directory in their PATH, and warn them that this could break if they decide to move the Couchbase application folder to a different location on the system.
Comment by Ravi Mayuram [ 13/May/13 ]

Pls make changes to the docs as mentioned by Traun for 1 and 2.
For 3 and 4 I'll open a new ER for later release.
 
Comment by Karen Zeller [ 15/May/13 ]
[REMOVED FROM SERVER MANUAL] 1. Remove section from documentation in couchbase-getting-started-install-macosx.html "When you open Couchbase for the first time, you will be asked.."

[WAITING - NEED From STEVE the README.txt]

2. Remove snippet from readme.txt : "After installing the command-line tools...."*
Comment by Karen Zeller [ 15/May/13 ]
Need to edit the README.TXT that actually goes in build. Neither Phil or Maria know where this exists.
Comment by Maria McDuff [ 15/May/13 ]
found it.
attached.
Comment by Karen Zeller [ 15/May/13 ]
Fixed and updated to 2.0.2
Comment by Karen Zeller [ 15/May/13 ]
[REMOVED FROM SERVER MANUAL] 1. Remove section from documentation in couchbase-getting-started-install-macosx.html "When you open Couchbase for the first time, you will be asked.."

[REMOVED]

2. Remove snippet from readme.txt : "After installing the command-line tools...."*

[ATTACHED new readme and sent via email]
Comment by Karen Zeller [ 15/May/13 ]
[REMOVED FROM SERVER MANUAL] 1. Remove section from documentation in couchbase-getting-started-install-macosx.html "When you open Couchbase for the first time, you will be asked.."

[REMOVED]

2. Remove snippet from readme.txt : "After installing the command-line tools...."*

[ATTACHED new readme and sent via email]
Comment by Karen Zeller [ 16/May/13 ]
I just discussed this with Maria/Tony. In the past we apparently did not have an upgrade path for Mac, in which case you always had to destroy these files to move from 1.8, 2.0. In this version there will be an upgrade path for 2.0.2 in which case you may now need to keep these files. This I just discovered from Tony.

Maria asked that I reopen the ticket and assign to Chisheng who is testing the upgrade process and will assign to me once he provides the correct information for 2.0.2.
Comment by Karen Zeller [ 21/May/13 ]
Consolidating all README Fixes for 2.0.2 for all platforms into MB-8302




[MB-8319] [DOC? 2.0.2] View compaction is triggered during rebalance when auto-compaction is disabled Created: 20/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation, ns_server
Affects Version/s: 2.0.1, 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Pavel Paulau Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Centos 64-bit

 Description   
Steps:
1. Set fragmentation threshold to 100% (both data and index)
2. Start workload, view fragmentation is high, >90%.
3. Rebalance cluster

As result view compaction kicks in. Bucket compaction doesn't kick in.

Not sure if it's a bug but behavior is very confusing. At least I didn't find documentation that explains it.

Diags for 2.0.2, just in case:
http://172.23.96.10:8080/view/views/job/apollo-views/116/artifact/

Basically I just want to run a test with no view compaction.

 Comments   
Comment by Aleksey Kondratenko [ 20/May/13 ]
The reason is that it's known that cleanup vbuckets (those that we're removing from index) are not properly accounted for in index fragmentation. After background cleanup is complete that stuff will be deleted (causing more fragmentation) after which compaction will need another pass. So our logic is to do view compaction after certain number of incoming or outgoing moves from certain nodes regardless of anything. This can be disabled via internal settings.
Comment by Aleksey Kondratenko [ 20/May/13 ]
I have already replied above. Let me know if that's not enough.
Comment by Pavel Paulau [ 21/May/13 ]
could we create a separate task for documentation and keep this open? imho we should change this behavior eventually.




[MB-6972] [Doc when ready] distribute couchbase-server through yum and debian package repositories Created: 19/Oct/12  Updated: 21/May/13

Status: Reopened
Project: Couchbase Server
Component/s: build, documentation
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Improvement Priority: Critical
Reporter: Farshid Ghods Assignee: Phil Labee
Resolution: Unresolved Votes: 1
Labels: 2.0.1-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
blocks MB-7821 yum install couchbase-server from cou... Resolved
Duplicate
duplicates MB-2299 Create signed RPM's Resolved
Flagged:
Release Note

 Description   
this helps us in handling dependencies that are needed for couchbase server
sdk team has already implemented this for various sdk packages.

we might have to make some changes to our packaging metadata to work with this schema

 Comments   
Comment by Steve Yen [ 26/Nov/12 ]
to 2.0.2 per bug-scrub

first step is do the repositories?
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Steve Yen [ 26/Nov/12 ]
back to 2.01, per bug-scrub
Comment by Farshid Ghods [ 19/Dec/12 ]
Phil,
please sync up with Farshid and get instructions that Sergey and Pavel sent
Comment by Farshid Ghods [ 28/Jan/13 ]
we should resolve this task once 2.0.1 is released .
Comment by Dipti Borkar [ 29/Jan/13 ]
Have we figured out the upgrade process moving forward. for example from 2.0.1 to 2.0.2 or 2.0.1 to 2.1 ?
Comment by Jin Lim [ 04/Feb/13 ]
Please ensure that we also confirm/validate the upgrade process moving from 2.0.1 to 2.0.2. Thanks.
Comment by Phil Labee [ 06/Feb/13 ]
Now have DEB repo working, but another issue has come up: We need to distribute the public key so that users can install the key before running apt-get.

wiki page has been updated.
Comment by Karen Zeller [ 14/Feb/13 ]
Added to 2.0.1 RN as:

Fix:

We now provide Couchbase Server as a yum and Debian package
repositories.
Comment by Matt Ingenthron [ 09/Apr/13 ]
What are the public URLs for these repositories? This was mentioned in the release notes here:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-server-rn_2-0-0l.html
Comment by Matt Ingenthron [ 09/Apr/13 ]
Reopening, since this isn't documented that I can find. Apologies if I'm just missing it.
Comment by Dipti Borkar [ 23/Apr/13 ]
Anil, can you work with Phil to see what are the next steps here?
Comment by Anil Kumar [ 24/Apr/13 ]
Yes I'll be having discussion with Phil and will update here with details.
Comment by Tim Ray [ 28/Apr/13 ]
could we either remove the note about yum/deb repo's in the release notes or get those repo locations / sample files / keys added to public pages? The only links that seem that they 'might' contain the info point to internal pages I don't have access to.
Comment by Anil Kumar [ 14/May/13 ]
thanks Tim, we have removed it from release notes. we will add instructions about yum/deb repo's locations/files/keys to documentation once its available. thanks!
Comment by Karen Zeller [ 14/May/13 ]
Removing duplicate ticket:

http://www.couchbase.com/issues/browse/MB-7860
Comment by Phil Labee [ 21/May/13 ]

Debian install instructions:

    http://hub.internal.couchbase.com/confluence/display/CR/How+to+Download+from+a+Linux+Repo+--+debian

reviewed, up to "STOP: Instructions verified to here."

To verify that couchbase-server was installed correctly, see http://10.1.3.105:8091/index.html




[MB-8240] memslap vbucketkeygen vbuckettool is not working with MAC for latest build Created: 09/May/13  Updated: 21/May/13  Resolved: 20/May/13

Status: Closed
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Chisheng Hong Assignee: Chisheng Hong
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 2.0.2-793-rel

Operating System: MacOSX 64-bit

 Description   
Chishengs-MacBook-Pro:tools chisheng$ ./memslap -h
dyld: Library not loaded: /opt/couchbase/lib/libmemcached.6.dylib
  Referenced from: /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/tools/./memslap
  Reason: image not found
Trace/BPT trap: 5


Chishengs-MacBook-Pro:tools chisheng$ ./vbuckettool -h
dyld: Library not loaded: /opt/couchbase/lib/libvbucket.1.dylib
  Referenced from: /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/tools/./vbuckettool
  Reason: image not found
Trace/BPT trap: 5

Chishengs-MacBook-Pro:tools chisheng$ ./vbucketkeygen -h
dyld: Library not loaded: /opt/couchbase/lib/libvbucket.1.dylib
  Referenced from: /Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/tools/./vbucketkeygen
  Reason: image not found
Trace/BPT trap: 5

 Comments   
Comment by Ravi Mayuram [ 13/May/13 ]
Have asked Traun L to look into this. Cant yet assign it to him.
Comment by Traun Leyden [ 13/May/13 ]
I'm getting the same error in my local installation.

It's trying to load some .dylib libraries from a non-existent, hardcoded path (/opt/couchbase/lib).

The obvious fix would be to change it to load libraries from a relative path (eg, "../../lib/")

Ravi: in order to dig in deeper, I'd need to know the location of the source code repository where these tools are stored (eg, the memslap tool).
Comment by Ravi Mayuram [ 13/May/13 ]
/Users/ravi/couchsrc/2.0.2
(or wherever you source tree is)
./libvbucket/src/vbucket.c
./libvbucket/src/vbucketkeygen.c
./libvbucket/src/vbuckettool.c
Comment by Traun Leyden [ 13/May/13 ]
I don't have a source tree.. I don't even know where the repository is to clone from.

In CouchBase mobile the repositories are hosted on github. Not sure what the situation is for couchbase server.
Comment by Traun Leyden [ 16/May/13 ]
Update: I've got the source tree and it's building now.

After digging into the issue, I've got a "lead". On the buildbot machine, the script which is supposed to re-link the .dylib libraries does not seem to be doing anything:

http://qa.hq.northscale.net:8010/builders/mac-x64-21-builder/builds/61/steps/couchbase-server%20make%20community%20/logs/stdio

Actual output
==========

Fixing library imports in bin ...
/Users/buildbot/mac-x64-21-builder/build/build/couchdbx-app/Couchbase Server/install_libraries.rb:38: warning: Insecure world writable dir /opt/couchbase in PATH, mode 040777

Fixing library imports in lib ...

Done fixing library imports!

Expected output
============

It should be outputting "Change import x to y" for the memslap and other binaries.

I sat down with Phil and we dug into it, and were not able to find the root cause. At the end we decided that I should commit some debugging information into the install_libraries.rb script, and he would re-run the build and we'd take a look at the updated output.
Comment by Traun Leyden [ 17/May/13 ]
OK this is fixed by modifying the install_libraries.rb script to recurse into the bin/tools directory. Prior to this fix, this script was only fixing up the .dylib links in the bin directory, but not any subdirectories such as bin/tools, where the memslap, vbucketkeygen, and vbuckettool's live.

I verified the fix by doing the following:

- Install a clean version of Mountain Lion in a VMWare Fusion virtual machine
- Install Couchbase server from this build: http://builder.hq.couchbase.com/get/couchbase-server-community_x86_64_2.0.2-806-rel.zip
- Cd into /Applications/Couchbase Server.app/../../bin/tools directory
- Run ./memslap and the other tools in this directory, verify that there are no .dylib related errors

This fix has only been done on the 2.02 branch.

@Phil: will this fix automatically make it back into master, or do I need to explicitly push it to master too?

(will close the ticket after I hear back on this)
Comment by Maria McDuff [ 20/May/13 ]
phil to work on. after phil is done, pls assign to QE.
Comment by Phil Labee [ 20/May/13 ]
The executables run without dylib errors on build machine under mac-x64-202-builder directory.
Comment by Phil Labee [ 20/May/13 ]
fix merged to master: http://review.couchbase.org/#/c/26420/
Comment by Phil Labee [ 20/May/13 ]
fix is in 2.0.2 builds starting with 806

Comment by Maria McDuff [ 21/May/13 ]
pls verify / close.
Comment by Chisheng Hong [ 21/May/13 ]
Verified with build 2.0.2-807-rel

Chishengs-MacBook-Pro:tools chisheng$ ./memslap -s 127.0.0.1:11211 -t 2m -v 0.2 -e 0.05 -b
servers : 127.0.0.1:11211
threads count: 1
concurrency: 16
run time: 120s
windows size: 10k
set proportion: set_prop=0.10
get proportion: get_prop=0.90
Assertion failed: (pos <= buf), function ms_print_memslap_stats, file clients/memslap.c, line 793.
Abort trap: 6

Chishengs-MacBook-Pro:tools chisheng$ curl http://127.0.0.1:8091/pools/default/buckets/default | ./vbucketkeygen - 5 10000
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 2716 100 2716 0 0 694k 0 --:--:-- --:--:-- --:--:-- 1326k
key_0000000046 0
key_0000000072 0
key_0000000293 0
key_0000000303 0
key_0000000337 0
key_0000000002 1
key_0000000036 1
key_0000000192 1
key_0000000347 1
key_0000000373 1
key_0000000146 2
key_0000000172 2
key_0000000203 2
key_0000000237 2
key_0000000393 2
key_0000000092 3
key_0000000102 3
key_0000000136 3
key_0000000247 3
key_0000000273 3
key_0000000093 4
key_0000000103 4
key_0000000137 4
key_0000000246 4
key_0000000272 4
key_0000000147 5
key_0000000173 5
key_0000000202 5
key_0000000236 5
key_0000000392 5
key_0000000003 6
key_0000000037 6
key_0000000193 6
key_0000000346 6
key_0000000372 6
key_0000000047 7
key_0000000073 7
key_0000000292 7
key_0000000302 7
key_0000000336 7
key_0000000001 8
key_0000000019 8
key_0000000035 8
key_0000000189 8
key_0000000191 8
key_0000000045 9
key_0000000069 9
key_0000000071 9
key_0000000288 9
key_0000000290 9
key_0000000089 10
key_0000000091 10
key_0000000101 10
key_0000000119 10
key_0000000135 10
key_0000000145 11
key_0000000169 11
key_0000000171 11
key_0000000200 11
key_0000000218 11
key_0000000144 12
key_0000000168 12
key_0000000170 12
key_0000000201 12
key_0000000219 12
key_0000000088 13
key_0000000090 13
key_0000000100 13
key_0000000118 13
key_0000000134 13
key_0000000044 14
key_0000000068 14
key_0000000070 14
key_0000000289 14
key_0000000291 14
key_0000000000 15
key_0000000018 15
key_0000000034 15
key_0000000188 15
key_0000000190 15
key_0000000155 16
key_0000000161 16
key_0000000179 16
key_0000000208 16
key_0000000210 16
key_0000000081 17
key_0000000099 17
key_0000000109 17
key_0000000111 17
key_0000000125 17
key_0000000055 18
key_0000000061 18
key_0000000079 18
key_0000000280 18
key_0000000298 18
key_0000000009 19
key_0000000011 19
key_0000000025 19
key_0000000181 19
key_0000000199 19
key_0000000008 20
key_0000000010 20
key_0000000024 20
key_0000000180 20
key_0000000198 20
key_0000000054 21
key_0000000060 21
key_0000000078 21
key_0000000281 21
key_0000000299 21
key_0000000080 22
key_0000000098 22
key_0000000108 22
key_0000000110 22
key_0000000124 22
key_0000000154 23
key_0000000160 23
key_0000000178 23
key_0000000209 23
key_0000000211 23
key_0000000082 24
key_0000000112 24
key_0000000126 24
key_0000000257 24
key_0000000263 24
key_0000000156 25
key_0000000162 25
key_0000000213 25
key_0000000227 25
key_0000000383 25
key_0000000012 26
key_0000000026 26
key_0000000182 26
key_0000000357 26
key_0000000363 26
key_0000000056 27
key_0000000062 27
key_0000000283 27
key_0000000313 27
key_0000000327 27
key_0000000057 28
key_0000000063 28
key_0000000282 28
key_0000000312 28
key_0000000326 28
key_0000000013 29
key_0000000027 29
key_0000000183 29
key_0000000356 29
key_0000000362 29
key_0000000157 30
key_0000000163 30
key_0000000212 30
key_0000000226 30
key_0000000382 30
key_0000000083 31
key_0000000113 31
key_0000000127 31
key_0000000256 31
key_0000000262 31
key_0000000094 32
key_0000000104 32
key_0000000128 32
key_0000000130 32
key_0000000241 32
key_0000000140 33
key_0000000158 33
key_0000000174 33
key_0000000205 33
key_0000000229 33
key_0000000004 34
key_0000000028 34
key_0000000030 34
key_0000000194 34
key_0000000341 34
key_0000000040 35
key_0000000058 35
key_0000000074 35
key_0000000295 35
key_0000000305 35
key_0000000041 36
key_0000000059 36
key_0000000075 36
key_0000000294 36
key_0000000304 36
key_0000000005 37
key_0000000029 37
key_0000000031 37
key_0000000195 37
key_0000000340 37
key_0000000141 38
key_0000000159 38
key_0000000175 38
key_0000000204 38
key_0000000228 38
key_0000000095 39
key_0000000105 39
key_0000000129 39
key_0000000131 39
key_0000000240 39
key_0000000143 40
key_0000000177 40
key_0000000206 40
key_0000000232 40
key_0000000396 40
key_0000000097 41
key_0000000107 41
key_0000000133 41
key_0000000242 41
key_0000000276 41
key_0000000043 42
key_0000000077 42
key_0000000296 42
key_0000000306 42
key_0000000332 42
key_0000000007 43
key_0000000033 43
key_0000000197 43
key_0000000342 43
key_0000000376 43
key_0000000006 44
key_0000000032 44
key_0000000196 44
key_0000000343 44
key_0000000377 44
key_0000000042 45
key_0000000076 45
key_0000000297 45
key_0000000307 45
key_0000000333 45
key_0000000096 46
key_0000000106 46
key_0000000132 46
key_0000000243 46
key_0000000277 46
key_0000000142 47
key_0000000176 47
key_0000000207 47
key_0000000233 47
key_0000000397 47
key_0000000017 48
key_0000000023 48
key_0000000187 48
key_0000000352 48
key_0000000366 48
key_0000000053 49
key_0000000067 49
key_0000000286 49
key_0000000316 49
key_0000000322 49
key_0000000087 50
key_0000000117 50
key_0000000123 50
key_0000000252 50
key_0000000266 50
key_0000000153 51
key_0000000167 51
key_0000000216 51
key_0000000222 51
key_0000000386 51
key_0000000152 52
key_0000000166 52
key_0000000217 52
key_0000000223 52
key_0000000387 52
key_0000000086 53
key_0000000116 53
key_0000000122 53
key_0000000253 53
key_0000000267 53
key_0000000052 54
key_0000000066 54
key_0000000287 54
key_0000000317 54
key_0000000323 54
key_0000000016 55
key_0000000022 55
key_0000000186 55
key_0000000353 55
key_0000000367 55
key_0000000048 56
key_0000000050 56
key_0000000064 56
key_0000000285 56
key_0000000315 56
key_0000000014 57
key_0000000020 57
key_0000000038 57
key_0000000184 57
key_0000000349 57
key_0000000148 58
key_0000000150 58
key_0000000164 58
key_0000000215 58
key_0000000221 58
key_0000000084 59
key_0000000114 59
key_0000000120 59
key_0000000138 59
key_0000000249 59
key_0000000085 60
key_0000000115 60
key_0000000121 60
key_0000000139 60
key_0000000248 60
key_0000000149 61
key_0000000151 61
key_0000000165 61
key_0000000214 61
key_0000000220 61
key_0000000015 62
key_0000000021 62
key_0000000039 62
key_0000000185 62
key_0000000348 62
key_0000000049 63
key_0000000051 63
key_0000000065 63
key_0000000284 63
key_0000000314 63


Chishengs-MacBook-Pro:tools chisheng$ curl http://127.0.0.1:8091/pools/default/buckets/default | ./vbuckettool - some_key another_key
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
100 2716 100 2716 0 0 667k 0 --:--:-- --:--:-- --:--:-- 884k
key: some_key master: 127.0.0.1:11210 vBucketId: 4 couchApiBase: http://127.0.0.1:8092/default replicas:
key: another_key master: 127.0.0.1:11210 vBucketId: 62 couchApiBase: http://127.0.0.1:8092/default replicas:




[MB-7959] [Doc'd]: New 2.0.2 XDCR Contin. Optimistic, new XDCR stats, and more XDCR background Created: 22/Mar/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Improvement Priority: Trivial
Reporter: Karen Zeller Assignee: Perry Krug
Resolution: Unresolved Votes: 0
Labels: PM-PRIORITIZED, info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Provide information to document on new parameter for 2.0.2, then assign to Karen to add.

 Comments   
Comment by Junyi Xie [ 28/Mar/13 ]
I have already provided comment at MB-7837
http://www.couchbase.com/issues/browse/MB-7837

Copy and Paste
============
The comment below is for documentation purpose of 2.0.2 @Karen Zeller


In 2.0.2 we will introduce a new parameter, namely, "xdcr_optimistic_replication_threshold", which is a non-negative integer parameter, in unit of bytes. It is 256 bytes by default.

This paraemter is used by XDCR to split docs list into two: a list of big docs, whose docs are all bigger than the threshold parameter, and a list of small docs
whose docs are no greater than it. For small docs, we skip all revs_diff operations and optimistically send them directly to the remote cluster. That eliminates the first getMeta operation from source to destination. If the doc from source fails conflict resolution, it would be discarded by destination node. The correctness of XDCR is not affected. This new behavior of replication is to optimize replication latency, especially for small documents, at possible cost of bandwidth waste.

For big docs whose doc body is bigger than "xdcr_optimistic_replication_threshold", we still keep the current XDCR behavior, that is, for each doc, we first send revs_diff t and then only send those docs surviving conflict resolution at remote node. This behavior is to optimize bandwidth instead of latency, since we never send any doc if it fails conflict resolution at destination, however, it may not latency optimized since for each doc we need to do two sequential operations, metadata operation to get the list of keys that indeed need to be replicated, and then send these docs.

By this parameter, users are able to continuously to determine which docs should be replicated optimistically. As a result, uses are able to choose between latency optimized or bandwidth optimized in practically. At one extreme if we set this parameter to 0, all docs will be treated as "big docs" and sent to remote conservatively to save bandwidth. At the other extreme when the parameter is set to be a significantly big value, all updates are considered "small docs" and will be sent optimistically to remote side in favor of latency.

Please note the deletion, however, is always treated as "a small doc" and sent optimistically, regardless of its doc size and the parameter, because there is no benefit to send revs_diff for deletions at all.



The corresponding environment parameter is:

"XDCR_OPTIMISTIC_REPLICATION_THRESHOLD"

and users can always override the ns_server parameter using the env
parameter.

Comment by Karen Zeller [ 01/May/13 ]
This will be a new setting 2.0.2 available in Web Console and also settable via REST-API.

As of 5/1 not yet in UI.

User clicks on "Edit Internal Settings" (need to be enabled - separate topic)
Comment by Karen Zeller [ 01/May/13 ]
Hi Junyi,

Please confirm the endpoint: is this an addition to the REST XDCR Internal settings endpoint:


http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-restapi-xdcr-change-settings.html


From Anil:

pre 2.0.2 - xdcr checks metadata for document at destination. Then sends doc into replication queue. With this change, if document under 256 byte (i.e. small document) it won't do this check it will send immediately into replication queue.

Abhimav: has some performance stats.....
Comment by Junyi Xie [ 01/May/13 ]
Karen,

Yes, in 2.0.2, we have one more REST XDCR Internal setting endpoint, namely "xdcrOptimisticReplicationThreshold", which can be changed in a similar way to existing XDCR-related settings

xdcrOptimisticReplicationThreshold (Integer)

Its semantics is non-negative integer. What Anil described is correct, for any doc under that threshold, we send it optimistically without checking.


And one more note , for delete mutation, since the body size is always 0 (no doc body for a deletion), we ALWAYS send it optimistically.


Please ask Anil or me if you have more questions, thanks.
Comment by Karen Zeller [ 06/May/13 ]
Hi Junyi,

Please send:

1) REST endpoint for setting the parameter
2) Example of setting the endpoint with the parameter
3) example of setting the parameter as an environment variable.



Thanks,

Karen

Comment by Karen Zeller [ 06/May/13 ]
Added this info to release notes:

You can now tune the performance of XDCR with a new parameter,
<literal>xdcr_optimistic_replication_threshold</literal>. XDCR usually gets metadata from every document before it puts
the document in a queue to be replicated. When there are conflicts between two versions of a document, XDCR uses
this metadata to resolve this conflict.</para>

<para>If a document is smaller than the number of bytes
provided with this parameter, XDCR immediately puts it into the
replication queue without getting document metadata. If a document is deleted, this update will also be
sent immediately to a destination cluster. If the document from a source cluster fails
conflict resolution, Couchbase Server discards it on the destination. This new feature improves replication
latency, particularly for small documents. See
Comment by Junyi Xie [ 07/May/13 ]
Hi Karen,

1) REST endpoint for setting the parameter
[jx]: the new endpoint introduced in 202 is xdcrOptimisticReplicationThreshold

2) Example of setting the endpoint with the parameter
[jx]: Users can change the parameter like other endpoints, e.g.,

curl -X POST -u Administrator:asdasd http://localhost:8091/internalSettings -d xdcrOptimisticReplicationThreshold=1024

3) example of setting the parameter as an environment variable.
[jx]: User can also use env parameter to change the value, e.g.,

export XDCR_OPTIMISTIC_REPLICATION_THRESHOLD=1024

Comment by Karen Zeller [ 09/May/13 ]
more from Junyi:

In the log in xdcr.1, xdcr.2, , look for

 "out of all 11 docs, number of small docs (including dels: 0) is 0, number o\
f big docs is 11, threshold is 256 bytes,
    after conflict resolution at target ("http://Administrator:asdasd@127.0.0.1:9501/default%2f3%3ba19c9d4e733a97fa7cb38daa4113d034/"), out of all big 11 docs the number of docs we need to replicate is: 11; total # of docs to be replicated is: 11, total latency: 142 ms


How to Monitor: The easiest way is to look at the UI, at section "Incoming XDCR Operations" of destination cluster,

Comment by Karen Zeller [ 09/May/13 ]
separate Engineering ticket on features: MB-7837
Comment by Karen Zeller [ 09/May/13 ]
Sent draft out. In review phase.
Comment by Perry Krug [ 13/May/13 ]
[FIXED]-Pages 140-141 - I don't think we want the "1000" in front of the name of various stats, that's just the specific value of the statistic shown in the picture

[NEED MORE INFO- Engineering] -Pages 140-141 - It would be really helpful to provide a bit more explanation to what these statistics are showing. The read is not really getting much information...for instance, what is a "waiting vbucket replicator"?

[FIXED with input from Abhinav]-Page 224 - All the discussion about the impact of changing the threshold is a little confusing. "the chance that the right document version gets accidentally discarded on the destination cluster" is definitely not correct...the "right" version will always win no matter what, and checking the metadata before hand or not will not change that. I think engineering needs to provide better wording around this.

[NEED MORE INFO- Engineering] -Page 225 - I think we need to double-check on whether metadata reads/sec will be double or equal to sets per second. I don't personally understand (which means the reader won't) why there would be more than one metadata read for every set operation, and why setting the threshold high wouldn't totally eliminate metadata reads.



[FIXED]-Page 225, there is an extra period (.) after "sets per sec" in two places

[FIXED- commented out 2.0.2 per PM and cross referenced from XDCR for 2.0 and 2.0.2] -Why do we have the exact same list of statistics and descriptions on page 140-141 as we do on page 109...seems like a case for us to duplicate information, have to change in two places, etc...

[MOVED and reworked in XDCR section from REST, Need info from PM if we want to get into this detail on conflict resolution and the # of gets]

-I think there should be a more detailed discussion about optimistic XDCR within the main XDCR section, likely in the "behavior" area. Right now it is just in the "modifying settings" section which someone might not read if they don't think they want to change anything...but it's important to understand what it is and how it works so I think it should be discussed separately. Then maybe you can just point to the instructions for changing it rather than repeat the same.
Comment by Junyi Xie [ 13/May/13 ]
Perry,

Thanks for great comments. Anil, Karen and I had a meeting that we will expose more details about optimistic xdcr, because in 2.0.2, the threshold parameter will be exposed to users and in order for users to tune it, users first need to understand it. Karen will modify the draft and send everybody an updated version.


Per your questions, here is my response.

[FIXED]-Pages 140-141 - I don't think we want the "1000" in front of the name of various stats, that's just the specific value of the statistic shown in the picture
[JX]: I agree. 1000 here is just an example, shall not appear as part of stat name.


[ADDED] -Pages 140-141 - It would be really helpful to provide a bit more explanation to what these statistics are showing. The read is not really getting much information...for instance, what is a "waiting vbucket replicator"?
[ADDED]: "waiting vbucket replicators" are the replicators waiting for the token to start replication. Recall we have a limit of the max concurrent vbucket replicators on each node, by default it is 32, so, when you have more than 32 vbuckets that need to replicate, some of them need to wait for the token. Once active vbucket replicator finish replicating, it will release token so the next waiting vbucket replicator can be activated and start replication.

Karen, do we explain this behavior in doc? If not, we definitely need to add it.
 
[FIXED with input from Abhinav]-Page 224 - All the discussion about the impact of changing the threshold is a little confusing. "the chance that the right document version gets accidentally discarded on the destination cluster" is definitely not correct...the "right" version will always win no matter what, and checking the metadata before hand or not will not change that. I think engineering needs to provide better wording around this.
[ADDED]. I agree. We need choose better wording. We NEVER discard any right doc, we only discard the doc that failed conflict resolution. The right wording should be something like "(with smaller threshold, the chance we send a doc that is eventually discarded at destination due to failed conflict resolution will be lower". I leave the wording to Karen.

[ADDED] -Page 225 - I think we need to double-check on whether metadata reads/sec will be double or equal to sets per second. I don't personally understand (which means the reader won't) why there would be more than one metadata read for every set operation, and why setting the threshold high wouldn't totally eliminate metadata reads.

[ADDED- JX]. Karen now understand the protocol and she will explain that in doc. In short, per each setMeta, we have 2 getMeta before 2.0.2, the first getMeta from source to destination is for conflict resolution, the dest will respond source with a list of keys that really need to be sent, for example, out of 500 keys, source only need to send 100 of them which win the conflict resolution.

The 2nd getMeta is implicitly coupled with setMeta. For example, the source send 100 docs to destination, for each doc, the destination will issue a getMeta before setMeta, this is important because we need to make sure no change to this key happend since the first getMeta in conflict resolution.

This is under-the-hood details we used to hide from users, but now we need to explain it in documentation. Hope I explain it clearly, please let me know if you have more questions.

[CREATED NEW - PK] -I think there should be a more detailed discussion about optimistic XDCR within the main XDCR section, likely in the "behavior" area. Right now it is just in the "modifying settings" section which someone might not read if they don't think they want to change anything...but it's important to understand what it is and how it works so I think it should be discussed separately. Then maybe you can just point to the instructions for changing it rather than repeat the same.

[ADDED - JX] - Totally agree. Karen will update the draft with more details.


Comment by Karen Zeller [ 15/May/13 ]
sent for FINAL review.
Comment by Perry Krug [ 21/May/13 ]
Thanks a lot Karen, this is really looking good.

A few more comments from me:
[Need Clarification on how this works.....]

-Page 108 under cancelling replication...it would be good to discuss what happens if you recreate the replication stream. While not all data will be necessarily resynced, we will perform a metadata check on every item and determine whether to resend it. This needs to be further correlated with the new optimistic XDCR in that each item will be evaluated against the item size threshold and some might actually be resent when they don't need to be.

[FIXED] -Bottom of page 109: "This means that the a document " should be "This means that a document"
[REMOVED 2 Extra Xrefs]-Page 110: I don't think we need a link to the "Changing XDCR settings" in multiple places under Optimistic Replication. I think the "Changing the document threshold" is enough.

[FIXED] -Top of page 110: "which you can find int" should be "which you can find in"

-Page 111: Is section 5.8.11 really necessary in its entirety? Seems like a bit of repetitive information...And how does 5.8.12 relate and should it be a separate section?
[Note to Perry] - We got the feedback that we should reference the XDCR section from the XDCR-REST sections and vice versa. That is the purpose of 5.8.11, but I now removed the Optimistic one since it already has it's own section and made it just a cross-ref on REST and XDCR. There are some other XDCR settings we want to highlight and cross-ref from here, particularly the xdcrMaxConcurrentReps.

On 5.8.12 this is Retry setting for XDCR which one "would think" should just be another REST endpoint, but it is not implemented that way. You have to do it as an environment variable.

So It is in it's own section. I consolidated that into a bigger section for both REST and this one-off-non-rest-XDCR setting called "Changing XDCR Settings" the drawback of this is that I cannot cross reference it by element ID if it just lives with a bolded paragraph with the REST-XDCR stuff.......


 
-On page 147, we link back to Section 5.8.8 to describe the metadata stat, conflict resolution, etc. I seems to me that 5.8.8 is correctly named as "Behavior and Limitations" but should actually have 5.8.9 and 5.8.10 as sub-categories since they are both "behaviors". In PDF format this reads okay, but in the Web-based docs I think it's less obvious that all these are related.
-The REST API stats section (page 228) is pretty good as it is, but I think it could be improved a bit by providing more example output so the user knows what to expect

Thanks again Karen!
Comment by Karen Zeller [ 21/May/13 ]
Can you list 3-4 specific example REST calls that are the most useful for the XDCR stats endpoints? I will assign to engineering and collect from them then document.


Thanks




[MB-8274] items not draining seems... ep-engine is deadlocked Created: 14/May/13  Updated: 21/May/13  Resolved: 14/May/13

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Tommie McAfee Assignee: Tommie McAfee
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File ep_backtrace.log     GZip Archive guinep-s10502.tar.gz    

 Description   
during xdcr longevity test memcached kept restarting on a node
ui reports: Control connection to memcached on 'ns_1@172.23.105.57' disconnected: {badmatch,
{error,
couldnt_connect_to_memcached}}

(guinep-s1050.sc.couchbase.com)

everytime I do a gdb backtrace I see threads 16,17,18 in the lock() method.

Thread 18 (Thread 0x7f1212849700 (LWP 21113)):
#0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1214436f7a in Mutex::acquire (this=0x641e0f0) at src/mutex.cc:79
#4 0x00007f121447d566 in lock (this=0x641e000) at ./src/locks.hh:48
#5 LockHolder (this=0x641e000) at ./src/locks.hh:26
#6 CouchNotifier::selectBucket (this=0x641e000) at src/couch-kvstore/couch-notifier.cc:723
#7 0x00007f121447dbcf in CouchNotifier::processInput (this=0x641e000) at src/couch-kvstore/couch-notifier.cc:606
#8 0x00007f121447e475 in waitOnce (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at src/couch-kvstore/couch-notifier.cc:675
#9 CouchNotifier::notify_update (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at src/couch-kvstore/couch-notifier.cc:755
#10 0x00007f1214475cb8 in notify_headerpos_update (this=0x643fb00, vbid=760, rev=1, docs=0x323bd880, docinfos=0x323bda40, docCount=56) at ./src/couch-kvstore/couch-notifier.hh:144
#11 CouchKVStore::saveDocs (this=0x643fb00, vbid=760, rev=1, docs=0x323bd880, docinfos=0x323bda40, docCount=56) at src/couch-kvstore/couch-kvstore.cc:1498
#12 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x643fb00) at src/couch-kvstore/couch-kvstore.cc:1410
#13 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806
#14 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x6395c00, vbid=760) at src/ep.cc:1919
#15 0x00007f121442aeb9 in doFlush (this=0x520b7a0, tid=1098) at src/flusher.cc:222
#16 Flusher::step (this=0x520b7a0, tid=1098) at src/flusher.cc:152
#17 0x00007f121443ac10 in ExecutorThread::run (this=0x52bfba0) at src/scheduler.cc:148
#18 0x00007f121443b32d in launch_executor_thread (arg=0x52bfba0) at src/scheduler.cc:34
#19 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f121a2fb90d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f1211e48700 (LWP 21114)):
#0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1214436f7a in Mutex::acquire (this=0x641e0f0) at src/mutex.cc:79
#4 0x00007f121447e2c3 in lock (this=0x641e000, vbs=..., file_version=1, header_offset=217088, cb=...) at ./src/locks.hh:48
#5 LockHolder (this=0x641e000, vbs=..., file_version=1, header_offset=217088, cb=...) at ./src/locks.hh:26
#6 CouchNotifier::notify_update (this=0x641e000, vbs=..., file_version=1, header_offset=217088, cb=...) at src/couch-kvstore/couch-notifier.cc:753
#7 0x00007f1214475cb8 in notify_headerpos_update (this=0x64d4c00, vbid=693, rev=1, docs=0x93f4480, docinfos=0x94186c0, docCount=72) at ./src/couch-kvstore/couch-notifier.hh:144
#8 CouchKVStore::saveDocs (this=0x64d4c00, vbid=693, rev=1, docs=0x93f4480, docinfos=0x94186c0, docCount=72) at src/couch-kvstore/couch-kvstore.cc:1498
#9 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x64d4c00) at src/couch-kvstore/couch-kvstore.cc:1410
#10 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806
#11 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x6395c00, vbid=693) at src/ep.cc:1919
#12 0x00007f121442aeb9 in doFlush (this=0x520b680, tid=1095) at src/flusher.cc:222
#13 Flusher::step (this=0x520b680, tid=1095) at src/flusher.cc:152
#14 0x00007f121443ac10 in ExecutorThread::run (this=0x52bfa00) at src/scheduler.cc:148
#15 0x00007f121443b32d in launch_executor_thread (arg=0x52bfa00) at src/scheduler.cc:34
#16 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f121a2fb90d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f1211447700 (LWP 21115)):
#0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1214436f7a in Mutex::acquire (this=0x641e0f0) at src/mutex.cc:79
#4 0x00007f121447e2c3 in lock (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at ./src/locks.hh:48
#5 LockHolder (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at ./src/locks.hh:26
#6 CouchNotifier::notify_update (this=0x641e000, vbs=..., file_version=1, header_offset=212992, cb=...) at src/couch-kvstore/couch-notifier.cc:753
#7 0x00007f1214475cb8 in notify_headerpos_update (this=0x64d4600, vbid=750, rev=1, docs=0x321a4480, docinfos=0x321a4fc0, docCount=65) at ./src/couch-kvstore/couch-notifier.hh:144
#8 CouchKVStore::saveDocs (this=0x64d4600, vbid=750, rev=1, docs=0x321a4480, docinfos=0x321a4fc0, docCount=65) at src/couch-kvstore/couch-kvstore.cc:1498
#9 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x64d4600) at src/couch-kvstore/couch-kvstore.cc:1410
#10 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806
#11 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x6395c00, vbid=750) at src/ep.cc:1919
#12 0x00007f121442aeb9 in doFlush (this=0x520b560, tid=1096) at src/flusher.cc:222
#13 Flusher::step (this=0x520b560, tid=1096) at src/flusher.cc:152
#14 0x00007f121443ac10 in ExecutorThread::run (this=0x52bf860) at src/scheduler.cc:148
#15 0x00007f121443b32d in launch_executor_thread (arg=0x52bf860) at src/scheduler.cc:34
#16 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f121a2fb90d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f1210a46700 (LWP 21116)):
#0 0x00007f121a5b4054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f121a5af388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f121a5af257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1214436f7a in Mutex::acquire (this=0x528f0f0) at src/mutex.cc:79
#4 0x00007f121447d566 in lock (this=0x528f000) at ./src/locks.hh:48
#5 LockHolder (this=0x528f000) at ./src/locks.hh:26
#6 CouchNotifier::selectBucket (this=0x528f000) at src/couch-kvstore/couch-notifier.cc:723
#7 0x00007f121447dbcf in CouchNotifier::processInput (this=0x528f000) at src/couch-kvstore/couch-notifier.cc:606
#8 0x00007f121447e475 in waitOnce (this=0x528f000, vbs=..., file_version=2, header_offset=1798144, cb=...) at src/couch-kvstore/couch-notifier.cc:675
#9 CouchNotifier::notify_update (this=0x528f000, vbs=..., file_version=2, header_offset=1798144, cb=...) at src/couch-kvstore/couch-notifier.cc:755
#10 0x00007f1214475cb8 in notify_headerpos_update (this=0x5295b00, vbid=1023, rev=2, docs=0x1a897c00, docinfos=0x1a896a80, docCount=107) at ./src/couch-kvstore/couch-notifier.hh:144
#11 CouchKVStore::saveDocs (this=0x5295b00, vbid=1023, rev=2, docs=0x1a897c00, docinfos=0x1a896a80, docCount=107) at src/couch-kvstore/couch-kvstore.cc:1498
#12 0x00007f121447628b in CouchKVStore::commit2couchstore (this=0x5295b00) at src/couch-kvstore/couch-kvstore.cc:1410
#13 0x00007f121447647a in CouchKVStore::commit (this=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:806
#14 0x00007f1214401f56 in EventuallyPersistentStore::flushVBucket (this=0x527e000, vbid=1023) at src/ep.cc:1919
#15 0x00007f121442aeb9 in doFlush (this=0x520a360, tid=12) at src/flusher.cc:222
#16 Flusher::step (this=0x520a360, tid=12) at src/flusher.cc:152
#17 0x00007f121443ac10 in ExecutorThread::run (this=0x52e6ea0) at src/scheduler.cc:148
#18 0x00007f121443b32d in launch_executor_thread (arg=0x52e6ea0) at src/scheduler.cc:34
#19 0x00007f121a5ad851 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f121a2fb90d in clone () from /lib64/libc.so.6


there are also errors in memcached.logs about "Too many connections"

I have full backtrace attached and machine is currently live.

 Comments   
Comment by Maria McDuff [ 14/May/13 ]
per bug triage, upgrading to blocker.
Comment by Ketaki Gangal [ 14/May/13 ]
Probably a related issue here http://www.couchbase.com/issues/browse/MB-8259
Comment by Chiyoung Seo [ 14/May/13 ]
Jin,

Please take a look at to see if this is a regression from MRW.
Comment by Mike Wiederhold [ 14/May/13 ]
Jin,

This looks like a deadlock in the couch-notifier. Also, those "Too many connections" messages mean that memcached has too many open connections and cannot accept a new one.
Comment by Jin Lim [ 14/May/13 ]
Yep there is a deadlock. excellent finding! Thanks.
Comment by Jin Lim [ 14/May/13 ]
http://review.couchbase.org/#/c/26300/, fix is uploaded for the review.
Comment by Jin Lim [ 14/May/13 ]
the fix got merged
Comment by Maria McDuff [ 15/May/13 ]
pls verify / close.
Comment by Tommie McAfee [ 21/May/13 ]
verified fix




[MB-8324] some nodes have 0 active items after offline upgrade Created: 20/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Tommie McAfee Assignee: Jin Lim
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive 10.3.121.90-8091-diag.txt.gz     GZip Archive 10.3.121.91-8091-diag.txt.gz     GZip Archive 10.3.2.89-8091-diag.txt.gz     GZip Archive 10.3.2.92-8091-diag.txt.gz     GZip Archive 10.3.3.131-8091-diag.txt.gz     GZip Archive 10.3.3.132-8091-diag.txt.gz     GZip Archive 10.3.3.215-8091-diag.txt.gz     GZip Archive 10.3.3.60-8091-diag.txt.gz     GZip Archive 10.3.3.69-8091-diag.txt.gz     PNG File Screen Shot 2013-05-20 at 7.45.24 PM.png    

 Description   

1) Loaded data into a 9 node cluster at 181.
2) Stopped load, waited for disk-write-queue = 0.
3) shutdown all nodes simultaneously via /etc/init.d/couchbase-server stop
4) upgrade to 2.0.2-807.

Nodes come back up but with less items than before upgrade. 2 nodes have 0 active items.

Impact: apparent data loss
Cluster live ( 10.3.2.89:8091)

all node diags attached.
shutdown occured at timestamp: [2013-05-20 14:37...]
followed by upgrade.

 Comments   
Comment by Jin Lim [ 20/May/13 ]
I took a quick look at this on one of the node that has 0 active items (10.3.3.132) and found below:

As you can see the warmup has completed and loaded 0 item during the warmup. I believe there was already a bug where we have encountered a similar issue. I will discuss this with the ep engine team but for now I won't spend much of times for this yet.

[jin@cen-2201 bin]$ sudo ./cbstats 0.0.0.0:11210 warmup
 ep_warmup: enabled
 ep_warmup_dups: 0
 ep_warmup_estimate_time: 35
 ep_warmup_estimated_key_count: 0
 ep_warmup_estimated_value_count: 0
 ep_warmup_item_expired: 0
 ep_warmup_key_count: 0
 ep_warmup_keys_time: 870
 ep_warmup_min_item_threshold: 100
 ep_warmup_min_memory_threshold: 100
 ep_warmup_oom: 0
 ep_warmup_state: done
 ep_warmup_thread: complete
 ep_warmup_time: 1136
 ep_warmup_value_count: 0



Comment by Maria McDuff [ 21/May/13 ]
Jin,

What's the status?
is this a dupe of another bug?
Or do you have a fix for this?
Comment by Jin Lim [ 21/May/13 ]
No fix but this isn't also a regression in 2.0.2. I believe we had seen simliar warmup issue (as I stated in the above comment), and need to figure out who/what we had provided for a resolution. It may be also possible we just deferred it to next release. Will update after the team meeting.




[MB-7852] [Doc'd] Complete productization of Replica Read API Created: 01/Mar/13  Updated: 21/May/13

Status: Reopened
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: 2.0.2
Security Level: Public

Type: Task Priority: Minor
Reporter: Jin Lim Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: 2.0.2-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
Replica Read API implementation is ready but testing and related documentation have not completed up to the product-feature ready state.

 Comments   
Comment by Maria McDuff [ 21/Mar/13 ]
JIn LIm to confirm with PM that this feature is needed for 2.0.2 release.
Comment by Maria McDuff [ 21/Mar/13 ]
anil to update business justification for this feature.
Comment by Maria McDuff [ 25/Mar/13 ]
ready for qe testing. now in 2.0.2 build.
Comment by Maria McDuff [ 27/Mar/13 ]
Iryna,

pls update with your test progress.
Thanks.
Comment by Iryna Mironava [ 27/Mar/13 ]
CBQE-1107 opened to track progress
Comment by Maria McDuff [ 05/Apr/13 ]
Iryna, pls provide test progress to this feature. thanks.
Comment by Iryna Mironava [ 16/Apr/13 ]
tested cases P0 for centos 64 bit and 32 bit, windows 64 bit, ubuntu 64 and 32 bit. tests passed
Comment by Maria McDuff [ 18/Apr/13 ]
testing done.
iryna --- pls do another round of testing when we get the RC build, end of April. Thanks.
Comment by Iryna Mironava [ 29/Apr/13 ]
build 2.0.2-772-rel tested
Comment by Anil Kumar [ 08/May/13 ]
Please follow up with Jin and Matt for server and client documentation.
Comment by Jin Lim [ 08/May/13 ]
Please review following information
============================

Replica Read
Binary opcode: CMD_GET_REPLICA - 0x83

Description: A new ep_engine specific binary retrieval command that retrieves data correspond to a given key. The command behaves exactly like the existing binary get command, except it returns data for a vbucket that is in replica state (vs active state in case of the normal get state)

Request & Response:
* Both Response and Request header structures for this command is identical to tho ones of the regular get.
* Request example: Replica Get("Hello")
  Field (offset) (value)
  Magic (0) : 0x80
  Opcode (1) : 0x83
  Key length (2,3) : 0x0005
  Extra length (4) : 0x00
  Data type (5) : 0x00
  VBucket (6,7) : 0x0000
  Total body (8-11) : 0x00000005
  Opaque (12-15): 0x00000000
  CAS (16-23): 0x0000000000000000
  Extras : None
  Key (24-29): The textual string: "Hello"
  Value : None

* Response example: Replica Get("Hello") response ex.
  Field (offset) (value)
  Magic (0) : 0x81
  Opcode (1) : 0x83
  Key length (2,3) : 0x0000
  Extra length (4) : 0x04
  Data type (5) : 0x00
  Status (6,7) :0x0000
  Total body (8-11) : 0x00000009
  Opaque (12-15): 0x00000000
  CAS (16-23): 0x0000000000000001
  Key (24-29): The textual string: "Hello"
  Value : The textual string: "World"

Response Status:
* ENGINE_NOT_MY_VBUCKET = 0x0c
  cannot find vbucket with key or
  vbucket is not in replica state

* ENGINE_EWOULDBLOCK = 0x07
   EP Engine would block - vbucket is in pending operation

 Unit Tests:
* test_get_replica - returns data for a vbucket that is in replica state
* test_get_replica_active_state - returns error for a vbucket that is in active state (ENGINE_NOT_MY_VBUCKET)
* test_get_replica_pending_state - returns error for a vbucket that that is in pending state (ENGINE_EWOULDBLOCK)
* test_get_replica_dead_state - returns error for a vbucket that is in dead state (ENGINE_NOT_MY_VBUCKET)
Comment by Karen Zeller [ 14/May/13 ]
Sent to support:

Hi,

I'm tasked with documenting the newly "productized" replica read API. The current and only information to document is below. I am meeting with Jin Thursday this week so I can get additional information that people need to know about this.

Please let me know if there are specific things a developer wants to know that is not already provided by him. I will get it from him.


Thanks

Karen
Comment by Karen Zeller [ 14/May/13 ]
Questions from Tim:

Karen,

One thing to document explicitly, although I assume it is the only
logical way for this to be implemented: a replica read for an item
that is not resident in the caching layer will cause it to be fetched
from disk and its value cached. So the resident % for replica vbuckets
can have a significant impact on replica read performance.

Question for Jin:
is there a stat to track cache miss ratio for
replica reads?

Answer from Jin: ep_bg_fetch is the closest thing, which tells you the total number of background fetches. The one thing you can check is the change pre-replica-reads compared to replica-read scenario for the same data set.
No distinction in underlying ep engine whether it is fetching active or replica.


 bg_fetch for replica reads vs. active reads, or are
those combined in a single stat?

Answer: see above. a single stat for both active and replica reads.

Any other stats that are exposed to
understand performance of replica reads?

Answer: No

I think the most important information for developers depends on the
client SDK, which I guess Matt will have to answer. For example, if
the bucket is configured with more than one replica copy, which
replica server will be queried?

Answer: All true

 If the first attempt at a replica read
fails, will it try a 2nd (and 3rd) before returning an error? How can
that behavior be configured?

Answer: All depends on client.....

Comment by Karen Zeller [ 14/May/13 ]
From Frank:

I think key for developers will be to talk about the potential staleness of the data and be explicit when to use it (active node is unreachable and your app is okay with stale data [maybe mentioning observe for replication to counter is a consideration, though performance impact I believe is quite meaningful]) and when not (to try and spread read load, as you often do with other databases).

Input from Jin: The best use case for replica read from server perspective is when, during the 30 seconds of...

takes 30 seconds for server to detect a node is unavailable and initiate auto-failover, if avaiable. During that time, clients may experience get fails, in which case, clients attempt replica. E.g. if 5 get attempts fail, try the replica read....

imagine mutation has not yet replicated to other node and you do a replica node. You may get the data that last replicated to that node, the current set on the active node. May not the version that was on the active node.

should recommend to user that they can use the CAS operation to determine integrity of replicated data. Basically do a set with CAS and compare cas number from active node with your replica read CAS......Basically for each set, keep the cas and compare it with CAS from replica read.....

Still in case of multiple concurrent sets and gets from multiple clients, there will always be some risk that data is "stale"


Comment by Karen Zeller [ 15/May/13 ]
From Perry:

Agree that the most critical information will be around how, when and when not to use this feature.

-When: you need data and multiple gets continuously fail, then attempt this scenario. If you have SLA by certain time.
-When not: if you cannot afford to return stale data, do not use, or definitely use CAS to mitigate staleness. If you don't care about availability of data, e.g. user profile.
-how: see binary....


 We will also have to bring in both how this is different from a failover (which turns a replica vbucket into an active one)
-Failover: application will continue doing a get because the replicated data becomes available and client will still get it. Functioning nodes with replicated data can still server it. Clients will also automatically know to go to the healthy nodes too.

-Replica read really for scenario that applications cannot function without 30 seconds of downtime and must get data within that timeframe.


 and what happens to this functionality after a failover happens.

-if you attempt a replica read on a node that is promoted to 'active' server will send a failure message (because the replica no longer a replica): ENGINE_NOT_MY_VBUCKET.......up to SDKs on how they handle this error.


Point on SDKs: how they handle errors if replica no longer replica and if they reroute request......Up to SDKs
Comment by Karen Zeller [ 16/May/13 ]
Send this over a

Request & Response:
* Both Response and Request header structures for this command is identical to tho ones of the regular get.
* Request example: Replica Get("Hello")

  Field (offset) (value)
  Magic (0) : 0x80
  Opcode (1) : 0x83
  Key length (2,3) : 0x0005
  Extra length (4) : 0x00
  Data type (5) : 0x00
  VBucket (6,7) : 0x0000
  Total body (8-11) : 0x00000005
  Opaque (12-15): 0x00000000
  CAS (16-23): 0x0000000000000000
  Extras : None
  Key (24-29): The textual string: "Hello"
  Value : None

Everything in the request is same as a binary protocol get request except the Opcode which must 0x83....




* Response example: Replica Get("Hello") response ex.
  Field (offset) (value)
  Magic (0) : 0x81
  Opcode (1) : 0x83
  Key length (2,3) : 0x0000
  Extra length (4) : 0x04
  Data type (5) : 0x00
  Status (6,7) :0x0000
  Total body (8-11) : 0x00000009
  Opaque (12-15): 0x00000000
  CAS (16-23): 0x0000000000000001
  Key (24-29): The textual string: "Hello"
  Value : The textual string: "World"

Get the same type of return value as you have with GET(). no flag, etc. indicate this is replica read get



Response Status:
* ENGINE_NOT_MY_VBUCKET = 0x0c
  cannot find vbucket with key or
  vbucket is not in replica state

occurs if replica node elevated to active.....


* ENGINE_EWOULDBLOCK = 0x07
   EP Engine would block - vbucket is in pending operation

can happen if node undergoing rebalance.


 Unit Tests:

(internal: informational for support)

* test_get_replica - returns data for a vbucket that is in replica state
* test_get_replica_active_state - returns error for a vbucket that is in active state (ENGINE_NOT_MY_VBUCKET)
* test_get_replica_pending_state - returns error for a vbucket that that is in pending state (ENGINE_EWOULDBLOCK)
* test_get_replica_dead_state - returns error for a vbucket that is in dead state (ENGINE_NOT_MY_VBUCKET)
Comment by Karen Zeller [ 20/May/13 ]
Added to 2.0.2 RN and sent PDF for review:

<rnentry type="feature">

<version ver="2.0.0m"/>

<class id="db"/>

<issue type="cb" ref="MB-7852"/>


<rntext>

<para>
Couchbase Server provides a binary protocol for replica read. The command is similar to the existing binary get command, however it returns data for
a vBucket that is in replica state as opposed to an active state. For more information, see
<ulink url="http://www.couchbase.com/docs/couchbase-devguide-2.0/cb-protocol-replica-read.html">Couchbase Developer's Guide, Replica Read</ulink>.</para>


</rntext>

</rnentry>
Comment by Perry Krug [ 21/May/13 ]
Thanks Karen, looks great. Some quick comments from me:
-Page 128 under "Replica Read": "we have binary protocol" should be "we have a binary protocol"
-Page 128: "longer to for human intervention" should be "longer for human intervention"
-Page 128: I think the link to more information on failing over nodes should be higher up in the paragraph (after the 3rd paragram) and/or separated from the discussion about the decision to use the replica read
-Page 128: This seems a little harsh: "We do not recommend you use replica read for any scenario, especially if you do not care about a very high level of availability". I would reword it a bit along the lines of "Using the replica read functionality can introduce the possibility of inconsistent data and our general recommendation is to have application-level logic to deal with short periods of unavailability"




[MB-8314] rebalance exited ns_vbucket_mover failed to initiate_indexing Created: 17/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Tommie McAfee Assignee: Tommie McAfee
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: GZip Archive 10.3.121.69_diag.tar.gz     GZip Archive 10.3.3.131_diags.tar.gz     File erl_crash.dump    

 Description   
Have a 5 node cluster that was upgraded from 181 -> 202 (build 805).
After upgrade attempted to swap out orchestrator(10.3.121.69) and add in a new 202 node (10.3.3.131)


Rebalance fails with ns_vbucket_mover000 on orchestrator reporting:
<0.11037.1> exited with {noproc,
{gen_server,call,
[{'janitor_agent-saslbucket',
'ns_1@10.3.3.131'},
{if_rebalance,<0.1.1>,initiate_indexing},
infinity]}}

On new node 202, couchdb went down and erlang crash dump was generated(attached).


Saw these errors from mccouch which may be why vbuckets couldn't be moved to this node:
Fri May 17 07:27:36.981472 PDT 3: (saslbucket) Trying to connect to mccouch: "127.0.0.1:11213"
Fri May 17 07:27:36.981615 PDT 3: (saslbucket) Connected to mccouch: "127.0.0.1:11213"
Fri May 17 07:27:37.019496 PDT 3: (saslbucket) Connection closed by mccouch
Fri May 17 07:27:37.019527 PDT 3: (saslbucket) Resetting connection to mccouch, lastReceivedCommand = notify_vbucket_update lastSentCommand = notify_vbucket_update currentCommand =unknown
Fri May 17 07:27:37.019595 PDT 3: (saslbucket) Trying to connect to mccouch: "127.0.0.1:11213"
Fri May 17 07:27:37.019730 PDT 3: (saslbucket) Connected to mccouch: "127.0.0.1:11213"
Fri May 17 07:27:37.021763 PDT 3: (saslbucket) Connection closed by mccouch
Fri May 17 07:27:37.021788 PDT 3: (saslbucket) Resetting connection to mccouch, lastReceivedCommand = select_bucket lastSentCommand = notify_vbucket_update currentCommand =unknown

=========================CRASH REPORT=========================
  crasher:
    initial call: mc_connection:init/1
    pid: <0.965.0>
    registered_name: []
    exception error: no case clause matching {error,system_limit}
      in function mc_connection:do_notify_vbucket_update/3
      in call from mc_connection:handle_message/9
      in call from mc_connection:read_full_message/2
      in call from mc_connection:run_loop/2
    ancestors: [mc_conn_sup,mc_sup,ns_server_sup,ns_server_cluster_sup,
                  <0.59.0>]
    messages: []
    links: [<0.641.0>,#Port<0.6784>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 1094838
  neighbours:

[error_logger:error,2013-05-17T7:27:36.622,ns_1@10.3.3.131:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
     Supervisor: {local,mc_conn_sup}
     Context: child_terminated
     Reason: {case_clause,{error,system_limit}}
     Offender: [{pid,<0.965.0>},
                  {name,mc_connection},
                  {mfargs,{mc_connection,start_link,undefined}},
                  {restart_type,temporary},
                  {shutdown,brutal_kill},
                  {child_type,worker}]

=========================CRASH REPORT=========================
  crasher:
    initial call: couch_file:spawn_reader/2
    pid: <0.652.0>
    registered_name: []
    exception exit: {problem_reopening_file,
                        {error,system_limit},
                        {set_close_after,infinity,<0.650.0>},
                        <0.652.0>,
                        "/opt/couchbase/var/lib/couchbase/data/_replicator.couch.1",
                        10}
      in function couch_file:reader_loop/3
    ancestors: [<0.650.0>,couch_server,couch_primary_services,
                  couch_server_sup,cb_couch_sup,ns_server_cluster_sup,
                  <0.59.0>]
    messages: []
    links: [<0.650.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 377
    stack_size: 24
    reductions: 504
  neighbours:






 Comments   
Comment by Tommie McAfee [ 17/May/13 ]
the final trace here is similar to MB-8235 {problem_reopening_file
although this is a much different context
Comment by Tommie McAfee [ 17/May/13 ]
1st attempt to retry rebalance fails with:

Server error during processing: ["web request failed",
{path,"/pools/default/tasks"},
{type,exit},
{what,
{{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.3.131',
{'EXIT',
{{{kill,
{gen_server,call,
[couch_server,
{open,<<"saslbucket/master">>,[]},
infinity]}},
{gen_server,call,
['capi_set_view_manager-saslbucket',
{set_vbucket_states,
[replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
replica,replica,replica,replica,
missing,missing,missing,missing,
missing,missing,missing,missing,
missing,missing,missing,missing,
Comment by Aleksey Kondratenko [ 17/May/13 ]
Tommie, please avoid just giving raw logs. That's very very inconvenient compared to diag or cbcollectinfo.
Comment by Aleksey Kondratenko [ 17/May/13 ]
We hit fds limit.

Not clear why. I need cbcollectinfo from point of time when this happened.
Comment by Tommie McAfee [ 17/May/13 ]
Sorry alk, I've restarted test

Should I do : ulimit -n unlimited ?
Comment by Aleksey Kondratenko [ 17/May/13 ]
No. Just grab me cbcollect_info ASAP from moment it fails. That's important.
Comment by Maria McDuff [ 20/May/13 ]
Tommie, pls see Alk k's comment.
Comment by Tommie McAfee [ 21/May/13 ]
I've run once without repro, will try at least 2 more tries.




[MB-8232] [Doc'd] Multi-reader/writer Created: 09/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Karen Zeller Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Karen Zeller [ 09/May/13 ]
Sent draft for review:

Hi,

Below is the draft content on multi-reader-writers for 2.0.2 for review/input:

Release Notes:
<rnentry type="feature">

<version ver="2.0.0m"/>

<class id="db"/>

<issue type="cb" ref="MB-7518"/>


<rntext>

<para>
We now provide multiple readers and writers per data bucket for disk persistence. In the past, Couchbase
Server had only one reader/writer per data bucket. This enhancement provides significant
performance improvements for restoring data, replicating via XDCR, as well as rebalance.
For more information, see <xref linkend="couchbase-introduction-architecture-diskstorage" />.
</para>


</rntext>

</rnentry>


2. Update to existing Disk Storage Section, page 8 of attached PDF.


If I could get your input by Wednesday of next week, that would be great. Please provide on the existing ticket or as an attachment to it: http://www.couchbase.com/issues/browse/MB-8232



Regards,

Karen
Comment by Perry Krug [ 13/May/13 ]
[FIXED]Karen, there's a small bit of discussion that says the mrw feature is designed to "improve cache miss ratios". In fact it doesn't...the ratio of cache misses will not change with mrw, but the misses will be served faster and more efficiently by having multiple threads that can service them in parallel.

NEW = In order to utilize increased disk speeds and improve the read rate from disk

[FIXED: assigns] Also, a typo on page 9: "and assign each thread".
Comment by Karen Zeller [ 16/May/13 ]
Sent for final review:

Mandatory: Jin, abhinav
Comment by Perry Krug [ 21/May/13 ]
Looks good from here. Maybe make a note for yourself that future versions will expose more configuration around this feature that is not yet available as of 2.0.2




[MB-8017] [Internal Only Doc] cbrecovery tool to recovery data for missing partitions Created: 31/Jan/13  Updated: 21/May/13

Status: In Progress
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0
Fix Version/s: 2.0.2

Type: Improvement Priority: Trivial
Reporter: Steve Yen Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: 2.0.2-release-notes, PM-PRIORITIZED
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Provide a tool to help with data recovery when nodes (beyond the number of replicas) fail. Currently if a rebalance operation is performed without restoring data, it causes loss of data. To help restore data, we need to develop a tool that manages the following:

- check with partitions are missing all-together
- Block service for those partitions
- Recover the missing partitions from the backup cluster
- Start service on those partitions

This will require data to be backed up using XDCR on a separate cluster. The first version of the tool will be available in the early April time frame. It will be productized in the 2.1 release.

Addition details here:

Engineering spec: http://hub.internal.couchbase.com/confluence/display/~farshid/cbrecovery+tool.
PM requirements: http://hub.internal.couchbase.com/confluence/display/PM/CBRecovery+Tool

Ticket: http://www.couchbase.com/issues/browse/CBSE-301
Ticket: http://www.couchbase.com/issues/browse/MB-8017

Test-Plan: http://hub.internal.couchbase.com/confluence/display/QA/Cbrecovery+test+plan
Bin, design: https://github.com/couchbase/couchbase-cli/blob/2.0.2/docs/cbrecovery-design_spec.md

Need to build by 2.0.2 / April

Customer request

 Comments   
Comment by Perry Krug [ 15/Feb/13 ]
Steve, is that document ready to be shared with the customer? Do you or Dipti have an ETA on when we would have something like that (they're asking...)?

Thanks
Comment by Perry Krug [ 22/Feb/13 ]
Any update when these docs will be available to share with the customer?
Comment by Bin Cui [ 01/Apr/13 ]
http://review.couchbase.org/#/c/25426/
Comment by Dipti Borkar [ 08/Apr/13 ]
Anil, can you add a task for documentation for Karen?

Comment by Perry Krug [ 09/Apr/13 ]
Reopening for doc creation.

Where is this tool and can I point our customer at it before official release?
Comment by Bin Cui [ 09/Apr/13 ]
This tool is under same directory as cbbackup, cbrestore or cbtransfer.
Comment by Anil Kumar [ 09/Apr/13 ]
I'll open Documentation bug and assign to Karen.
Comment by Karen Zeller [ 16/Apr/13 ]
4/16 - SteveY- contact point is Bin, will be in 2.0.2
Comment by Karen Zeller [ 29/Apr/13 ]
Requested REST endpoints that cbrecover is based on 4/29 from AliaskyA

Hi,

I am looking for the endpoints and parameters for cbrecovery (recover missing buckets to a cluster from a remote cluster.) We need:

-Start endpoint
-Stop endpoint
-Endpoint to get the vBucket diff list
-Any endpoint for progress.


Thanks,

Karen

Comment by Karen Zeller [ 06/May/13 ]
Other input from Perry:

It's pretty good as it is, there are a few specific technical areas that need to be fixed but I also think that it's a bit on the hard side to read and understand which is why I suggested some larger reworks as well.


Resolving with approach:

I will be working with James, Abhinav and Bin to address the issues you have that are addressable in the ticket. Once the documentation is to their satisfaction I will close this ticket.
Comment by Karen Zeller [ 09/May/13 ]
Sent to Perry for clarification on his review comments:

Hi Perry,

Anil, James, Abhinav and I are working item-by-item on the feedback you provided on this ticket:

http://www.couchbase.com/issues/browse/MB-8017

You had asked that I document "topologies" from a meeting last week, based on your Google document here:

https://docs.google.com/document/d/1am9ifqfdIh9oYLsb8lHxGCukzCjEzCQ3mckjXpBsVLg/edit?pli=1

In the meeting I understood that to mean I should document these items:

"Various scencarios we need to think about/test/ensure: (Karen - describe scenarios and sequence)
1 - Bucket A has 0 replicas and a node in Cluster A fails
2 - Any node in Backup Cluster fails (with 0 replicas). How do we restore that data?
3 ……

We need clarification on scenario/topology 2:

This scenario look like it can be handled by rebalancing the backup cluster and replicating via XDCR or you *can* use cbrecovery? It seems as if the rebalance and replication should be the preferred method in this scenario and therefore why would we recommend cbrecovery?

You also suggest that it would be better to model a section after "our sales presentation that shows a small number of active vBuckets" on each node and "replica vBuckets" on each node…..


Could you please provide a copy of this sales presentation so we can leverage the graphics and information?

Please try to send this within the next couple days as we will need them as we finish overhaul of this section.



Thanks,

Karen
Comment by Karen Zeller [ 15/May/13 ]
Sent for FINAL Reviews.
Comment by Perry Krug [ 21/May/13 ]
Thank you Karen.

Just a few more comments:
-I think the diagram on page 114 needs a little cleaning up. The green and red lines don't really make sense and there is a selection box around one of the x'es
-Typo page 115: "support workload" should be "support the workload"
-Typo page 115: "have four node" should be "have a four node"
-Page 115 under "Multi-Node failure": This isn't exactly correct: "Each node as 1024 active and 1024 replica vBuckets:". The whole cluster will have 1024 active vbuckets and 1024 replica vbuckets, but each node will only have 256 active and 256 replica vbuckets.
-Page 115, this is not quite correct: "If you have multi- node failure on both your main and backup clusters you will experience data loss." I would say just remove this sentence. There are situations where even a single node failure on either side would result in data loss (with 0 replicas for example). I'm not sure there's much to be gained by making this statement.
-Page 115: "adequate number replicas" should be "adequate number of replicas"
-For step 3 on page 116, can we add a specific note about not rebalancing after the nodes have been added?
-There seems to be a bit of repeated statements, for example on page 116: "If you have unavailable vBuckets due to node failure you can recover these vBuckets if you set up replication via XDCR."
-This text on page 116: "These are the bucket you can recover from the remotes cluster." Should be "These are the vbuckets you can recover from the remote clusters"
-Sorry I missed this earlier, it seems that we're missing a bit of discussion around what happens to the replica vbuckets. I think we'll need engineering to comment on whether the replica vbuckets are created through the recovery tool or whether the rebalance is needed to do so.
-Along the lines of above, a brief description about why the rebalance is necessary will help the user understand what's going on.
-I think we want a section on various "scenarios" as described in the Google doc. There are a few permutations that I think would be worth covering to more closely match our customers' production deployments (i.e., restoring separate buckets within a single cluster, when one of those buckets has more or less replicas than others)


Otherwise looks really good Karen, thanks.




[MB-8321] [RN 2.0.2] windows32: Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}} Created: 20/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation, ns_server
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows32, 2.0.2-807

Operating System: Windows 32-bit

 Description   
http://qa.hq.northscale.net/job/windows32_rebalance-kv/11/consoleFull

./testrunner -i /tmp/4-w-32.ini get-cbcollect-info=True -t rebalancetests.IncrementalRebalanceOut.test_load,replica=1,do-stop=True

2013-05-19 06:24:15,085] - [rest_client:925] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.2.179%2Cns_1%4010.3.2.185%2Cns_1%4010.3.2.178%2Cns_1%4010.3.2.183
[2013-05-19 06:24:15,296] - [rest_client:929] INFO - rebalance operation started
[2013-05-19 06:24:15,334] - [rest_client:1031] INFO - rebalance percentage : 0 %
[2013-05-19 06:24:17,376] - [rest_client:1031] INFO - rebalance percentage : 0.0 %
[2013-05-19 06:24:19,438] - [rest_client:1031] INFO - rebalance percentage : 0.703125 %
[2013-05-19 06:24:21,473] - [rest_client:1031] INFO - rebalance percentage : 1.640625 %
[2013-05-19 06:24:23,499] - [rest_client:1031] INFO - rebalance percentage : 2.6953125 %
[2013-05-19 06:24:25,523] - [rest_client:1031] INFO - rebalance percentage : 3.6328125 %
[2013-05-19 06:24:26,544] - [rest_client:1031] INFO - rebalance percentage : 4.21875 %
[2013-05-19 06:24:27,572] - [rest_client:1031] INFO - rebalance percentage : 4.8046875 %
[2013-05-19 06:24:28,601] - [rest_client:1031] INFO - rebalance percentage : 5.5078125 %
[2013-05-19 06:24:30,639] - [rest_client:1031] INFO - rebalance percentage : 6.4453125 %
[2013-05-19 06:24:31,663] - [rest_client:1014] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
[2013-05-19 06:24:31,663] - [rest_client:1015] INFO - Latest logs from UI:
[2013-05-19 06:24:31,764] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 1, u'text': u'Bucket "bucket-0" loaded on node \'ns_1@10.3.2.179\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1368969856238, u'type': u'info'}
[2013-05-19 06:24:31,764] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 4, u'text': u"Node 'ns_1@10.3.2.179' saw that node 'ns_1@10.3.2.185' came up. Tags: []", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1368969855691, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 3, u'text': u'Node ns_1@10.3.2.179 joined cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1368969854738, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 4, u'text': u"Node 'ns_1@10.3.2.179' saw that node 'ns_1@10.3.2.183' came up. Tags: []", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1368969854738, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 0, u'text': u"Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: WARNING: curl error: transfer closed with outstanding read data remaining from: http://127.0.0.1:8091/pools/default/saslBucketsStreaming\nEOL on stdin. Exiting", u'shortText': u'message', u'module': u'ns_log', u'tstamp': 1368969854628, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 1, u'text': u"Couchbase Server has started on web port 8091 on node 'ns_1@10.3.2.179'.", u'shortText': u'web start ok', u'module': u'menelaus_sup', u'tstamp': 1368969854597, u'type': u'info'}
[2013-05-19 06:24:31,766] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.178', u'code': 2, u'text': u'Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}}\n', u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1368969854354, u'type': u'info'}
[2013-05-19 06:24:31,766] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.178', u'code': 0, u'text': u'<0.26161.33> exited with {mover_failed,{badmatch,{error,eaddrinuse}}}', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1368969854339, u'type': u'critical'}
[2013-05-19 06:24:31,767] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.185', u'code': 1, u'text': u'Bucket "bucket-0" loaded on node \'ns_1@10.3.2.185\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1368969850886, u'type': u'info'}
[2013-05-19 06:24:31,767] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.185', u'code': 3, u'text': u'Node ns_1@10.3.2.185 joined cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1368969850479, u'type': u'info'}


 Comments   
Comment by Andrei Baranouski [ 20/May/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.178-5192013-624-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.179-5192013-629-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.183-5192013-628-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.185-5192013-630-diag.zip

Comment by Aleksey Kondratenko [ 20/May/13 ]
Make sure fix from here: http://support.microsoft.com/kb/196271 is applied (and apparently it requires windows reboot)
Comment by Maria McDuff [ 20/May/13 ]
andrei, pls re-test after alk k's fix is confirmed to be there.


FYI -- for specific windows, need to document that this windows patch needs to be applied.
Comment by Karen Zeller [ 21/May/13 ]
Andrei: Please provide some notes here:

1) Confirm this is correct fix
2) When does fix need to be applied on machine (after fresh install, or after shutdown....)
3) Anything else?


Thanks

Karen




[MB-8210] Update the incorrect information on the Automated Index Updates Created: 07/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0, 2.0.1, 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Anil Kumar Assignee: Karen Zeller
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We need to fix this page http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-operation-autoupdate.html ?

Relevant information about how it actually works.

"Every updateInterval milliseconds it checks if index file is more than updateMinChanges behind .couch files (which is itself behind in-memory source of truth, potentially for tens of seconds). And true, it triggers view update."




[MB-7887] [Doc'd] Appends can cause large amounts of memory fragmentation in tcmalloc Created: 09/Mar/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, documentation
Affects Version/s: 2.0.1, 2.0.2
Fix Version/s: 2.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates MB-6120 Memory bloat when appending object up... Resolved

 Comments   
Comment by Maria McDuff [ 25/Mar/13 ]
bug scrub: aleksey submitted a patch to tcmalloc. will need to upgrade to new version of tcmalloc when avail.
Comment by Sharon Barr [ 18/Apr/13 ]
Can this be moved to 2.0.2?
Comment by Sharon Barr [ 23/Apr/13 ]
I assume this relates to the tcmalloc bug fix we submitted for large objects.

I am raising this to a blocker for 2.0.2, to make sure it is on the radar and prioritized accordingly.
We have some major customers that might encounter this issue.
Comment by Maria McDuff [ 23/Apr/13 ]
mike, can you check if the tcmalloc fix is already checked-in? sharon thinks it is... is it just a matter of merging the fix into the 2.0.2 branch?
if so, can we go ahead and merge the fix?

thanks.
Comment by Mike Wiederhold [ 23/Apr/13 ]
It's not merged. I already checked. It's also a matter of updating tcmalloc to a release that hasn't be tested yet.
Comment by Perry Krug [ 20/May/13 ]
Given that we recently upgraded tcmalloc for a Windows issue...do we know whether this fix got included as well?
Comment by Karen Zeller [ 21/May/13 ]
Added to 2.0.2 RN as Known issue. Workaround proposed by Chiyoung. Commenting out until release:

<rnentry type="knownissue">

<version ver="2.0.0m"/>

<class id="db"/>

<issue type="cb" ref="MB-7887"/>


<rntext>

<para>
If you continuously perform numerous appends to a document,
it may lead to memory fragmentation and overuse. This is due to an underlying
issue of inefficient memory allocation and deallocation. If you know you will
perform numerous small 1K appends totaling 5MB,
you can mitigate this problem by performing one larger append of about 1 MB and then
perform your numerous, smaller 1K appends. You can use a similar approach for other appends, depending on your total, aggregate size of appends..
</para>


</rntext>

</rnentry>




[MB-7902] [windows] Rebalance exited with reason {detected_nodes_change, {ns_node_disco_events, Created: 13/Mar/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.1
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Deepkaran Salooja Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: 2.0.1-release-notes, windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 2.0.1-179-rel

Attachments: File dist_util.beam     File nohup.out.multi-nodes-2.0.x-windows-64-rebalance-kv    
Issue Links:
Dependency
depends on MB-7658 [Windows] severe timeouts in windows ... Resolved
depends on MB-7920 [windows] Rebalancing out the last no... Closed
Duplicate
duplicates MB-7658 [Windows] severe timeouts in windows ... Resolved
duplicates MB-7949 [windows] Rebalance exited with reaso... Closed
is duplicated by MB-7891 [windows] a node became down during r... Closed
is duplicated by MB-7946 [windows] Rebalance exited with reaso... Closed

 Description   

The below test is failing while doing rebalance:
./testrunner -i win-aws.ini -t rebalancetests.IncrementalRebalanceInTests.test_load,replica=2,delete-ratio=0.6,expiry-ratio=0.2,do-stop=True

Logs show below error message:

<0.6654.65>:ns_orchestrator:handle_info:319]Rebalance exited with reason {detected_nodes_change,
                                 {ns_node_disco_events,
                                     ['ns_1@10.130.71.207',
                                      'ns_1@10.131.33.89',
                                      'ns_1@10.131.35.139',
                                      'ns_1@10.142.174.97'],

The run log of testrunner is attached. Will be attaching cbcollect_info shortly.


 Comments   
Comment by Deepkaran Salooja [ 13/Mar/13 ]
collect info:

https://s3.amazonaws.com/bugdb/jira/MB-7902/e9125b6b/cbcollect_info_10.142.174.97.zip
https://s3.amazonaws.com/bugdb/jira/MB-7902/e9125b6b/cbcollect_info_10.131.33.89.zip
https://s3.amazonaws.com/bugdb/jira/MB-7902/e9125b6b/cbcollect_info_10.131.35.139.zip
https://s3.amazonaws.com/bugdb/jira/MB-7902/e9125b6b/cbcollect_info_10.131.71.207.zip
Comment by Thuan Nguyen [ 14/Mar/13 ]
I got the same rebalance failure in physical servers during rebalance out. I will add collect info file of all nodes soon
Comment by Thuan Nguyen [ 14/Mar/13 ]
collect info of all nodes running build 2.0.1-175
rebalance failed during rebalance out node 63
https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201303/4win-nodes-201_175-reb-out-node-change-20130314-111111.tgz
Comment by Sriram Melkote [ 15/Mar/13 ]
Potentially the same issue as MB-7658
Comment by Sriram Melkote [ 16/Mar/13 ]
I too ran this many times but couldn't reproduce. But the bug is real, I've seen it happen here and elsewhere.
Comment by Ketaki Gangal [ 18/Mar/13 ]
Does rebalance eventually succeed on this test?
Comment by Sriram Melkote [ 18/Mar/13 ]
Yes - we can restart the rebalance and it will complete. It's not reproducible by running the specific test, it's bit random failure.
Comment by Sriram Melkote [ 21/Mar/13 ]
This appears to be possibly a manifestation of MB-7920. I've verified that network connectivity is never lost, and the Erlang node connectivity is itself stable over long periods by themselves. Potentially, the trigger of cluster to disconnect is the bucket deletion.
Comment by Sriram Melkote [ 22/Mar/13 ]
MB7920 does not reproduce despite many tests, so investigating this independently.
Comment by Thuan Nguyen [ 08/Apr/13 ]
Hit this bug again in build 2.0.1-185. Node down during rebalance due to net_tick_timeout.
Environment:
Create a cluster with 3 nodes
10.2.1.61
10.2.1.62
10.2.1.63
Create 2 buckets: default (14GB) and sasl (10GB)
No view or xdcr created
Load 20+ million items to both bucket until resident ratio on both bucket around 90%
Access cluster in 3 hours with spec in this page http://hub.internal.couchbase.com/confluence/pages/viewpage.action?pageId=6785119
Add node 10.2.1.64 to cluster and rebalance. Rebalance failed MB-7995
Then rebalance again, rebalance hang MB-7996
Then rebalance again. After failed several times with errors in bug MB-7658, MB-7902, rebalance success
Remove node 63. After one rebalance failed (node change error), rebalance completed
Swap rebalance (add node 63 back and remove node 61), rebalance failed due to node 62 and 63 down

Link to collect info files of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201304/4phy-win-201_185-reb-failed_node-down-net_tick_timeout_20130408-125052.tgz

Link to erl dump https://s3.amazonaws.com/packages.couchbase/erlang/windows/erl_node_63.DMP

Link to erl dump zip file https://s3.amazonaws.com/packages.couchbase/erlang/windows/erl_node_63.DMP.zip
Comment by Thuan Nguyen [ 08/Apr/13 ]
Promote it to blocker since I see it often when testing in windows physical machines
Comment by Sriram Melkote [ 26/Apr/13 ]
I've attached dist_util.beam that will retry on net_tick_timeout. Can you please replace C:\Program Files\Couchbase\Server\lib\kernel-2.14.5\ebin\dist_util.beam with the attached version, and rerun the test and see if it does anything to the issue.
Comment by Maria McDuff [ 29/Apr/13 ]
siri still debugging this issue. probl still reproducible.
Comment by Ray Chin [ 14/May/13 ]
The 4 machines have been moved to another desk near alk.
Comment by Aleksey Kondratenko [ 20/May/13 ]
With potential fix found by Siri I'm unable to reproduce this.

Fix is described here: http://support.microsoft.com/kb/196271

I recall we tried it before, reportedly without success. But lets hope it was not properly applied.
Comment by Sriram Melkote [ 21/May/13 ]
That is interesting. The reason I didn't merge it earlier was because Win2008 has increased the limit to 16k:

http://support.microsoft.com/kb/929851

Will watch the bug to see if QA confirms the fix, if yes - I can merge the change I had for the installer to set this (and reduce TIME_WAIT delay).
Comment by Aleksey Kondratenko [ 21/May/13 ]
BTW I did not do anything with TIME_WAIT on my VMs and undone it on cluster of boxes I was given. Don't know if it matters or not, but so far everything works.




[MB-8331] Need to move to 5.8.5.cb1 Created: 21/May/13  Updated: 21/May/13  Resolved: 21/May/13

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.0.2
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Sriram Melkote Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on MB-7772 [windows performance] Slow Rebalance ... Closed

 Description   
We need to move to our patched Erlang version 5.8.5.cb1 as we decided not to move to newer version of Erlang that has Filipe's changes upstream.


 Comments   
Comment by Sriram Melkote [ 21/May/13 ]
Assigning to Phil as I need IP addresses of all 2.0.2 Windows builders before I can switch the version, to ensure they have the patched VM.

Phil, please assign back to me with the IP's of the builders and I'll verify and make the switch
Comment by Sriram Melkote [ 21/May/13 ]
Got confirmation that 2.0.2 builders are same as 2.0.1 -- assigning to Bin as he has picked up the changes here: http://review.couchbase.org/26378




[MB-7862] Building from source: couchbase version is not correct Created: 05/Mar/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: build
Affects Version/s: 2.0.1
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Minor
Reporter: Iryna Mironava Assignee: Chisheng Hong
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: ubuntu 64 bit

Attachments: PNG File wizard_2_0_0.png     PNG File wizard_2_0_1.png    

 Description   
built from source using command
PRODUCT_VERSION=2.0.1 PATH=/opt/couchbase/bin:$PATH LD_RUN_PATH=/opt/couchbase/lib make PREFIX=/opt/couchbase AUTO_RECONFIG=1 'couchdb_EXTRA_OPTIONS=--with-erlang=/opt/couchbase/lib/erlang/usr/include --with-btree-implementation=native --with-v8-include=~/icu4c/source/v8/include --with-v8-lib=~/icu4c/source/v8' 'couchdb_EXTRA_MAKE_OPTIONS=' ' libmemcached_EXTRA_OPTIONS=--disable-sasl' 'memcached_EXTRA_OPTIONS=--with-libevent=/opt/couchbase LDFLAGS=-L/opt/couchbase/lib LIBS=-ltcmalloc_minimal' 'ep-engine_EXTRA_OPTIONS=--with-libevent-prefix=/opt/couchbase' 'moxi_EXTRA_MAKE_OPTIONS=LTLIBEVENT=~/Downloads/libevent-2.0.19-stable/.libs/libevent.a' all

On initializing screen build version is not correct (see screenshot): 0.0.0 DEV
For 2.0.0 appropriate version is indicated

 Comments   
Comment by Maria McDuff [ 21/May/13 ]
chisheng, can you repro this by building from src?




[MB-7177] lack of fsyncs in view engine may lead to silent index corruption Created: 13/Nov/12  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.0
Fix Version/s: .next
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Rahim Yaseen
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
SUBJ. Found out about this in discussion with Filipe about how views work.

If I understood correctly it doesn't fsync at all silently assuming that if there's valid header, then preceding data is valid as well. Which is clearly not true.

IMHO that's a massive blocker that needs to be fixed sooner rather than later.

 Comments   
Comment by Steve Yen [ 14/Nov/12 ]
bug-scrub -- assigned to yaseen
Comment by Aleksey Kondratenko [ 14/Nov/12 ]
Comment was made that this cannot be silent index corruption due to CRC-ing of all btree nodes. But my point still holds, we if there's data corruption we'll know at query time and people we'll have to experience down time to manually rebuild index.
Comment by Steve Yen [ 15/Nov/12 ]
per bug scrub
Comment by Farshid Ghods [ 26/Nov/12 ]
Deep and Iryna have tried a scenario where they rebooted the system and did not hit this issue.
Comment by Steve Yen [ 26/Nov/12 ]
to .next per bug-scrub.

QE reports that deep & iryna tried to reproduce this and couldn't yet.
Comment by Aleksey Kondratenko [ 26/Nov/12 ]
It appears that move to .next was based on same old "we cannot reproduce" logic. It appears that we continue to under-prioritize IMHO important bugs merely because it's hard to reproduce them.

Because with that logic we'll I'm sure will forever move it to next release. If we think we don't need to that, IMHO it would be better to just close it.
Comment by Filipe Manana [ 04/Jan/13 ]
Due to crc checks for every object written to a file (btree nodes), it won't certainly be silent.
Comment by Aleksey Kondratenko [ 04/Jan/13 ]
I agree. My earlier comment above (based on your's or Damien's verbal comment) has same information.

But not being silent doesn't mean we can simply close it (or IMHO downgrade or forget it). Do we know what exactly will happen if querying or updating view will suddenly detect corrupted index file ?
Comment by Andrew DePue [ 21/May/13 ]
We just ran into this, or something like it. We have a development cluster and lost power to the entire cluster at once (it was a dev cluster so we didn't have backup power). The Couchbase cluster _seemed_ to start OK, but accessing certain views would result in strange behavior... mostly timeouts without any error or any indication as to what the problem could be.
Comment by Filipe Manana [ 21/May/13 ]
If there's a corruption issue with a file (either view or database), view queries will return an explicit file_corruption error if the index file is corrupted. If the corruption is in a database file, the error is only returned in a query response if the query is of type stale=false. For all cases, the error (and a stack trace) are logged.

Did you saw such error in your case? Example:
http://www.couchbase.com/forums/thread/filecorruption-error-executing-view




[MB-8323] Crash caused by task running while a bucket is shutting down Created: 20/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Mike Wiederhold
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Mike Wiederhold [ 20/May/13 ]
http://review.couchbase.org/#/c/26426/
Comment by Maria McDuff [ 21/May/13 ]
per bug triage, bumping up to blocker.




[MB-8214] bgfetcher performance regression Created: 08/May/13  Updated: 21/May/13

Status: In Progress
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Jin Lim Assignee: Abhinav Dangeti
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
* bgfetcher condition variable for woken task is not getting correctly (hasWokenTask).
* also need to immediately move any woken task to readyQueue instead of re-pushing to futureQueue

 Comments   
Comment by Maria McDuff [ 08/May/13 ]
per bug triage, upgrading to blocker.
Comment by Jin Lim [ 10/May/13 ]
* Two toy builds, MRW28 & MRW30, have shown bgfetcher performance regression is now fixed.

* Both implemented the two fixes mentioned in the above description + different ways of binding working threads to incoming data access requests.

* QE (Abhinav) already started the 18 hours longevity litmus test

* Jin to drop MRW30 to Perf (Ronnie) for the full cycle of performance test

* Will mark this as fixed after QE and Perf validation
Comment by Maria McDuff [ 10/May/13 ]
abhinav, pls provide test result of the toy build.
Comment by Maria McDuff [ 10/May/13 ]
Jin,

abhinav will update this bug (with his test result) tonight when the run completes (~8pm onwards).
if all looks good, you can merge the fix (as agreed in today's bug mtg) then he can launch a new run with the official build this weekend.
Comment by Abhinav Dangeti [ 11/May/13 ]
Litmus run for 2.0.1-170 vs MRW28-toy (4 readers + 1 writer):
Mixed-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/178/
Mixed-dgm-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/179/
Read-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/180/
Write-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/181/
Reb-in-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/182/ -- Reb-in-time: 673s vs 845s
Reb-out-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/183/ -- Reb-out-time: 663s vs 1091s
Reb-swap-litmus: http://qa.hq.northscale.net/job/litmuses-graph-loop/184/ -- Reb-swap-time: 453s vs 640s

*Times for rebalance longer
*Set-get latencies regressed a a little on the other runs.

Wait for results of 2.0.1-170 vs MRW30-toy (4 readers + 2 writers) from Ronnie.
Comment by Maria McDuff [ 13/May/13 ]
Wayne to post results from Ronnie's run.
Comment by Wayne Siu [ 13/May/13 ]
Performance tests numbers (2.0.1-170 vs MRW30-toy)
Reb-large-2 (Reb-in): 1299s vs 4865s (-275%)
Reb-large-2-out (Reb-out) : 971s vs 5728s (-490%)
Comment by Jin Lim [ 13/May/13 ]
This is due to the same regression from rebalance. We will update from that end.
Comment by Jin Lim [ 13/May/13 ]
MB-8231 tracking the rebalance regression. Unless if this bug is still needed otherwise please mark as duplicate or resolved.
Comment by Maria McDuff [ 14/May/13 ]
per bug triage, keeping this bug opened since it is more about read i/o starvation.
Comment by Jin Lim [ 15/May/13 ]
After QE's validation on the build 803 and give us a GO, will have the perf team resume the performance test. Thanks.
Comment by Maria McDuff [ 16/May/13 ]
abhinav will post the test result tonight. test is still running.
Comment by Abhinav Dangeti [ 18/May/13 ]
Test1's results from the following spreadsheet:
https://docs.google.com/spreadsheet/ccc?key=0Ap_3tfZFLHzcdE16WnFyb09ZcE1CckZQMnN1eWRldFE#gid=0
Status: green
Comment by Maria McDuff [ 19/May/13 ]
assigning back to Jin --- QE is giving you a GO. result is good -- see link above for result.
Comment by Maria McDuff [ 21/May/13 ]
per bug triage (Jin),
Abhinav -- pls re-launch test with build 809 and report back the result in this bug.




[MB-7772] [windows performance] Slow Rebalance with 1 bucket and views Created: 19/Feb/13  Updated: 21/May/13  Resolved: 14/Mar/13

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.1
Fix Version/s: 2.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Deepkaran Salooja Assignee: Filipe Manana
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: build 2.0.1-156-rel on windows 64

<manifest>
<remote name="couchbase" fetch="git://github.com/couchbase/"/>
<remote name="membase" fetch="git://github.com/membase/"/>
<remote name="apache" fetch="git://github.com/apache/"/>
<remote name="erlang" fetch="git://github.com/erlang/"/>
<default remote="couchbase" revision="master"/>
<project name="tlm" path="tlm" revision="12abea946eafd7411273d18a10ae1f84390db3d4">
<copyfile src="Makefile.top" dest="Makefile"/>
</project>
<project name="bucket_engine" path="bucket_engine" revision="1495eb770b9735dd791c483eee7e69641f82f09c"/>
<project name="ep-engine" path="ep-engine" revision="eee2e9564ef844cf8cc435911d9e80af0dece244"/>
<project name="libconflate" path="libconflate" revision="2cc8eff8e77d497d9f03a30fafaecb85280535d6"/>
<project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/>
<project name="libvbucket" path="libvbucket" revision="026c79ae424a6daed4bb9345e86cc8fc21759b28"/>
<project name="membase-cli" path="membase-cli" revision="8a716cc15a04a1ed2a67eb10ff8fa061f14d6a58" remote="membase"/>
<project name="memcached" path="memcached" revision="e6f892cf7bd61e91a79be0d8d2c6f075d3c4ae1b" remote="membase"/>
<project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/>
<project name="ns_server" path="ns_server" revision="d8aa12706c2edac309743a502606594a00dc9929"/>
<project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/>
<project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/>
<project name="couchbase-examples" path="couchbase-examples" revision="cd9c8600589a1996c1ba6dbea9ac171b937d3379"/>
<project name="couchbase-python-client" path="couchbase-python-client" revision="006c1aa8b76f6bce11109af8a309133b57079c4c"/>
<project name="couchdb" path="couchdb" revision="c6e715704e9fd7f2ae2e7899dc027727b50c5b8a"/>
<project name="couchdbx-app" path="couchdbx-app" revision="52d20848312254148c889de517b39199765d8e22"/>
<project name="couchstore" path="couchstore" revision="146c965a5aa5da21901209122bde60a8f3a52171"/>
<project name="geocouch" path="geocouch" revision="5f5706d8a0db214aaa942ca496661938eaf2385e"/>
<project name="testrunner" path="testrunner" revision="f3ed4fdc95236086f54b4de4bee90cab8b5e3b06"/>
<project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/>
<project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/>
<project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/>
<project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/>
<project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/>
<project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/>
</manifest>

Attachments: Text File 0001-Use-share-flags-for-all-file-operations-on-Windows.patch     Zip Archive disconnects.zip     Zip Archive physical-runs.zip    
Issue Links:
Dependency
blocks MB-8331 Need to move to 5.8.5.cb1 Resolved

 Description   

The below test takes 2 hours to run with 2.0.0-1976-rel while it took 47 hours to finish with 2.0.1-156-rel build in automation run.

viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,max-dupe-result-count=10,num_nodes_to_add=3,skip_rebalance=true

The test loads 200k json docs on default bucket on single node. Creates 6 ddocs(1 view each). And then run view queries and start rebalance 1->4.

I have started the test again on the below cluster for 2.0.1. Rebalance is 19% complete in 20 hours and is running very slowly.
For 2.0, the test completed in 2 hours.

Both the clusters are live and can be used for investigation.

2.0.1 cluster (build 2.0.1-156-rel)

10.3.2.23
10.3.2.25
10.3.2.24
10.3.2.73

2.0 cluster (build 2.0.0-1976-rel)

10.3.2.135
10.3.3.40
10.3.2.133
10.3.3.166


 Comments   
Comment by Deepkaran Salooja [ 19/Feb/13 ]
Steps to run the automated test using testrunner:

./testrunner -i <ini_file> -t viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,max-dupe-result-count=10,num_nodes_to_add=3,skip_rebalance=true
Comment by Sriram Melkote [ 19/Feb/13 ]
It looks like at 11pm, the cluster saw interconnections failing, and orchestrator failed over as well. We've seen this before and are investigating the cause of the loss of connectivity in MB-7658. Applying the fixes suggested there for this cluster (prefer ipv4 to ipv6, MaxUserPort=60000 and TcpTimeWaitDelay=30) here and retrying. Machines will be restarted.
Comment by Farshid Ghods [ 21/Feb/13 ]
Deep,

can you rerun this test on build 153 and compare the prformance ?
Comment by Sriram Melkote [ 22/Feb/13 ]
Please also apply the disconnect.zip patches (attached, has readme inside) to all machines in the cluster to see if the issue is one of the known problems in Windows
Comment by Deepkaran Salooja [ 24/Feb/13 ]
Reran the test with build 153.

Without applying the disconnect.zip patches, test run time:
30 hours

After applying the disconnect.zip patches, test run time:
5 hours

Rerunning the test with patches just to confirm the run time.
Comment by Farshid Ghods [ 24/Feb/13 ]
Deep/Siri,

can we merge the disconnect.zip changes to 2.0.1 branch for windows so that we can confirm this performance boost with performance team ?
Comment by Sriram Melkote [ 25/Feb/13 ]
We can, but we'll need to add an additional screen to the installer to ask the user permission to change these system-wide settings. I'd like to request that we test this patch here and in MB-7658, and when we're satisfied that it does solve the issue, then I'll add the screen.
Comment by Deepkaran Salooja [ 25/Feb/13 ]
The 2nd run of the test with disconnect.zip patches on the same VMs is running very slow.

Its only 36% complete in 24 hours.
Live cluster: http://10.3.2.135:8091/index.html#sec=servers

Running the test in EC2 as well to confirm the behavior.
Comment by Farshid Ghods [ 25/Feb/13 ]
we are aos going to fix windows vms so that they use VMNet3 network device instead of E1000 to avoid sch network disconnects
Comment by Deepkaran Salooja [ 26/Feb/13 ]
Tried the same test on EC2 cluster with build 153 and disconnects.zip patches.

Test completion time - 1st run
7 hours

Restarted the same test again on the same cluster. Now the rebalance is only 13% complete in 7.5 hours.
So on local VMs and EC2, the 2nd time the test runs very slow as compared to 1st time.

Live cluster:
http://ec2-54-251-6-11.ap-southeast-1.compute.amazonaws.com:8091/
Comment by Sriram Melkote [ 26/Feb/13 ]
I ran this test four times successively on a physical cluster (10.2.1.61, .62, .63, .64) with spinning disks and lot of memory, several times.

The run times are stable (Run 1 - 2329s, Run 2 - 2322s, Run 3 - 2253s, Run 4 - 2329s). Logs from runs are attached.

Can you please reconfigure the VM to ensure that it has statically reserved memory, cpu and disk IO, and retry? Please remember to turn off paging entirely.
Comment by Sriram Melkote [ 06/Mar/13 ]
The logs for one clean run showing the sticking of rebalance is here:
https://s3.amazonaws.com/bugdb/MB-7772/logs.zip

The test progress is here:
https://s3.amazonaws.com/bugdb/MB-7772/testrunner.log

I restart server on all nodes and start test at 13:13. I kill everything at 13:23. Between 13:19 and 13:22, the rebalance is clearly in stuck state.

I see that at 13:17, on node 1:
Killing current compactor to speed up uninhibit_view_compaction

Wondering if that triggered the state where rebalance refuses to make progress
Comment by Aleksey Kondratenko [ 07/Mar/13 ]
Quick note. Gathering plain logs is 1/10000000th as useful as gathering diag or cbcollectinfo.
Comment by Sriram Melkote [ 07/Mar/13 ]
Sure, no problem - I'll attach cbcollectinfo shortly.
Comment by Sriram Melkote [ 07/Mar/13 ]
I've attached another run, with cbcollect. In this run, note that progress stops between 2013-03-07 20:29:24 and 2013-03-07 20:34:53.

It's a bit different from the last run, in that there's some progress after 2013-03-07 20:34:53

However, the behavior of interest exhibits during 2013-03-07 20:29:24 and 2013-03-07 20:34:53.

https://s3.amazonaws.com/bugdb/MB-7772/cbcollect_thursday.zip
https://s3.amazonaws.com/bugdb/MB-7772/cbcollect_thursday_2.zip
https://s3.amazonaws.com/bugdb/MB-7772/cbcollect_thursday_3.zip
https://s3.amazonaws.com/bugdb/MB-7772/cbcollect_thursday_4.zip
https://s3.amazonaws.com/bugdb/MB-7772/run.thursday.log
Comment by Aleksey Kondratenko [ 07/Mar/13 ]
I cannot be sure it's completely stuck but this what view compaction is "busy" doing:

     {<0.28538.1>,
      [{registered_name,[]},
       {status,waiting},
       {initial_call,{proc_lib,init_p,3}},
       {backtrace,[<<"Program counter: 0x04c2dfe8 (timer:sleep/1 + 20)">>,
                   <<"CP: 0x00000000 (invalid)">>,<<"arity = 0">>,<<>>,
                   <<"0x078b54e4 Return addr 0x052543d0 (file2:do_file_op_loop/5 + 92)">>,
                   <<"y(0) 200">>,<<>>,
                   <<"0x078b54ec Return addr 0x05264fd4 (filelib:do_fold_files2/8 + 576)">>,
                   <<"y(0) 2490">>,<<"y(1) 200">>,
                   <<"(2) [\"c:/Program Files/Couchbase/Server/var/lib/couchbase/data/.delete/b4620ac77568343">>,
                   <<"y(3) delete">>,<<"y(4) file">>,<<>>,
                   <<"0x078b5504 Return addr 0x052543f4 (file2:do_file_op_loop/5 + 128)">>,
                   <<"y(0) file">>,<<"y(1) []">>,
                   <<"y(2) #Fun<couch_file.0.53430530>">>,
                   <<"y(3) true">>,<<"y(4) \".*\"">>,
                   <<"y(5) {re_pattern,0,1,<<49 bytes>>}">>,
                   <<"y(6) \"c:/Program Files/Couchbase/Server/var/lib/couchbase/data/.delete\"">>,
                   <<"(7) [\"afcc1670a1fbaa03b9fd5f3fb0778992\",\"ae3d04b0075f1e9f4f10463b40590adf\",\"ab7007d281">>,
                   <<"y(8) []">>,<<>>,
                   <<"0x078b552c Return addr 0x0526b470 (couch_file:init_delete_dir/1 + 124)">>,
                   <<"y(0) 3890">>,<<"y(1) 200">>,
                   <<"(2) [\"c:/Program Files/Couchbase/Server/var/lib/couchbase/data/.delete\",\".*\",true,#Fun">>,
                   <<"y(3) fold_files">>,<<"y(4) filelib">>,<<>>,
                   <<"0x078b5544 Return addr 0x074dd834 (couch_view:cleanup_index_files/1 + 384)">>,
                   <<>>,
                   <<"0x078b5548 Return addr 0x0a0f6fb4 (compaction_daemon:try_to_cleanup_indexes/1 + 668)">>,
                   <<"y(0) \"C:\\\\Program Files\\\\Couchbase\\\\Server\\\\var\\\\lib\\\\couchbase\\\\data\"">>,
                   <<"y(1) []">>,<<>>,
                   <<"x078b5554 Return addr 0x0a0fc818 (compaction_daemon:'-spawn_bucket_compactor/3-fun-2-'/4 +">>,
                   <<"y(0) []">>,<<"y(1) []">>,
                   <<"(2) {db,<0.8440.0>,<0.8441.0>,nil,<<16 bytes>>,<0.8435.0>,<0.8442.0>,{db_header,11,6,<">>,
                   <<"y(3) Catch 0x0a0f6fc4 (compaction_daemon:try_to_cleanup_indexes/1 + 684)">>,
                   <<"y(4) Catch 0x0a0f70a4 (compaction_daemon:try_to_cleanup_indexes/1 + 908)">>,
                   <<"y(5) <<14 bytes>>">>,<<>>,
                   <<"0x078b5570 Return addr 0x01892b44 (proc_lib:init_p/3 + 352)">>,
                   <<"y(0) []">>,
                   <<"(1) [forced_previously_inhibited_view_compaction,{parallel_db_and_view_compaction,fals">>,
                   <<"y(2) false">>,<<"y(3) <<7 bytes>>">>,
                   <<"(4) {config,{100,18446744073709551616},{0,0},undefined,false,false,{daemon_config,30,1">>,
                   <<>>,
                   <<"0x078b5588 Return addr 0x00ae09b4 (<terminate process normally>)">>,
                   <<"y(0) []">>,
                   <<"y(1) Catch 0x01892b54 (proc_lib:init_p/3 + 368)">>,
                   <<"y(2) []">>,<<>>]},
       {error_handler,error_handler},
       {garbage_collection,[{min_bin_vheap_size,46368},
                            {min_heap_size,233},
                            {fullsweep_after,512},
                            {minor_gcs,42}]},
       {heap_size,2584},
       {total_heap_size,13530},
       {links,[<0.6704.0>]},
       {memory,54640},
       {message_queue_len,0},
       {reductions,966651},
       {trap_exit,false}]}


This is how I tracked it down. We see that last rebalance action is forced compaction. Right after Got actions log message we see some messages from compaction_daemon. It's pid is also logged. You can also spot message from it's child (different pid, same source file) about view cleanup. Then (because we finally have diag) we can inspect this processes' state. See that indeed main compaction process is linked to this child; observe child state and voila! we have it.

P.S. I took liberty to subscribe Filipe to this ticket because this issues is relevant for view engine magic.
Comment by Aleksey Kondratenko [ 07/Mar/13 ]
See above
Comment by Filipe Manana [ 07/Mar/13 ]
Consequence of MB-7569, which I'll revert on 2.0.2 and master.
The eacces errors are causing an apparently finite loop, due to nested file calls inside directory fold callbacks (which itself has eacces retries).

Nothing to do until CBD-790 or we try Erlang R16B package (32 or 64 bits) from erlang.org.
Comment by Aleksey Kondratenko [ 07/Mar/13 ]
Filipe, I've heard you have some fixes to make our code run on R16. Can you post them somewhere ?
Comment by Filipe Manana [ 07/Mar/13 ]
Alk, see last comment from CBD-790. It's just a mochiweb upgrade, the change still alows to compile and run with R14 and R15 (at least simple-test and all QE view test suites pass with it, both under R16 and R14B04).
Comment by Aleksey Kondratenko [ 07/Mar/13 ]
Cool. So lets apply it permanently then.
Comment by Sriram Melkote [ 10/Mar/13 ]
Verified that this binary built by Trond:
www.norbye.org/otp_win32_R14B04.exe

Which is R14B04 plus Filipe's patches, 32 bit build of Erlang:
https://github.com/erlang/otp/commit/03611f821da3c4f2343f1cad97bf041c9e42187f
https://github.com/erlang/otp/commit/0e02f488971b32ff9ab88a3f0cb144fe5db161b2
(exact clean patch attached to this bug, for latter)

Resolves the issue. Will apply these to builders on Monday. Likely that we'll stay with R14 + these patches for 2.0.1
Comment by Sriram Melkote [ 10/Mar/13 ]
Patch from Filipe that applies cleanly on R14 for the sharing flags issue attached
Comment by Deepkaran Salooja [ 14/Mar/13 ]
Tested with build 2.0.1-179-rel on EC2 cluster. The test takes 23 minutes to run now as compared to 7 hours on the same environment with 2.0.1-153-rel. Also the test time is consistent across multiple runs(earlier the 2nd run used to take > 40 hours).
Comment by Jin Lim [ 14/Mar/13 ]
Great! Just to confirm does the build 179 includes the patch from Filipe? Or users still have to manual installation of patch in order to have this issue addressed with say our Windows RC? Thanks.
Comment by Sriram Melkote [ 14/Mar/13 ]
Build includes the patch.




[MB-7149] cbbackup loops infinitely Created: 10/Nov/12  Updated: 21/May/13

Status: Reopened
Project: Couchbase Server
Component/s: tools
Affects Version/s: 2.0-beta-2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Mike Wiederhold Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: 2.0-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Linux


 Description   
I'm trying to backup my cluster via cbbackup. I start the backup via
/opt/couchbase/bin/cbbackup couchbase://Administrator:password@mymachine:8091 /backups/couchbase_backup_test
The backup appears to work fine and a progress bar appears, but then exceeds 100% progress and never stops! Example:
^Cinterrupted.###############################] 210.8% (26141950/12403784 msgs)
(I ctrl-C'd to kill it after 200% because it seems that this can't possibly work. Note that both the percent and the number of messages are off). I only have ~12 million items in the bucket, but it went right past that limit when backing up.
Help?

 Comments   
Comment by Steve Yen [ 12/Nov/12 ]
There's a couple things going on here that probably need documentation...

- cbbackup first contacts the server to get the # of items. But, if the cluster is changing (there are item mutations), that # of items will just be an estimate.

- then, cbbackup uses the TAP protocol to perform the backup. But, under some conditions (not all item values are resident in memory), the TAP protocol might actually send duplicate messages. That's why cbbackup reports "msgs" for progress instead of "items" in its numerator, but uses "items" in its denominator. That can lead to >100% in some cases.

Whether it leads to >200% is somewhat unexpected, but it depends on the situation and what couchbase server is doing in generating the TAP stream.
Comment by Steve Yen [ 12/Nov/12 ]
The 2.0.2 filter isn't correct at the moment. Putting this into 2.0.1 for the moment as it'll be revisited again.
Comment by Michael L [ 27/Nov/12 ]
(I am the original poster)

Changing the line above to use an IP address rather than hostname seems to have fixed the problem. My backups now run to 100% and then complete as expected.

As for the root cause: I don't believe it has anything to do with the cluster changing, since I first encountered this when trying to backup an essentially idle cluster.
Comment by Bin Cui [ 06/Dec/12 ]
Verified on a multi-node cluster that cbtransfer get the total item number correctly. And fail to reproduce the bug on the idle cluster.
Comment by Mike Wiederhold [ 06/Dec/12 ]
I asked the user on the forums for more information to reproduce this issue. I will post the information here if and when he responds.
Comment by Farshid Ghods [ 10/Dec/12 ]
deferring to 2.1 per bug scrub meeting ( Dipti & Farshid -December 7th )
Comment by Paul Janssen [ 04/Jan/13 ]
I have the same problem.
Observed progress over 32000%.
The expected msgs to save is far less than the actual msgs saved.
Restore will load the same number of msgs as were actually saved.
This will impact backup and restore time.
This will impact diskspace.
Comment by Paul Janssen [ 04/Jan/13 ]
Version info: 2.0.0 community edition (build-1723)
Comment by Paul Janssen [ 04/Jan/13 ]
Using ip-address (local,external) or hostname (localhost) does not make any difference, issue remains.
Comment by Maria McDuff [ 25/Mar/13 ]
bug scrub: Bin -- have you had a chance to take a look? pls update.
thanks.
Comment by Bin Cui [ 25/Mar/13 ]
Cannot reproduce it in house.
Comment by Maria McDuff [ 01/Apr/13 ]
per bug scrub: abhinav -- can you please repro in latest 2.0.2 build? thanks.
Comment by Abhinav Dangeti [ 01/Apr/13 ]
Cannot reproduce on 2.0.2-749-rel.
- 3 nodes, 2 buckets
[root@orange-11601 ~]# /opt/couchbase/bin/cbbackup couchbase://Administrator:password@localhost:8091 ~/backup
  [####################] 100.0% (8557766/8558766 msgs)
bucket: default, msgs transferred...
       : total | last | per sec
 batch : 10657 | 10657 | 14.7
 byte : 1131794345 | 1131794345 | 1562011.5
 msg : 8558766 | 8558766 | 11812.1
  [####################] 100.0% (2024739/2024739 msgs)
bucket: saslbucket, msgs transferred...
       : total | last | per sec
 batch : 14775 | 14775 | 86.0
 byte : 1367390279 | 1367390279 | 7955075.3
 msg : 10583505 | 10583505 | 61571.7
done

Comment by Maria McDuff [ 09/Apr/13 ]
not reproducible.
Comment by Bin Cui [ 17/May/13 ]
First, the error itself is harmless. The tool tried to transfer design docs and the source cluster doesn't contain any. Since 2.0.2, customer can specify --data-only option for cbtransfer/cbback/cbrestore tool.

But we still dont know the root cause why there is such a big difference between the initial msgs to be sent and the final msgs that are transferred.
Comment by Bin Cui [ 17/May/13 ]
One possible explanation about the deviant number:

1. the estimate number is the total active item number
2. the actual msg tranferred = total(tap_mutations + tap_delete)

For the above customer case where they have 2 million item deleted, we will transfer not only the current active items, but also any deleted items.
At again, there will be more msgs transferred if any key will have repeated set/deletions.
Comment by Perry Krug [ 18/May/13 ]
Reopening for visibility. Whether the tool is doing the "right" thing or not, there is still a major impact to the user both in terms of disk space being taken up, time being taken and perception of confidence, etc in the backup.
Comment by Maria McDuff [ 20/May/13 ]
Anil to work with Bin on customer use case.
Comment by Anil Kumar [ 20/May/13 ]
Talked to Bin, here's the update.

There are 2 issues here…
1). In case of heavy DGM – tool only captures the 'total active items' in memory and not include the items on disk. Fix for this is to consider also resident-ratio to get the current_item. As per Bin, we have already the stats and this should be low-risk fix. He will be making this fix for 2.0.2.
2). In case of Deletes on items – tool currently only captures the snapshot of 'active items' and doesn't consider any items getting deleted. Hence when it transfers it not only transfers current active items but also any deleted items which is unnecessary. To fix this we require some changes in EP-Engine side to provide stats on deleted items so that tool can smartly ignore those. Considering the timeframe for release this won't make it for 2.0.2 but we will have documentation explaining this to users. [Doc ticket on Karen ]

Comment by Abhinav Dangeti [ 20/May/13 ]
When cbbackup was run on a node with 6104188 items (~45% active resident ratio),

root@plum-009:~# /opt/couchbase/bin/cbbackup http://localhost:8091 /data/backup
  [##############################] 147.6% (9007021/6104188 msgs)
bucket: default, msgs transferred...
       : total | last | per sec
 batch : 29230 | 29230 | 33.1
 byte : 10030064183 | 10030064183 | 11365547.3
 msg : 9007021 | 9007021 | 10206.3
done
Comment by Bin Cui [ 20/May/13 ]
http://review.couchbase.org/#/c/26431/
Comment by Maria McDuff [ 21/May/13 ]
per bug triage, this fix from bin only addresses item 1 (see below):
1). In case of heavy DGM – tool only captures the 'total active items' in memory and not include the items on disk. Fix for this is to consider also resident-ratio to get the current_item. As per Bin, we have already the stats and this should be low-risk fix. He will be making this fix for 2.0.2.




[MB-8263] [system test] Erlang crash during data access phase with Mike's toy build Created: 13/May/13  Updated: 21/May/13  Resolved: 14/May/13

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Chisheng Hong Assignee: Mike Wiederhold
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-mikewied-x86_64_2.0.0-21-toy.rpm.manifest.xml

Operating System: Centos 64-bit

 Description   
cluster ip is 10.5.2.30

1.3 node cluster, create 2 buckets, default 2.5G quota and saslbucket 1.3G
2.load items into buckets to make it dgm. Around 80% resident ratio
3.After initial loading, do a lot of "gets", around 80%. 20% cache miss maximum. Total ops per sec is 3k ops/sec
During data access phase, nodes went down in "Pend" state. bucket is not accessible from UI.

Find erlang_crash.dump on 10.5.2.31 under /opt/couchbase/var/lib/couchbase.

Link to the core_dump file: https://s3.amazonaws.com/bugdb/jira/MB-8263/erl_crash.dump.05-10-2013-19:05:36.7640

 Comments   
Comment by Chisheng Hong [ 13/May/13 ]
diags link https://s3.amazonaws.com/bugdb/jira/MB-8263/3nodes_mike-210_erlang_crash__20130513-183321.tgz
Comment by Maria McDuff [ 14/May/13 ]
bumping up to blocker.
Comment by Maria McDuff [ 14/May/13 ]
per bug triage, assigning to mike.
Comment by Mike Wiederhold [ 14/May/13 ]
Alk,

I saw wait backfill determination issues in the logs which is an ep-engine issue. I will get the code in the toy build re-run once we fix that issue. There was also an erlang crash dump attached to this bug. Please take a quick look at it and then close this bug as won't fix if you don't see anything interesting.

[EDIT]: My comments here are actually incorrect. I was looking at the wrong logs. In any case the crash dump should be investiagted.
Comment by Aleksey Kondratenko [ 14/May/13 ]
It looks like crash dump is unrelated to this.

In fact Aliaksey managed to look at logs here and we found that logs are much rotated past interesting times. We believe there is indeed one subtle and rare race in ns_server which we will address. But because logs are rotated we are not sure what actually happened here.
Comment by Aleksey Kondratenko [ 14/May/13 ]
And we don't think this race could initiate any problems.
Comment by Mike Wiederhold [ 14/May/13 ]
Thanks Alk. I will create another toy build for this issue soon.
Comment by Maria McDuff [ 21/May/13 ]
mike, what's the status of this bug? is this still an issue? why is it resolved as 'wont-fix'?
Comment by Mike Wiederhold [ 21/May/13 ]
It's a toy build used for testing things that aren't in the product. We should just close this.




[MB-5673] Make a way to install Couchbase if you don't have root access Created: 25/Jun/12  Updated: 21/May/13  Resolved: 05/Jul/12

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: None
Fix Version/s: 1.8.1
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Mike Wiederhold Assignee: Steve Yen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   
If this is already done then we need to document it somewhere.

http://www.couchbase.com/forums/thread/installing-non-root-user

 Comments   
Comment by Steve Yen [ 27/Jun/12 ]
http://review.couchbase.org/17675
Comment by Steve Yen [ 05/Jul/12 ]
I've provided *.tar.gz to support for testing which folks can untar/gunzip anywhere.
Comment by Kartik v [ 23/Aug/12 ]
Hi Steve,

   Can you let us know how can we download tar.gz file you provided.


Thanks,
Kartik V
Comment by Steve Yen [ 23/Aug/12 ]
Hi Kartik, I think it's now in the hands of our support team. Please email support@couchbase.com to see where it stands and/or whether they actually wanted to release it.
Thanks,
Steve

Comment by Ben Spiller [ 28/Sep/12 ]
The resolution on the issue says 'resolved', but I can't see anything that documents how to install Couchbase if you don't have root access.

If installing the enterprise edition then building from source seems not to be an option, so it'd be great to have instructions on how to do this...




[MB-8252] Docs: How to work with data across different client languages Created: 13/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: clients, documentation
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
There are a few cases where the customer wants to use the same data from different languages and it would be helpful to provide some documentation on how to do this. There are certain things like serialization, etc to take into account.

 Comments   
Comment by Perry Krug [ 21/May/13 ]
https://plus.google.com/116265933027154117867/posts/idoAAfT2TyE




[MB-8318] memcached crashed in CouchKVStore::setVBucketState Created: 20/May/13  Updated: 21/May/13  Resolved: 21/May/13

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0.2
Fix Version/s: 2.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos64


 Description   
http://qa.hq.northscale.net/job/cen-ubu-64-2.0-new-rebalance-mixed-cluster-P1/9/consoleFull

 1.8.1 & 2.0.2-807 ( mixed cluster)
1_8_1:
10.5.2.11
10.6.2.81
10.6.2.82

2.0.2-807-rel:
10.3.121.119
10.3.121.121
10.3.121.122
10.3.121.123

on 10.3.121.121( root/couchbase):

[root@localhost tmp]# ls -la
total 1345100
drwxrwxrwt 10 root root 4096 May 20 01:09 .
drwxr-xr-x 24 root root 4096 May 18 09:40 ..
drwx------ 2 root root 4096 May 20 00:00 atop.d
drwxr-xr-x 2 root root 4096 May 18 03:24 backup
-rw------- 1 couchbase couchbase 698519552 May 19 17:13 core.memcached.14586
-rw------- 1 couchbase couchbase 947982336 May 19 13:57 core.memcached.3876
-rw-r--r-- 1 root root 129834728 May 18 00:32 couchbase-server-enterprise_x86_64_2.0.2-807-rel.rpm
drwxrwxrwt 2 root root 4096 May 18 09:40 .font-unix
-rw------- 1 root root 66 May 18 00:33 .gdm6DAMUW
srw-rw-rw- 1 root root 0 May 18 09:40 .gdm_socket
drwxrwxrwt 2 root root 4096 May 18 09:40 .ICE-unix
drwx------ 2 root root 4096 Apr 2 12:32 keyring-iUmHcT
drwx------ 2 couchbase couchbase 4096 Apr 11 2012 keyring-JYEPdw
drwx------ 2 root root 4096 May 18 09:40 vmware-root
-r--r--r-- 1 root root 11 May 18 09:40 .X0-lock
drwxrwxrwt 2 root root 4096 May 18 09:40 .X11-unix
[root@localhost tmp]# gdb /opt/couchbase/bin/memcached core.memcached.14586
GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/couchbase/bin/memcached...done.
[New Thread 15231]
[New Thread 20939]
[New Thread 20938]
[New Thread 20937]
[New Thread 15236]
[New Thread 15235]
[New Thread 15234]
[New Thread 15233]
[New Thread 15232]
[New Thread 15230]
[New Thread 14593]
[New Thread 14592]
[New Thread 14591]
[New Thread 14590]
[New Thread 14589]
[New Thread 14588]
[New Thread 14587]
[New Thread 14586]

warning: .dynamic section for "/usr/lib64/libstdc++.so.6" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations

warning: .dynamic section for "/lib64/libgcc_s.so.1" is not at the expected address

warning: difference appears to be caused by prelink, adjusting expectations
Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done.
Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
Reading symbols from /opt/couchbase/lib/libicuuc.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicuuc.so.44
Reading symbols from /opt/couchbase/lib/libicudata.so.44...(no debugging symbols found)...done.
Loaded symbols for /opt/couchbase/lib/libicudata.so.44
Reading symbols from /opt/couchbase/lib/libicui18n.so.44...done.
Loaded symbols for /opt/couchbase/lib/libicui18n.so.44

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff41d5f000
Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 6, Aborted.
#0 0x0000003018630265 in raise () from /lib64/libc.so.6
(gdb) t a a bt

Thread 18 (Thread 0x2b6944d5f220 (LWP 14586)):
#0 0x00000030186d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b69448ef576 in epoll_dispatch (base=0x13772000, tv=<value optimized out>) at epoll.c:404
#2 0x00002b69448dae44 in event_base_loop (base=0x13772000, flags=<value optimized out>) at event.c:1558
#3 0x00000000004097d6 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7926

Thread 17 (Thread 14587):
#0 0x00000030186c678b in read () from /lib64/libc.so.6
#1 0x000000301866cd57 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x000000301866d71e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x0000003018662804 in _IO_getline_info_internal () from /lib64/libc.so.6
#4 0x00000030186616a9 in fgets () from /lib64/libc.so.6
#5 0x00002b6944d60939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37
#6 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#7 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 16 (Thread 14588):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0xef0a040) at extensions/loggers/file_logger.c:368
#2 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#3 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 15 (Thread 14589):
#0 0x00000030186d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b69448ef576 in epoll_dispatch (base=0x13772500, tv=<value optimized out>) at epoll.c:404
#2 0x00002b69448dae44 in event_base_loop (base=0x13772500, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414604 in worker_libevent (arg=0xef0d900) at daemon/thread.c:301
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 14 (Thread 14590):
#0 0x00000030186d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b69448ef576 in epoll_dispatch (base=0x13772280, tv=<value optimized out>) at epoll.c:404
#2 0x00002b69448dae44 in event_base_loop (base=0x13772280, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414604 in worker_libevent (arg=0xef0d9f8) at daemon/thread.c:301
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 13 (Thread 14591):
#0 0x00000030186d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b69448ef576 in epoll_dispatch (base=0x13772c80, tv=<value optimized out>) at epoll.c:404
#2 0x00002b69448dae44 in event_base_loop (base=0x13772c80, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414604 in worker_libevent (arg=0xef0daf0) at daemon/thread.c:301
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 12 (Thread 14592):
#0 0x00000030186d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b69448ef576 in epoll_dispatch (base=0x13772a00, tv=<value optimized out>) at epoll.c:404
#2 0x00002b69448dae44 in event_base_loop (base=0x13772a00, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414604 in worker_libevent (arg=0xef0dbe8) at daemon/thread.c:301
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 11 (Thread 14593):
#0 0x00000030186d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b69448ef576 in epoll_dispatch (base=0x13772780, tv=<value optimized out>) at epoll.c:404
#2 0x00002b69448dae44 in event_base_loop (base=0x13772780, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414604 in worker_libevent (arg=0xef0dce0) at daemon/thread.c:301
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 10 (Thread 15230):
#0 0x000000301869a541 in nanosleep () from /lib64/libc.so.6
#1 0x00000030186cded4 in usleep () from /lib64/libc.so.6
#2 0x00002aaaaaf32805 in updateStatsThread (arg=0xef0a4c0) at src/memory_tracker.cc:31
#3 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 9 (Thread 15232):

#0 0x000000301920d524 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003019208e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x0000003019208cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaaaaf336fa in Mutex::acquire (this=0x137ed110) at src/mutex.cc:79
#4 0x00002aaaaaf79dc3 in lock (this=0x137ed000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:48
#5 LockHolder (this=0x137ed000, vbs=..., file_version=1, header_offset=4096, cb=...) at ./src/locks.hh:26
#6 CouchNotifier::notify_update (this=0x137ed000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:753
#7 0x00002aaaaaf70163 in CouchKVStore::setVBucketState (this=0x195f6300, vbucketId=1, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745
#8 0x00002aaaaaf71069 in CouchKVStore::snapshotVBuckets (this=0x195f6300, vbstates=Traceback (most recent call last):
  File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children
    nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype))
RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >.
std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596
#9 0x00002aaaaaf021f3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2ca5b800, priority=..., shardId=<value optimized out>) at src/ep.cc:760
#10 0x00002aaaaaf51cff in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78
#11 0x00002aaaaaf371c1 in ExecutorThread::run (this=0x1381da00) at src/scheduler.cc:153
#12 0x00002aaaaaf378dd in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
#13 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#14 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 8 (Thread 15233):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaf37051 in wait (this=0x1381d860) at src/syncobject.hh:57
#2 ExecutorThread::run (this=0x1381d860) at src/scheduler.cc:139
#3 0x00002aaaaaf378dd in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 7 (Thread 15234):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaf37051 in wait (this=0x13844ea0) at src/syncobject.hh:57
#2 ExecutorThread::run (this=0x13844ea0) at src/scheduler.cc:139
#3 0x00002aaaaaf378dd in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 6 (Thread 15235):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaf37051 in wait (this=0x13844d00) at src/syncobject.hh:57
---Type <return> to continue, or q <return> to quit---
#2 ExecutorThread::run (this=0x13844d00) at src/scheduler.cc:139
#3 0x00002aaaaaf378dd in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 5 (Thread 15236):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaf37051 in wait (this=0x13844b60) at src/syncobject.hh:57
#2 ExecutorThread::run (this=0x13844b60) at src/scheduler.cc:139
#3 0x00002aaaaaf378dd in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
#4 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 4 (Thread 20937):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaf0ebdf in wait (this=0x13770400) at src/syncobject.hh:57
#2 wait (this=0x13770400) at src/syncobject.hh:73
#3 wait (this=0x13770400) at src/tapconnmap.hh:169
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x13770400) at src/ep_engine.cc:3377
#5 0x00002aaaaaf0ecc3 in EvpNotifyPendingConns (arg=0x13770400) at src/ep_engine.cc:1153
#6 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#7 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 3 (Thread 20938):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaef31c8 in wait (this=0x1428d440, d=...) at src/syncobject.hh:57
#2 IdleTask::run (this=0x1428d440, d=...) at src/dispatcher.cc:342
#3 0x00002aaaaaef5d2a in Dispatcher::run (this=0x137b2a80) at src/dispatcher.cc:184
#4 0x00002aaaaaef64ed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28
#5 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 2 (Thread 20939):
#0 0x000000301920b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaef31c8 in wait (this=0x1428ccf0, d=...) at src/syncobject.hh:57
#2 IdleTask::run (this=0x1428ccf0, d=...) at src/dispatcher.cc:342
#3 0x00002aaaaaef5d2a in Dispatcher::run (this=0x137b2fc0) at src/dispatcher.cc:184
#4 0x00002aaaaaef64ed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28
#5 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000030186d44bd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x4651e940 (LWP 15231)):
#0 0x0000003018630265 in raise () from /lib64/libc.so.6
#1 0x0000003018631d10 in abort () from /lib64/libc.so.6
#2 0x00002aaaaaf7a779 in BinaryPacketHandler::implicitResponse (this=0x38fa) at src/couch-kvstore/couch-notifier.cc:41
#3 0x00002aaaaaf77009 in CouchNotifier::handleResponse (this=0x137ed000, res=0x137fc000) at src/couch-kvstore/couch-notifier.cc:216
#4 0x00002aaaaaf794eb in CouchNotifier::processInput (this=0x137ed000) at src/couch-kvstore/couch-notifier.cc:571
#5 0x00002aaaaaf79185 in waitOnce (this=0x137ed000) at src/couch-kvstore/couch-notifier.cc:677
#6 CouchNotifier::selectBucket (this=0x137ed000) at src/couch-kvstore/couch-notifier.cc:725
#7 0x00002aaaaaf7966f in CouchNotifier::processInput (this=0x137ed000) at src/couch-kvstore/couch-notifier.cc:608
#8 0x00002aaaaaf79f75 in waitOnce (this=0x137ed000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:677
#9 CouchNotifier::notify_update (this=0x137ed000, vbs=..., file_version=1, header_offset=4096, cb=...) at src/couch-kvstore/couch-notifier.cc:755
#10 0x00002aaaaaf70163 in CouchKVStore::setVBucketState (this=0x195f6000, vbucketId=0, vbstate=..., vb_change_type=1, newfile=96, notify=true) at src/couch-kvstore/couch-kvstore.cc:745
#11 0x00002aaaaaf71069 in CouchKVStore::snapshotVBuckets (this=0x195f6000, vbstates=Traceback (most recent call last):
  File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children
---Type <return> to continue, or q <return> to quit---
    nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype))
RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >.
std::map with 1 elements) at src/couch-kvstore/couch-kvstore.cc:596
#12 0x00002aaaaaf021f3 in EventuallyPersistentStore::snapshotVBuckets (this=0x2ca5b800, priority=..., shardId=<value optimized out>) at src/ep.cc:760
#13 0x00002aaaaaf51cff in VBSnapshotTask::run (this=<value optimized out>) at src/tasks.cc:78
#14 0x00002aaaaaf371c1 in ExecutorThread::run (this=0x1381dba0) at src/scheduler.cc:153
#15 0x00002aaaaaf378dd in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
#16 0x000000301920673d in start_thread () from /lib64/libpthread.so.0
#17 0x00000030186d44bd in clone () from /lib64/libc.so.6
(gdb) quit

 Comments   
Comment by Andrei Baranouski [ 20/May/13 ]
I see no tests failures at this time, when the core files are generated. how useful it is for you to have just that core.memcached file?
Comment by Andrei Baranouski [ 20/May/13 ]
I got collect info on other tests:
http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions-P1/210/consoleFull

./testrunner -i /tmp/rebalance_regression.ini wait_timeout=100,get-cbcollect-info=True -t swaprebalance.SwapRebalanceFailedTests.test_add_back_failed_node,replica=2,num-buckets=3,num-swap=1,keys-count=1000000,swap-orchestrator=False


root( couchbase)@10.3.121.98: /tmp/core.memcached.6368

collect_info from the test:
https://s3.amazonaws.com/bugdb/jira/MB-8318/43440a18/10.3.121.93-5202013-22-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8318/43440a18/10.3.121.98-5202013-24-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8318/43440a18/10.3.121.94-5202013-24-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8318/43440a18/10.3.121.95-5202013-25-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8318/43440a18/10.3.121.96-5202013-26-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8318/43440a18/10.3.121.97-5202013-26-diag.zip

Comment by Jin Lim [ 20/May/13 ]
Thanks. The above information is enough for us to continue debug. The crash is from an assert() from the original mc-couch notifier. I need to contact the original developers for this module to understand if it is a true error case for assert or we need a better handling of this.
Comment by Jin Lim [ 20/May/13 ]
Also I would assume this would only occurs during the tail of your tests when Couchbase is either shutting down or deleting buckets?
Comment by Jin Lim [ 20/May/13 ]
http://review.couchbase.org/#/c/26428/2, a fix got just merged to 2.0.2. Unless this reveals another mccouch issue otherwise this should address the original crash here.

Per discussion with Ketaki, I am assigning it to her so QE can coordinate a quick validation of the fix with their original test suit. Thanks much.
Comment by Ketaki Gangal [ 20/May/13 ]
Hi Andrei,

Can you repro this issue w/ the latest build? Jin has pushed a potential change on the new build.

thanks,
Ketaki
Comment by Andrei Baranouski [ 21/May/13 ]
don't see any memcached crashes on 2.0.2-809 http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions-P1/211/consoleFull





[MB-8241] Refactor set_view code Created: 10/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 2.1
Fix Version/s: 2.1
Security Level: Public

Type: Task Priority: Major
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Refactor the MapReduce specific code into its own module. The goal is to have major code sharing between the MapReduce and that Spatial Views.


 Comments   
Comment by Thuan Nguyen [ 17/May/13 ]
Integrated in github-couchdb-preview #578 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/578/])
    MB-8241: Refactor couch_set_view_updater module (Revision edfebea5c5db0338eecf6a8d0abb547829ca433b)
MB-8241: Refactor couch_set_view_group module (Revision bddf446702dfd1d949c1eb30b01e060f13615a8e)
MB-8241: Rename ViewBtreeStates to just ViewStates (Revision c76d9393d8ec5615836da966dc69a666ebce882e)
MB-8241: Refactor couch_set_view_util module (Revision 326a463af1cc6162705c867794bb4ab44c99d569)
MB-8241: Refactor couch_set_view_compactor module (Revision 3e627d2c8405ec9bf8bee1dbd0d28aa275a5abc9)

     Result = SUCCESS
vmx :
Files :
* src/couch_set_view/Makefile.am
* src/couch_set_view/src/couch_set_view_updater.erl
* src/couch_set_view/include/couch_set_view.hrl
* src/couch_set_view/src/mapreduce_view.erl

vmx :
Files :
* src/couch_set_view/src/couch_set_view_group.erl
* src/couch_set_view/src/couch_set_view.erl
* src/couch_set_view/src/mapreduce_view.erl
* src/couch_set_view/include/couch_set_view.hrl
* src/couch_set_view/src/couch_set_view_util.erl

vmx :
Files :
* src/couch_set_view/src/couch_set_view_util.erl

vmx :
Files :
* src/couch_set_view/src/couch_set_view_util.erl
* src/couch_set_view/src/mapreduce_view.erl

vmx :
Files :
* src/couch_set_view/src/mapreduce_view.erl
* src/couch_set_view/src/couch_set_view_compactor.erl
Comment by Thuan Nguyen [ 17/May/13 ]
Integrated in github-couchdb-preview #579 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/579/])
    MB-8241: Refactor couch_set_view module (Revision aed89e3d1f230f838cec5d0a8a32d5c68ce2cd50)
MB-8241: Move MapReduce View specific stuff into its own record (Revision 880e127299b389586941810d17936b88327ae5b4)
MB-8241: Make make_views_fun clearer (Revision 6da854cec35d1f8ec3757c7430f95026a92aff42)
MB-8241: Move records into common header file (Revision ffef2b2c61ff5d8e66815c2937dd2e0579ffb818)

     Result = SUCCESS
vmx :
Files :
* src/couch_set_view/test/22-compactor-cleanup.t
* src/couch_set_view/test/14-duplicated-keys-per-doc.t
* src/couch_set_view/test/13-progressive-cleanup.t
* src/couch_set_view/test/06-main-compaction.t
* src/couch_set_view/test/17-unindexable-partitions.t
* src/couch_set_view/test/26-multiple-reductions.t
* src/couch_set_view/src/couch_set_view_compactor.erl
* src/couch_set_view/test/20-debug-params.t
* src/couch_set_view/test/21-updater-cleanup.t
* src/couch_set_view/test/24-updater-add-more-passive-partitions.t
* src/couch_set_view/src/mapreduce_view.erl
* src/couch_set_view/test/15-passive-partitions.t
* src/couch_set_view/test/12-errors.t
* src/couch_set_view/test/02-old-index-cleanup.t
* src/couch_set_view/test/18-monitor-partition-updates.t
* src/couch_set_view/test/25-util-stats.t
* src/couch_set_view/test/19-compaction-retry.t
* src/couch_set_view/test/05-replicas-transfer.t
* src/couch_set_view/test/16-pending-transition.t
* src/couch_set_view/test/11-updates-cleanup-many-views.t
* src/couch_set_view/test/09-deletes-cleanup-many-views.t
* src/couch_set_view/test/10-updates-cleanup.t
* src/couch_set_view/test/08-deletes-cleanup.t
* src/couch_set_view/src/couch_set_view_http.erl
* src/couch_set_view/test/04-handle-db-deletes.t
* src/couch_set_view/src/couch_set_view.erl
* src/couch_set_view/test/07-replica-compaction.t
* src/couch_set_view/test/03-db-compaction-file-leaks.t
* src/couch_set_view/test/23-replica-group-missing.t

vmx :
Files :
* src/couch_set_view/src/couch_set_view_compactor.erl
* src/couch_set_view/test/22-compactor-cleanup.t
* src/couch_set_view/include/couch_set_view.hrl
* src/couch_set_view/test/16-pending-transition.t
* src/couch_set_view/test/08-deletes-cleanup.t
* src/couch_set_view/test/17-unindexable-partitions.t
* src/couch_set_view/src/couch_set_view_updater.erl
* src/couch_set_view/src/couch_set_view_util.erl
* src/couch_set_view/src/couch_set_view_http.erl
* src/couch_set_view/src/mapreduce_view.erl
* src/couch_set_view/test/21-updater-cleanup.t
* src/couch_set_view/test/14-duplicated-keys-per-doc.t
* src/couch_set_view/test/19-compaction-retry.t
* src/couch_set_view/test/09-deletes-cleanup-many-views.t
* src/couch_set_view/test/11-updates-cleanup-many-views.t
* src/couch_set_view/test/26-multiple-reductions.t
* src/couch_set_view/test/24-updater-add-more-passive-partitions.t
* src/couch_set_view/src/couch_set_view.erl
* src/couch_set_view/src/couch_set_view_group.erl
* src/couch_set_view/test/10-updates-cleanup.t
* src/couch_set_view/test/05-replicas-transfer.t
* src/couch_set_view/test/15-passive-partitions.t
* src/couch_set_view/src/couch_set_view_mapreduce.erl
* src/couch_set_view/test/13-progressive-cleanup.t

vmx :
Files :
* src/couch_set_view/src/mapreduce_view.erl
* src/couch_set_view/src/couch_set_view_group.erl

vmx :
Files :
* src/couch_set_view/src/couch_set_view_updater.erl
* src/couch_set_view/src/mapreduce_view.erl
* src/couch_set_view/src/couch_set_view_updater.hrl
Comment by Thuan Nguyen [ 21/May/13 ]
Integrated in github-couchdb-preview #581 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/581/])
    MB-8241: Fix broken test (Revision 3bcd3fa8cb32eb7c5015783e094d370bf7893044)

     Result = SUCCESS
vmx :
Files :
* src/couch_set_view/test/18-monitor-partition-updates.t




[MB-8322] Use couch_set_view module for index deletion Created: 20/May/13  Updated: 21/May/13

Status: Open
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: 2.1
Fix Version/s: 2.1
Security Level: Public

Type: Improvement Priority: Major
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
At the moment ns_server implements it's own code to cleanup index files. The couch_set_view module implements a similar logic. ns_server should just call the logic from couch_set_view.

 Comments   
Comment by Thuan Nguyen [ 21/May/13 ]
Integrated in github-couchdb-preview #581 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/581/])
    MB-8322: Export delete_index_dir/2 for ns_sever (Revision 990ac005ccfe26d8ecea19ea92008c9483455d24)

     Result = SUCCESS
vmx :
Files :
* src/couch_set_view/src/couch_set_view.erl




[MB-100] Bug to track trivial code merges Created: 09/Feb/10  Updated: 21/May/13

Status: Reopened
Project: Couchbase Server
Component/s: None
Affects Version/s: 1.0.3-47
Fix Version/s: .next
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Farshid Ghods Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by Thuan Nguyen [ 16/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #28 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/28/])
    MB-100: Added mixed suv 4-9 conf (Revision ff336943156210bd2a1ccbe0d88ca7d0446b8582)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-suv-4-9.conf
Comment by Thuan Nguyen [ 16/May/12 ]
Integrated in github-ep-engine-2-0 #282 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/282/])
    MB-100 Move TAP backfill scheduling log messages to warning level (Revision cbd04724f1e455f62ef50920f5065dfcdedf970b)
MB-100 Add more logs to TapProducer::setCursorToOpenCheckpoint (Revision 1469270622c14ac5660b1eb39b61851197cffbd8)
MB-100 Promote TAP VB_SET / OPAQUE log messages to warning level (Revision fc50c5c85a684be8962ef9f26249670c209360be)
MB-100 Convert TAP OPAQUE code correctly in the log message (Revision 6676dea2cd8f0303bc1ed230549dfdbb9949b47b)

     Result = SUCCESS
Chiyoung Seo :
Files :
* tapconnection.cc

Chiyoung Seo :
Files :
* tapconnection.cc

Chiyoung Seo :
Files :
* ep_engine.cc

Chiyoung Seo :
Files :
* ep_engine.cc
Comment by Thuan Nguyen [ 17/May/12 ]
Integrated in github-ep-engine-2-0 #283 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/283/])
    MB-100 Remove unused variable from BackfillDiskLoad::callback() (Revision 6e43a285b8c3159f9e959dabd07663901b89f006)

     Result = SUCCESS
Chiyoung Seo :
Files :
* backfill.cc
Comment by Thuan Nguyen [ 17/May/12 ]
Integrated in multi-nodes-18x-centos-failover-replica_3 #15 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-centos-failover-replica_3/15/])
    MB-100: Added mixed suv 4-9 conf (Revision ff336943156210bd2a1ccbe0d88ca7d0446b8582)
MB-100: refactored while loop (Revision 12b42be9b2b567aea8de77321fcf380e0ad04178)
MB-100: added mixed conf file (Revision 2e68c304441b6fe60c5a54e2357f966e5f6a7f9b)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-suv-4-9.conf

Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py

ronnie :
Files :
* conf/perf/mixed-suv-4-10.conf
Comment by Thuan Nguyen [ 17/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #29 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/29/])
    MB-100: refactored while loop (Revision 12b42be9b2b567aea8de77321fcf380e0ad04178)
MB-100: added mixed conf file (Revision 2e68c304441b6fe60c5a54e2357f966e5f6a7f9b)
MB-100: added mixed 4-11 conf (Revision 0b53f23af0232d8e52a350fedb80b9c838d2165c)

     Result = SUCCESS
Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py

ronnie :
Files :
* conf/perf/mixed-suv-4-10.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-11.conf
Comment by Thuan Nguyen [ 17/May/12 ]
Integrated in multi-nodes-18x-windows-64-install #26 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-install/26/])
    MB-100: Added mixed suv 4-9 conf (Revision ff336943156210bd2a1ccbe0d88ca7d0446b8582)
MB-100: refactored while loop (Revision 12b42be9b2b567aea8de77321fcf380e0ad04178)
MB-100: added mixed conf file (Revision 2e68c304441b6fe60c5a54e2357f966e5f6a7f9b)
MB-100: added mixed 4-11 conf (Revision 0b53f23af0232d8e52a350fedb80b9c838d2165c)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-suv-4-9.conf

Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py

ronnie :
Files :
* conf/perf/mixed-suv-4-10.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-11.conf
Comment by Thuan Nguyen [ 17/May/12 ]
Integrated in multi-nodes-18x-windows-64-rebalance-kv #37 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-rebalance-kv/37/])
    MB-100: Added mixed suv 4-9 conf (Revision ff336943156210bd2a1ccbe0d88ca7d0446b8582)
MB-100: refactored while loop (Revision 12b42be9b2b567aea8de77321fcf380e0ad04178)
MB-100: added mixed conf file (Revision 2e68c304441b6fe60c5a54e2357f966e5f6a7f9b)
MB-100: added mixed 4-11 conf (Revision 0b53f23af0232d8e52a350fedb80b9c838d2165c)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-suv-4-9.conf

Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py

ronnie :
Files :
* conf/perf/mixed-suv-4-10.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-11.conf
Comment by Thuan Nguyen [ 17/May/12 ]
Integrated in multi-nodes-18x-failover_replica_1_replica_2 #22 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-failover_replica_1_replica_2/22/])
    MB-100: Added mixed suv 4-9 conf (Revision ff336943156210bd2a1ccbe0d88ca7d0446b8582)
MB-100: refactored while loop (Revision 12b42be9b2b567aea8de77321fcf380e0ad04178)
MB-100: added mixed conf file (Revision 2e68c304441b6fe60c5a54e2357f966e5f6a7f9b)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-suv-4-9.conf

Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py

ronnie :
Files :
* conf/perf/mixed-suv-4-10.conf
Comment by Thuan Nguyen [ 18/May/12 ]
Integrated in multi-nodes-18x-failover_replica_1_replica_2 #23 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-failover_replica_1_replica_2/23/])
    MB-100: added mixed 4-11 conf (Revision 0b53f23af0232d8e52a350fedb80b9c838d2165c)
MB-100: added read suv 2-2 (Revision abbff7d50835e80f39fbf2c562e840ff057e0738)
MB-100: fixed inconsistent indents (Revision 2eed71d6a68b0cb2f377a48a414baba3f818ee43)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-suv-4-11.conf

ronnie :
Files :
* conf/perf/read-suv-2-2.conf

Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py
Comment by Thuan Nguyen [ 19/May/12 ]
Integrated in multi-nodes-18x-win-64-failover #13 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-win-64-failover/13/])
    MB-100: Added mixed suv 4-9 conf (Revision ff336943156210bd2a1ccbe0d88ca7d0446b8582)
MB-100: refactored while loop (Revision 12b42be9b2b567aea8de77321fcf380e0ad04178)
MB-100: added mixed conf file (Revision 2e68c304441b6fe60c5a54e2357f966e5f6a7f9b)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-suv-4-9.conf

Pavel.Paulau :
Files :
* pytests/performance/mcsoda.py

ronnie :
Files :
* conf/perf/mixed-suv-4-10.conf
Comment by Thuan Nguyen [ 19/May/12 ]
Integrated in multi-nodes-18x-failover_replica_1_replica_2 #24 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-failover_replica_1_replica_2/24/])
    MB-100: Added read suv conf files (Revision 6aef79871f2937e7cd96ad5e9f77fa4306a97f9f)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/read-suv-2-4.conf
* conf/perf/read-suv-2-3.conf
Comment by Thuan Nguyen [ 22/May/12 ]
Integrated in github-ep-engine-2-0 #290 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/290/])
    MB-100 Set checkpoint parameters to their previous default values (Revision d3767feae3d1cb90ad1d0722e641ed25803403fe)

     Result = SUCCESS
Chiyoung Seo :
Files :
* configuration.json
* checkpoint.hh
Comment by Thuan Nguyen [ 22/May/12 ]
Integrated in github-couchdb-preview #406 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/406/])
    MB-100 Fix argument type in function call (Revision 18c59c9de73b547ed3301ce95304f656c4ab4996)
MB-100 Remove unnecessary debug message (Revision b2e3b0b8e614f2f48d2fc8b86aed6d9fc31b5a03)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couchdb/couch_index_merger.erl

Filipe David Borba Manana :
Files :
* src/couchdb/couch_index_merger.erl
Comment by Thuan Nguyen [ 23/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #32 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/32/])
    MB-100: added mixed chevy conf files (Revision b7b09b503484f705829adfca2f9adba8243db467)
MB-100: revert comments (Revision 7d1fa6f3645b610f9adc00ed4793da3b9a3c9a2e)
MB-100: added mixed-chevy conf (Revision f0dfba422f448efbe3031204bbbc13ebdbf5ef53)
MB-100: enabled stats (Revision e3ccf0eb615d31433434bcf434721b44c82fa0c2)
MB-100: added chevy conf 4 (Revision 51fa6ab02ac28bc12ecac1fa4f90bb91bab2110e)
MB-100: added for and chevy confs (Revision 78acb1cac357d864cfe383fd272441012f51e489)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf

ronnie :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-4.conf

ronnie :
Files :
* resources/perf/chevy.ini
* resources/perf/ford.ini
Comment by Thuan Nguyen [ 23/May/12 ]
Integrated in single-node1.8.x-windows-64-smoke #34 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/34/])
    MB-100: added mixed chevy conf files (Revision b7b09b503484f705829adfca2f9adba8243db467)
MB-100: revert comments (Revision 7d1fa6f3645b610f9adc00ed4793da3b9a3c9a2e)
MB-100: added mixed-chevy conf (Revision f0dfba422f448efbe3031204bbbc13ebdbf5ef53)
MB-100: enabled stats (Revision e3ccf0eb615d31433434bcf434721b44c82fa0c2)
MB-100: added chevy conf 4 (Revision 51fa6ab02ac28bc12ecac1fa4f90bb91bab2110e)
MB-100: added for and chevy confs (Revision 78acb1cac357d864cfe383fd272441012f51e489)
MB-100: added new chevy conf file (Revision 9f8b900d94eb7ab1388025c1fa9c07086fa4403f)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf

ronnie :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-4.conf

ronnie :
Files :
* resources/perf/ford.ini
* resources/perf/chevy.ini

ronnie :
Files :
* conf/perf/mixed-chevy-5.conf
Comment by Thuan Nguyen [ 24/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #33 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/33/])
    MB-100: added new chevy conf file (Revision 9f8b900d94eb7ab1388025c1fa9c07086fa4403f)
MB-100: added chevy 6 (Revision 34304e5f9e075e5dfa7e4c0e01131738cf8c4ca6)
MB-100: added chevy conf (Revision 2ac823af6372cf2648836ae4f85d55522b3ff8d0)
MB-100: added mixed suv 4-13 (Revision 6534ea0cdc97e72d8485bc3c7f452796ed5f199f)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-5.conf

ronnie :
Files :
* conf/perf/mixed-chevy-6.conf

ronnie :
Files :
* conf/perf/mixed-chevy-7.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-13.conf
Comment by Thuan Nguyen [ 24/May/12 ]
Integrated in multi-nodes-18x-windows-64-install #28 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-install/28/])
    MB-100: added mixed chevy conf files (Revision b7b09b503484f705829adfca2f9adba8243db467)
MB-100: revert comments (Revision 7d1fa6f3645b610f9adc00ed4793da3b9a3c9a2e)
MB-100: added mixed-chevy conf (Revision f0dfba422f448efbe3031204bbbc13ebdbf5ef53)
MB-100: enabled stats (Revision e3ccf0eb615d31433434bcf434721b44c82fa0c2)
MB-100: added chevy conf 4 (Revision 51fa6ab02ac28bc12ecac1fa4f90bb91bab2110e)
MB-100: added for and chevy confs (Revision 78acb1cac357d864cfe383fd272441012f51e489)
MB-100: added new chevy conf file (Revision 9f8b900d94eb7ab1388025c1fa9c07086fa4403f)
MB-100: added chevy 6 (Revision 34304e5f9e075e5dfa7e4c0e01131738cf8c4ca6)
MB-100: added chevy conf (Revision 2ac823af6372cf2648836ae4f85d55522b3ff8d0)
MB-100: added mixed suv 4-13 (Revision 6534ea0cdc97e72d8485bc3c7f452796ed5f199f)
MB-100: fixed and improved disk stats (Revision 526ac8c45ede54255f896675824f19cf697d79b6)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf

ronnie :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-2.conf
* conf/perf/mixed-chevy-3.conf
* conf/perf/mixed-chevy-1.conf

ronnie :
Files :
* conf/perf/mixed-chevy-4.conf

ronnie :
Files :
* resources/perf/ford.ini
* resources/perf/chevy.ini

ronnie :
Files :
* conf/perf/mixed-chevy-5.conf

ronnie :
Files :
* conf/perf/mixed-chevy-6.conf

ronnie :
Files :
* conf/perf/mixed-chevy-7.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-13.conf

Pavel.Paulau :
Files :
* resources/R/ep1.R
Comment by Thuan Nguyen [ 24/May/12 ]
Integrated in single-node1.8.x-windows-64-smoke #35 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/35/])
    MB-100: added chevy 6 (Revision 34304e5f9e075e5dfa7e4c0e01131738cf8c4ca6)
MB-100: added chevy conf (Revision 2ac823af6372cf2648836ae4f85d55522b3ff8d0)
MB-100: added mixed suv 4-13 (Revision 6534ea0cdc97e72d8485bc3c7f452796ed5f199f)
MB-100: fixed and improved disk stats (Revision 526ac8c45ede54255f896675824f19cf697d79b6)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-6.conf

ronnie :
Files :
* conf/perf/mixed-chevy-7.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-13.conf

Pavel.Paulau :
Files :
* resources/R/ep1.R
Comment by Thuan Nguyen [ 24/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #34 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/34/])
    MB-100: fixed and improved disk stats (Revision 526ac8c45ede54255f896675824f19cf697d79b6)
MB-100: added warmup phase for suv client scripts (Revision 7808325bc92c242975cacb7548c8ce3698e9a4c2)
MB-100: filtered unrealistic active resident ratio. (Revision a57429699f3c54bf9e0e9cec6e9e6621fcf48ac8)
MB-100: fixed disk size comparison (Revision ba5969dab6b8cbd2de9259a567c7e2fadc1ae9ef)
MB-100: fixed missing '-l' option (Revision 2d1513b6c4a7ed7114b08187d3f8120ff6503b63)

     Result = SUCCESS
Pavel.Paulau :
Files :
* resources/R/ep1.R

ronnie :
Files :
* scripts/perf/client

ronnie :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* TestInput.py
Comment by Thuan Nguyen [ 24/May/12 ]
Integrated in single-node1.8.x-windows-64-smoke #36 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/36/])
    MB-100: added warmup phase for suv client scripts (Revision 7808325bc92c242975cacb7548c8ce3698e9a4c2)
MB-100: filtered unrealistic active resident ratio. (Revision a57429699f3c54bf9e0e9cec6e9e6621fcf48ac8)
MB-100: fixed disk size comparison (Revision ba5969dab6b8cbd2de9259a567c7e2fadc1ae9ef)
MB-100: fixed missing '-l' option (Revision 2d1513b6c4a7ed7114b08187d3f8120ff6503b63)

     Result = SUCCESS
ronnie :
Files :
* scripts/perf/client

ronnie :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* TestInput.py
Comment by Thuan Nguyen [ 25/May/12 ]
Integrated in multi-nodes-18x-failover_replica_1_replica_2 #28 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-failover_replica_1_replica_2/28/])
    MB-100: added mixed chevy conf files (Revision b7b09b503484f705829adfca2f9adba8243db467)
MB-100: revert comments (Revision 7d1fa6f3645b610f9adc00ed4793da3b9a3c9a2e)
MB-100: added mixed-chevy conf (Revision f0dfba422f448efbe3031204bbbc13ebdbf5ef53)
MB-100: enabled stats (Revision e3ccf0eb615d31433434bcf434721b44c82fa0c2)
MB-100: added chevy conf 4 (Revision 51fa6ab02ac28bc12ecac1fa4f90bb91bab2110e)
MB-100: added for and chevy confs (Revision 78acb1cac357d864cfe383fd272441012f51e489)
MB-100: added new chevy conf file (Revision 9f8b900d94eb7ab1388025c1fa9c07086fa4403f)
MB-100: added chevy 6 (Revision 34304e5f9e075e5dfa7e4c0e01131738cf8c4ca6)
MB-100: added chevy conf (Revision 2ac823af6372cf2648836ae4f85d55522b3ff8d0)
MB-100: added mixed suv 4-13 (Revision 6534ea0cdc97e72d8485bc3c7f452796ed5f199f)
MB-100: fixed and improved disk stats (Revision 526ac8c45ede54255f896675824f19cf697d79b6)
MB-100: added warmup phase for suv client scripts (Revision 7808325bc92c242975cacb7548c8ce3698e9a4c2)
MB-100: filtered unrealistic active resident ratio. (Revision a57429699f3c54bf9e0e9cec6e9e6621fcf48ac8)
MB-100: fixed disk size comparison (Revision ba5969dab6b8cbd2de9259a567c7e2fadc1ae9ef)
MB-100: fixed missing '-l' option (Revision 2d1513b6c4a7ed7114b08187d3f8120ff6503b63)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-chevy-2.conf
* conf/perf/mixed-chevy-1.conf

ronnie :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-4.conf

ronnie :
Files :
* resources/perf/chevy.ini
* resources/perf/ford.ini

ronnie :
Files :
* conf/perf/mixed-chevy-5.conf

ronnie :
Files :
* conf/perf/mixed-chevy-6.conf

ronnie :
Files :
* conf/perf/mixed-chevy-7.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-13.conf

Pavel.Paulau :
Files :
* resources/R/ep1.R

ronnie :
Files :
* scripts/perf/client

ronnie :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* TestInput.py
Comment by Thuan Nguyen [ 26/May/12 ]
Integrated in multi-nodes-online-windows-upgrade-to-181 #11 (See [http://qa.hq.northscale.net/job/multi-nodes-online-windows-upgrade-to-181/11/])
    MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)

     Result = SUCCESS
Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py
Comment by Thuan Nguyen [ 26/May/12 ]
Integrated in multi-nodes-18x-centos-failover-replica_3 #20 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-centos-failover-replica_3/20/])
    MB-100: added chevy conf 4-9 (Revision 2a6ed4f34d2c0d80004a74950b1fa8c2c850d213)
MB-100: revised chevy9 (Revision 81fdd681cf3f8ecc0b54a0f1ef7846b676a39616)
MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf
Comment by Thuan Nguyen [ 26/May/12 ]
Integrated in multi-nodes-18x-win-64-failover #15 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-win-64-failover/15/])
    MB-100: added mixed chevy conf files (Revision b7b09b503484f705829adfca2f9adba8243db467)
MB-100: revert comments (Revision 7d1fa6f3645b610f9adc00ed4793da3b9a3c9a2e)
MB-100: added mixed-chevy conf (Revision f0dfba422f448efbe3031204bbbc13ebdbf5ef53)
MB-100: enabled stats (Revision e3ccf0eb615d31433434bcf434721b44c82fa0c2)
MB-100: added chevy conf 4 (Revision 51fa6ab02ac28bc12ecac1fa4f90bb91bab2110e)
MB-100: added for and chevy confs (Revision 78acb1cac357d864cfe383fd272441012f51e489)
MB-100: added new chevy conf file (Revision 9f8b900d94eb7ab1388025c1fa9c07086fa4403f)
MB-100: added chevy 6 (Revision 34304e5f9e075e5dfa7e4c0e01131738cf8c4ca6)
MB-100: added chevy conf (Revision 2ac823af6372cf2648836ae4f85d55522b3ff8d0)
MB-100: added mixed suv 4-13 (Revision 6534ea0cdc97e72d8485bc3c7f452796ed5f199f)
MB-100: fixed and improved disk stats (Revision 526ac8c45ede54255f896675824f19cf697d79b6)
MB-100: added warmup phase for suv client scripts (Revision 7808325bc92c242975cacb7548c8ce3698e9a4c2)
MB-100: filtered unrealistic active resident ratio. (Revision a57429699f3c54bf9e0e9cec6e9e6621fcf48ac8)
MB-100: fixed disk size comparison (Revision ba5969dab6b8cbd2de9259a567c7e2fadc1ae9ef)
MB-100: fixed missing '-l' option (Revision 2d1513b6c4a7ed7114b08187d3f8120ff6503b63)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf

ronnie :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf
* conf/perf/mixed-chevy-2.conf
* conf/perf/mixed-chevy-1.conf

ronnie :
Files :
* conf/perf/mixed-chevy-4.conf

ronnie :
Files :
* resources/perf/ford.ini
* resources/perf/chevy.ini

ronnie :
Files :
* conf/perf/mixed-chevy-5.conf

ronnie :
Files :
* conf/perf/mixed-chevy-6.conf

ronnie :
Files :
* conf/perf/mixed-chevy-7.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-13.conf

Pavel.Paulau :
Files :
* resources/R/ep1.R

ronnie :
Files :
* scripts/perf/client

ronnie :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* TestInput.py
Comment by Thuan Nguyen [ 26/May/12 ]
Integrated in multi-nodes-18x-windows-64-rebalance-kv #39 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-rebalance-kv/39/])
    MB-100: added mixed chevy conf files (Revision b7b09b503484f705829adfca2f9adba8243db467)
MB-100: revert comments (Revision 7d1fa6f3645b610f9adc00ed4793da3b9a3c9a2e)
MB-100: added mixed-chevy conf (Revision f0dfba422f448efbe3031204bbbc13ebdbf5ef53)
MB-100: enabled stats (Revision e3ccf0eb615d31433434bcf434721b44c82fa0c2)
MB-100: added chevy conf 4 (Revision 51fa6ab02ac28bc12ecac1fa4f90bb91bab2110e)
MB-100: added for and chevy confs (Revision 78acb1cac357d864cfe383fd272441012f51e489)
MB-100: added new chevy conf file (Revision 9f8b900d94eb7ab1388025c1fa9c07086fa4403f)
MB-100: added chevy 6 (Revision 34304e5f9e075e5dfa7e4c0e01131738cf8c4ca6)
MB-100: added chevy conf (Revision 2ac823af6372cf2648836ae4f85d55522b3ff8d0)
MB-100: added mixed suv 4-13 (Revision 6534ea0cdc97e72d8485bc3c7f452796ed5f199f)
MB-100: fixed and improved disk stats (Revision 526ac8c45ede54255f896675824f19cf697d79b6)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf

ronnie :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf

ronnie :
Files :
* conf/perf/mixed-chevy-3.conf
* conf/perf/mixed-chevy-1.conf
* conf/perf/mixed-chevy-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-4.conf

ronnie :
Files :
* resources/perf/chevy.ini
* resources/perf/ford.ini

ronnie :
Files :
* conf/perf/mixed-chevy-5.conf

ronnie :
Files :
* conf/perf/mixed-chevy-6.conf

ronnie :
Files :
* conf/perf/mixed-chevy-7.conf

ronnie :
Files :
* conf/perf/mixed-suv-4-13.conf

Pavel.Paulau :
Files :
* resources/R/ep1.R
Comment by Thuan Nguyen [ 29/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #35 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/35/])
    MB-100: added chevy conf 4-9 (Revision 2a6ed4f34d2c0d80004a74950b1fa8c2c850d213)
MB-100: revised chevy9 (Revision 81fdd681cf3f8ecc0b54a0f1ef7846b676a39616)
MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-4-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* lib/membase/helper/cluster_helper.py
* pytests/performance/eperf.py
Comment by Thuan Nguyen [ 29/May/12 ]
Integrated in multi-nodes-18x-windows-64-install #29 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-install/29/])
    MB-100: added warmup phase for suv client scripts (Revision 7808325bc92c242975cacb7548c8ce3698e9a4c2)
MB-100: filtered unrealistic active resident ratio. (Revision a57429699f3c54bf9e0e9cec6e9e6621fcf48ac8)
MB-100: fixed disk size comparison (Revision ba5969dab6b8cbd2de9259a567c7e2fadc1ae9ef)
MB-100: fixed missing '-l' option (Revision 2d1513b6c4a7ed7114b08187d3f8120ff6503b63)
MB-100: added chevy conf 4-9 (Revision 2a6ed4f34d2c0d80004a74950b1fa8c2c850d213)
MB-100: revised chevy9 (Revision 81fdd681cf3f8ecc0b54a0f1ef7846b676a39616)
MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)

     Result = SUCCESS
ronnie :
Files :
* scripts/perf/client

ronnie :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* resources/R/ep1.R

Pavel.Paulau :
Files :
* TestInput.py

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4-2-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py
* lib/membase/helper/cluster_helper.py
Comment by Thuan Nguyen [ 29/May/12 ]
Integrated in single-node1.8.x-windows-64-smoke #37 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/37/])
    MB-100: added chevy conf 4-9 (Revision 2a6ed4f34d2c0d80004a74950b1fa8c2c850d213)
MB-100: revised chevy9 (Revision 81fdd681cf3f8ecc0b54a0f1ef7846b676a39616)
MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-4-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py
* lib/membase/helper/cluster_helper.py
Comment by Thuan Nguyen [ 29/May/12 ]
Integrated in multi-nodes-online-windows-upgrade-to-181 #12 (See [http://qa.hq.northscale.net/job/multi-nodes-online-windows-upgrade-to-181/12/])
    MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)
MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)

     Result = SUCCESS
Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py
* lib/membase/helper/cluster_helper.py

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py
Comment by Thuan Nguyen [ 29/May/12 ]
Integrated in single-node-offline-upgrade-to-181 #11 (See [http://qa.hq.northscale.net/job/single-node-offline-upgrade-to-181/11/])
    MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)
MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* lib/membase/helper/cluster_helper.py
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py
Comment by Thuan Nguyen [ 30/May/12 ]
Integrated in multi-nodes-18x-centos-failover-replica_3 #21 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-centos-failover-replica_3/21/])
    MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4-2-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* lib/membase/helper/cluster_helper.py
* pytests/performance/eperf.py
Comment by Thuan Nguyen [ 30/May/12 ]
Integrated in github-couchdb-preview #409 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/409/])
    MB-100 Fix timing issues in test/10-updates-cleanup.t (Revision 2b66ea46fd715773f0c0bdffa83e53ab6c7b7dbc)
MB-100 Set set view tests number of worker threads to 2 (Revision 0fab83ad9a8753efb1059c63609551f0622f9561)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couch_set_view/test/10-updates-cleanup.t

Filipe David Borba Manana :
Files :
* src/couch_set_view/Makefile.am
Comment by Thuan Nguyen [ 30/May/12 ]
Integrated in multi-nodes-18x-failover_replica_1_replica_2 #29 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-failover_replica_1_replica_2/29/])
    MB-100: added chevy conf 4-9 (Revision 2a6ed4f34d2c0d80004a74950b1fa8c2c850d213)
MB-100: revised chevy9 (Revision 81fdd681cf3f8ecc0b54a0f1ef7846b676a39616)
MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)

     Result = UNSTABLE
ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py
* lib/membase/helper/cluster_helper.py
Comment by Thuan Nguyen [ 30/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #36 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/36/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)
MB-100: allow custom loglevel in testrunner clients (Revision 8d0cf338c7f1f14ddaeac308eb9ff089c5a656b5)
MB-100: eperf couchdb view script. (Revision df823a1f1ce014d274a8717f6f45c7077e52e9c1)
MB-100: added mixed conf (Revision ff547a772ded165186036b41b307e6dc5857bad4)
MB-100: reversed warmup order (Revision a300bcfe5f9035abdafc0a1fba4b664434a0dff2)
MB-100: added mixed-ec2-1 (Revision b83aeeb1e8440917b9bf162ce081754517b466b8)
MB-100: added ec2 cluster ini (Revision 2a6b88c1c034927de21d8064220ad9e4626c5c47)
MB-100: prepare for ec2 cluster run (Revision 3f11ea70dfa182522c1bc8e9fcad0dff1e026e26)
MB-100: fixed bucket name (Revision 4a7e4ea5937c578173e042e95e5f26fd05d92d6b)
MB-100: added warmup conf (Revision f0ad187893a68e7860624195976e311b8094b878)
MB-100: conf file mixed warmup (Revision cd6eafc4ea860238fee307ffe6144b9c82e4be60)

     Result = SUCCESS
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* scripts/perf/client

ronnie :
Files :
* scripts/perf/view_by_test_time

ronnie :
Files :
* conf/perf/mixed-suv-4-10v2.conf

ronnie :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf
Comment by Thuan Nguyen [ 30/May/12 ]
Integrated in single-node1.8.x-windows-64-smoke #38 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/38/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)
MB-100: allow custom loglevel in testrunner clients (Revision 8d0cf338c7f1f14ddaeac308eb9ff089c5a656b5)
MB-100: eperf couchdb view script. (Revision df823a1f1ce014d274a8717f6f45c7077e52e9c1)
MB-100: added mixed conf (Revision ff547a772ded165186036b41b307e6dc5857bad4)
MB-100: reversed warmup order (Revision a300bcfe5f9035abdafc0a1fba4b664434a0dff2)
MB-100: added mixed-ec2-1 (Revision b83aeeb1e8440917b9bf162ce081754517b466b8)
MB-100: added ec2 cluster ini (Revision 2a6b88c1c034927de21d8064220ad9e4626c5c47)
MB-100: prepare for ec2 cluster run (Revision 3f11ea70dfa182522c1bc8e9fcad0dff1e026e26)
MB-100: fixed bucket name (Revision 4a7e4ea5937c578173e042e95e5f26fd05d92d6b)
MB-100: added warmup conf (Revision f0ad187893a68e7860624195976e311b8094b878)
MB-100: conf file mixed warmup (Revision cd6eafc4ea860238fee307ffe6144b9c82e4be60)

     Result = SUCCESS
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* scripts/perf/client

ronnie :
Files :
* scripts/perf/view_by_test_time

ronnie :
Files :
* conf/perf/mixed-suv-4-10v2.conf

ronnie :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf
Comment by Thuan Nguyen [ 31/May/12 ]
Integrated in multi-nodes-18x-win-64-failover #16 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-win-64-failover/16/])
    MB-100: added chevy conf 4-9 (Revision 2a6ed4f34d2c0d80004a74950b1fa8c2c850d213)
MB-100: revised chevy9 (Revision 81fdd681cf3f8ecc0b54a0f1ef7846b676a39616)
MB-100: added config file (10M for 8 ddocs/1 view) (Revision a0d22d856037e4f4ad79d43311e677c8f66c7cd6)
MB-100: added mixed chevy 8-1 conf (Revision 4a9818425ece44ec5d1999194c771cffb43b98d8)
MB-100: added extra delay for index phase (Revision 1cc982ec67ea5f628efbaea4def119367a6e423a)
MB-100: added 'heavy' config file (Revision 9fe562b8b2690b8ba0cbdd1b11a1a969be7b17c0)
MB-100: rest api for custom logging level (Revision 8dce3fa22d36e7a472d12dc19844277825593402)
MB-100: added loglevel customization in test setUp (Revision 4eff1589a3aee775b68a2ae21ad1f580c72d74d7)
MB-100: added config files with 800K queries (Revision 65b14c15e4a1c413a7df0212fcf21752fcc1c44b)
MB-100: even more config files with 800 queries (Revision 6c87c5f071b389b9f4e142af53ff0cdd66573661)
MB-100: added mixed chevy 8-2 (Revision bff21501bc147f46b45e70d33c0b97002f6750a8)
MB-100: revised chevy 8-2 (Revision b21f66269cfae56ddb95db15ec0b210abb63f874)
MB-100: moved repeating code to helper (Revision 085994361c01571df0dd475d80223592ec7f45d0)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

ronnie :
Files :
* conf/perf/mixed-chevy-9.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-1.conf

Pavel.Paulau :
Files :
* pytests/performance/eperf.py

Pavel.Paulau :
Files :
* conf/perf/evperf-workload2-heavy.conf

Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* pytests/performance/perf.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4-2-fg800.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

ronnie :
Files :
* conf/perf/mixed-chevy-8-2.conf

Pavel.Paulau :
Files :
* lib/membase/helper/cluster_helper.py
* pytests/performance/eperf.py
Comment by Thuan Nguyen [ 31/May/12 ]
Integrated in multi-nodes-18x-failover_replica_1_replica_2 #30 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-failover_replica_1_replica_2/30/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)
MB-100: allow custom loglevel in testrunner clients (Revision 8d0cf338c7f1f14ddaeac308eb9ff089c5a656b5)
MB-100: eperf couchdb view script. (Revision df823a1f1ce014d274a8717f6f45c7077e52e9c1)
MB-100: added mixed conf (Revision ff547a772ded165186036b41b307e6dc5857bad4)
MB-100: reversed warmup order (Revision a300bcfe5f9035abdafc0a1fba4b664434a0dff2)
MB-100: added mixed-ec2-1 (Revision b83aeeb1e8440917b9bf162ce081754517b466b8)
MB-100: added ec2 cluster ini (Revision 2a6b88c1c034927de21d8064220ad9e4626c5c47)
MB-100: prepare for ec2 cluster run (Revision 3f11ea70dfa182522c1bc8e9fcad0dff1e026e26)
MB-100: fixed bucket name (Revision 4a7e4ea5937c578173e042e95e5f26fd05d92d6b)
MB-100: added warmup conf (Revision f0ad187893a68e7860624195976e311b8094b878)
MB-100: conf file mixed warmup (Revision cd6eafc4ea860238fee307ffe6144b9c82e4be60)
MB-100: added conf (Revision 70fd177cafadcec3768508e458048485e9ae660d)
MB-100: removed unnecessary spec names (Revision 5dfc9628f022bd042065ae765b287d3bb4947bc5)
MB-100: moved ns_server loglevel to conf files (Revision e9122fd104044c4f4ae6bd7214fd88873a94fba9)
MB-100: fixed issue with concurrent merge (Revision 982ea73c09bfaabf7169bbc68cbef587335f689d)
MB-100: multiclient test needs shorter start delay (Revision e3f0d6d57708c7c1657bfc9b5f925f63f894a5a7)
MB-100: fixed cluster setup order (Revision 8192551984b755fac5abd5a4e4db4e6aef172d00)
MB-100: added ec2 conf (Revision c6a578ee2b9e6041c2dc3b8cf91307857ba9b440)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* scripts/perf/client

ronnie :
Files :
* scripts/perf/view_by_test_time

ronnie :
Files :
* conf/perf/mixed-suv-4-10v2.conf

ronnie :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-ec2-2.conf
* resources/perf/ec2-2.ini

Pavel.Paulau :
Files :
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-4-2-fg200.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200-10M.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-4-fg200-10M.conf
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-2-15C.conf
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4.conf

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-15C.conf

Pavel.Paulau :
Files :
* pytests/performance/perf.py
* pytests/performance/do_cluster.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-1.conf
Comment by Thuan Nguyen [ 31/May/12 ]
Integrated in multi-nodes-18x-centos-failover-replica_3 #22 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-centos-failover-replica_3/22/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)
MB-100: allow custom loglevel in testrunner clients (Revision 8d0cf338c7f1f14ddaeac308eb9ff089c5a656b5)
MB-100: eperf couchdb view script. (Revision df823a1f1ce014d274a8717f6f45c7077e52e9c1)
MB-100: added mixed conf (Revision ff547a772ded165186036b41b307e6dc5857bad4)
MB-100: reversed warmup order (Revision a300bcfe5f9035abdafc0a1fba4b664434a0dff2)
MB-100: added mixed-ec2-1 (Revision b83aeeb1e8440917b9bf162ce081754517b466b8)
MB-100: added ec2 cluster ini (Revision 2a6b88c1c034927de21d8064220ad9e4626c5c47)
MB-100: prepare for ec2 cluster run (Revision 3f11ea70dfa182522c1bc8e9fcad0dff1e026e26)
MB-100: fixed bucket name (Revision 4a7e4ea5937c578173e042e95e5f26fd05d92d6b)
MB-100: added warmup conf (Revision f0ad187893a68e7860624195976e311b8094b878)
MB-100: conf file mixed warmup (Revision cd6eafc4ea860238fee307ffe6144b9c82e4be60)
MB-100: added conf (Revision 70fd177cafadcec3768508e458048485e9ae660d)
MB-100: removed unnecessary spec names (Revision 5dfc9628f022bd042065ae765b287d3bb4947bc5)
MB-100: moved ns_server loglevel to conf files (Revision e9122fd104044c4f4ae6bd7214fd88873a94fba9)
MB-100: fixed issue with concurrent merge (Revision 982ea73c09bfaabf7169bbc68cbef587335f689d)
MB-100: multiclient test needs shorter start delay (Revision e3f0d6d57708c7c1657bfc9b5f925f63f894a5a7)
MB-100: fixed cluster setup order (Revision 8192551984b755fac5abd5a4e4db4e6aef172d00)
MB-100: added ec2 conf (Revision c6a578ee2b9e6041c2dc3b8cf91307857ba9b440)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* scripts/perf/client

ronnie :
Files :
* scripts/perf/view_by_test_time

ronnie :
Files :
* conf/perf/mixed-suv-4-10v2.conf

ronnie :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* resources/perf/ec2-2.ini
* conf/perf/mixed-ec2-2.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-1.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4-2-fg200-10M.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-4-2-15C.conf
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-fg200-10M.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-4-1.conf

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-15C.conf

Pavel.Paulau :
Files :
* pytests/performance/perf.py
* pytests/performance/do_cluster.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-1.conf
Comment by Thuan Nguyen [ 31/May/12 ]
Integrated in single-node-1.8.x-windows-64-install #37 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/37/])
    MB-100: added conf (Revision 70fd177cafadcec3768508e458048485e9ae660d)
MB-100: removed unnecessary spec names (Revision 5dfc9628f022bd042065ae765b287d3bb4947bc5)
MB-100: moved ns_server loglevel to conf files (Revision e9122fd104044c4f4ae6bd7214fd88873a94fba9)
MB-100: fixed issue with concurrent merge (Revision 982ea73c09bfaabf7169bbc68cbef587335f689d)
MB-100: multiclient test needs shorter start delay (Revision e3f0d6d57708c7c1657bfc9b5f925f63f894a5a7)
MB-100: fixed cluster setup order (Revision 8192551984b755fac5abd5a4e4db4e6aef172d00)
MB-100: added ec2 conf (Revision c6a578ee2b9e6041c2dc3b8cf91307857ba9b440)
MB-100: revised ec2 ini (Revision 787865a736ddd125f4272014e3685ecd1fb3ba16)
MB-100: parallel compaction is on by default (Revision a0e8ce1eccc9013c62741ed33c0793782eb9e5a2)

     Result = UNSTABLE
ronnie :
Files :
* resources/perf/ec2-2.ini
* conf/perf/mixed-ec2-2.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4-2-fg200-10M.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-2-15C.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-4-fg200-10M.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-4-fg800.conf

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-15C.conf

Pavel.Paulau :
Files :
* pytests/performance/do_cluster.py
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* pytests/performance/perf.py
* pytests/performance/perf_defaults.py
Comment by Thuan Nguyen [ 31/May/12 ]
Integrated in github-couchdb-preview #410 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/410/])
    MB-100 Remove unnecessary code (Revision 230469d00fe4740bd21f2c25fc4588aaae1c2412)
Revert "MB-100 Remove unnecessary code" (Revision 9e86e87dee306351c91c7538c289863bd6d1bdfb)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couch_set_view/src/couch_set_view_updater.erl

Filipe David Borba Manana :
Files :
* src/couch_set_view/src/couch_set_view_updater.erl
Comment by Thuan Nguyen [ 31/May/12 ]
Integrated in single-node1.8.x-windows-64-smoke #39 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/39/])
    MB-100: added conf (Revision 70fd177cafadcec3768508e458048485e9ae660d)
MB-100: removed unnecessary spec names (Revision 5dfc9628f022bd042065ae765b287d3bb4947bc5)
MB-100: moved ns_server loglevel to conf files (Revision e9122fd104044c4f4ae6bd7214fd88873a94fba9)
MB-100: fixed issue with concurrent merge (Revision 982ea73c09bfaabf7169bbc68cbef587335f689d)
MB-100: multiclient test needs shorter start delay (Revision e3f0d6d57708c7c1657bfc9b5f925f63f894a5a7)
MB-100: fixed cluster setup order (Revision 8192551984b755fac5abd5a4e4db4e6aef172d00)
MB-100: added ec2 conf (Revision c6a578ee2b9e6041c2dc3b8cf91307857ba9b440)
MB-100: revised ec2 ini (Revision 787865a736ddd125f4272014e3685ecd1fb3ba16)
MB-100: parallel compaction is on by default (Revision a0e8ce1eccc9013c62741ed33c0793782eb9e5a2)

     Result = SUCCESS
ronnie :
Files :
* conf/perf/mixed-ec2-2.conf
* resources/perf/ec2-2.ini

Pavel.Paulau :
Files :
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-5-1.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg200-10M.conf
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg200-10M.conf
* conf/perf/lucky8-4-2-15C.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-5-1.conf

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-15C.conf

Pavel.Paulau :
Files :
* pytests/performance/perf.py
* pytests/performance/do_cluster.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* pytests/performance/perf_defaults.py
* pytests/performance/perf.py
Comment by Thuan Nguyen [ 01/Jun/12 ]
Integrated in multi-nodes-18x-win-64-failover #17 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-win-64-failover/17/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)
MB-100: allow custom loglevel in testrunner clients (Revision 8d0cf338c7f1f14ddaeac308eb9ff089c5a656b5)
MB-100: eperf couchdb view script. (Revision df823a1f1ce014d274a8717f6f45c7077e52e9c1)
MB-100: added mixed conf (Revision ff547a772ded165186036b41b307e6dc5857bad4)
MB-100: reversed warmup order (Revision a300bcfe5f9035abdafc0a1fba4b664434a0dff2)
MB-100: added mixed-ec2-1 (Revision b83aeeb1e8440917b9bf162ce081754517b466b8)
MB-100: added ec2 cluster ini (Revision 2a6b88c1c034927de21d8064220ad9e4626c5c47)
MB-100: prepare for ec2 cluster run (Revision 3f11ea70dfa182522c1bc8e9fcad0dff1e026e26)
MB-100: fixed bucket name (Revision 4a7e4ea5937c578173e042e95e5f26fd05d92d6b)
MB-100: added warmup conf (Revision f0ad187893a68e7860624195976e311b8094b878)
MB-100: conf file mixed warmup (Revision cd6eafc4ea860238fee307ffe6144b9c82e4be60)
MB-100: added conf (Revision 70fd177cafadcec3768508e458048485e9ae660d)
MB-100: removed unnecessary spec names (Revision 5dfc9628f022bd042065ae765b287d3bb4947bc5)
MB-100: moved ns_server loglevel to conf files (Revision e9122fd104044c4f4ae6bd7214fd88873a94fba9)
MB-100: fixed issue with concurrent merge (Revision 982ea73c09bfaabf7169bbc68cbef587335f689d)
MB-100: multiclient test needs shorter start delay (Revision e3f0d6d57708c7c1657bfc9b5f925f63f894a5a7)
MB-100: fixed cluster setup order (Revision 8192551984b755fac5abd5a4e4db4e6aef172d00)
MB-100: added ec2 conf (Revision c6a578ee2b9e6041c2dc3b8cf91307857ba9b440)
MB-100: revised ec2 ini (Revision 787865a736ddd125f4272014e3685ecd1fb3ba16)
MB-100: parallel compaction is on by default (Revision a0e8ce1eccc9013c62741ed33c0793782eb9e5a2)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* scripts/perf/client

ronnie :
Files :
* scripts/perf/view_by_test_time

ronnie :
Files :
* conf/perf/mixed-suv-4-10v2.conf

ronnie :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

ronnie :
Files :
* resources/perf/ec2-1.ini
* conf/perf/mixed-ec2-1.conf

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-ec2-2.conf
* resources/perf/ec2-2.ini

Pavel.Paulau :
Files :
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-fg200.conf

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-4-2-fg200-10M.conf
* conf/perf/lucky8-4-2-15C.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-fg200-10M.conf
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-5-2.conf

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py

Pavel.Paulau :
Files :
* conf/perf/lucky8-4-2-15C.conf

Pavel.Paulau :
Files :
* pytests/performance/do_cluster.py
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

Pavel.Paulau :
Files :
* pytests/performance/perf.py
* pytests/performance/perf_defaults.py
Comment by Thuan Nguyen [ 01/Jun/12 ]
Integrated in multi-nodes-18x-windows-64-rebalance-kv #41 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-rebalance-kv/41/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* lib/membase/api/rest_client.py

Pavel.Paulau :
Files :
* lib/membase/performance/stats.py
Comment by Thuan Nguyen [ 01/Jun/12 ]
Integrated in single-node-1.8.x-windows-64-install #39 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/39/])
    MB-100: fixed typo in parameter type (Revision 44180bb5e6183804060951851d41643d1b6bf627)
MB-100: added ec2 conf (Revision 5a0c60b9ca122c17d359d945d00da21dbf736dbc)
MB-100: refactored testrunner log folders (Revision 5fadd85e36e2c91a189cef1850447482e7f0ffca)
MB-100: added 30-client reb test conf (Revision fe3818e516d02ba6828101863717913a0a2ab5df)

     Result = SUCCESS
Pavel.Paulau :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-2.conf

Pavel.Paulau :
Files :
* testrunner

ronnie :
Files :
* conf/perf/reb-1-30clients.conf
Comment by Thuan Nguyen [ 01/Jun/12 ]
Integrated in single-node1.8.x-windows-64-smoke #40 (See [http://qa.hq.northscale.net/job/single-node1.8.x-windows-64-smoke/40/])
    MB-100: fixed typo in parameter type (Revision 44180bb5e6183804060951851d41643d1b6bf627)
MB-100: added ec2 conf (Revision 5a0c60b9ca122c17d359d945d00da21dbf736dbc)
MB-100: refactored testrunner log folders (Revision 5fadd85e36e2c91a189cef1850447482e7f0ffca)
MB-100: added 30-client reb test conf (Revision fe3818e516d02ba6828101863717913a0a2ab5df)

     Result = SUCCESS
Pavel.Paulau :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-2.conf

Pavel.Paulau :
Files :
* testrunner

ronnie :
Files :
* conf/perf/reb-1-30clients.conf
Comment by Thuan Nguyen [ 02/Jun/12 ]
Integrated in github-couchdb-preview #412 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/412/])
    MB-100 Make set view tests run faster (Revision c0928adba75b944be65c21151749f5b4d5a53e32)
MB-100 Set number of test jobs to 2 (Revision 4243c8d5e64b597c7be6bb17750a5aa453b427e2)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couch_set_view/test/09-deletes-cleanup-many-views.t
* src/couch_set_view/Makefile.am
* src/couch_set_view/test/05-replicas-transfer.t
* src/couch_set_view/test/15-passive-partitions.t
* src/couch_set_view/test/13-progressive-cleanup.t
* src/couch_set_view/test/11-updates-cleanup-many-views.t
* src/couch_set_view/test/10-updates-cleanup.t
* src/couch_set_view/test/08-deletes-cleanup.t

Filipe David Borba Manana :
Files :
* src/couch_set_view/Makefile.am
Comment by Thuan Nguyen [ 03/Jun/12 ]
Integrated in multi-nodes-18x-win-64-failover #18 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-win-64-failover/18/])
    MB-100: fixed typo in parameter type (Revision 44180bb5e6183804060951851d41643d1b6bf627)
MB-100: added ec2 conf (Revision 5a0c60b9ca122c17d359d945d00da21dbf736dbc)
MB-100: refactored testrunner log folders (Revision 5fadd85e36e2c91a189cef1850447482e7f0ffca)
MB-100: added 30-client reb test conf (Revision fe3818e516d02ba6828101863717913a0a2ab5df)

     Result = UNSTABLE
Pavel.Paulau :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-2.conf

Pavel.Paulau :
Files :
* testrunner

ronnie :
Files :
* conf/perf/reb-1-30clients.conf
Comment by Thuan Nguyen [ 04/Jun/12 ]
Integrated in github-couchdb-preview #413 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/413/])
    MB-100 Fix ocassional test failure (Revision 8290164017eb5a144bb042cb5a0b649f2179ff0c)
MB-100 Cleanup exit message in mailbox (if any) (Revision 7ccaffe23e9fa37a0ccb59db01a90533a241984e)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couch_set_view/test/05-replicas-transfer.t

Filipe David Borba Manana :
Files :
* src/couchdb/couch_util.erl
Comment by Thuan Nguyen [ 05/Jun/12 ]
Integrated in single-node-1.8.x-windows-64-install #40 (See [http://qa.hq.northscale.net/job/single-node-1.8.x-windows-64-install/40/])
    MB-100: give background thread time to stop (Revision c88c2dd1b33c952672ea59f03fe1882927123aba)
MB-100: smarter semaphore for stats merge (Revision 4f888dc6a96aafacd2ca2389e86c6ff302a254fb)
MB-100: reduced probability of missing HTTP responses (Revision 2ef5461fe6eaa49fc2e98454f3883520c54dba1e)
MB-100: refactored log names (Revision 58802e04f2c83849553156550b79460877bc94b8)
MB-100: added reb confs (Revision ed20e34809ff06b72b5f11bfa6ced4b8c02e3b75)
MB-100: added ec2 conf (Revision 2e12f1400d123bcb2ee3cd70f79cc782d777585e)
MB-100: fixed missing self (Revision 23c95db4a3455a2a2c9ac29dd1639d23f898c975)
MB-100: removed obsolete argument (Revision f4c434ad68a5102305157f0bfc6ef05ae4f5fced)
MB-100: allow to use parameters from *.conf file during cluster setup (Revision bbd977c0518ebe29b14b5672e3eef6dcd7eb75e8)
MB-100: fixed one more issue with concurrent merge (Revision f3cbc49839adeb010c2f55634827d36554181cb6)
MB-100: fixed missing test_conf argument (Revision 7a3773ade8e9fa114291b20b9b64c2530e055133)

     Result = SUCCESS
pavel.paulau :
Files :
* pytests/performance/eperf.py

pavel.paulau :
Files :
* lib/membase/performance/stats.py

pavel.paulau :
Files :
* pytests/performance/cbsoda.py

pavel.paulau :
Files :
* testrunner

ronnie :
Files :
* conf/perf/reb-1-9clients.conf
* conf/perf/reb-1-15clients.conf

ronnie :
Files :
* conf/perf/mixed-ec2-2-3.conf

ronnie :
Files :
* pytests/performance/perf.py

pavel.paulau :
Files :
* scripts/perf/parent

pavel.paulau :
Files :
* pytests/performance/do_cluster.py
* scripts/perf/parent

pavel.paulau :
Files :
* lib/membase/performance/stats.py

pavel.paulau :
Files :
* scripts/perf/stats
Comment by Thuan Nguyen [ 05/Jun/12 ]
Integrated in multi-nodes-18x-windows-64-install #31 (See [http://qa.hq.northscale.net/job/multi-nodes-18x-windows-64-install/31/])
    MB-100: added new stats methods to rest api client (Revision dd99ad8502423c40655a596e266fdc0feed8fabb)
MB-100: replaced redundant code with helpers (Revision 866118474078ec93a8ec60293a7036ff7e7dd6f8)
MB-100: allow custom loglevel in testrunner clients (Revision 8d0cf338c7f1f14ddaeac308eb9ff089c5a656b5)
MB-100: eperf couchdb view script. (Revision df823a1f1ce014d274a8717f6f45c7077e52e9c1)
MB-100: added mixed conf (Revision ff547a772ded165186036b41b307e6dc5857bad4)
MB-100: reversed warmup order (Revision a300bcfe5f9035abdafc0a1fba4b664434a0dff2)
MB-100: added mixed-ec2-1 (Revision b83aeeb1e8440917b9bf162ce081754517b466b8)
MB-100: added ec2 cluster ini (Revision 2a6b88c1c034927de21d8064220ad9e4626c5c47)
MB-100: prepare for ec2 cluster run (Revision 3f11ea70dfa182522c1bc8e9fcad0dff1e026e26)
MB-100: fixed bucket name (Revision 4a7e4ea5937c578173e042e95e5f26fd05d92d6b)
MB-100: added warmup conf (Revision f0ad187893a68e7860624195976e311b8094b878)
MB-100: conf file mixed warmup (Revision cd6eafc4ea860238fee307ffe6144b9c82e4be60)
MB-100: added conf (Revision 70fd177cafadcec3768508e458048485e9ae660d)
MB-100: removed unnecessary spec names (Revision 5dfc9628f022bd042065ae765b287d3bb4947bc5)
MB-100: moved ns_server loglevel to conf files (Revision e9122fd104044c4f4ae6bd7214fd88873a94fba9)
MB-100: fixed issue with concurrent merge (Revision 982ea73c09bfaabf7169bbc68cbef587335f689d)
MB-100: multiclient test needs shorter start delay (Revision e3f0d6d57708c7c1657bfc9b5f925f63f894a5a7)
MB-100: fixed cluster setup order (Revision 8192551984b755fac5abd5a4e4db4e6aef172d00)
MB-100: added ec2 conf (Revision c6a578ee2b9e6041c2dc3b8cf91307857ba9b440)
MB-100: revised ec2 ini (Revision 787865a736ddd125f4272014e3685ecd1fb3ba16)
MB-100: parallel compaction is on by default (Revision a0e8ce1eccc9013c62741ed33c0793782eb9e5a2)
MB-100: fixed typo in parameter type (Revision 44180bb5e6183804060951851d41643d1b6bf627)
MB-100: added ec2 conf (Revision 5a0c60b9ca122c17d359d945d00da21dbf736dbc)
MB-100: refactored testrunner log folders (Revision 5fadd85e36e2c91a189cef1850447482e7f0ffca)
MB-100: added 30-client reb test conf (Revision fe3818e516d02ba6828101863717913a0a2ab5df)
MB-100: give background thread time to stop (Revision c88c2dd1b33c952672ea59f03fe1882927123aba)
MB-100: smarter semaphore for stats merge (Revision 4f888dc6a96aafacd2ca2389e86c6ff302a254fb)
MB-100: reduced probability of missing HTTP responses (Revision 2ef5461fe6eaa49fc2e98454f3883520c54dba1e)
MB-100: refactored log names (Revision 58802e04f2c83849553156550b79460877bc94b8)
MB-100: added reb confs (Revision ed20e34809ff06b72b5f11bfa6ced4b8c02e3b75)
MB-100: added ec2 conf (Revision 2e12f1400d123bcb2ee3cd70f79cc782d777585e)
MB-100: fixed missing self (Revision 23c95db4a3455a2a2c9ac29dd1639d23f898c975)
MB-100: removed obsolete argument (Revision f4c434ad68a5102305157f0bfc6ef05ae4f5fced)
MB-100: allow to use parameters from *.conf file during cluster setup (Revision bbd977c0518ebe29b14b5672e3eef6dcd7eb75e8)
MB-100: fixed one more issue with concurrent merge (Revision f3cbc49839adeb010c2f55634827d36554181cb6)
MB-100: fixed missing test_conf argument (Revision 7a3773ade8e9fa114291b20b9b64c2530e055133)

     Result = SUCCESS
pavel.paulau :
Files :
* lib/membase/api/rest_client.py

pavel.paulau :
Files :
* lib/membase/performance/stats.py

pavel.paulau :
Files :
* scripts/perf/client

ronnie :
Files :
* scripts/perf/view_by_test_time

ronnie :
Files :
* conf/perf/mixed-suv-4-10v2.conf

ronnie :
Files :
* pytests/performance/perf.py

ronnie :
Files :
* conf/perf/mixed-ec2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

ronnie :
Files :
* resources/perf/ec2-1.ini
* conf/perf/mixed-ec2-1.conf

pavel.paulau :
Files :
* lib/membase/performance/stats.py

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-warmup-4-10v2.conf

ronnie :
Files :
* conf/perf/mixed-ec2-2.conf
* resources/perf/ec2-2.ini

pavel.paulau :
Files :
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-4-2-fg200.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-5-2.conf

pavel.paulau :
Files :
* conf/perf/lucky8-4-1-fg200.conf
* conf/perf/lucky8-4-1.conf
* conf/perf/lucky8-1.conf
* conf/perf/lucky8-3.conf
* conf/perf/lucky8-4.conf
* conf/perf/lucky8-4-fg800-10M.conf
* conf/perf/lucky8-4-2-fg800.conf
* conf/perf/lucky8-4-3-fg200.conf
* conf/perf/lucky8-4-2-fg200-10M.conf
* conf/perf/lucky8-2.conf
* conf/perf/lucky8-5-2.conf
* conf/perf/lucky8-5-1-fg200.conf
* conf/perf/lucky8-5-1.conf
* conf/perf/lucky8-4-fg800.conf
* conf/perf/lucky8-6-1-fg200.conf
* conf/perf/lucky8-4-2-15C.conf
* conf/perf/lucky8-4-fg200-10M.conf
* conf/perf/lucky8-4-2-fg800-10M.conf
* conf/perf/lucky8-4-fg200.conf
* conf/perf/lucky8-4-2-fg200.conf

pavel.paulau :
Files :
* lib/membase/performance/stats.py

pavel.paulau :
Files :
* conf/perf/lucky8-4-2-15C.conf

pavel.paulau :
Files :
* pytests/performance/perf.py
* pytests/performance/do_cluster.py

ronnie :
Files :
* conf/perf/mixed-ec2-2-1.conf

ronnie :
Files :
* resources/perf/ec2-1.ini

pavel.paulau :
Files :
* pytests/performance/perf_defaults.py
* pytests/performance/perf.py

pavel.paulau :
Files :
* pytests