[MB-6714] ep-engine needs around 1 seconds to mark empty and closed checkpoint as persisted Created: 24/Sep/12 Updated: 26/Oct/12 Resolved: 25/Sep/12 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Security Level: | Public |
| Type: | Bug | Priority: | Critical |
| Reporter: | Aleksey Kondratenko | Assignee: | Chiyoung Seo |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
I'm pretty sure this was not a case up until very recently. Suspect this is caused by recent XDCR changes where we enabled closing of empty checkpoints. (BTW there's no need IMHO)
Steps to reproduce: * grab everything latest and enable consistent views * rebalance 1-2 or 1->4 * observe how slow it is * in logs of ns_server you would see a lot of this: [rebalance:info,2012-09-25T6:54:32.463,n_0@192.168.1.2:<0.18358.0>:janitor_agent:get_replication_persistence_checkpoint_id:470]default: Doing get_replication_persistence_checkpoint_id call for vbucket 127 on n_0@192.168.1.2 [ns_server:debug,2012-09-25T6:54:32.463,n_0@192.168.1.2:'janitor_agent-default':janitor_agent:handle_call:661]After creating new checkpoint here's what we have: {2,3,4} [ns_server:debug,2012-09-25T6:54:32.464,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.464,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.475,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.475,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.486,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.486,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.497,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.497,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.508,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.508,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.519,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.519,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.550,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.550,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.581,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.581,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.612,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.612,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.643,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.643,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.674,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.674,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.775,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.775,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.876,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.876,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.977,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:32.977,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.078,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.078,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.179,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.179,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.280,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.280,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.381,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.381,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2 [ns_server:debug,2012-09-25T6:54:33.482,n_3@192.168.1.2:'janitor_agent-default':janitor_agent:handle_info:677]Got done message from subprocess: <0.12508.0> (ok) P.S.: given that it'll greatly slow down testing I'd raise this to blocker, but not sure people would understand me so set this to just critical. |
| Comments |
| Comment by Chiyoung Seo [ 25/Sep/12 ] |
|
I understood what you mentioned above. I pushed a couple of changes to address this issue: http://review.couchbase.org/#/c/21067/ http://review.couchbase.org/#/c/21068/ Please reopen the bug if you still have the same issue. |
| Comment by Thuan Nguyen [ 25/Sep/12 ] |
|
Integrated in github-ep-engine-2-0 #434 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/434/]) Result = SUCCESS Chiyoung Seo : Files : * src/ep_engine.cc Chiyoung Seo : Files : * src/checkpoint.cc |
| Comment by Junyi Xie [ 26/Sep/12 ] |
|
http://review.couchbase.org/#/c/21086/
|
| Comment by Thuan Nguyen [ 27/Sep/12 ] |
|
Integrated in github-ep-engine-2-0 #435 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/435/]) Result = SUCCESS Chiyoung Seo : Files : * src/ep_engine.cc |
| Comment by Karen Zeller [ 26/Oct/12 ] |
|
RN: Couchbase Server had experienced a one second delay to mark empty
and closed checkpoings on a persistence queue. Performance improvements made to eliminate delay. |