Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 2.0
-
Fix Version/s: 2.0-beta-2
-
Component/s: ns_server
-
Security Level: Public
-
Labels:
-
Environment:centos
Description
For build 1782, rebalance took ~6800 seconds (8M workload , 2 -> 4 nodes with consistent view turned on by default).
By contrast, build 1723, rebalance took ~500 seconds for the same workload, without consistent views.
Please refer to attached screenshot.
By contrast, build 1723, rebalance took ~500 seconds for the same workload, without consistent views.
Please refer to attached screenshot.
-
- reb-1-2.loop_2.0.0-1782-rel-enterprise_2.0.0-1782-rel-enterprise_terra_Oct-01-2012_18-57-11.pdf
- 02/Oct/12 4:50 PM
- 2.26 MB
- Ronnie Sun
-
- diag3.bz2
- 04/Oct/12 10:41 AM
- 1.72 MB
- Aleksey Kondratenko
-
Hide
- diag-2.zip
- 02/Oct/12 4:46 PM
- 17.72 MB
- Ronnie Sun
-
- ns-diag-20121002144053.txt 166.31 MB
- __MACOSX/._ns-diag-20121002144053.txt 0.4 kB
- ns-diag-20121002144311.txt 68.98 MB
- __MACOSX/._ns-diag-20121002144311.txt 0.4 kB
-
- diag0.bz2
- 04/Oct/12 10:41 AM
- 1.98 MB
- Aleksey Kondratenko
-
Hide
- diag.zip
- 02/Oct/12 4:46 PM
- 16.46 MB
- Ronnie Sun
-
- ns-diag-20121002144332.txt 159.67 MB
- __MACOSX/._ns-diag-20121002144332.txt 0.4 kB
- ns-diag-20121002144351.txt 65.83 MB
- __MACOSX/._ns-diag-20121002144351.txt 0.4 kB
-
- Screen Shot 2012-10-02 at 2.33.34 PM.png
- 133 kB
- 02/Oct/12 4:46 PM
-
- reb-1782-cons-views.png
- 176 kB
- 02/Oct/12 2:22 PM
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Hide
Tommie McAfee
added a comment -
Here's what I see on 9 node cluster without indexes at build-1781: (7 million items, write-queue always at 0 before rebalancing.)
Consistent Views On
---
Rebalance out 3 nodes: 3m:16s, Rebalance in 3 nodes: 3m:3s
Consistent Views Off
---
Rebalance out 3 nodes: 2m:45s, Rebalance in 3 nodes: 1m:55s
Consistent Views On
---
Rebalance out 3 nodes: 3m:16s, Rebalance in 3 nodes: 3m:3s
Consistent Views Off
---
Rebalance out 3 nodes: 2m:45s, Rebalance in 3 nodes: 1m:55s
Show
Tommie McAfee
added a comment - Here's what I see on 9 node cluster without indexes at build-1781: (7 million items, write-queue always at 0 before rebalancing.)
Consistent Views On
---
Rebalance out 3 nodes: 3m:16s, Rebalance in 3 nodes: 3m:3s
Consistent Views Off
---
Rebalance out 3 nodes: 2m:45s, Rebalance in 3 nodes: 1m:55s
Hide
Ronnie Sun
added a comment -
build 1723 took 6926 seconds, please see the attached screenshot and diag logs.
Show
Ronnie Sun
added a comment - build 1723 took 6926 seconds, please see the attached screenshot and diag logs.
Hide
Ronnie Sun
added a comment -
perf test case:
2-4 node rebalance, with ~7M items.
Mixed workload fired from 30 clients (each client has ~300 ops/sec, so the total is ~9k) during rebalance (Read:Insert:Update:Delete Ratio = 50:4:40:6.)
Also attached the perf graph
2-4 node rebalance, with ~7M items.
Mixed workload fired from 30 clients (each client has ~300 ops/sec, so the total is ~9k) during rebalance (Read:Insert:Update:Delete Ratio = 50:4:40:6.)
Also attached the perf graph
Show
Ronnie Sun
added a comment - perf test case:
2-4 node rebalance, with ~7M items.
Mixed workload fired from 30 clients (each client has ~300 ops/sec, so the total is ~9k) during rebalance (Read:Insert:Update:Delete Ratio = 50:4:40:6.)
Also attached the perf graph
Hide
Farshid Ghods
added a comment -
more info from Andrei
I'm slightly confused. rebalance is even better with ebabled consistent view
tested on build 2.0.0-1790-rel:
./testrunner -i /tmp/andrei_rebalance.ini get-logs=True -t rebalance.rebalancein.RebalanceInTests.rebalance_in_with_ops,nodes_in=3,replicas=1,items=1000000,max_verify=100000,sasl_buckets=1,standard_buckets=1,disabled_consistent_view=False/True
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, true)
rebalancing was completed with progress: 100% in 1588.21907401 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/50/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, true)
rebalancing was completed with progress: 100% in 2004.50655818 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/52/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, false).
rebalancing was completed with progress: 100% in 1457.30416107 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/51/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, false)
rebalancing was completed with progress: 100% in 1837.2985878 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/53/console
I'm slightly confused. rebalance is even better with ebabled consistent view
tested on build 2.0.0-1790-rel:
./testrunner -i /tmp/andrei_rebalance.ini get-logs=True -t rebalance.rebalancein.RebalanceInTests.rebalance_in_with_ops,nodes_in=3,replicas=1,items=1000000,max_verify=100000,sasl_buckets=1,standard_buckets=1,disabled_consistent_view=False/True
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, true)
rebalancing was completed with progress: 100% in 1588.21907401 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/50/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, true)
rebalancing was completed with progress: 100% in 2004.50655818 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/52/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, false).
rebalancing was completed with progress: 100% in 1457.30416107 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/51/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, false)
rebalancing was completed with progress: 100% in 1837.2985878 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/53/console
Show
Farshid Ghods
added a comment - more info from Andrei
I'm slightly confused. rebalance is even better with ebabled consistent view
tested on build 2.0.0-1790-rel:
./testrunner -i /tmp/andrei_rebalance.ini get-logs=True -t rebalance.rebalancein.RebalanceInTests.rebalance_in_with_ops,nodes_in=3,replicas=1,items=1000000,max_verify=100000,sasl_buckets=1,standard_buckets=1,disabled_consistent_view=False/True
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, true)
rebalancing was completed with progress: 100% in 1588.21907401 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/50/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, true)
rebalancing was completed with progress: 100% in 2004.50655818 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/52/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, false).
rebalancing was completed with progress: 100% in 1457.30416107 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/51/console
Enabling/disabling consistent-views during rebalance: ns_config:set(index_aware_rebalance_disabled, false)
rebalancing was completed with progress: 100% in 1837.2985878 sec
http://qa.hq.northscale.net/view/All/job/andrei-rebalance-regressions/53/console
Hide
Aleksey Kondratenko
added a comment -
See above diag0 is from master node of 3->4 rebalance and diag3 is from node being rebalanced in.
I just ran the following experiment:
* 3 million small items (about 100 bytes) are loaded into bucket having 256 vbuckets in 3 nodes cluster. Everything is persisted to disk. And everything is memory-resident.
* added 4th node
* started very tiny load - ~130 sets (all of them are updates) per second
* and clicked rebalance button
No indexes are defined.
I was able to observe much slower (30 seconds versus 4 minutes) rebalance speed compared to both:
* index aware rebalance disabled
* and rebalance without any load
Note when index aware rebalance is enabled, even without indexes defined we're waiting for persistence of items on vbucket move destination node. In fact this happens twice in order to enable consistent views. We do that by creating checkpoint and waiting for it to be persisted.
In logs attached above I'm seeing that first wait for persistence is slow (we expect that as it includes backfill).
But after first checkpoint is persisted we create another one and wait for it to be persisted. Because our load is very small we know that new checkpoint can have less then 10 items. So we don't expect replicating and persisting it to take any significant time. But logs indicate that this is not the case. I.e.
[ns_server:debug,2012-10-04T8:22:07.685,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.696,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.707,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.718,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.729,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.737,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.740,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.771,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.802,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.833,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.838,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.864,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.895,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.939,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.996,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.040,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.097,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.141,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.198,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.242,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.299,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.343,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.417,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.444,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.518,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.545,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.619,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.646,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.720,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.747,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.821,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.848,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.922,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.949,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.023,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.050,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.124,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.151,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.225,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.252,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.326,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.353,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.427,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.454,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.528,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.555,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.629,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.656,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.730,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.757,n_3@10.17.23.211:'janitor_agent-default':janitor_agent:handle_info:677]Got done message from subprocess: <0.12248.2> (ok)
[ns_server:debug,2012-10-04T8:22:09.758,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.770,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.781,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.792,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.803,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.814,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.831,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.845,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.876,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.907,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.932,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.938,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.969,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.033,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.070,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.134,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.171,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.235,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.272,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.336,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.373,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.437,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.474,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.538,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.575,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.639,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.676,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.740,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.777,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.841,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.878,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.942,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.979,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.043,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.080,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.144,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.181,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.245,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.282,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.346,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.383,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.447,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.484,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.548,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.585,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.649,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.686,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.750,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.787,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.851,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.888,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.952,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.989,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.053,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.090,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.154,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.191,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.255,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.292,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.356,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.393,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.457,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.494,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.558,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.595,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.659,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.696,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.760,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.797,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.861,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
You can see that situation where we see checkpoint 35 marked as persisted and us waiting for checkpoint 36 to become persisted lasts about 5 seconds.
I just ran the following experiment:
* 3 million small items (about 100 bytes) are loaded into bucket having 256 vbuckets in 3 nodes cluster. Everything is persisted to disk. And everything is memory-resident.
* added 4th node
* started very tiny load - ~130 sets (all of them are updates) per second
* and clicked rebalance button
No indexes are defined.
I was able to observe much slower (30 seconds versus 4 minutes) rebalance speed compared to both:
* index aware rebalance disabled
* and rebalance without any load
Note when index aware rebalance is enabled, even without indexes defined we're waiting for persistence of items on vbucket move destination node. In fact this happens twice in order to enable consistent views. We do that by creating checkpoint and waiting for it to be persisted.
In logs attached above I'm seeing that first wait for persistence is slow (we expect that as it includes backfill).
But after first checkpoint is persisted we create another one and wait for it to be persisted. Because our load is very small we know that new checkpoint can have less then 10 items. So we don't expect replicating and persisting it to take any significant time. But logs indicate that this is not the case. I.e.
[ns_server:debug,2012-10-04T8:22:07.685,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.696,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.707,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.718,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.729,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.737,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.740,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.771,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.802,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.833,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.838,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.864,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.895,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.939,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.996,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.040,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.097,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.141,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.198,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.242,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.299,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.343,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.417,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.444,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.518,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.545,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.619,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.646,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.720,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.747,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.821,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.848,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.922,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.949,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.023,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.050,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.124,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.151,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.225,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.252,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.326,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.353,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.427,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.454,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.528,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.555,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.629,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.656,n_3@10.17.23.211:<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.730,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.757,n_3@10.17.23.211:'janitor_agent-default':janitor_agent:handle_info:677]Got done message from subprocess: <0.12248.2> (ok)
[ns_server:debug,2012-10-04T8:22:09.758,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.770,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.781,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.792,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.803,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.814,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.831,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.845,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.876,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.907,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.932,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.938,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.969,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.033,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.070,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.134,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.171,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.235,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.272,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.336,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.373,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.437,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.474,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.538,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.575,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.639,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.676,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.740,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.777,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.841,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.878,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.942,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.979,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.043,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.080,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.144,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.181,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.245,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.282,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.346,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.383,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.447,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.484,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.548,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.585,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.649,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.686,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.750,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.787,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.851,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.888,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.952,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.989,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.053,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.090,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.154,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.191,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.255,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.292,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.356,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.393,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.457,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.494,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.558,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.595,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.659,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.696,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.760,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.797,n_3@10.17.23.211:<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.861,n_3@10.17.23.211:<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
You can see that situation where we see checkpoint 35 marked as persisted and us waiting for checkpoint 36 to become persisted lasts about 5 seconds.
Show
Aleksey Kondratenko
added a comment - See above diag0 is from master node of 3->4 rebalance and diag3 is from node being rebalanced in.
I just ran the following experiment:
* 3 million small items (about 100 bytes) are loaded into bucket having 256 vbuckets in 3 nodes cluster. Everything is persisted to disk. And everything is memory-resident.
* added 4th node
* started very tiny load - ~130 sets (all of them are updates) per second
* and clicked rebalance button
No indexes are defined.
I was able to observe much slower (30 seconds versus 4 minutes) rebalance speed compared to both:
* index aware rebalance disabled
* and rebalance without any load
Note when index aware rebalance is enabled, even without indexes defined we're waiting for persistence of items on vbucket move destination node. In fact this happens twice in order to enable consistent views. We do that by creating checkpoint and waiting for it to be persisted.
In logs attached above I'm seeing that first wait for persistence is slow (we expect that as it includes backfill).
But after first checkpoint is persisted we create another one and wait for it to be persisted. Because our load is very small we know that new checkpoint can have less then 10 items. So we don't expect replicating and persisting it to take any significant time. But logs indicate that this is not the case. I.e.
[ns_server:debug,2012-10-04T8:22:07.685, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.696, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.707, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.718, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.729, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.737, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.740, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.771, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.802, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.833, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.838, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.864, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.895, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:07.939, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:07.996, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.040, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.097, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.141, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.198, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.242, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.299, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.343, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.417, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.444, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.518, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.545, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.619, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.646, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.720, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.747, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.821, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.848, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:08.922, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:08.949, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.023, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.050, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.124, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.151, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.225, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.252, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.326, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.353, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.427, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.454, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.528, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.555, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.629, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.656, n_3@10.17.23.211 :<0.12248.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 37 on 212, now 0
[ns_server:debug,2012-10-04T8:22:09.730, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.757, n_3@10.17.23.211 :'janitor_agent-default':janitor_agent:handle_info:677]Got done message from subprocess: <0.12248.2> (ok)
[ns_server:debug,2012-10-04T8:22:09.758, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.770, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.781, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.792, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.803, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.814, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.831, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.845, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.876, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.907, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.932, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:09.938, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:09.969, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.033, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.070, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.134, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.171, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.235, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.272, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.336, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.373, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.437, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.474, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.538, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.575, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.639, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.676, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.740, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.777, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.841, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.878, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:10.942, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:10.979, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.043, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.080, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.144, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.181, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.245, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.282, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.346, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.383, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.447, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.484, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.548, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.585, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.649, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.686, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.750, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.787, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.851, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.888, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:11.952, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:11.989, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.053, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.090, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.154, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.191, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.255, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.292, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.356, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.393, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.457, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.494, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.558, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.595, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.659, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.696, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.760, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
[ns_server:debug,2012-10-04T8:22:12.797, n_3@10.17.23.211 :<0.12283.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 38 on 212, now 37
[ns_server:debug,2012-10-04T8:22:12.861, n_3@10.17.23.211 :<0.12275.2>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 36 on 255, now 35
You can see that situation where we see checkpoint 35 marked as persisted and us waiting for checkpoint 36 to become persisted lasts about 5 seconds.
Hide
Aleksey Kondratenko
added a comment -
See my analysis above. Feel free to approach me with any questions.
Show
Aleksey Kondratenko
added a comment - See my analysis above. Feel free to approach me with any questions.
Hide
Chiyoung Seo
added a comment -
I've debugged the following cases:
(1) rebalance from one to two nodes took 20 minutes with the consistent view enabled, but without any loads from the client
(2) rebalance from one to two nodes took 2 hours with the consistence view enabled and with small loads (2K SET ops / sec) from the client
For the further debugging, I removed couchstore API calls (couchstore_save_docs, couchstore_commit) from the ep-engine flusher execution path to see if this is caused by the slow write throughput from couchstore. I saw that the rebalance with the same load (2K SET ops / sec) just took 21 minutes.
I also looked at the timings of couchstore_save_docs and couchstore_commit from the above (2) case, and saw that calling those two APIs to persist 10 -15 dirty items per vbucket takes 5 - 10ms on average, which infers 1.5. - 2K drain rate per sec. The disk write queue size usually remained in 5 - 10K, which indicates that a single vbucket takeover frequently takes 4 - 5 sec.
I need to further investigate this issue, but it seems to me that this is not caused by the ep-engine layer at this time.
(1) rebalance from one to two nodes took 20 minutes with the consistent view enabled, but without any loads from the client
(2) rebalance from one to two nodes took 2 hours with the consistence view enabled and with small loads (2K SET ops / sec) from the client
For the further debugging, I removed couchstore API calls (couchstore_save_docs, couchstore_commit) from the ep-engine flusher execution path to see if this is caused by the slow write throughput from couchstore. I saw that the rebalance with the same load (2K SET ops / sec) just took 21 minutes.
I also looked at the timings of couchstore_save_docs and couchstore_commit from the above (2) case, and saw that calling those two APIs to persist 10 -15 dirty items per vbucket takes 5 - 10ms on average, which infers 1.5. - 2K drain rate per sec. The disk write queue size usually remained in 5 - 10K, which indicates that a single vbucket takeover frequently takes 4 - 5 sec.
I need to further investigate this issue, but it seems to me that this is not caused by the ep-engine layer at this time.
Show
Chiyoung Seo
added a comment - I've debugged the following cases:
(1) rebalance from one to two nodes took 20 minutes with the consistent view enabled, but without any loads from the client
(2) rebalance from one to two nodes took 2 hours with the consistence view enabled and with small loads (2K SET ops / sec) from the client
For the further debugging, I removed couchstore API calls (couchstore_save_docs, couchstore_commit) from the ep-engine flusher execution path to see if this is caused by the slow write throughput from couchstore. I saw that the rebalance with the same load (2K SET ops / sec) just took 21 minutes.
I also looked at the timings of couchstore_save_docs and couchstore_commit from the above (2) case, and saw that calling those two APIs to persist 10 -15 dirty items per vbucket takes 5 - 10ms on average, which infers 1.5. - 2K drain rate per sec. The disk write queue size usually remained in 5 - 10K, which indicates that a single vbucket takeover frequently takes 4 - 5 sec.
I need to further investigate this issue, but it seems to me that this is not caused by the ep-engine layer at this time.
Hide
Aleksey Kondratenko
added a comment -
Interesting data, Chiyoung. I'd like to point out however that while your finding of 5-10 ms per commit looks like disk's rotational delay, in practice I know we're running with barriers disabled and "short" fsyns (which is the case here) do not really wait for disk. So there could be something else that's slowing us down somewhere. In mccouch or couchstore, for example.
Show
Aleksey Kondratenko
added a comment - Interesting data, Chiyoung. I'd like to point out however that while your finding of 5-10 ms per commit looks like disk's rotational delay, in practice I know we're running with barriers disabled and "short" fsyns (which is the case here) do not really wait for disk. So there could be something else that's slowing us down somewhere. In mccouch or couchstore, for example.
Hide
Per discussion today, I did more tests and observed the followings:
1) I commented out the fsync calls in couchstore_commit, and the rebalance time with the constant load from the client took 25 minutes.
2) I made some changes in the ep-engine so that we can batch writes for all vbuckets and issue couchstore_commit requests at the end of transaction, but didn't see much difference. The rebalance took 2 hours.
So, for the cases where the rebalance takes 2 hours, the wall clock times for couchstore low level system calls (fsync, pwrite, pread) are as follows:
rw:fsSyncTime (2289998 total)
4us - 8us : ( 0.00%) 19
8us - 16us : ( 0.00%) 94
16us - 32us : ( 0.01%) 73
32us - 64us : ( 0.01%) 3
64us - 128us : ( 0.02%) 221
128us - 256us : ( 1.85%) 41935 ##
256us - 512us : ( 3.46%) 36925 #
512us - 1ms : ( 65.99%) 1431971 ############################################################################
1ms - 2ms : ( 89.80%) 545262 #############################
2ms - 4ms : ( 97.81%) 183281 #########
4ms - 8ms : ( 99.60%) 40998 ##
8ms - 16ms : ( 99.88%) 6371
16ms - 32ms : ( 99.93%) 1229
32ms - 65ms : ( 99.95%) 532
65ms - 131ms : ( 99.97%) 457
131ms - 262ms : ( 99.99%) 431
262ms - 524ms : (100.00%) 184
524ms - 1s : (100.00%) 12
rw:fsWriteTime (14744695 total)
0 - 1us : ( 0.00%) 11
1us - 2us : ( 0.00%) 263
2us - 4us : ( 31.85%) 4695419 #######################################
4us - 8us : ( 56.85%) 3686190 ##############################
8us - 16us : ( 76.76%) 2935818 ########################
16us - 32us : ( 91.33%) 2148786 #################
32us - 64us : ( 97.96%) 977121 ########
64us - 128us : ( 99.31%) 198760 #
128us - 256us : ( 99.58%) 41135
256us - 512us : ( 99.82%) 35083
512us - 1ms : ( 99.93%) 15252
1ms - 2ms : ( 99.98%) 8638
2ms - 4ms : (100.00%) 1813
4ms - 8ms : (100.00%) 239
8ms - 16ms : (100.00%) 64
16ms - 32ms : (100.00%) 34
32ms - 65ms : (100.00%) 23
65ms - 131ms : (100.00%) 18
131ms - 262ms : (100.00%) 22
262ms - 524ms : (100.00%) 5
524ms - 1s : (100.00%) 1
rw:fsReadTime (9555100 total)
0 - 1us : ( 0.00%) 6
1us - 2us : ( 0.05%) 5156
2us - 4us : ( 53.68%) 5123773 #################################################################
4us - 8us : ( 86.88%) 3172621 ########################################
8us - 16us : ( 98.03%) 1065233 #############
16us - 32us : ( 99.39%) 130430 #
32us - 64us : ( 99.69%) 28320
64us - 128us : ( 99.83%) 13641
128us - 256us : ( 99.91%) 7176
256us - 512us : ( 99.96%) 4798
512us - 1ms : ( 99.99%) 2674
1ms - 2ms : (100.00%) 982
2ms - 4ms : (100.00%) 243
4ms - 8ms : (100.00%) 26
8ms - 16ms : (100.00%) 8
16ms - 32ms : (100.00%) 2
32ms - 65ms : (100.00%) 3
65ms - 131ms : (100.00%) 5
131ms - 262ms : (100.00%) 3
The wall clock timings for couchstore level APIs couchstore_save_docs and couchstore_commit() are
rw:save_documents (1140612 total)
16us - 32us : ( 0.00%) 1
32us - 64us : ( 0.00%) 3
64us - 128us : ( 0.00%) 53
128us - 256us : ( 7.39%) 84185 #########
256us - 512us : ( 33.63%) 299371 ################################
512us - 1ms : ( 81.68%) 548081 ###########################################################
1ms - 2ms : ( 97.57%) 181242 ###################
2ms - 4ms : ( 99.61%) 23205 ##
4ms - 8ms : ( 99.88%) 3080
8ms - 16ms : ( 99.92%) 434
16ms - 32ms : (100.00%) 901
32ms - 65ms : (100.00%) 22
65ms - 131ms : (100.00%) 16
131ms - 262ms : (100.00%) 13
262ms - 524ms : (100.00%) 4
524ms - 1s : (100.00%) 1
rw:commit (1140612 total)
256us - 512us : ( 0.00%) 15
512us - 1ms : ( 1.83%) 20857 ##
1ms - 2ms : ( 60.07%) 664282 ########################################################################
2ms - 4ms : ( 89.95%) 340869 #####################################
4ms - 8ms : ( 98.08%) 92729 ##########
8ms - 16ms : ( 99.63%) 17660 #
16ms - 32ms : ( 99.86%) 2547
32ms - 65ms : ( 99.91%) 602
65ms - 131ms : ( 99.94%) 387
131ms - 262ms : ( 99.98%) 432
262ms - 524ms : (100.00%) 212
524ms - 1s : (100.00%) 20
As I mentioned earlier, a single call of couchstore_commit() invokes two fsync opertions internally (one for btree nodes, the other for the btree header).
Also note that couchstore_save_docs() and couchstore_commit() are invoked once per vbucket.
If we comment out fsync in couchstore_commit(), couchstore_save_docs() takes much less time because of the much smaller file sizes.
In addition, a new header notification is sent to the erlang mccouch process through socket for each vbucket's commit and its wall clock timings are
rw:notify (1144999 total)
256us - 512us : ( 2.77%) 31681 ###
512us - 1ms : ( 82.19%) 909361 ##################################################################################################
1ms - 2ms : ( 97.90%) 179953 ###################
2ms - 4ms : ( 99.66%) 20110 ##
4ms - 8ms : ( 99.96%) 3440
8ms - 16ms : (100.00%) 433
16ms - 32ms : (100.00%) 20
32ms - 65ms : (100.00%) 1
In my tests, persisting 10 - 15 dirty items per vbucket requires (couchstore_save_docs() + couchstore_commit() + new header notification to the erlang process) at least, which roughly takes 5ms.
1) I commented out the fsync calls in couchstore_commit, and the rebalance time with the constant load from the client took 25 minutes.
2) I made some changes in the ep-engine so that we can batch writes for all vbuckets and issue couchstore_commit requests at the end of transaction, but didn't see much difference. The rebalance took 2 hours.
So, for the cases where the rebalance takes 2 hours, the wall clock times for couchstore low level system calls (fsync, pwrite, pread) are as follows:
rw:fsSyncTime (2289998 total)
4us - 8us : ( 0.00%) 19
8us - 16us : ( 0.00%) 94
16us - 32us : ( 0.01%) 73
32us - 64us : ( 0.01%) 3
64us - 128us : ( 0.02%) 221
128us - 256us : ( 1.85%) 41935 ##
256us - 512us : ( 3.46%) 36925 #
512us - 1ms : ( 65.99%) 1431971 ############################################################################
1ms - 2ms : ( 89.80%) 545262 #############################
2ms - 4ms : ( 97.81%) 183281 #########
4ms - 8ms : ( 99.60%) 40998 ##
8ms - 16ms : ( 99.88%) 6371
16ms - 32ms : ( 99.93%) 1229
32ms - 65ms : ( 99.95%) 532
65ms - 131ms : ( 99.97%) 457
131ms - 262ms : ( 99.99%) 431
262ms - 524ms : (100.00%) 184
524ms - 1s : (100.00%) 12
rw:fsWriteTime (14744695 total)
0 - 1us : ( 0.00%) 11
1us - 2us : ( 0.00%) 263
2us - 4us : ( 31.85%) 4695419 #######################################
4us - 8us : ( 56.85%) 3686190 ##############################
8us - 16us : ( 76.76%) 2935818 ########################
16us - 32us : ( 91.33%) 2148786 #################
32us - 64us : ( 97.96%) 977121 ########
64us - 128us : ( 99.31%) 198760 #
128us - 256us : ( 99.58%) 41135
256us - 512us : ( 99.82%) 35083
512us - 1ms : ( 99.93%) 15252
1ms - 2ms : ( 99.98%) 8638
2ms - 4ms : (100.00%) 1813
4ms - 8ms : (100.00%) 239
8ms - 16ms : (100.00%) 64
16ms - 32ms : (100.00%) 34
32ms - 65ms : (100.00%) 23
65ms - 131ms : (100.00%) 18
131ms - 262ms : (100.00%) 22
262ms - 524ms : (100.00%) 5
524ms - 1s : (100.00%) 1
rw:fsReadTime (9555100 total)
0 - 1us : ( 0.00%) 6
1us - 2us : ( 0.05%) 5156
2us - 4us : ( 53.68%) 5123773 #################################################################
4us - 8us : ( 86.88%) 3172621 ########################################
8us - 16us : ( 98.03%) 1065233 #############
16us - 32us : ( 99.39%) 130430 #
32us - 64us : ( 99.69%) 28320
64us - 128us : ( 99.83%) 13641
128us - 256us : ( 99.91%) 7176
256us - 512us : ( 99.96%) 4798
512us - 1ms : ( 99.99%) 2674
1ms - 2ms : (100.00%) 982
2ms - 4ms : (100.00%) 243
4ms - 8ms : (100.00%) 26
8ms - 16ms : (100.00%) 8
16ms - 32ms : (100.00%) 2
32ms - 65ms : (100.00%) 3
65ms - 131ms : (100.00%) 5
131ms - 262ms : (100.00%) 3
The wall clock timings for couchstore level APIs couchstore_save_docs and couchstore_commit() are
rw:save_documents (1140612 total)
16us - 32us : ( 0.00%) 1
32us - 64us : ( 0.00%) 3
64us - 128us : ( 0.00%) 53
128us - 256us : ( 7.39%) 84185 #########
256us - 512us : ( 33.63%) 299371 ################################
512us - 1ms : ( 81.68%) 548081 ###########################################################
1ms - 2ms : ( 97.57%) 181242 ###################
2ms - 4ms : ( 99.61%) 23205 ##
4ms - 8ms : ( 99.88%) 3080
8ms - 16ms : ( 99.92%) 434
16ms - 32ms : (100.00%) 901
32ms - 65ms : (100.00%) 22
65ms - 131ms : (100.00%) 16
131ms - 262ms : (100.00%) 13
262ms - 524ms : (100.00%) 4
524ms - 1s : (100.00%) 1
rw:commit (1140612 total)
256us - 512us : ( 0.00%) 15
512us - 1ms : ( 1.83%) 20857 ##
1ms - 2ms : ( 60.07%) 664282 ########################################################################
2ms - 4ms : ( 89.95%) 340869 #####################################
4ms - 8ms : ( 98.08%) 92729 ##########
8ms - 16ms : ( 99.63%) 17660 #
16ms - 32ms : ( 99.86%) 2547
32ms - 65ms : ( 99.91%) 602
65ms - 131ms : ( 99.94%) 387
131ms - 262ms : ( 99.98%) 432
262ms - 524ms : (100.00%) 212
524ms - 1s : (100.00%) 20
As I mentioned earlier, a single call of couchstore_commit() invokes two fsync opertions internally (one for btree nodes, the other for the btree header).
Also note that couchstore_save_docs() and couchstore_commit() are invoked once per vbucket.
If we comment out fsync in couchstore_commit(), couchstore_save_docs() takes much less time because of the much smaller file sizes.
In addition, a new header notification is sent to the erlang mccouch process through socket for each vbucket's commit and its wall clock timings are
rw:notify (1144999 total)
256us - 512us : ( 2.77%) 31681 ###
512us - 1ms : ( 82.19%) 909361 ##################################################################################################
1ms - 2ms : ( 97.90%) 179953 ###################
2ms - 4ms : ( 99.66%) 20110 ##
4ms - 8ms : ( 99.96%) 3440
8ms - 16ms : (100.00%) 433
16ms - 32ms : (100.00%) 20
32ms - 65ms : (100.00%) 1
In my tests, persisting 10 - 15 dirty items per vbucket requires (couchstore_save_docs() + couchstore_commit() + new header notification to the erlang process) at least, which roughly takes 5ms.
Show
Chiyoung Seo
added a comment - - edited Per discussion today, I did more tests and observed the followings:
1) I commented out the fsync calls in couchstore_commit, and the rebalance time with the constant load from the client took 25 minutes.
2) I made some changes in the ep-engine so that we can batch writes for all vbuckets and issue couchstore_commit requests at the end of transaction, but didn't see much difference. The rebalance took 2 hours.
So, for the cases where the rebalance takes 2 hours, the wall clock times for couchstore low level system calls (fsync, pwrite, pread) are as follows:
rw:fsSyncTime (2289998 total)
4us - 8us : ( 0.00%) 19
8us - 16us : ( 0.00%) 94
16us - 32us : ( 0.01%) 73
32us - 64us : ( 0.01%) 3
64us - 128us : ( 0.02%) 221
128us - 256us : ( 1.85%) 41935 ##
256us - 512us : ( 3.46%) 36925 #
512us - 1ms : ( 65.99%) 1431971 ############################################################################
1ms - 2ms : ( 89.80%) 545262 #############################
2ms - 4ms : ( 97.81%) 183281 #########
4ms - 8ms : ( 99.60%) 40998 ##
8ms - 16ms : ( 99.88%) 6371
16ms - 32ms : ( 99.93%) 1229
32ms - 65ms : ( 99.95%) 532
65ms - 131ms : ( 99.97%) 457
131ms - 262ms : ( 99.99%) 431
262ms - 524ms : (100.00%) 184
524ms - 1s : (100.00%) 12
rw:fsWriteTime (14744695 total)
0 - 1us : ( 0.00%) 11
1us - 2us : ( 0.00%) 263
2us - 4us : ( 31.85%) 4695419 #######################################
4us - 8us : ( 56.85%) 3686190 ##############################
8us - 16us : ( 76.76%) 2935818 ########################
16us - 32us : ( 91.33%) 2148786 #################
32us - 64us : ( 97.96%) 977121 ########
64us - 128us : ( 99.31%) 198760 #
128us - 256us : ( 99.58%) 41135
256us - 512us : ( 99.82%) 35083
512us - 1ms : ( 99.93%) 15252
1ms - 2ms : ( 99.98%) 8638
2ms - 4ms : (100.00%) 1813
4ms - 8ms : (100.00%) 239
8ms - 16ms : (100.00%) 64
16ms - 32ms : (100.00%) 34
32ms - 65ms : (100.00%) 23
65ms - 131ms : (100.00%) 18
131ms - 262ms : (100.00%) 22
262ms - 524ms : (100.00%) 5
524ms - 1s : (100.00%) 1
rw:fsReadTime (9555100 total)
0 - 1us : ( 0.00%) 6
1us - 2us : ( 0.05%) 5156
2us - 4us : ( 53.68%) 5123773 #################################################################
4us - 8us : ( 86.88%) 3172621 ########################################
8us - 16us : ( 98.03%) 1065233 #############
16us - 32us : ( 99.39%) 130430 #
32us - 64us : ( 99.69%) 28320
64us - 128us : ( 99.83%) 13641
128us - 256us : ( 99.91%) 7176
256us - 512us : ( 99.96%) 4798
512us - 1ms : ( 99.99%) 2674
1ms - 2ms : (100.00%) 982
2ms - 4ms : (100.00%) 243
4ms - 8ms : (100.00%) 26
8ms - 16ms : (100.00%) 8
16ms - 32ms : (100.00%) 2
32ms - 65ms : (100.00%) 3
65ms - 131ms : (100.00%) 5
131ms - 262ms : (100.00%) 3
The wall clock timings for couchstore level APIs couchstore_save_docs and couchstore_commit() are
rw:save_documents (1140612 total)
16us - 32us : ( 0.00%) 1
32us - 64us : ( 0.00%) 3
64us - 128us : ( 0.00%) 53
128us - 256us : ( 7.39%) 84185 #########
256us - 512us : ( 33.63%) 299371 ################################
512us - 1ms : ( 81.68%) 548081 ###########################################################
1ms - 2ms : ( 97.57%) 181242 ###################
2ms - 4ms : ( 99.61%) 23205 ##
4ms - 8ms : ( 99.88%) 3080
8ms - 16ms : ( 99.92%) 434
16ms - 32ms : (100.00%) 901
32ms - 65ms : (100.00%) 22
65ms - 131ms : (100.00%) 16
131ms - 262ms : (100.00%) 13
262ms - 524ms : (100.00%) 4
524ms - 1s : (100.00%) 1
rw:commit (1140612 total)
256us - 512us : ( 0.00%) 15
512us - 1ms : ( 1.83%) 20857 ##
1ms - 2ms : ( 60.07%) 664282 ########################################################################
2ms - 4ms : ( 89.95%) 340869 #####################################
4ms - 8ms : ( 98.08%) 92729 ##########
8ms - 16ms : ( 99.63%) 17660 #
16ms - 32ms : ( 99.86%) 2547
32ms - 65ms : ( 99.91%) 602
65ms - 131ms : ( 99.94%) 387
131ms - 262ms : ( 99.98%) 432
262ms - 524ms : (100.00%) 212
524ms - 1s : (100.00%) 20
As I mentioned earlier, a single call of couchstore_commit() invokes two fsync opertions internally (one for btree nodes, the other for the btree header).
Also note that couchstore_save_docs() and couchstore_commit() are invoked once per vbucket.
If we comment out fsync in couchstore_commit(), couchstore_save_docs() takes much less time because of the much smaller file sizes.
In addition, a new header notification is sent to the erlang mccouch process through socket for each vbucket's commit and its wall clock timings are
rw:notify (1144999 total)
256us - 512us : ( 2.77%) 31681 ###
512us - 1ms : ( 82.19%) 909361 ##################################################################################################
1ms - 2ms : ( 97.90%) 179953 ###################
2ms - 4ms : ( 99.66%) 20110 ##
4ms - 8ms : ( 99.96%) 3440
8ms - 16ms : (100.00%) 433
16ms - 32ms : (100.00%) 20
32ms - 65ms : (100.00%) 1
In my tests, persisting 10 - 15 dirty items per vbucket requires (couchstore_save_docs() + couchstore_commit() + new header notification to the erlang process) at least, which roughly takes 5ms.
Hide
Aleksey Kondratenko
added a comment -
Very interesting data. I wonder what honest fsync will look like. I'll try to run my own experiment quickly on honestly quick hardware.
Show
Aleksey Kondratenko
added a comment - Very interesting data. I wonder what honest fsync will look like. I'll try to run my own experiment quickly on honestly quick hardware.
Show
Aleksey Kondratenko
added a comment - Sorry honestly _slow_
Hide
Aleksey Kondratenko
added a comment -
So I tried rebalance under tiny load as before (130 ops sec) and _enabled_ barriers.
I'm seeing commit times in range 131-262 ms. Weird right?
rw:commit (8563 total)
256us - 512us : ( 2.15%) 184 ####
512us - 1ms : ( 33.07%) 2648 ##############################################################
1ms - 2ms : ( 70.42%) 3198 ###########################################################################
2ms - 4ms : ( 83.22%) 1096 #########################
4ms - 8ms : ( 85.11%) 162 ###
8ms - 16ms : ( 86.16%) 90 ##
16ms - 32ms : ( 86.93%) 66 #
32ms - 65ms : ( 87.06%) 11
65ms - 131ms : ( 88.61%) 133 ###
131ms - 262ms : ( 99.26%) 912 #####################
262ms - 524ms : ( 99.91%) 55 #
524ms - 1s : (100.00%) 8
I'm seeing commit times in range 131-262 ms. Weird right?
rw:commit (8563 total)
256us - 512us : ( 2.15%) 184 ####
512us - 1ms : ( 33.07%) 2648 ##############################################################
1ms - 2ms : ( 70.42%) 3198 ###########################################################################
2ms - 4ms : ( 83.22%) 1096 #########################
4ms - 8ms : ( 85.11%) 162 ###
8ms - 16ms : ( 86.16%) 90 ##
16ms - 32ms : ( 86.93%) 66 #
32ms - 65ms : ( 87.06%) 11
65ms - 131ms : ( 88.61%) 133 ###
131ms - 262ms : ( 99.26%) 912 #####################
262ms - 524ms : ( 99.91%) 55 #
524ms - 1s : (100.00%) 8
Show
Aleksey Kondratenko
added a comment - So I tried rebalance under tiny load as before (130 ops sec) and _enabled_ barriers.
I'm seeing commit times in range 131-262 ms. Weird right?
rw:commit (8563 total)
256us - 512us : ( 2.15%) 184 ####
512us - 1ms : ( 33.07%) 2648 ##############################################################
1ms - 2ms : ( 70.42%) 3198 ###########################################################################
2ms - 4ms : ( 83.22%) 1096 #########################
4ms - 8ms : ( 85.11%) 162 ###
8ms - 16ms : ( 86.16%) 90 ##
16ms - 32ms : ( 86.93%) 66 #
32ms - 65ms : ( 87.06%) 11
65ms - 131ms : ( 88.61%) 133 ###
131ms - 262ms : ( 99.26%) 912 #####################
262ms - 524ms : ( 99.91%) 55 #
524ms - 1s : (100.00%) 8
Hide
Chiyoung Seo
added a comment -
Yes, this is quite interesting. I don't expect that slowness honestly.
Btw, the following is the code snippet that measures the timings of couchstore_commit() API:
cs_begin = gethrtime();
errCode = couchstore_commit(db);
st.commitHisto.add((gethrtime() - cs_begin) / 1000);
Btw, the following is the code snippet that measures the timings of couchstore_commit() API:
cs_begin = gethrtime();
errCode = couchstore_commit(db);
st.commitHisto.add((gethrtime() - cs_begin) / 1000);
Show
Chiyoung Seo
added a comment - Yes, this is quite interesting. I don't expect that slowness honestly.
Btw, the following is the code snippet that measures the timings of couchstore_commit() API:
cs_begin = gethrtime();
errCode = couchstore_commit(db);
st.commitHisto.add((gethrtime() - cs_begin) / 1000);
Hide
Aleksey Kondratenko
added a comment -
Actually, given that each our fdatasync call is extending file size, it means we modify file metadata, list of free blocks etc.. Thus fdatasync call is actually doing at least one physical write to filesystem journal (to journal metadata update) and then physical write to actual file. And there's potentially very long disk seek in between. Thus 100 millseconds per fsync is perhaps not that unexpected.
Show
Aleksey Kondratenko
added a comment - Actually, given that each our fdatasync call is extending file size, it means we modify file metadata, list of free blocks etc.. Thus fdatasync call is actually doing at least one physical write to filesystem journal (to journal metadata update) and then physical write to actual file. And there's potentially very long disk seek in between. Thus 100 millseconds per fsync is perhaps not that unexpected.
Hide
Aleksey Kondratenko
added a comment -
I was able to speed up commit times 2 times by doing reserving space with linux's fallocate (with FL_KEEP_SIZE otherwise our constant re-opening will cause huge slowness finding last usable header). We can remove more of metadata syncing overhead by pre-extending with actual 0 bytes. That way our fdatasync will not have to update file metadata at all, but as I pointed out above today this will hit problems with re-opening of .couch files and with finding last valid header.
Anyway, we clearly need order of magnitude improvement rather than 2x. So we need something else.
Dirty patch can be found here: http://paste.debian.net/198227/
Anyway, we clearly need order of magnitude improvement rather than 2x. So we need something else.
Dirty patch can be found here: http://paste.debian.net/198227/
Show
Aleksey Kondratenko
added a comment - I was able to speed up commit times 2 times by doing reserving space with linux's fallocate (with FL_KEEP_SIZE otherwise our constant re-opening will cause huge slowness finding last usable header). We can remove more of metadata syncing overhead by pre-extending with actual 0 bytes. That way our fdatasync will not have to update file metadata at all, but as I pointed out above today this will hit problems with re-opening of .couch files and with finding last valid header.
Anyway, we clearly need order of magnitude improvement rather than 2x. So we need something else.
Dirty patch can be found here: http://paste.debian.net/198227/
Hide
Aleksey Kondratenko
added a comment -
I think we have enough data that proves that indeed fsync is the problem here. I think we should have Damien and Peter involved to seek a solution.
Show
Aleksey Kondratenko
added a comment - I think we have enough data that proves that indeed fsync is the problem here. I think we should have Damien and Peter involved to seek a solution.
Hide
Aleksey Kondratenko
added a comment -
There's another thing I don't understand. I'm doing around 130 sets (alll of them updates) per second. Stats show >200 disk updates per second (which is ok consider replica writes too).
Even more weird I'm seeing 4.5 megs of writes per second in iotop. Even for 300 disk update per second that's 15k of bytes in disk writes per item. My items are small btw, less then 100 bytes. Kinda big imho. I.e. 4 disk blocks per item.
We know however that overhead will be more or less stable for higher write rates.
Even more weird I'm seeing 4.5 megs of writes per second in iotop. Even for 300 disk update per second that's 15k of bytes in disk writes per item. My items are small btw, less then 100 bytes. Kinda big imho. I.e. 4 disk blocks per item.
We know however that overhead will be more or less stable for higher write rates.
Show
Aleksey Kondratenko
added a comment - There's another thing I don't understand. I'm doing around 130 sets (alll of them updates) per second. Stats show >200 disk updates per second (which is ok consider replica writes too).
Even more weird I'm seeing 4.5 megs of writes per second in iotop. Even for 300 disk update per second that's 15k of bytes in disk writes per item. My items are small btw, less then 100 bytes. Kinda big imho. I.e. 4 disk blocks per item.
We know however that overhead will be more or less stable for higher write rates.
Hide
Thuan Nguyen
added a comment -
Integrated in github-ep-engine-2-0 #441 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/441/])
MB-6796 Prioritize flushing pending vbuckets over regular vbuckets (Revision e9ce877041d101efee775e9c65ea6b9eef914926)
Result = SUCCESS
Chiyoung Seo :
Files :
* src/stats.hh
* src/ep.cc
* src/flusher.hh
* src/vbucketmap.cc
* src/flusher.cc
* src/vbucket.hh
* tests/ep_test_apis.cc
* tests/ep_testsuite.cc
* tests/ep_test_apis.h
* docs/stats.org
* src/vbucketmap.hh
* include/ep-engine/command_ids.h
* src/ep_engine.cc
* src/ep.hh
Result = SUCCESS
Chiyoung Seo :
Files :
* src/stats.hh
* src/ep.cc
* src/flusher.hh
* src/vbucketmap.cc
* src/flusher.cc
* src/vbucket.hh
* tests/ep_test_apis.cc
* tests/ep_testsuite.cc
* tests/ep_test_apis.h
* docs/stats.org
* src/vbucketmap.hh
* include/ep-engine/command_ids.h
* src/ep_engine.cc
* src/ep.hh
Show
Thuan Nguyen
added a comment - Integrated in github-ep-engine-2-0 #441 (See [ http://qa.hq.northscale.net/job/github-ep-engine-2-0/441/ ])
MB-6796 Prioritize flushing pending vbuckets over regular vbuckets (Revision e9ce877041d101efee775e9c65ea6b9eef914926)
Result = SUCCESS
Chiyoung Seo :
Files :
* src/stats.hh
* src/ep.cc
* src/flusher.hh
* src/vbucketmap.cc
* src/flusher.cc
* src/vbucket.hh
* tests/ep_test_apis.cc
* tests/ep_testsuite.cc
* tests/ep_test_apis.h
* docs/stats.org
* src/vbucketmap.hh
* include/ep-engine/command_ids.h
* src/ep_engine.cc
* src/ep.hh
Hide
Karen Zeller
added a comment -
Added to RN as: Prioritize flushing pending vbuckets
over regular vbuckets.
This is a performance improvement used
for rebalancing buckets that have no
views or design docs when consistent view mode is enabled.
over regular vbuckets.
This is a performance improvement used
for rebalancing buckets that have no
views or design docs when consistent view mode is enabled.
Show
Karen Zeller
added a comment - Added to RN as: Prioritize flushing pending vbuckets
over regular vbuckets.
This is a performance improvement used
for rebalancing buckets that have no
views or design docs when consistent view mode is enabled.
I hear from @Farshid Ghods that @Karan k also sees this.