Details
Description
____________________
From: Ronnie Sun
Sent: Friday, November 09, 2012 6:34 PM
To: Steve Yen
Subject: Re: swap rebalance perf?
Hi Steve,
Got some preliminary results showing 2.0 is 10x slower, which falls into the same area we've seen for reb-litmus tests.
1.8.1-938: 203 seconds
2.0.0-1939: 2983 seconds
Spec: (reb-litmus-swap-2) 3M mixed, start with 2 nodes, swap 1 node with 9 client firing 1k ops each.
Current jenkins automation for this has some problems, so I don't have comparison graphs.
Thanks,
Ronnie
From: Ronnie Sun
Sent: Friday, November 09, 2012 6:34 PM
To: Steve Yen
Subject: Re: swap rebalance perf?
Hi Steve,
Got some preliminary results showing 2.0 is 10x slower, which falls into the same area we've seen for reb-litmus tests.
1.8.1-938: 203 seconds
2.0.0-1939: 2983 seconds
Spec: (reb-litmus-swap-2) 3M mixed, start with 2 nodes, swap 1 node with 9 client firing 1k ops each.
Current jenkins automation for this has some problems, so I don't have comparison graphs.
Thanks,
Ronnie
-
Hide
- 10.2.1.58-11142012-1140-diag.zip
- 14/Nov/12 1:48 PM
- 4.33 MB
- Ronnie Sun
-
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/couchbase.log 782 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.xdcr.log 4 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.couchdb.log 2.48 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/stats.log 1.23 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.stats.log 838 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.error.log 35 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.views.log 317 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.info.log 10.27 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/diag.log 16.72 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/ns_server.debug.log 19.77 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194241/memcached.log 12.39 MB
-
Hide
- 10.2.1.61-11142012-1139-diag.zip
- 14/Nov/12 1:48 PM
- 3.52 MB
- Ronnie Sun
-
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/couchbase.log 667 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.xdcr.log 5 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.couchdb.log 996 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/stats.log 2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.stats.log 489 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.error.log 32 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.views.log 323 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.info.log 12.14 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/ns_server.debug.log 22.71 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194010/memcached.log 2.09 MB
-
Hide
- 10.2.1.63-11142012-1142-diag.zip
- 14/Nov/12 1:48 PM
- 6.99 MB
- Ronnie Sun
-
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/couchbase.log 783 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.xdcr.log 4 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.couchdb.log 2.88 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/stats.log 1.22 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.stats.log 952 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.error.log 35 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.views.log 513 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.info.log 18.32 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/diag.log 16.72 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/ns_server.debug.log 31.69 MB
- cbcollect_info_ns_1@127.0.0.1_20121114-194344/memcached.log 21.82 MB
-
Hide
- info_168.zip
- 12/Nov/12 4:57 PM
- 6.50 MB
- Ronnie Sun
-
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/couchbase.log 870 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.xdcr.log 4 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.couchdb.log 6.01 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/stats.log 1.27 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.stats.log 2.21 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.error.log 33 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.views.log 293 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.info.log 15.23 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/diag.log 17.51 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/ns_server.debug.log 25.02 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225004/memcached.log 16.00 MB
-
Hide
- info_169.zip
- 12/Nov/12 4:57 PM
- 11.53 MB
- Ronnie Sun
-
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/couchbase.log 857 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.xdcr.log 4 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.couchdb.log 8.06 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/stats.log 1.26 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.stats.log 2.48 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.error.log 37 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.views.log 390 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.info.log 27.60 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/diag.log 17.50 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/ns_server.debug.log 41.02 MB
- cbcollect_info_ns_1@127.0.0.1_20121112-225053/memcached.log 48.04 MB
-
Hide
- info.zip
- 10/Nov/12 4:54 PM
- 8.04 MB
- Ronnie Sun
-
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/couchbase.log 875 kB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.xdcr.log 4 kB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.couchdb.log 6.03 MB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/stats.log 1.27 MB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.stats.log 14.06 MB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.error.log 37 kB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.views.log 292 kB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.info.log 16.89 MB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/diag.log 16.52 MB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/ns_server.debug.log 27.78 MB
- cbcollect_info_ns_1@127.0.0.1_20121110-223856/memcached.log 17.14 MB
-
Hide
- ns-diag-168.txt.zip
- 12/Nov/12 4:57 PM
- 5.23 MB
- Ronnie Sun
-
- ns-diag-168.txt 64.17 MB
- __MACOSX/._ns-diag-168.txt 0.4 kB
-
- reb-swap-6-1.loop_1.8.1-938-rel-enterprise_2.0.0-1954-rel-enterprise_orange_Nov-21-2012_10-07-08.pdf
- 21/Nov/12 12:31 PM
- 1.80 MB
- Ronnie Sun
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Hide
Permalink
Filipe Manana
added a comment -
No views defined, so updating component list.
Show
Filipe Manana
added a comment - No views defined, so updating component list.
Show
Filipe Manana
added a comment - No views defined, so updating component list.
Hide
Steve Yen
added a comment -
https://www.yammer.com/couchbase.com/#/Threads/show?threadId=236527848
4-3 swap rebalance:
without foreground load (1x identical):
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
with ~10k foreground (7x slower):
1.8.1-938: 222 sec
2.0.0- 1954: 1723 sec
6-1 swap rebalance:
with ~10k foreground (1x identical):
1.8.1-938: 639.52 sec
2.0.0- 1954: 654.22 sec
Like · Reply · Share · More · 13 minutes ago
Ronnie Sun: Based on 6-1 results (started with 6 nodes, swap 1 node):
2.0 has similar reb time as 1.8.1, while latencies are ~30% worse.
reb-swap-6-1.loop_1.8.1-938-rel-enterprise_2.0.0-1954-rel-enterprise_orange_Nov-21-2012_10-07-08
Steve Yen: @Aliaksey Kandratsenka, @Chiyoung Seo
"4-3 swap rebalance" == 4 nodes initially and swap 3 of them.
"6-1 swap rebalance" == 6 nodes initially and swap 1 of them.
@Ronnie Sun also reports latencies were worse in 2.0 during swap rebalance.
This seems to indicate that we don't need @Chiyoung Seo's patch that prioritizes the vbucket takeover even more.
This was on system test cluster (xen VM's) with SSD, key-value mixed workload (no views), with consistent views enabled (the @Chiyoung Seo .
4-3 swap rebalance:
without foreground load (1x identical):
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
with ~10k foreground (7x slower):
1.8.1-938: 222 sec
2.0.0- 1954: 1723 sec
6-1 swap rebalance:
with ~10k foreground (1x identical):
1.8.1-938: 639.52 sec
2.0.0- 1954: 654.22 sec
Like · Reply · Share · More · 13 minutes ago
Ronnie Sun: Based on 6-1 results (started with 6 nodes, swap 1 node):
2.0 has similar reb time as 1.8.1, while latencies are ~30% worse.
reb-swap-6-1.loop_1.8.1-938-rel-enterprise_2.0.0-1954-rel-enterprise_orange_Nov-21-2012_10-07-08
Steve Yen: @Aliaksey Kandratsenka, @Chiyoung Seo
"4-3 swap rebalance" == 4 nodes initially and swap 3 of them.
"6-1 swap rebalance" == 6 nodes initially and swap 1 of them.
@Ronnie Sun also reports latencies were worse in 2.0 during swap rebalance.
This seems to indicate that we don't need @Chiyoung Seo's patch that prioritizes the vbucket takeover even more.
This was on system test cluster (xen VM's) with SSD, key-value mixed workload (no views), with consistent views enabled (the @Chiyoung Seo .
Show
Steve Yen
added a comment - https://www.yammer.com/couchbase.com/#/Threads/show?threadId=236527848
4-3 swap rebalance:
without foreground load (1x identical):
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
with ~10k foreground (7x slower):
1.8.1-938: 222 sec
2.0.0- 1954: 1723 sec
6-1 swap rebalance:
with ~10k foreground (1x identical):
1.8.1-938: 639.52 sec
2.0.0- 1954: 654.22 sec
Like · Reply · Share · More · 13 minutes ago
Ronnie Sun: Based on 6-1 results (started with 6 nodes, swap 1 node):
2.0 has similar reb time as 1.8.1, while latencies are ~30% worse.
reb-swap-6-1.loop_1.8.1-938-rel-enterprise_2.0.0-1954-rel-enterprise_orange_Nov-21-2012_10-07-08
Steve Yen: @Aliaksey Kandratsenka, @Chiyoung Seo
"4-3 swap rebalance" == 4 nodes initially and swap 3 of them.
"6-1 swap rebalance" == 6 nodes initially and swap 1 of them.
@Ronnie Sun also reports latencies were worse in 2.0 during swap rebalance.
This seems to indicate that we don't need @Chiyoung Seo's patch that prioritizes the vbucket takeover even more.
This was on system test cluster (xen VM's) with SSD, key-value mixed workload (no views), with consistent views enabled (the @Chiyoung Seo .
Hide
Ronnie Sun
added a comment -
Based on 6-1 results (started with 6 nodes, swap 1 node):
2.0 has similar reb time as 1.8.1, while latencies are ~30% worse.
2.0 has similar reb time as 1.8.1, while latencies are ~30% worse.
Show
Ronnie Sun
added a comment - Based on 6-1 results (started with 6 nodes, swap 1 node):
2.0 has similar reb time as 1.8.1, while latencies are ~30% worse.
Hide
Ronnie Sun
added a comment -
4-3 swap rebalance:
without foreground load (1x identical):
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
with ~10k foreground (7x slower):
1.8.1-938: 222 sec
2.0.0- 1954: 1723 sec
6-1 swap rebalance:
with ~10k foreground (1x identical):
1.8.1-938: 639.52 sec
2.0.0- 1954: 654.22 sec
without foreground load (1x identical):
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
with ~10k foreground (7x slower):
1.8.1-938: 222 sec
2.0.0- 1954: 1723 sec
6-1 swap rebalance:
with ~10k foreground (1x identical):
1.8.1-938: 639.52 sec
2.0.0- 1954: 654.22 sec
Show
Ronnie Sun
added a comment - 4-3 swap rebalance:
without foreground load (1x identical):
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
with ~10k foreground (7x slower):
1.8.1-938: 222 sec
2.0.0- 1954: 1723 sec
6-1 swap rebalance:
with ~10k foreground (1x identical):
1.8.1-938: 639.52 sec
2.0.0- 1954: 654.22 sec
Hide
Ronnie Sun
added a comment -
Early results for larger number of nodes swap rebalance (swap 3 nodes from a 4 node cluster).
No foreground load case looks promising.
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
No foreground load case looks promising.
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
Show
Ronnie Sun
added a comment - Early results for larger number of nodes swap rebalance (swap 3 nodes from a 4 node cluster).
No foreground load case looks promising.
1.8.1-938: 988.58 seconds.
2.0.0- 1954: 974.67 seconds.
Hide
Ronnie Sun
added a comment -
due to the cluster limitations:
change test plan to:
1) 4 node swap 3.
2) 6 nodes swap 1.
change test plan to:
1) 4 node swap 3.
2) 6 nodes swap 1.
Show
Ronnie Sun
added a comment - due to the cluster limitations:
change test plan to:
1) 4 node swap 3.
2) 6 nodes swap 1.
Hide
Chisheng Hong
added a comment -
[global]
#username:root
username:root
password:couchbase
port:8091
data_path=/data
[servers]
1:10.6.2.37
2:10.6.2.38
3:10.6.2.39
4:10.6.2.40
5:10.6.2.42
6:10.6.2.43
7:10.6.2.44
8:10.6.2.45
#9:10.3.121.24
#10:10.3.121.25
#11:10.3.121.26
#12:10.3.121.27
[membase]
rest_username:Administrator
rest_password:password
This is the cluster information for orange cluster. Ronnie you can run the test on them.
#username:root
username:root
password:couchbase
port:8091
data_path=/data
[servers]
1:10.6.2.37
2:10.6.2.38
3:10.6.2.39
4:10.6.2.40
5:10.6.2.42
6:10.6.2.43
7:10.6.2.44
8:10.6.2.45
#9:10.3.121.24
#10:10.3.121.25
#11:10.3.121.26
#12:10.3.121.27
[membase]
rest_username:Administrator
rest_password:password
This is the cluster information for orange cluster. Ronnie you can run the test on them.
Show
Chisheng Hong
added a comment - [global]
#username:root
username:root
password:couchbase
port:8091
data_path=/data
[servers]
1:10.6.2.37
2:10.6.2.38
3:10.6.2.39
4:10.6.2.40
5:10.6.2.42
6:10.6.2.43
7:10.6.2.44
8:10.6.2.45
#9:10.3.121.24
#10:10.3.121.25
#11:10.3.121.26
#12:10.3.121.27
[membase]
rest_username:Administrator
rest_password:password
This is the cluster information for orange cluster. Ronnie you can run the test on them.
Hide
Ronnie Sun
added a comment -
test 1:
- 8 nodes, swap 1
- 20M items.
- 30 clients with 500 ops per second (set:get = 50:50)
test 2:
- 8 nodes, swap 3
- 20M items.
- 30 clients with 500 ops per second
- 8 nodes, swap 1
- 20M items.
- 30 clients with 500 ops per second (set:get = 50:50)
test 2:
- 8 nodes, swap 3
- 20M items.
- 30 clients with 500 ops per second
Show
Ronnie Sun
added a comment - test 1:
- 8 nodes, swap 1
- 20M items.
- 30 clients with 500 ops per second (set:get = 50:50)
test 2:
- 8 nodes, swap 3
- 20M items.
- 30 clients with 500 ops per second
Hide
Steve Yen
added a comment -
Ronnie, reassigning to you to please rerun tests with the Chiyoung's toy-build, and help cover Pavel while he's traveling.
From Chiyoung's email...
Ronnie, Pavel,
I made more changes in ep-engine, so that the flusher can work on persisting high priority vbuckets much more aggressively. In my tests, I observed ~ 4X faster rebalance.
However, my main concern on this new change is that it might cause the starvation on flushing regular vbuckets, and consequently grow the number of dirty items on those regular vbuckets (i.e., disk write queue size might grow a lot especially in XDCR tests because we prioritize flushing 32 vbuckets once every 30 minutes)
You can download the toy build from
http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-chiyoung-x86_64_2.0.0-2002-toy.rpm
Ronnie, can you please test it with the swap rebalance?
Pavel, can you test it with the XDCR?
Thanks,
Chiyoung
From Chiyoung's email...
Ronnie, Pavel,
I made more changes in ep-engine, so that the flusher can work on persisting high priority vbuckets much more aggressively. In my tests, I observed ~ 4X faster rebalance.
However, my main concern on this new change is that it might cause the starvation on flushing regular vbuckets, and consequently grow the number of dirty items on those regular vbuckets (i.e., disk write queue size might grow a lot especially in XDCR tests because we prioritize flushing 32 vbuckets once every 30 minutes)
You can download the toy build from
http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-chiyoung-x86_64_2.0.0-2002-toy.rpm
Ronnie, can you please test it with the swap rebalance?
Pavel, can you test it with the XDCR?
Thanks,
Chiyoung
Show
Steve Yen
added a comment - Ronnie, reassigning to you to please rerun tests with the Chiyoung's toy-build, and help cover Pavel while he's traveling.
From Chiyoung's email...
Ronnie, Pavel,
I made more changes in ep-engine, so that the flusher can work on persisting high priority vbuckets much more aggressively. In my tests, I observed ~ 4X faster rebalance.
However, my main concern on this new change is that it might cause the starvation on flushing regular vbuckets, and consequently grow the number of dirty items on those regular vbuckets (i.e., disk write queue size might grow a lot especially in XDCR tests because we prioritize flushing 32 vbuckets once every 30 minutes)
You can download the toy build from
http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_toy-chiyoung-x86_64_2.0.0-2002-toy.rpm
Ronnie, can you please test it with the swap rebalance?
Pavel, can you test it with the XDCR?
Thanks,
Chiyoung
Hide
Chiyoung Seo
added a comment -
Steve,
From now, I wouldn't work on any rebalance slowness issues due to consistent view. As I mentioned, depending on the persistence for view updates is NOT scalable approach.
Initially, I suggested to stream incoming mutations or takeover items to the indexer through TAP without waiting for persistence, but heard from someone that it won't make any difference compared with the current approach. However, I recently showed that the current approach is way slow due to the disk slowness and fsync per vbucket.
I don't expect that we will change the current architecture in post 2.0, and am not interested in giving any suggestions to people who are so stubborn.
Again, my suggestion is to disable the consistent view if there are no views defined.
Please assign it to people who is willing to work on this issue.
In post 2.0, I will remove this prioritization because it's not a good approach, but instead workaround. I will send a notice to the cluster and XDCR teams after 2.0.
From now, I wouldn't work on any rebalance slowness issues due to consistent view. As I mentioned, depending on the persistence for view updates is NOT scalable approach.
Initially, I suggested to stream incoming mutations or takeover items to the indexer through TAP without waiting for persistence, but heard from someone that it won't make any difference compared with the current approach. However, I recently showed that the current approach is way slow due to the disk slowness and fsync per vbucket.
I don't expect that we will change the current architecture in post 2.0, and am not interested in giving any suggestions to people who are so stubborn.
Again, my suggestion is to disable the consistent view if there are no views defined.
Please assign it to people who is willing to work on this issue.
In post 2.0, I will remove this prioritization because it's not a good approach, but instead workaround. I will send a notice to the cluster and XDCR teams after 2.0.
Show
Chiyoung Seo
added a comment - Steve,
From now, I wouldn't work on any rebalance slowness issues due to consistent view. As I mentioned, depending on the persistence for view updates is NOT scalable approach.
Initially, I suggested to stream incoming mutations or takeover items to the indexer through TAP without waiting for persistence, but heard from someone that it won't make any difference compared with the current approach. However, I recently showed that the current approach is way slow due to the disk slowness and fsync per vbucket.
I don't expect that we will change the current architecture in post 2.0, and am not interested in giving any suggestions to people who are so stubborn.
Again, my suggestion is to disable the consistent view if there are no views defined.
Please assign it to people who is willing to work on this issue.
In post 2.0, I will remove this prioritization because it's not a good approach, but instead workaround. I will send a notice to the cluster and XDCR teams after 2.0.
Hide
Aleksey Kondratenko
added a comment -
Actually there is some something interesting and non-flusher related in in timestamps from latest diags (without ongoing mutations).
E.g.:
[rebalance:debug,2012-11-14T11:10:17.702,ns_1@10.2.1.58:<0.17937.1>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 511
[ns_server:info,2012-11-14T11:10:17.899,ns_1@10.2.1.58:ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:17.698884 PST 3: TAP (Consumer) eq_tapq:anon_1537 - Reset vbucket 511 was completed succecssfully.
[ns_server:debug,2012-11-14T11:10:17.951,ns_1@10.2.1.58:janitor_agent-default<0.2070.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.17937.1> (ok)
[ns_server:info,2012-11-14T11:10:18.151,ns_1@10.2.1.58:ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:17.951680 PST 3: Notified the completion of checkpoint persistence for vbucket 511, cookie 0x13fef080
[ns_server:info,2012-11-14T11:10:18.656,ns_1@10.2.1.58:ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:18.455993 PST 3: Schedule cleanup of "eq_tapq:anon_1536"
[rebalance:debug,2012-11-14T11:10:19.365,ns_1@10.2.1.58:<0.17944.1>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 511
[ns_server:debug,2012-11-14T11:10:19.366,ns_1@10.2.1.58:janitor_agent-default<0.2070.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.17944.1> (ok)
We see that we actually get reply quite quickly from ep-engine in both invokations. Second is especially quick as it's a NOP in this case. But there's some weird 1 second delay in between those two calls. Only significant activity in between there only stats request for "checkpoint <vbucket-id>" and possible create_new_checkpoint request.
E.g.:
[rebalance:debug,2012-11-14T11:10:17.702,ns_1@10.2.1.58:<0.17937.1>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 511
[ns_server:info,2012-11-14T11:10:17.899,ns_1@10.2.1.58:ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:17.698884 PST 3: TAP (Consumer) eq_tapq:anon_1537 - Reset vbucket 511 was completed succecssfully.
[ns_server:debug,2012-11-14T11:10:17.951,ns_1@10.2.1.58:janitor_agent-default<0.2070.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.17937.1> (ok)
[ns_server:info,2012-11-14T11:10:18.151,ns_1@10.2.1.58:ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:17.951680 PST 3: Notified the completion of checkpoint persistence for vbucket 511, cookie 0x13fef080
[ns_server:info,2012-11-14T11:10:18.656,ns_1@10.2.1.58:ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:18.455993 PST 3: Schedule cleanup of "eq_tapq:anon_1536"
[rebalance:debug,2012-11-14T11:10:19.365,ns_1@10.2.1.58:<0.17944.1>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 511
[ns_server:debug,2012-11-14T11:10:19.366,ns_1@10.2.1.58:janitor_agent-default<0.2070.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.17944.1> (ok)
We see that we actually get reply quite quickly from ep-engine in both invokations. Second is especially quick as it's a NOP in this case. But there's some weird 1 second delay in between those two calls. Only significant activity in between there only stats request for "checkpoint <vbucket-id>" and possible create_new_checkpoint request.
Show
Aleksey Kondratenko
added a comment - Actually there is some something interesting and non-flusher related in in timestamps from latest diags (without ongoing mutations).
E.g.:
[rebalance:debug,2012-11-14T11:10:17.702, ns_1@10.2.1.58 :<0.17937.1>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 511
[ns_server:info,2012-11-14T11:10:17.899, ns_1@10.2.1.58 :ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:17.698884 PST 3: TAP (Consumer) eq_tapq:anon_1537 - Reset vbucket 511 was completed succecssfully.
[ns_server:debug,2012-11-14T11:10:17.951, ns_1@10.2.1.58 :janitor_agent-default<0.2070.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.17937.1> (ok)
[ns_server:info,2012-11-14T11:10:18.151, ns_1@10.2.1.58 :ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:17.951680 PST 3: Notified the completion of checkpoint persistence for vbucket 511, cookie 0x13fef080
[ns_server:info,2012-11-14T11:10:18.656, ns_1@10.2.1.58 :ns_port_memcached<0.2008.0>:ns_port_server:log:171]memcached<0.2008.0>: Wed Nov 14 11:10:18.455993 PST 3: Schedule cleanup of "eq_tapq:anon_1536"
[rebalance:debug,2012-11-14T11:10:19.365, ns_1@10.2.1.58 :<0.17944.1>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 511
[ns_server:debug,2012-11-14T11:10:19.366, ns_1@10.2.1.58 :janitor_agent-default<0.2070.0>:janitor_agent:handle_info:682]Got done message from subprocess: <0.17944.1> (ok)
We see that we actually get reply quite quickly from ep-engine in both invokations. Second is especially quick as it's a NOP in this case. But there's some weird 1 second delay in between those two calls. Only significant activity in between there only stats request for "checkpoint <vbucket-id>" and possible create_new_checkpoint request.
Show
Ronnie Sun
added a comment - And test with index_waiting disabled took 2580 seconds.
Hide
Chiyoung Seo
added a comment -
I think this is a design defect on the consistence view. I don't want to adapt the ep-engine flusher anymore at this time. I think we really should revisit the overall design of consistent view.
For a safer and more reliable workaround, we should disable the consistence view during rebalance if there are no views defined.
For a safer and more reliable workaround, we should disable the consistence view during rebalance if there are no views defined.
Show
Chiyoung Seo
added a comment - I think this is a design defect on the consistence view. I don't want to adapt the ep-engine flusher anymore at this time. I think we really should revisit the overall design of consistent view.
For a safer and more reliable workaround, we should disable the consistence view during rebalance if there are no views defined.
Hide
Ronnie Sun
added a comment -
Repeated the test (physical cluster), took ~840 seconds,
cbcollect info stats attached. (including diags)
Starting rebalance, KeepNodes = ['ns_1@10.2.1.63','ns_1@10.2.1.58'], EjectNodes = ['ns_1@10.2.1.61']
cbcollect info stats attached. (including diags)
Starting rebalance, KeepNodes = ['ns_1@10.2.1.63','ns_1@10.2.1.58'], EjectNodes = ['ns_1@10.2.1.61']
Show
Ronnie Sun
added a comment - Repeated the test (physical cluster), took ~840 seconds,
cbcollect info stats attached. (including diags)
Starting rebalance, KeepNodes = [' ns_1@10.2.1.63 ',' ns_1@10.2.1.58 '], EjectNodes = [' ns_1@10.2.1.61 ']
Hide
Aleksey Kondratenko
added a comment -
Here's related messages about vbucket 37:
[rebalance:debug,2012-11-12T12:52:58.562,ns_1@192.168.0.21:<0.7340.0>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 37
[ns_server:info,2012-11-12T12:52:58.572,ns_1@192.168.0.21:ns_port_memcached<0.4538.0>:ns_port_server:log:171]memcached<0.4538.0>: Mon Nov 12 12:52:58.371396 PST 3: TAP (Consumer) eq_tapq:anon_137 - disconnected
memcached<0.4538.0>: Mon Nov 12 12:52:58.453295 PST 3: TAP (Consumer) eq_tapq:anon_138 - Reset vbucket 37 was completed succecssfully.
memcached<0.4538.0>: Mon Nov 12 12:52:59.092677 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c62000
I.e. we can see our initial waiting for checkpoint on vbucket 37 completes in about 1 second. That's were we expect bulk of items to get persisted. After that we create another checkpoint on source and wait for it's persistence. It's needed for views consistency. In this case even though we don't have any views we still do that second checkpoint. But we always expect that second checkpoint to be persisted very quickly because it'll have only few items to persist.
Here's relevant messages:
[rebalance:debug,2012-11-12T12:53:00.225,ns_1@192.168.0.21:<0.7363.0>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 3 in vbucket 37
(hm, we see about 1 seconds delay from ep-engine's log message to ns_server proceeding to next step, which is potential problem in ns_server or memcached)
[ns_server:info,2012-11-12T12:53:01.296,ns_1@192.168.0.21:ns_port_memcached<0.4538.0>:ns_port_server:log:171]memcached<0.4538.0>: Mon Nov 12 12:53:01.096198 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie
We see another 1 second delay. Ronnie mentioned (not on ticket as usual) that load is about 9k ops per second which per vbucket 37 is about 9 items per second. So that's at most few tens of items that we needed to persist for that second "delta" checkpoint.
[rebalance:debug,2012-11-12T12:52:58.562,ns_1@192.168.0.21:<0.7340.0>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 37
[ns_server:info,2012-11-12T12:52:58.572,ns_1@192.168.0.21:ns_port_memcached<0.4538.0>:ns_port_server:log:171]memcached<0.4538.0>: Mon Nov 12 12:52:58.371396 PST 3: TAP (Consumer) eq_tapq:anon_137 - disconnected
memcached<0.4538.0>: Mon Nov 12 12:52:58.453295 PST 3: TAP (Consumer) eq_tapq:anon_138 - Reset vbucket 37 was completed succecssfully.
memcached<0.4538.0>: Mon Nov 12 12:52:59.092677 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c62000
I.e. we can see our initial waiting for checkpoint on vbucket 37 completes in about 1 second. That's were we expect bulk of items to get persisted. After that we create another checkpoint on source and wait for it's persistence. It's needed for views consistency. In this case even though we don't have any views we still do that second checkpoint. But we always expect that second checkpoint to be persisted very quickly because it'll have only few items to persist.
Here's relevant messages:
[rebalance:debug,2012-11-12T12:53:00.225,ns_1@192.168.0.21:<0.7363.0>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 3 in vbucket 37
(hm, we see about 1 seconds delay from ep-engine's log message to ns_server proceeding to next step, which is potential problem in ns_server or memcached)
[ns_server:info,2012-11-12T12:53:01.296,ns_1@192.168.0.21:ns_port_memcached<0.4538.0>:ns_port_server:log:171]memcached<0.4538.0>: Mon Nov 12 12:53:01.096198 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie
We see another 1 second delay. Ronnie mentioned (not on ticket as usual) that load is about 9k ops per second which per vbucket 37 is about 9 items per second. So that's at most few tens of items that we needed to persist for that second "delta" checkpoint.
Show
Aleksey Kondratenko
added a comment - Here's related messages about vbucket 37:
[rebalance:debug,2012-11-12T12:52:58.562, ns_1@192.168.0.21 :<0.7340.0>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 2 in vbucket 37
[ns_server:info,2012-11-12T12:52:58.572, ns_1@192.168.0.21 :ns_port_memcached<0.4538.0>:ns_port_server:log:171]memcached<0.4538.0>: Mon Nov 12 12:52:58.371396 PST 3: TAP (Consumer) eq_tapq:anon_137 - disconnected
memcached<0.4538.0>: Mon Nov 12 12:52:58.453295 PST 3: TAP (Consumer) eq_tapq:anon_138 - Reset vbucket 37 was completed succecssfully.
memcached<0.4538.0>: Mon Nov 12 12:52:59.092677 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c62000
I.e. we can see our initial waiting for checkpoint on vbucket 37 completes in about 1 second. That's were we expect bulk of items to get persisted. After that we create another checkpoint on source and wait for it's persistence. It's needed for views consistency. In this case even though we don't have any views we still do that second checkpoint. But we always expect that second checkpoint to be persisted very quickly because it'll have only few items to persist.
Here's relevant messages:
[rebalance:debug,2012-11-12T12:53:00.225, ns_1@192.168.0.21 :<0.7363.0>:janitor_agent:handle_call:651]Going to wait for persistence of checkpoint 3 in vbucket 37
(hm, we see about 1 seconds delay from ep-engine's log message to ns_server proceeding to next step, which is potential problem in ns_server or memcached)
[ns_server:info,2012-11-12T12:53:01.296, ns_1@192.168.0.21 :ns_port_memcached<0.4538.0>:ns_port_server:log:171]memcached<0.4538.0>: Mon Nov 12 12:53:01.096198 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie
We see another 1 second delay. Ronnie mentioned (not on ticket as usual) that load is about 9k ops per second which per vbucket 37 is about 9 items per second. So that's at most few tens of items that we needed to persist for that second "delta" checkpoint.
Hide
Aleksey Kondratenko
added a comment -
Ronnie just posted on some internal forum that without mutations we're still 4x slower. I'm still waiting newer diags.
I also lack information about environment. Looks like this is run on vmware, but any details about this environment are missing
I also lack information about environment. Looks like this is run on vmware, but any details about this environment are missing
Show
Aleksey Kondratenko
added a comment - Ronnie just posted on some internal forum that without mutations we're still 4x slower. I'm still waiting newer diags.
I also lack information about environment. Looks like this is run on vmware, but any details about this environment are missing
Hide
Aleksey Kondratenko
added a comment -
As part of meeting understanding views and rebalance we decided Ronnie will run same rebalance but with no mutations and same rebalance but with rebalance_index_waiting_disabled set to false. To see if that 10x is due to need to persist deltas from ongoing mutations
Show
Aleksey Kondratenko
added a comment - As part of meeting understanding views and rebalance we decided Ronnie will run same rebalance but with no mutations and same rebalance but with rebalance_index_waiting_disabled set to false. To see if that 10x is due to need to persist deltas from ongoing mutations
Hide
Chiyoung Seo
added a comment -
Reassigned it to me as I was told that it's an expected behavior.
Show
Chiyoung Seo
added a comment - Reassigned it to me as I was told that it's an expected behavior.
Show
Ronnie Sun
added a comment - collect info for chiyoung's comments.
Hide
Chiyoung Seo
added a comment -
An example of such duplicate checkpoint_persistence requests is for vbucket 37 on 10.2.2.168:
memcached<0.4538.0>: Mon Nov 12 12:52:59.092677 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c62000
...
memcached<0.4538.0>: Mon Nov 12 12:53:01.096198 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c63600
memcached<0.4538.0>: Mon Nov 12 12:52:59.092677 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c62000
...
memcached<0.4538.0>: Mon Nov 12 12:53:01.096198 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c63600
Show
Chiyoung Seo
added a comment - An example of such duplicate checkpoint_persistence requests is for vbucket 37 on 10.2.2.168:
memcached<0.4538.0>: Mon Nov 12 12:52:59.092677 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c62000
...
memcached<0.4538.0>: Mon Nov 12 12:53:01.096198 PST 3: Notified the completion of checkpoint persistence for vbucket 37, cookie 0x5c63600
Hide
Chiyoung Seo
added a comment -
Alk,
I have a quick question on checkpoint persistence. I saw that ep-engine sometimes receives two checkpoint_persistence commands for the same vbucket even though it doesn't send any timeout tmp_fail response to the ebucketmigrator.
For example, this is the timings from the node that was newly added for swap rebalance in two node cluster:
cbstats 10.2.2.168:11210 timings
chk_persistence_cmd (1354 total)
32ms - 65ms : ( 0.22%) 3
65ms - 131ms : ( 0.59%) 5
131ms - 262ms : ( 1.92%) 18 #
262ms - 524ms : ( 14.03%) 164 ##############
524ms - 1s : ( 35.08%) 285 #########################
1s - 2s : ( 50.89%) 214 ###################
2s - 4s : ( 71.34%) 277 #########################
4s - 8s : ( 88.04%) 226 ####################
8s - 16s : ( 97.71%) 131 ###########
16s - 33s : (100.00%) 31 ##
There were no timeout responses to the ebucketmigrator. In this case, I expect that there would be 1024 checkpoint_persistence commands (512 for active and 512 for replica vbuckets).
Please let me know if this is still fine and reassign it back to me.
I have a quick question on checkpoint persistence. I saw that ep-engine sometimes receives two checkpoint_persistence commands for the same vbucket even though it doesn't send any timeout tmp_fail response to the ebucketmigrator.
For example, this is the timings from the node that was newly added for swap rebalance in two node cluster:
cbstats 10.2.2.168:11210 timings
chk_persistence_cmd (1354 total)
32ms - 65ms : ( 0.22%) 3
65ms - 131ms : ( 0.59%) 5
131ms - 262ms : ( 1.92%) 18 #
262ms - 524ms : ( 14.03%) 164 ##############
524ms - 1s : ( 35.08%) 285 #########################
1s - 2s : ( 50.89%) 214 ###################
2s - 4s : ( 71.34%) 277 #########################
4s - 8s : ( 88.04%) 226 ####################
8s - 16s : ( 97.71%) 131 ###########
16s - 33s : (100.00%) 31 ##
There were no timeout responses to the ebucketmigrator. In this case, I expect that there would be 1024 checkpoint_persistence commands (512 for active and 512 for replica vbuckets).
Please let me know if this is still fine and reassign it back to me.
Show
Chiyoung Seo
added a comment - Alk,
I have a quick question on checkpoint persistence. I saw that ep-engine sometimes receives two checkpoint_persistence commands for the same vbucket even though it doesn't send any timeout tmp_fail response to the ebucketmigrator.
For example, this is the timings from the node that was newly added for swap rebalance in two node cluster:
cbstats 10.2.2.168:11210 timings
chk_persistence_cmd (1354 total)
32ms - 65ms : ( 0.22%) 3
65ms - 131ms : ( 0.59%) 5
131ms - 262ms : ( 1.92%) 18 #
262ms - 524ms : ( 14.03%) 164 ##############
524ms - 1s : ( 35.08%) 285 #########################
1s - 2s : ( 50.89%) 214 ###################
2s - 4s : ( 71.34%) 277 #########################
4s - 8s : ( 88.04%) 226 ####################
8s - 16s : ( 97.71%) 131 ###########
16s - 33s : (100.00%) 31 ##
There were no timeout responses to the ebucketmigrator. In this case, I expect that there would be 1024 checkpoint_persistence commands (512 for active and 512 for replica vbuckets).
Please let me know if this is still fine and reassign it back to me.
Hide
Chiyoung Seo
added a comment -
The followings are the timings of vbucket checkpoint persistence from the node that is newly added:
chk_persistence_cmd (1336 total)
16ms - 32ms : ( 0.07%) 1
32ms - 65ms : ( 0.15%) 1
65ms - 131ms : ( 0.37%) 3
131ms - 262ms : ( 0.97%) 8
262ms - 524ms : ( 13.70%) 170 #################
524ms - 1s : ( 34.66%) 280 #############################
1s - 2s : ( 53.22%) 248 ##########################
2s - 4s : ( 68.49%) 204 #####################
4s - 8s : ( 78.89%) 139 ##############
8s - 16s : ( 95.66%) 224 #######################
16s - 33s : (100.00%) 58 ######
From the above timings, we can easily see they took more than 50minutes overall.
chk_persistence_cmd (1336 total)
16ms - 32ms : ( 0.07%) 1
32ms - 65ms : ( 0.15%) 1
65ms - 131ms : ( 0.37%) 3
131ms - 262ms : ( 0.97%) 8
262ms - 524ms : ( 13.70%) 170 #################
524ms - 1s : ( 34.66%) 280 #############################
1s - 2s : ( 53.22%) 248 ##########################
2s - 4s : ( 68.49%) 204 #####################
4s - 8s : ( 78.89%) 139 ##############
8s - 16s : ( 95.66%) 224 #######################
16s - 33s : (100.00%) 58 ######
From the above timings, we can easily see they took more than 50minutes overall.
Show
Chiyoung Seo
added a comment - The followings are the timings of vbucket checkpoint persistence from the node that is newly added:
chk_persistence_cmd (1336 total)
16ms - 32ms : ( 0.07%) 1
32ms - 65ms : ( 0.15%) 1
65ms - 131ms : ( 0.37%) 3
131ms - 262ms : ( 0.97%) 8
262ms - 524ms : ( 13.70%) 170 #################
524ms - 1s : ( 34.66%) 280 #############################
1s - 2s : ( 53.22%) 248 ##########################
2s - 4s : ( 68.49%) 204 #####################
4s - 8s : ( 78.89%) 139 ##############
8s - 16s : ( 95.66%) 224 #######################
16s - 33s : (100.00%) 58 ######
From the above timings, we can easily see they took more than 50minutes overall.
Show
Ronnie Sun
added a comment - btw: info.zip is from 10.2.2.168
Hide
Ronnie Sun
added a comment -
chiyoung,
I repeated the test on vms, this time reb took 1 hr.
Left the cluster for you, 10.2.2.168.
Note the ip addresses shown on the UI are internal ips.
Here are the maps fyi:
10.2.2.167 : 192.168.0.20 (reb out)
10.2.2.168 : 192168.0.21 (new node)
10.2.2.169 : 192.168.0.22
I repeated the test on vms, this time reb took 1 hr.
Left the cluster for you, 10.2.2.168.
Note the ip addresses shown on the UI are internal ips.
Here are the maps fyi:
10.2.2.167 : 192.168.0.20 (reb out)
10.2.2.168 : 192168.0.21 (new node)
10.2.2.169 : 192.168.0.22
Show
Ronnie Sun
added a comment - chiyoung,
I repeated the test on vms, this time reb took 1 hr.
Left the cluster for you, 10.2.2.168.
Note the ip addresses shown on the UI are internal ips.
Here are the maps fyi:
10.2.2.167 : 192.168.0.20 (reb out)
10.2.2.168 : 192168.0.21 (new node)
10.2.2.169 : 192.168.0.22
Hide
Chiyoung Seo
added a comment -
Ronnie,
I need cb_collect_info from the nodes that are newly added.
I need cb_collect_info from the nodes that are newly added.
Show
Chiyoung Seo
added a comment - Ronnie,
I need cb_collect_info from the nodes that are newly added.