Details
Description
test: newupgradetests.MultiNodesUpgradeTests.offline_cluster_upgrade,initial_version=2.0.0-1978-rel,nodes_init=2,during-ops=failover,upgrade_version=2.0.1-112-rel,initial_vbuckets=64
steps:
1. 2.0.0 release cluster with 2 nodes 10.3.121.112 & 10.3.121.113(2.0.0-1978-rel)
2. failover 10.3.121.112
3. stop 2 nodes and upgrade them on 2.0.1-112
result:
for some reason the node 10.3.121.113 does not start
panels Active Servers and Pending Rebalance reversed or show nonsense( see screenshots)
I tried to play with failover, add back, rebalance, start manually 10.3.121.113, etc. but it did not help
server's logs, tests logs and some screenshots are attached
steps:
1. 2.0.0 release cluster with 2 nodes 10.3.121.112 & 10.3.121.113(2.0.0-1978-rel)
2. failover 10.3.121.112
3. stop 2 nodes and upgrade them on 2.0.1-112
result:
for some reason the node 10.3.121.113 does not start
panels Active Servers and Pending Rebalance reversed or show nonsense( see screenshots)
I tried to play with failover, add back, rebalance, start manually 10.3.121.113, etc. but it did not help
server's logs, tests logs and some screenshots are attached
-
Hide
- 10.3.121.112-1182013-1818-diag.zip
- 18/Jan/13 9:29 AM
- 303 kB
- Andrei Baranouski
-
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/couchbase.log 593 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.xdcr.log 5 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.couchdb.log 1 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/stats.log 45 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ini.log 16 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.error.log 45 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.views.log 5 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.info.log 395 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/diag.log 1.02 MB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.debug.log 634 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ddocs.log 0.2 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/memcached.log 5 kB
- cbcollect_info_ns_1@10.3.121.112_20130118-151809/ns_server.stats.log 57 kB
-
- 10.3.121.112-8091-diag.txt.gz
- 04/Jan/13 4:56 AM
- 2.72 MB
- Andrei Baranouski
-
- 10.3.121.113-8091-diag.txt.gz
- 04/Jan/13 4:56 AM
- 254 kB
- Andrei Baranouski
-
Hide
- 10.3.121.114-1182013-1819-diag.zip
- 18/Jan/13 9:29 AM
- 310 kB
- Andrei Baranouski
-
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/couchbase.log 583 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.xdcr.log 6 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.couchdb.log 1 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/stats.log 4 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ini.log 16 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.error.log 58 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.views.log 2 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.info.log 441 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.xdcr_errors.log 0.2 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.mapreduce_errors.log 0.2 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/diag.log 1.02 MB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.debug.log 859 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ddocs.log 0.2 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/memcached.log 0.3 kB
- cbcollect_info_ns_1@10.3.121.114_20130118-150450/ns_server.stats.log 99 kB
-
- test_logs.txt
- 04/Jan/13 4:56 AM
- 69 kB
- Andrei Baranouski
-
- add_back1.png
- 51 kB
- 04/Jan/13 4:56 AM
-
- add_back2.png
- 52 kB
- 04/Jan/13 4:56 AM
-
- failo0ver_13.png
- 58 kB
- 04/Jan/13 4:56 AM
-
- restart_13_man.png
- 65 kB
- 04/Jan/13 4:56 AM
-
- Screenshot from 2013-01-04 13-21-20.png
- 242 kB
- 04/Jan/13 4:56 AM
-
- Screenshot from 2013-01-04 13-28-54.png
- 52 kB
- 04/Jan/13 4:56 AM
-
- Screenshot from 2013-01-04 13-31-45.png
- 81 kB
- 04/Jan/13 4:56 AM
-
- Screenshot from 2013-01-04 13-39-34.png
- 44 kB
- 04/Jan/13 4:56 AM
-
- step4.png
- 163 kB
- 18/Jan/13 9:29 AM
-
- step5.png
- 102 kB
- 18/Jan/13 9:29 AM
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Show
Aleksey Kondratenko
added a comment - See above
Hide
Aleksey Kondratenko
added a comment -
Indeed there's issue with incorrectly allowing failover in that case. Filed: MB-7493
Show
Aleksey Kondratenko
added a comment - Indeed there's issue with incorrectly allowing failover in that case. Filed: MB-7493
Show
Andrei Baranouski
added a comment - can't reproduce it now
Hide
Andrei Baranouski
added a comment -
steps:
1.cluster with 2 nodes: 10.3.121.112 and 10.3.121.114
2. failover node 10.3.121.114
3. stop 2 nodes
4. start only node 10.3.121.114( step4.png)
5. on UI console of 10.3.121.114 add its back ( step5.png)
new screenshosts and collect_info are attached
1.cluster with 2 nodes: 10.3.121.112 and 10.3.121.114
2. failover node 10.3.121.114
3. stop 2 nodes
4. start only node 10.3.121.114( step4.png)
5. on UI console of 10.3.121.114 add its back ( step5.png)
new screenshosts and collect_info are attached
Show
Andrei Baranouski
added a comment - steps:
1.cluster with 2 nodes: 10.3.121.112 and 10.3.121.114
2. failover node 10.3.121.114
3. stop 2 nodes
4. start only node 10.3.121.114( step4.png)
5. on UI console of 10.3.121.114 add its back ( step5.png)
new screenshosts and collect_info are attached
Hide
Aleksey Kondratenko
added a comment -
Not sure exactly what you expect. Rebalancing requires both nodes to be up.
Show
Aleksey Kondratenko
added a comment - Not sure exactly what you expect. Rebalancing requires both nodes to be up.
Hide
Andrei Baranouski
added a comment -
Alk, after such steps I want to return back node that was failover in step#2.
I know that it should be cleaned but if node 10.3.121.112 disappeared without a trace, we can not revive the second one and the only solution is to reinstall it
I know that it should be cleaned but if node 10.3.121.112 disappeared without a trace, we can not revive the second one and the only solution is to reinstall it
Show
Andrei Baranouski
added a comment - Alk, after such steps I want to return back node that was failover in step#2.
I know that it should be cleaned but if node 10.3.121.112 disappeared without a trace, we can not revive the second one and the only solution is to reinstall it
Show
Andrei Baranouski
added a comment - need confirmation that we will not handle it
[error_logger:error,2013-01-04T2:09:45.459,nonode@nohost:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_server:init/1
pid: <0.217.0>
registered_name: []
exception exit: {undef,[{file2,ensure_dir,["/tmp/.delete/foo"]},
{couch_file,init_delete_dir,1},
{couch_server,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
in function gen_server:init_it/6
ancestors: [couch_primary_services,couch_server_sup,cb_couch_sup,
ns_server_cluster_sup,<0.59.0>]
messages: []
links: [<0.212.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 377
stack_size: 24
reductions: 186
neighbours:
Looks like some missing file because that function actually exists and didn't change between 2.0.0 and 2.0.1
I'd need another reproduction with collect info to see what's going on. I.e. collect_info will give me list of files.
Second problem is that somehow UI allowed you to failover .113 even though it was last remaining active node in cluster. I'll try to rerproduce and will file separate bug.