[MB-7554] Rebalance fails with "bad match wait_backfill_determination" error on a very small load Created: 17/Jan/13 Updated: 04/Feb/13 Resolved: 22/Jan/13 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Ketaki Gangal | Assignee: | Ketaki Gangal |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.1-125 | ||
| Attachments: |
|
| Description |
|
Load 1M items on a 4 node cluster.
Rebalance in 2 nodes. Rebalance and Compaction start in parallel. Rebalance is very slow in initial few minutes, catches up, but fails with a timeout exit, The load/cluster is a very basic configiuration. This is a working on 2.0 ** Reason for termination == ** {unexpected_exit, {'EXIT',<0.896.2>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<20117.4759.0>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}} [error_logger:error,2013-01-17T23:11:39.476,ns_1@10.176.169.6:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ns_vbucket_mover:init/1 pid: <0.24152.1> registered_name: [] exception exit: {unexpected_exit, {'EXIT',<0.896.2>, {{badmatch, [{'EXIT', {timeout, {gen_server,call, [<20117.4759.0>,had_backfill,30000]}}}]}, [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}} in function gen_server:terminate/6 ancestors: [<0.13807.1>] messages: [{backfill_done, {'ns_1@10.176.169.6',1019, ['ns_1@10.176.169.6','ns_1@10.169.54.218'], ['ns_1@10.176.155.132','ns_1@10.168.94.60']}}, {move_done_new_style, {'ns_1@10.176.169.6',1019, ['ns_1@10.176.169.6','ns_1@10.169.54.218'], ['ns_1@10.176.155.132','ns_1@10.168.94.60']}}, {'EXIT',<0.6884.2>,normal}, {backfill_done, {'ns_1@10.169.54.218',678, ['ns_1@10.169.54.218','ns_1@10.168.173.242'], ['ns_1@10.168.94.60','ns_1@10.176.155.132']}}, {move_done_new_style, {'ns_1@10.169.54.218',678, ['ns_1@10.169.54.218','ns_1@10.168.173.242'], ['ns_1@10.168.94.60','ns_1@10.176.155.132']}}, {'EXIT',<0.7095.2>,normal}] links: [<0.13807.1>,<0.24159.1>,<0.57.0>] dictionary: [{bucket_name,"default"}, {i_am_master_mover,true}, {child_processes,[<0.7095.2>,<0.6884.2>,<0.6760.2>, <0.3866.2>,<0.3858.2>,<0.862.2>,<0.855.2>, <0.852.2>,<0.787.2>,<0.26181.1>, <0.26131.1>,<0.26086.1>,<0.24174.1>, <0.24173.1>]}] trap_exit: true status: running heap_size: 28657 stack_size: 24 reductions: 1198089 Logs at |
| Comments |
| Comment by Aliaksey Artamonau [ 17/Jan/13 ] |
| I need diags from other nodes. From 'ns_1@10.176.155.132' in particular. |
| Comment by Ketaki Gangal [ 17/Jan/13 ] |
|
The cluster is no longer around.
Do we have an idea of what is causing these timeouts based off these limited logs? |
| Comment by Aliaksey Artamonau [ 21/Jan/13 ] |
| No, unfortunately there's not enough information there. |
| Comment by Farshid Ghods [ 22/Jan/13 ] |
| please reopen if this case occurs again |