[MB-8321] [Doc'd] windows32: Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}} Created: 20/May/13  Updated: 17/Sep/13  Resolved: 17/Sep/13

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.1.0
Fix Version/s: 2.1.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Duplicate Votes: 0
Labels: info-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows32, 2.0.2-807

Issue Links:
Duplicate
is duplicated by MB-7902 [windows] Rebalance exited with reaso... Reopened
Operating System: Windows 32-bit

 Description   
http://qa.hq.northscale.net/job/windows32_rebalance-kv/11/consoleFull

./testrunner -i /tmp/4-w-32.ini get-cbcollect-info=True -t rebalancetests.IncrementalRebalanceOut.test_load,replica=1,do-stop=True

2013-05-19 06:24:15,085] - [rest_client:925] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.2.179%2Cns_1%4010.3.2.185%2Cns_1%4010.3.2.178%2Cns_1%4010.3.2.183
[2013-05-19 06:24:15,296] - [rest_client:929] INFO - rebalance operation started
[2013-05-19 06:24:15,334] - [rest_client:1031] INFO - rebalance percentage : 0 %
[2013-05-19 06:24:17,376] - [rest_client:1031] INFO - rebalance percentage : 0.0 %
[2013-05-19 06:24:19,438] - [rest_client:1031] INFO - rebalance percentage : 0.703125 %
[2013-05-19 06:24:21,473] - [rest_client:1031] INFO - rebalance percentage : 1.640625 %
[2013-05-19 06:24:23,499] - [rest_client:1031] INFO - rebalance percentage : 2.6953125 %
[2013-05-19 06:24:25,523] - [rest_client:1031] INFO - rebalance percentage : 3.6328125 %
[2013-05-19 06:24:26,544] - [rest_client:1031] INFO - rebalance percentage : 4.21875 %
[2013-05-19 06:24:27,572] - [rest_client:1031] INFO - rebalance percentage : 4.8046875 %
[2013-05-19 06:24:28,601] - [rest_client:1031] INFO - rebalance percentage : 5.5078125 %
[2013-05-19 06:24:30,639] - [rest_client:1031] INFO - rebalance percentage : 6.4453125 %
[2013-05-19 06:24:31,663] - [rest_client:1014] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
[2013-05-19 06:24:31,663] - [rest_client:1015] INFO - Latest logs from UI:
[2013-05-19 06:24:31,764] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 1, u'text': u'Bucket "bucket-0" loaded on node \'ns_1@10.3.2.179\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1368969856238, u'type': u'info'}
[2013-05-19 06:24:31,764] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 4, u'text': u"Node 'ns_1@10.3.2.179' saw that node 'ns_1@10.3.2.185' came up. Tags: []", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1368969855691, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 3, u'text': u'Node ns_1@10.3.2.179 joined cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1368969854738, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 4, u'text': u"Node 'ns_1@10.3.2.179' saw that node 'ns_1@10.3.2.183' came up. Tags: []", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1368969854738, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 0, u'text': u"Port server moxi on node 'babysitter_of_ns_1@127.0.0.1' exited with status 0. Restarting. Messages: WARNING: curl error: transfer closed with outstanding read data remaining from: http://127.0.0.1:8091/pools/default/saslBucketsStreaming\nEOL on stdin. Exiting", u'shortText': u'message', u'module': u'ns_log', u'tstamp': 1368969854628, u'type': u'info'}
[2013-05-19 06:24:31,765] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.179', u'code': 1, u'text': u"Couchbase Server has started on web port 8091 on node 'ns_1@10.3.2.179'.", u'shortText': u'web start ok', u'module': u'menelaus_sup', u'tstamp': 1368969854597, u'type': u'info'}
[2013-05-19 06:24:31,766] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.178', u'code': 2, u'text': u'Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}}\n', u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1368969854354, u'type': u'info'}
[2013-05-19 06:24:31,766] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.178', u'code': 0, u'text': u'<0.26161.33> exited with {mover_failed,{badmatch,{error,eaddrinuse}}}', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1368969854339, u'type': u'critical'}
[2013-05-19 06:24:31,767] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.185', u'code': 1, u'text': u'Bucket "bucket-0" loaded on node \'ns_1@10.3.2.185\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1368969850886, u'type': u'info'}
[2013-05-19 06:24:31,767] - [rest_client:1016] ERROR - {u'node': u'ns_1@10.3.2.185', u'code': 3, u'text': u'Node ns_1@10.3.2.185 joined cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1368969850479, u'type': u'info'}


 Comments   
Comment by Andrei Baranouski [ 20/May/13 ]
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.178-5192013-624-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.179-5192013-629-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.183-5192013-628-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-8321/3afd5ca7/10.3.2.185-5192013-630-diag.zip

Comment by Aleksey Kondratenko [ 20/May/13 ]
Make sure fix from here: http://support.microsoft.com/kb/196271 is applied (and apparently it requires windows reboot)
Comment by Maria McDuff [ 20/May/13 ]
andrei, pls re-test after alk k's fix is confirmed to be there.


FYI -- for specific windows, need to document that this windows patch needs to be applied.
Comment by kzeller [ 21/May/13 ]
Andrei: Please provide some notes here:

1) Confirm this is correct fix
2) When does fix need to be applied on machine (after fresh install, or after shutdown....)
3) Anything else?


Thanks

Karen
Comment by Deepkaran Salooja [ 27/May/13 ]
Tested after applying fix:
http://support.microsoft.com/kb/196271

The test passes(build 2.0.2-812-rel):
http://qa.hq.northscale.net/job/windows32_rebalance-kv/14/
Comment by kzeller [ 28/May/13 ]
Added to Release Notes 2.1.0:

<rnentry type="knownissue">

<version ver="2.0.0m"/>

<class id="install"/>

<issue type="cb" ref="MB-8321"/>


<rntext>

<para>
For Couchbase Server 32-bit version on Windows, you must apply the third party patch available from Microsoft at
<ulink url="http://support.microsoft.com/kb/196271">Microsoft Support</ulink>. Either perform this right after initial install, or stop your server, apply the patch
and restart your machine. If you do not have this patch rebalance will fail and
you will see the error <literal>Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}}</literal>
</para>


</rntext>

</rnentry>

Added to Windows 2.1.0 README:

- For Couchbase Server 32-bit version on Windows, you must apply the third party patch available from Microsoft at "http://support.microsoft.com/kb/196271". Either perform this right after initial install, or stop your server, apply the patch and restart your machine then restart the server. If you do not have this patch rebalance will fail and you will see the error <literal>Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}}.
Comment by Aleksey Kondratenko [ 28/May/13 ]
This is not specific to 32-bit windows.

And there's patch in gerrit to automatically apply kb as part of windows installer.
Comment by kzeller [ 28/May/13 ]
<sarcasm>I wish that were clear from the ticket title and body</sarcasm>.....

Have we really tested this fix on 32 and 64 bit Windows?
Comment by kzeller [ 28/May/13 ]
Removed from 2.1.0 README. Changed in RN 2.1.0:

<rnentry type="fix">

<version ver="2.0.0m"/>

<class id="install"/>

<issue type="cb" ref="MB-8321"/>


<rntext>

<para>
For Couchbase Server on Windows, we now include a third-party patch available from Microsoft at
<ulink url="http://support.microsoft.com/kb/196271">Microsoft Support</ulink>. In earlier releases, without this patch rebalance failed and
produced the error <literal>Rebalance exited with reason {mover_failed,{badmatch,{error,eaddrinuse}}}</literal>
</para>


</rntext>

</rnentry>
Comment by Aleksey Kondratenko [ 28/May/13 ]
This is not quite patch and this is not yet merged.

Siri, may I ask you to handle this with Karen. I.e. w.r.t. polishing release notes and making sure it's correct ?
Comment by Aleksey Kondratenko [ 28/May/13 ]
Also maybe release notes should link to some other ticket with better context or description of what really happens.
Comment by Sriram Melkote [ 29/May/13 ]
Karen, the fix is being tracked on bug MB-7902 and should get merged shortly. Let us use this text for the note --

The Couchbase Server installers (Windows 32-bit and 64-bit) now prompt the user to set the MaxUserPort registry setting if they detects that the value has not been set or is too low. This is to increase the number of ephemeral ports available to applications on Windows, as documented in <ulink url="http://support.microsoft.com/kb/196271">Microsoft Knowledge Base Article 196271</ulink>. The installer also warns the user that a reboot is necessary for this change to take effect. If this registry key is not set, it may lead to port starvation leading to various problems, such as MB-8321 and MB-7902.
Comment by kzeller [ 29/May/13 ]
Ok Updated to:

The Couchbase Server installer for Windows 32-bit and 64-bit now prompts you to set the MaxUserPort registry setting
if it detects this value has not been set or is too low. This will increase the number of ephemeral ports available
to applications on Windows, as documented in <ulink url="http://support.microsoft.com/kb/196271">Microsoft Knowledge Base Article 196271</ulink>.
The installer also warns the user that a reboot is necessary for this change to take effect.
If this registry key is not set, it may lead to port exhaustion leading to various problems, see as MB-8321.
Comment by Andrei Baranouski [ 03/Jun/13 ]
closed according to above Karen's comments
Comment by Sriram Melkote [ 03/Jun/13 ]
Linking to MB-7902 which isn't quite resolved yet. If we close this bug independently of MB-7902, then it should be because setting MaxUserPorts resolves this issue independently
Comment by Andrei Baranouski [ 03/Jun/13 ]
well, let's leave it open status

but it should be noted that all runs/tests passed after applying the patch on the vms for http://qa.hq.northscale.net/job/windows32_rebalance-kv:

Deepkaran Salooja added a comment - 27/May/13 3:33 PM
Tested after applying fix:
http://support.microsoft.com/kb/196271

The test passes(build 2.0.2-812-rel):
http://qa.hq.northscale.net/job/windows32_rebalance-kv/14/
Comment by kzeller [ 21/Jun/13 ]
[ADDED to Doc and RN]

Sent to Tony:

Hi Tony,

There is a new prompt in the Windows installer so you do not run into port exhaustion. Please send me a screen shot of this new prompt.


Thanks,

Karen
Comment by kzeller [ 25/Jun/13 ]
Removing documentation label for 2.1.0. Save ticket for Eng/QA.

Doc'd on 2.1.0 RN:

The Windows installer for Windows 32-bit and 64-bit now prompts you to set the MaxUserPort registry setting. This will increase the number of ephemeral ports available to applications on Windows, as documented in Microsoft Knowledge Base Article 196271. The installer also warns you that a reboot is necessary for this change to take effect. If this registry key is not set, it may lead to port exhaustion leading to various problems, see as MB-8321. For installer instructions, see Section 2.2.3, “Microsoft Windows Installation”.

Issues: MB-8321


Main chapter update:

http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-getting-started-install-win.html
Generated at Wed Apr 23 13:58:14 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.