[MB-7238] ns_server is still validating ip address in ip file even if erlang already has node name defined (was: 2.0 Build 1941: Couchbase Server does not start after a change in IP, server is looking for the old IP even after the hostname resolves to the new one.) Created: 21/Nov/12  Updated: 29/Nov/12  Resolved: 29/Nov/12

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0-beta-2, 2.0
Fix Version/s: 2.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: balak Assignee: Aliaksey Artamonau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive cbcollectinfo_10_1_3_222.zip    

 Description   

Couchbase Server does not start after a change in IP, server is looking for the old IP even after the hostname resolves to the new one. Followed the best-practices information to configure the hostname in the couchbase-server file and this issue is reproducible in the 1941 2.0 build.

Error messages from the log:
[ns_server:info,2012-11-16T13:16:45.502,ns_1@FQDN:dist_manager<0.2732.0>:dist_manager:read_address_config:55]Reading ip config from "/opt/couchbase/var/lib/couchbase/ip"
[ns_server:warn,2012-11-16T13:16:45.522,ns_1@FQDN:dist_manager<0.2732.0>:dist_manager:is_good_address:81]Cannot listen on address `OLD IP`: eaddrnotavail

The logs are available in the link below:
https://s3.amazonaws.com/customers.couchbase.com/jawfishgames/couch14-build-1914.zip

update:

Apparently as part of process of setting up node name folks just left original /opt/couchbase/var/lib/couchbase/ip. And ns_server's bug is due to attempt to validate that address even though it won't be actually used.

 Comments   
Comment by Steve Yen [ 21/Nov/12 ]
abhinav now attempting to reproduce
Comment by Steve Yen [ 21/Nov/12 ]
please also get DNS diagnostic info... like ping (ask alk).

it could be as simple as DNS propagation delay.
Comment by Aleksey Kondratenko [ 21/Nov/12 ]
From error message it appears that DNS resolver still thinks old ip is assigned to this hostname.

So in order to help diagnosing this I need both cbcollect_info (or just output of ifconfig -a) and some information about in what ip this hostname is resolved. Simple way is by pinging hostname and sending me output
Comment by Abhinav Dangeti [ 21/Nov/12 ]
- So started with 10.1.3.235, 10.1.3.236 (build 1954)
- set host ip's on /etc/hosts
- stopped couchbase-server on 10.1.3.236
- changed ip of 10.1.3.236 to 10.1.3.222
- resolved /etc/hosts to the new ip
- started couchbase-server back up on 10.1.3.222, server never comes back up.

[ Servers available as is ]

10.1.3.222>>
ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:50:56:97:02:D2
          inet addr:10.1.3.222 Bcast:10.255.255.255 Mask:255.0.0.0
          inet6 addr: fe80::250:56ff:fe97:2d2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:109556236 errors:0 dropped:0 overruns:0 frame:0
          TX packets:108558164 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:83579290130 (77.8 GiB) TX bytes:86321632329 (80.3 GiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:72816990 errors:0 dropped:0 overruns:0 frame:0
          TX packets:72816990 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:120374350544 (112.1 GiB) TX bytes:120374350544 (112.1 GiB)

sit0 Link encap:IPv6-in-IPv4
          NOARP MTU:1480 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

<<Attaching cbcollectinfo_10_1_3_222.zip>>
Comment by Aleksey Kondratenko [ 21/Nov/12 ]
a) cannot access .222.

b) don't have ping output that I need in order to understand more
Comment by Abhinav Dangeti [ 21/Nov/12 ]
If we use ifconfig eth0 10.1.3.222, to change the IP, we see the issue.
However the issue doesn't occur when the IP is changed this way:
vim /etc/sysconfig/network-scripts/ifcfg-eth0

(and comment out the BOOTPROTO=dhcp and set it to static)
..
# Intel Corporation 82545EM Gigabit Ethernet Controller (Copper)
DEVICE=eth0
#BOOTPROTO=dhcp
BOOTPROTO=static
ONBOOT=yes
IPADDR=10.1.3.222
GATEWAY=10.1.0.1
NETMASK=255.255.0.0
...
sudo /etc/init.d/network restart

The reason why this worked was because /opt/couchbase/var/lib/couchbase/ip was empty.
Comment by Aleksey Kondratenko [ 21/Nov/12 ]
Looks like /opt/couchbase/var/lib/.../ip is still being used somehow. I recommend manually deleting it. It's still ns_server's bug if we try do anything about it when hostname is specificied
Comment by Steve Yen [ 26/Nov/12 ]
The ip file was/is being used by cbupgrade situation. Please see...

  http://www.couchbase.com/issues/browse/MB-7241
Comment by Aleksey Kondratenko [ 28/Nov/12 ]
Approved for 2.0. Be careful with using right branch
Comment by Steve Yen [ 29/Nov/12 ]
i think this fix was merged? -- http://review.couchbase.org/#/c/22895/
Generated at Sat Apr 19 21:14:39 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.