Unable to Telnet or Access Default Bucket
Hello, I'm running into a problem setting up my second membase cluster on a 64-bit Ubuntu machine. I downloaded the 1.7.1 .deb package from the couchbase site and installed a 2 node cluster successfully on two other machines and it's been running great. But now I'm trying to get a second cluster up (just trying for a stand alone right now) and while it appears to install properly, I can't telnet to the memcached instance or use libmemcached to save any data. I used the default installation options from the GUI. The GUI works fine from anywhere. No errors appear in the GUI log after setup and I don't see any core dumps or errors in the system logs. When I try to telnet from the machine I get:
root@myhost:~# telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.
The connection closes immediately and I can't even type STATS to see what happens. I've checked everything I could find in the forms elsewhere. The ports are listening:
root@myhost:~# netstat -ln | grep 112
tcp 0 0 0.0.0.0:11210 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN
tcp6 0 0 :::11210 :::* LISTEN
tcp6 0 0 :::11211 :::* LISTEN
udp 0 0 0.0.0.0:11210 0.0.0.0:*
udp6 0 0 :::11210 :::*
Permissions look alright as far as I can tell with membase as the user and group for the /opt/membase directories. No firewall running on the box. One thing I noticed was that the GUI's log showed "ns_1@127.0.0.1" as the host but my good cluster shows "ns_1@ipaddress" as the host. So I followed http://www.couchbase.org/wiki/display/membase/Using+Membase+in+the+Cloud and explicitly declared the hostname (and the IP) and while it changed in the GUI, I still couldn't connect.
The only indication of an error I get is constant REST errors:
INFO REPORT <0.371.0> 2011-08-08 17:17:35
===============================================================================
moxi<0.371.0>: 2011-08-08 17:17:36: (agent_config.c.423) ERROR: parse JSON failed, from REST server: http://127.0.0.1:8091/pools/default/saslBucketsStreaming,
moxi<0.371.0>:
moxi<0.371.0>: 404 Not Found
moxi<0.371.0>:
moxi<0.371.0>: Not Found
moxi<0.371.0>:
The requested URL http://127.0.0.1:8091/pools/default/saslBucketsStreaming was not found on this server.
moxi<0.371.0>:
moxi<0.371.0>: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g Server at 127.0.0.1 Port 8091
moxi<0.371.0>:
moxi<0.371.0>:
Any ideas? Thanks!
Thanks for responding Perry, I logged the PIDs for memcached and moxy, then ran my telnet command (closed immediately) and then checked the PIDs again but no change and no core dumps or errors in the kernel log. What else could I check?
Did you install from our provided package or compile yourself?
Straight from the package: membase-server-community_x86_64_1.7.1.deb
Can you try a fresh install without changing the name to a hostname?
Sure thing. I ran a dpkg -r membase-server, then dpkg --purge membase-server, then deleted the leftover /opt/membase directory, then ran the re-install with all the defaults (except changing the bucket size from 15GB to 2GB) and it exhibits the same behavior
Bucket "default" loaded on node 'ns_1@127.0.0.1' in 0 seconds. ns_memcached001 ns_1@127.0.0.1 18:34:12 - Tue Aug 16, 2011
Created bucket "default" of type: membase
menelaus_web012 ns_1@127.0.0.1 18:34:11 - Tue Aug 16, 2011
Membase Server has started on web port 8091 on node 'ns_1@127.0.0.1'. menelaus_sup001 ns_1@127.0.0.1 18:31:57 - Tue Aug 16, 2011
Initial otp cookie generated: xsrdjghdgvasdgmh ns_node_disco003 ns_1@127.0.0.1 18:31:57 - Tue Aug 16, 2011
Okay, can you run "/opt/membase/bin/mbcollect_info " and email the resulting file to perry@couchbase.com?
On it's way, thanks again!
The logs show most things are running properly, but Moxi is having problems communicating with the REST interface.
Do you have any web server installed on this box and/or something listening on port 8091?
Thanks Perry. Nope, no web server on the box and membase is the only thing using 8091. I verified by killing membase and checked netstat, no ports listening on 8091.
Do you have iptables running at all?
Looks like it's running but wide open:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Hey Perry, here's some more info that may help. I went back to my working membase cluster which was running RabbitMQ 2.3.1 without any plugins. The I installed the management plugins for RabbitMQ which includes a Moxi server and restarted the MQ broker. After about an hour, I came back and saw my machine going nuts and membase had crashed. So maybe there's an issue with running two moxis on the same machine?
Nevermind, the MQ broker glitched and ate up a ton of RAM, bumping everything into swap. After fixing the MQ issue I brought up membase again and it's running properly again
So now it's all good for you? The access problem has been fixed?
Well our first cluster is fine, but updating or disabling rabbitmq on the broken cluster did not fix the issue. It still rejects connections immediately.
What do you get when you run:
curl http://127.0.0.1:8091/pools/default/saslBucketsStreaming
From all of the nodes in the cluster that is not working?
Perry
root@host:/home/user# curl http://127.0.0.1:8091/pools/default/saslBucketsStreaming
404 Not Found
Not Found
The requested URL http://127.0.0.1:8091/pools/default/saslBucketsStreaming was not found on this server.
Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g Server at 127.0.0.1 Port 8091
And I presume running the same command on your working cluster actually returns some data?
Can you also try running 'curl http://127.0.0.1:8091/pools'?
Perry
root@host:/home/user# curl http://127.0.0.1:8091/pools
404 Not Found
Not Found
The requested URL http://127.0.0.1:8091/pools was not found on this server.
Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g Server at 127.0.0.1 Port 8091
***** BUT ***** if I use the host IP I do get data back (which may be why the GUI works as I only view it from another machine)
root@host:/home/user# curl http://host.ip:8091/pools
{"pools":[{"name":"default","uri":"/pools/default","streamingUri":"/poolsStreaming/default"}],"isAdminCreds":false,"uuid":"6ac28710-ddaa-4001-15e8-ec9a00000228","implementationVersion":"1.7.1","componentsVersion":{"os_mon":"2.2.5","mnesia":"4.4.17","kernel":"2.14.3","sasl":"2.1.9.3","ns_server":"1.7.1","stdlib":"1.17.3"}}
They both work on the good cluster using loopback and the IP. So for some reason it's not binding properly to the loopback on one cluster?
root@host:~# netstat -an | grep 8091
tcp 0 0 0.0.0.0:8091 0.0.0.0:* LISTEN
There's clearly something wrong with that one cluster. Any chance you can start with a fresh install of the OS?
What does your 'ifconfig' look like from that one?
Sorry, I can't reimage that machine but I'll see if I can get another image up somewhere else.
Ifconfig dumps the following (redacted without the actual IPs)
root@myhost:~# ifconfig
eth0 Link encap:Ethernet HWaddr 00:12:34:56:78:9a
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: ffff::ffff:ffff:ffff:ffff/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:38010956021 errors:0 dropped:250 overruns:0 frame:0
TX packets:31874802323 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:24747886148302 (24.7 TB) TX bytes:11078195945959 (11.0 TB)
Memory:df420000-df440000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1252259391 errors:0 dropped:0 overruns:0 frame:0
TX packets:1252259391 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:14862907577622 (14.8 TB) TX bytes:14862907577622 (14.8 TB)
What happens if you 'ping 127.0.0.1' or 'ping localhost'?
Perry
root@host:~# ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.025 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.024 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=0.021 ms
^C
--- localhost ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3002ms
rtt min/avg/max/mdev = 0.021/0.023/0.025/0.001 ms
root@host:~# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.030 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.054 ms
^C
--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.030/0.039/0.054/0.011 ms
I'm running out of things to look at here...it really seems to be a problem with the underlying OS.
The fact that http://127.0.0.1:8091/pools does not work while http://realip:8091/pools does means that Membase is not doing anything improperly.
It "really" seems like something else is listening on that port and is explicitly sending back a 404 (instead of a timeout or connection refused).
Sorry for the trouble Perry, I agree it looks like something to do with the OS or DNS or something. If I kill membase-server, nothing is listening on 8091 according to netstat. However I was poking around a bit more and one difference I noticed between the working servers and the non-working ones has to do with port sigar and I wonder if that could be causing the 404 somehow. On the bad server I see:
membase 3619 3579 0 1017 588 3 18:43 ? 00:00:00 portsigar for ns_1@127.0.0.1
in the process list whereas a good server will have:
membase 3619 3579 0 1017 588 3 18:43 ? 00:00:00 portsigar for ns_1@realIP
I couldn't find what is responsible for starting sigar_port but is there a way to force it to use the realIP via a config somewhere?
I believe the 'realIP' gets set when you actually form a cluster, until then it's using localhost (or 127.0.0.1).
I'm sorry but I'm still at a loss for what's going on here...
It "seems" like either Moxi or memcached is restarting itself, which would close your connections.
Can you look at the PID's of the various running processes and see if they are changing which would indicate them exiting and restarting?
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!