Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Node failure

26 replies [Last post]
  • Login or register to post comments
Wed, 02/02/2011 - 15:41
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi

I have 3 Membase nodes running or had. One of them failed and I have no clue as to why. Here is the log for the server that went down:

Failed to add node xxx.xxx.xxx.xxx:8091 to cluster. Failed to reach erlang port mapper. Could not connect to "xxx.xxx.xxx.xxx" on port "4369". This could be due to an incorrect host/port combination or a firewall in place between the servers.

Restarting. Messages: Connecting to {Sock xxx.xxx.xxx.xxx:11210}
Failed to connect to host: Failed to connect to [xxx.xxx.xxx.xxx:11210] (repeated 38 times)

...the above message reappears 3 times, then

Node 'ns_1@xxx.xxx.xxx.xxx' saw that node 'ns_1@xxx.xxx.xxx.xxx' went down.

Node 'ns_1@xxx.xxx.xxx.xxx' saw that node 'ns_1@xxx.xxx.xxx.xxx' went down.

...then again

Restarting. Messages: Connecting to {Sock xxx.xxx.xxx.xxx:11210}
Failed to connect to host: Failed to connect to [xxx.xxx.xxx.xxx:11210]

Bucket "default" loaded on node 'ns_1@xxx.xxx.xxx.xxx' in 4 seconds.

Control connection to memcached on 'ns_1@xxx.xxx.xxx.xxx' disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
stats_recv,
4},
{mc_client_binary,
stats,
4},
{ns_memcached,
ensure_bucket_config,
4},
{ns_memcached,
handle_info,
2},
{gen_server,
handle_msg,
5},
{proc_lib,
init_p_do_apply,
3}]}

Does anyone has a clue why this happening? I have 2 other Membase nodes running and so far they are ok.

When a node fails how does one restart it? I tried to use the webconsole to add the IP and then make it join the cluster but as stated earlier received this  message:

Failed to add node xxx.xxx.xxx.xxx:8091 to cluster. Failed to reach erlang port mapper. Could not connect to "xxx.xxx.xxx.xxx" on port "4369". This could be due to an incorrect host/port combination or a firewall in place between the servers.

 

Top
  • Login or register to post comments
Wed, 02/02/2011 - 16:11
perry
Offline
Joined: 10/11/2010
Groups:

 Looks like there could be a few things going on here.

 

First off, is it possible that the IP address of this node changed after it was setup?  

 

It looks like there are a number of connection issues going on, is it possible that a firewall was put in place?

 

Also check the running processes and look at the output of /opt/membase/bin/browse_logs on the node that died.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Thu, 02/03/2011 - 15:14
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

 

This is the output I get if I do a tail for the log (/var/opt/membase/1.6.4.1/logs) from the last day the node was running, and as you see it is pretty difficult to read:

 

statisticslhd

 

wall_clockhb>ahdcontext_switcheshbsvahdgarbage_collectionhb��nb�ahdiohhdinputn���+hdoutputn    �8<hd

 

reductionshn�ث�b�hd    run_queueahdruntimehba�ajjjj��hhhb�aahaaa+hinfo_msggdns_1@xxx.xxx.220.1244hgdns_1@xxx.xxx.220.124[kPulling config from: ~p~nldns_1@xxx.xxx.218.77j��hhhb�aahaaa0hderrorgdns_1@xxx.xxx.220.1244hgdns_1@xxx.xxx.220.124�k~s:~s:~B: Dropped ~b ticks~nldns_1@xxx.xxx.220.124dstats_collectoramaj��hhhb�aahaaahinfo_msggdns_1@xxx.xxx.220.1244hgdns_1@xxx.xxx.220.124[kPulling config from: ~p~nldns_1@xxx.xxx.223.231j��hhhb�aahaaa

 

                                                                                                                                                                      hderrorgdns_1@xxx.xxx.220.1244hgdns_1@xxx.xxx.220.124�k~s:~s:~B: Dropped ~b ticks~nldns_1@xxx.xxx.220.124dstats_collectoramaj��hhhb�aahaaahderrorgdns_1@xxx.xxx.220.1244hgdns_1@xxx.xxx.220.124�k~s:~s:~B: Dropped ~b ticks~nldns_1@xxx.xxx.220.124dstats_collectoramaj

I tried to connect to the web console of the node that died just to see if there was a connection and because you said that perhaps the IP had changed I called it with hostname:8091 (I could with the other 2) but no luck. Can I use top to see any running processes? Is it only possible to start a membase node through webconsole?

I checked with my provider and was told that each VPS does have a unique IP but not static it may change over time (almost never). But to avoid any problems if IP is changed, would it help  to use hostname instead of IP when adding server? That way the hostname would always resolve to the right IP. I tried this, ie. first remove servers 1 by 1 and then readding them, rebalance but also saw that server was designated by the IP instead of their hostname.

 

 

Thanks

Top
  • Login or register to post comments
Thu, 02/03/2011 - 16:05
perry
Offline
Joined: 10/11/2010
Groups:

We actually store our logs in a binary format so you'll have to use the browse_logs utility to view them.  That's located at /opt/membase/bin/browse_logs

 

Secondly, you can see whether the Membase processes are running by looking at the output of 'top' or better yet, 'ps aux' and grep for 'membase' which is our user.  You should see a memcached process running among other things.  if that's not running, then the server will not be up and responding.  You should be able to start the server with 'sudo /etc/init.d/membase-server start'

 

Lastly, if the IP address is possibly going to change, you can configure Membase to use hostnames by following these instructions: http://wiki.membase.org/display/membase/Using+Membase+in+the+Cloud

 

Hope that helps, let me know if there's anything else you need.

 

Perry

 

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Mon, 02/07/2011 - 09:52
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

I did follow the instructions so that membase can use hostname instead. After restarting the server again and seeing the change being made to /opt/membase/1.6.4.1/bin/membase an error is stating:

/opt/membase/bin/membase: line 47: -name: command not found

... and I do not see the hostname in this clusterview /index.html#sec=servers&serversTab=0

What could be the cause for this?

Thanks

Update.... I had automated this in a script and it was a redundant space after \ that caused the problem.

Top
  • Login or register to post comments
Mon, 02/07/2011 - 13:10
perry
Offline
Joined: 10/11/2010
Groups:

 Yes, I was just going to respond that it looks like a type in the 'membase' script.  Is everything working properly for you now?

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Mon, 02/07/2011 - 13:54
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Yes thank you. I will now try to do some benchmarking and try various settings to see how to improve performance. But again thanks for being extremely helpful.

Thanks

Top
  • Login or register to post comments
Tue, 02/22/2011 - 06:34
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

 

Sorry to bring this topic up again but now the problem has appeared again. A node went down after running 14 days. I have tried to follow your advices but none has shed any light over the problem:

 

I tried to use the browse_logs feature llike so:

sudo /opt/membase/1.6.4.1/bin/browse_logs /var/opt/membase/1.6.4.1/logs

... but received a "killed" message and no other output.

I also tried to ps aux | grep membase and received this output:

sshd: membase_2

sshd: membase_2

grep membase

... which I assume is related to my session with ssh and the use of grep. No memcached process.

When trying to get the server back up again with:

sudo /etc/init.d/membase-server start

... I got a:

chown: not valid user: 'membase'

unknown id: membase

Failed to start Membase server failed!

What can I do to locate the problem?

If addressing the issue with my hosting provider the general answer is "upping your resources usually helps" despite the fact the the memory and cpu usage never surpasses what I'm entitled to.

I'm wondering if I could do a cronjob to check if node is up and if not start the server again - what would be required to do that?

Top
  • Login or register to post comments
Tue, 02/22/2011 - 07:36
perry
Offline
Joined: 10/11/2010
Groups:

 I'm particularly concerned about the messages about the membase user not being present...can you run:

'sudo grep membase /etc/passwd' and see if anything is there?  If not, then something external removed the Membase user.

 

What kind of hosting platform are you using?

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Tue, 02/22/2011 - 13:23
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

Yes something is returned (slighty modified):

membase_user:x:11381857:750303:My Name:/home/membase_user:/bin/bash

I'm using Dreamhost VPS, and they use http://linux-vserver.org/ as the bases of their VPS service.

Thanks again :)

 

Top
  • Login or register to post comments
Tue, 02/22/2011 - 19:20
perry
Offline
Joined: 10/11/2010
Groups:

 Did you build this from source or use one of our pre-packaged binaries?  

 

The logs are clearly complaining about not being able to use the user 'membase' so I would think that's the first thing that we need to resolve...

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 02/23/2011 - 00:14
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

I would prefer to build it from source but never suceeded in doing so mainly due to dependencies - so I did a:

sudo dpkg -i membase-server-community_x86_64_1.6.4.1.deb

Now that one node is down and one remaining one would expect that all request would be heading for the node still up but occassional I get a:

bool(false)

... should the Moxi server on the client not handle this situation?

Thanks...

Top
  • Login or register to post comments
Wed, 02/23/2011 - 07:41
perry
Offline
Joined: 10/11/2010
Groups:

 If a node is down, you will need to fail it over in order for the data to be available.  You should also probably upgrade to 1.6.5.1.

 

I'm still unclear why you were getting that error about the 'membase' user.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 02/25/2011 - 01:54
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

I tried to stop and start the remaining node and it was ok to stop it but the same thing:

chown: invalid user: `membase'
Unknown id: membase
Failed to start Membase server failed!

... happened when trying to start it. You think it will be ok with package version 1.6.5.1?

Update: I installed the new version 1.6.5.1 and it is now possible to stop/start the server. I do not know if it is the crash of the node that makes it impossible to start the server again - I will wait and see the next time a node fails.

Top
  • Login or register to post comments
Fri, 02/25/2011 - 09:55
perry
Offline
Joined: 10/11/2010
Groups:

 It really sounds like something is changing the user settings on your server.  Can you check the /etc/passwd file to see if there is a 'membase' entry?  Then check again in a few days to see if something has changed it?

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Mon, 02/28/2011 - 19:49
niru
Offline
Joined: 02/23/2011
Groups: None

Hi Perry,

we have a 2 Membase clusters ( one in East region and other one in West region) running in the AWS EC2. Each cluster has 4 ec2 nodes of type m2.2xlarge ( 34 GB RAM). This cluster has 128 GB RAM Capacity and out of which,  we have 2 memcached buckets ( one is allocated with 110 GB and other one is allocated with 10 GB) and 1 membase bucket with 2 GB memory, no replication but enabled persistence.

Since we launched these 2 clusters on 02/06, for some reason we are experiencing frequent node crashes in the both regions and loosing 1/4 of our data set from our cluster in each incident. we fortunate so far that we did not loose 2 nodes simultaneously. Apparently we found this was happening because of Kernal bug. At this point of time, our work around is reboot the partitioned node to get it back to the cluster and then  we are re-pushing the whole data set in order to rebuild our data set in the cluster. Even though we lost 1/4th of data set, we are re-pushing the whole data that whatever we have pushed the data daily basis  for the last 5 days. but strange thing is we are not able to get the point what it was before crash (like total number of items between before the crash and repushing the data after crash. For example, before the crash, there were 40 Million items on 10 GB bucket but after I re-pushed the last 5 days data, it was reached 37 Million) My developers are sure that there is a 95-97% of keys overlap between consecutive daily pushes so last 5 days push should have almost nearer to what we had before crash. Could please educate me what could be the causing this?

 

Niru

 



Top
  • Login or register to post comments
Thu, 03/24/2011 - 23:29
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

Yet again a node is down with the exact same symptoms as before. Again I cannot restart the membase server, error being:

chown: not valid user: 'membase'

unknown id: membase

Failed to start Membase server failed!

... and looking for an entry with sudo grep membase /etc/passwd I get:
membase_2:x:11941857:750053:my name:/home/membase_2:/bin/bash

The membase server is the newest version and I know for sure that I could start/stop the server right after installation.

Do you have any suggestions as to how to make some kind of surveillance of what is going on? It happens typically after 4-5 days. I think it could be Dreamhost that makes some changes to the server settings but then again I'm not sure. In order to have some kind of "evidence" it could be nice to give them loggings or such.

Thanks again

Top
  • Login or register to post comments
Thu, 03/24/2011 - 23:44
perry
Offline
Joined: 10/11/2010
Groups:

Unfortunately there's not much that the Membase server software is going to tell you about changes made to the underlying operating system. The issue here is very clear, "something" is changing the user configuration of the operating system. I would expect some entries in /var/log/messages or syslog, but I'm not sure they'd be there.

I wish I could be of more help, but this is definitely something you need to take up with your hoster or other system administrator.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Mon, 05/02/2011 - 02:11
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

I got in contact with my ISP and they could tell me that the cause of the error was that user configuration was managed by my ISP and this could be affecting the membase running process so I turned all kind of management off. For a while (around 25 days or so) it appeared to have been the problem but then I deleted the default bucket and added 2 new and then a couple of days later a node (node 2) went down. I reinstalled membase on that node and got it working again and now after 3 days it first appeared to be the same node that went down and but this morning I can not connect to any node (2 in all).

I did a ps aux | grep membase on node 1 and got:
membase 17219 0.0 0.1 10492 364 ? S 17:30 0:00 /opt/membase/1.6.5/erlang-13b03/lib/erlang/erts-5.7.4/bin/epmd -daemon

when I did a sudo /etc/init.d/membase-server start on node 1 I got:
Started Membase server.

If i do sudo grep membase /etc/passwd I got:
membase1:x:12132650:750053:my name:/home/membase1:/bin/bash
membase:x:11:71:Membase system user:/opt/membase:/bin/bash

So I suppose everthing is ok here - just wondering why the :8091 sometimes says "difficulties communicating with the cluster - displaying cached information"?

Now for node 2...

Doing ps aux | grep membase gives:

membase 817 4.4 21.4 187892 71532 ? Sl Apr27 316:56 /opt/membase/1.6.5/erlang-13b03/lib/erlang/erts-5.7.4/bin/beam.smp -A 16 -- -root /opt/membase/1.6.5/erlang-13b03/lib/erlang -progname erl -- -home /opt/membase -- -pa /opt/membase/1.6.5/bin/ns_server/ebin /opt/membase/1.6.5/bin/ns_server/deps/menelaus/ebin /opt/membase/1.6.5/bin/ns_server/deps/menelaus/deps/erlwsh/ebin /opt/membase/1.6.5/bin/ns_server/deps/menelaus/deps/mochiweb/ebin /opt/membase/1.6.5/bin/ns_server/deps/gen_smtp/ebin -setcookie nocookie -ns_server error_logger_mf_dir "/var/opt/membase/1.6.5/logs" -ns_server error_logger_mf_maxbytes 10485760 -ns_server error_logger_mf_maxfiles 10 -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21199 -mnesia dir "/var/opt/membase/1.6.5/mnesia" -noshell -noinput -noshell -noinput -run ns_bootstrap -- -ns_server ns_server_config "/etc/opt/membase/1.6.5/config" -ns_server pidfile "/var/run/membase-server.pid" -name ns_1@node2.domain.com
membase 846 0.0 0.1 3776 536 ? Ss Apr27 0:05 /opt/membase/1.6.5/erlang-13b03/lib/erlang/lib/os_mon-2.2.4/priv/bin/memsup
membase 848 0.0 0.1 3776 460 ? Ss Apr27 0:00 /opt/membase/1.6.5/erlang-13b03/lib/erlang/lib/os_mon-2.2.4/priv/bin/cpu_sup

membase 9845 0.0 0.1 10492 488 ? Ss Apr27 0:00 inet_gethost 4
membase 21340 0.0 0.1 10492 496 ? S Apr26 0:03 /opt/membase/1.6.5/erlang-13b03/lib/erlang/erts-5.7.4/bin/epmd -daemon
membase 25116 0.0 0.4 93376 1544 ? Ss 15:19 0:01 sh -s disksup

If I do sudo /etc/init.d/membase-server start I am shown:

Membase server is already started (warning).

sudo grep membase /etc/passwd gives:

membase2:x:12132653:750053:my name:/home/membase2:/bin/bash
membase:x:11:71:Membase system user:/opt/membase:/bin/bash

Top
  • Login or register to post comments
Tue, 05/03/2011 - 12:20
perry
Offline
Joined: 10/11/2010
Groups:

I would suspect that we're in the same situation with node 2, but I don't know for sure.

-Can you do an 'ls -l' on the Membase data directory?
-What do the logs show from that node?

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Tue, 05/03/2011 - 23:34
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

If you mean ls -l /var/opt/membase/1.6.5/data/ns_1 then for node 1 it is:

-rw-r--r-- 1 membase membase 46080 25 apr 14:35 default
-rw-r--r-- 1 membase membase 23626752 25 apr 14:35 default-0.mb
-rw-r--r-- 1 membase membase 23182336 25 apr 14:35 default-1.mb
-rw-r--r-- 1 membase membase 23126016 25 apr 14:35 default-2.mb
-rw-r--r-- 1 membase membase 23469056 25 apr 14:35 default-3.mb
-rw-r--r-- 1 membase membase 64 1 maj 23:06 isasl.pw
-rw-r--r-- 1 membase membase 46080 2 maj 07:24 main
-rw-r--r-- 1 membase membase 1084416 27 apr 00:57 main-0.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 main-0.mb-shm
-rw-r--r-- 1 membase membase 1276496 27 apr 05:18 main-0.mb-wal
-rw-r--r-- 1 membase membase 1022976 27 apr 00:57 main-1.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 main-1.mb-shm
-rw-r--r-- 1 membase membase 1231432 27 apr 05:15 main-1.mb-wal
-rw-r--r-- 1 membase membase 1082368 27 apr 00:57 main-2.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 main-2.mb-shm
-rw-r--r-- 1 membase membase 1244008 30 apr 20:21 main-2.mb-wal
-rw-r--r-- 1 membase membase 1043456 27 apr 00:57 main-3.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 main-3.mb-shm
-rw-r--r-- 1 membase membase 1216760 27 apr 05:34 main-3.mb-wal
-rw-r--r-- 1 membase membase 250 1 maj 16:58 main-mj0AC34A42
-rw-r--r-- 1 membase membase 250 1 maj 13:39 main-mj10CBFBD7
-rw-r--r-- 1 membase membase 250 1 maj 15:53 main-mj23FDA3BD
-rw-r--r-- 1 membase membase 250 1 maj 16:53 main-mj2A19B776
-rw-r--r-- 1 membase membase 250 2 maj 07:07 main-mj2E529629
-rw-r--r-- 1 membase membase 250 2 maj 02:08 main-mj3AA0E1AE
-rw-r--r-- 1 membase membase 250 2 maj 07:11 main-mj3AD61911
-rw-r--r-- 1 membase membase 250 2 maj 02:15 main-mj3C490225
-rw-r--r-- 1 membase membase 250 2 maj 07:13 main-mj43334A99
-rw-r--r-- 1 membase membase 250 2 maj 06:56 main-mj534031CD
-rw-r--r-- 1 membase membase 250 1 maj 19:46 main-mj686BF9B5
-rw-r--r-- 1 membase membase 250 2 maj 06:02 main-mj6D063576
-rw-r--r-- 1 membase membase 425984 2 maj 07:24 main-shm
-rw-r--r-- 1 membase membase 53003680 2 maj 07:24 main-wal
-rw-r--r-- 1 membase membase 204171 2 maj 07:24 ns_log
-rw-r--r-- 1 membase membase 45056 2 maj 07:24 sessions
-rw-r--r-- 1 membase membase 1024 25 apr 22:38 sessions-0.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 sessions-0.mb-shm
-rw-r--r-- 1 membase membase 315480 25 apr 22:38 sessions-0.mb-wal
-rw-r--r-- 1 membase membase 1024 25 apr 22:38 sessions-1.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 sessions-1.mb-shm
-rw-r--r-- 1 membase membase 315480 25 apr 22:38 sessions-1.mb-wal
-rw-r--r-- 1 membase membase 1024 25 apr 22:38 sessions-2.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 sessions-2.mb-shm
-rw-r--r-- 1 membase membase 315480 25 apr 22:38 sessions-2.mb-wal
-rw-r--r-- 1 membase membase 1024 25 apr 22:38 sessions-3.mb
-rw-r--r-- 1 membase membase 32768 2 maj 07:24 sessions-3.mb-shm
-rw-r--r-- 1 membase membase 415040 28 apr 01:57 sessions-3.mb-wal
-rw-r--r-- 1 membase membase 131072 2 maj 07:24 sessions-shm
-rw-r--r-- 1 membase membase 14598672 2 maj 07:24 sessions-wal

For node 2 it is:

-rw-r--r-- 1 membase membase 64 27 apr 00:36 isasl.pw
-rw-r--r-- 1 membase membase 46080 1 maj 13:01 main
-rw-r--r-- 1 membase membase 807936 27 apr 00:57 main-0.mb
-rw-r--r-- 1 membase membase 32768 27 apr 05:18 main-0.mb-shm
-rw-r--r-- 1 membase membase 1116152 27 apr 05:18 main-0.mb-wal
-rw-r--r-- 1 membase membase 826368 27 apr 00:57 main-1.mb
-rw-r--r-- 1 membase membase 32768 27 apr 05:15 main-1.mb-shm
-rw-r--r-- 1 membase membase 1154928 27 apr 05:15 main-1.mb-wal
-rw-r--r-- 1 membase membase 821248 27 apr 00:57 main-2.mb
-rw-r--r-- 1 membase membase 32768 30 apr 20:22 main-2.mb-shm
-rw-r--r-- 1 membase membase 1157024 30 apr 20:21 main-2.mb-wal
-rw-r--r-- 1 membase membase 839680 27 apr 00:57 main-3.mb
-rw-r--r-- 1 membase membase 32768 27 apr 05:35 main-3.mb-shm
-rw-r--r-- 1 membase membase 1057464 27 apr 05:34 main-3.mb-wal
-rw-r--r-- 1 membase membase 32768 1 maj 15:18 main-shm
-rw-r--r-- 1 membase membase 1075280 1 maj 15:18 main-wal
-rw-r--r-- 1 membase membase 31574 1 maj 23:09 ns_log
-rw-r--r-- 1 membase membase 26624 1 maj 14:57 sessions
-rw-r--r-- 1 membase membase 1024 27 apr 00:36 sessions-0.mb
-rw-r--r-- 1 membase membase 32768 28 apr 07:36 sessions-0.mb-shm
-rw-r--r-- 1 membase membase 530320 28 apr 07:36 sessions-0.mb-wal
-rw-r--r-- 1 membase membase 1024 27 apr 00:36 sessions-1.mb
-rw-r--r-- 1 membase membase 32768 27 apr 00:36 sessions-1.mb-shm
-rw-r--r-- 1 membase membase 315480 27 apr 00:36 sessions-1.mb-wal
-rw-r--r-- 1 membase membase 1024 27 apr 00:36 sessions-2.mb
-rw-r--r-- 1 membase membase 32768 27 apr 00:36 sessions-2.mb-shm
-rw-r--r-- 1 membase membase 315480 27 apr 00:36 sessions-2.mb-wal
-rw-r--r-- 1 membase membase 1024 27 apr 00:36 sessions-3.mb
-rw-r--r-- 1 membase membase 32768 27 apr 00:36 sessions-3.mb-shm
-rw-r--r-- 1 membase membase 315480 27 apr 00:36 sessions-3.mb-wal
-rw-r--r-- 1 membase membase 32768 1 maj 15:18 sessions-shm
-rw-r--r-- 1 membase membase 1064800 1 maj 15:18 sessions-wal

As to the logs I can not connect to the web console on :8091 on any of the nodes. If I for node 1 & 2 do a /opt/membase/bin/browse_logs > /tmp/node1.txt the process runs for a while and the outputs "killed" and nothing is inside /tmp/node1.txt. Do you have any other idea?

Thanks

Top
  • Login or register to post comments
Thu, 05/05/2011 - 21:09
perry
Offline
Joined: 10/11/2010
Groups:

Sounds like something's definitely not happy in there.

-Do you have enough disk space on the nodes?
-Can you get the logs from node1?

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 05/06/2011 - 03:57
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

Yes, I would say I have enough space

Node 2 gives:

/dev/hdv1 2,7T 1,5T 1,2T 56% /
none 128M 700K 128M 1% /tmp

Node 1 gives:

/dev/hdv1 3,6T 195G 3,4T 6% /
none 128M 0 128M 0% /tmp

As to logs...

Node 1 gives with "time /opt/membase/bin/browse_logs > node1.txt" :

Killed

real 0m2.635s
user 0m0.016s
sys 0m3.440s

... and node 2 with same command (not killed):

real 1m28.754s
user 1m15.981s
sys 0m5.048s

It does produce some output to node2.txt - it is 98 Mb - what part of the file is of interest to you?

Top
  • Login or register to post comments
Sat, 05/07/2011 - 19:04
perry
Offline
Joined: 10/11/2010
Groups:

May I step back a second and understand more clearly what the current issue is?

-You had a 2 node cluster running fine (after turning off the management "feature" of your ISP)
-At some point, a node failed (node 2)
-You reinstalled that node, added it back to the cluster
-Everything was fine for 3 days
-Now another node failed (same one?) and you cannot connect to either.
-Node 1 does not have any logs available, even though that one didn't show any signs of failure?
-Node 2 does have logs available...do they show any errors? You should be looking specifically for messages coming from memcached.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Sun, 05/08/2011 - 13:01
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi Perry

Yes, you have understood what has happened and it is really confusing.

Now, I even can not connect with ssh and must admit I'm truly tired of all these strange things happening - so I'm not going to waste mine and your time anymore - I'm not going to continue using the ISP. So, I'm actually looking for another ISP that will be suitable as to hosting membase - do you have any suggestions? Good network transfer and disk storage is of value to me.

Top
  • Login or register to post comments
Tue, 05/10/2011 - 16:54
perry
Offline
Joined: 10/11/2010
Groups:

Any reason you can't utilize Amazon's cloud?

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 05/11/2011 - 05:51
membaseuser_dk
Offline
Joined: 01/13/2011
Groups: None

Hi

Yes, I have been looking into using Amazon EC2 instances but I find it difficult to see through their pricing - they have so many categories that I do not see what the actual cost will be - but of course you only pay for what you use.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker