Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Membase EC2 timeout error on 3 node cluster

1 reply [Last post]
  • Login or register to post comments
Sun, 02/19/2012 - 08:26
chrisribe
Offline
Joined: 09/26/2011
Groups: None

Hello,

We are having intermittent issues with our 3 node cluster.
Randomly connection is lost with one of the nodes and we get this error.

======
Node ('ns_1@10.48.137.138') was automatically failovered.
auto_failover001 ns_1@10.226.226.6 08:39:06 - Sun Feb 19, 2012
Failed over 'ns_1@10.48.137.138': ok ns_orchestrator006 ns_1@10.226.226.6 08:39:06 - Sun Feb 19, 2012
Shutting down bucket "default" on 'ns_1@10.48.137.138' for server shutdown ns_memcached002 ns_1@10.48.137.138 08:35:52 - Sun Feb 19, 2012
Port server memcached on node 'ns_1@10.48.137.138' exited with status 134. Restarting. Messages: exception caught in task Fetching item from disk: 7c7739e276e6e10e66a29617f93c8b92: Unhandled case in sqlite-pst: 17 (database schema has changed)
Object unexpectedly changed size by 4 bytes
memcached: stored-value.cc:367: static void StoredValue::increaseCacheSize(HashTable&, size_t, bool): Assertion `ht.cacheSize.get() < ((size_t)1<<(sizeof(size_t)*8-1))' failed. ns_port_server000 ns_1@10.48.137.138 08:35:31 - Sun Feb 19, 2012
Control connection to memcached on 'ns_1@10.48.137.138' disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
stats_recv,
4},
{mc_client_binary,
stats,
4},
{ns_memcached,
handle_call,
3},
{gen_server,
handle_msg,
5},
{proc_lib,
init_p_do_apply,
3}]}

====

This also happened the day before with the same error message. I left three ssh consoles open to the servers and noticed that two of them where disconnected this morning. So is it Ec2 that is having random network issues or what? We have been having this issue for a while now and is extremely annoying!

I have also checked the server logs and noticed that I got moxi segfaults but im not sure its related.

====
[2527244.634510] moxi[9330]: segfault at 40 ip 00000040 sp b5725530 error 14 in moxi[8048000+a7000]
[2778981.638250] moxi[18570]: segfault at 0 ip (null) sp b586e530 error 14 in moxi[8048000+a7000]
[2779059.356534] moxi[23437]: segfault at 0 ip (null) sp b5755530 error 14 in moxi[8048000+a7000]
[2779080.212025] moxi[23447]: segfault at fffff0 ip 0805e3b4 sp b5721530 error 4 in moxi[8048000+a7000]
[2779107.689050] moxi[23455]: segfault at 400 ip 00000400 sp b582d530 error 14 in moxi[8048000+a7000]
root@ip-10-227-55-162:/opt/membase/bin# cat /etc/issue
Debian GNU/Linux 6.0 \n \l

===

Any help would be appreciated...
Chris

Top
  • Login or register to post comments
Mon, 06/04/2012 - 18:22
ingenthr
Offline
Joined: 03/16/2010
Groups:

It sounds like network connectivity issues.

Questions:
1) can you verify they're all in the same region, and preferably in the same availability zone?
2) can you confirm they are not micro instances? micro instances can become starved for far too long of resources, triggering failover.

Otherwise, we'd have to gather more info. This doesn't sound expected. There are many people running on EC2.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker