Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Membase odd behavior

13 replies [Last post]
  • Login or register to post comments
Thu, 12/30/2010 - 03:08
Tom
Offline
Joined: 06/07/2010
Groups: None

The web GUI of our membase server in the production environment is extremely slow at best (it's non-responsive at times).
We currently have a single server in the cluster, and it's installed on fedora.
Is this a known issue?
Is there a known solution?
 

----

Update: this problem now seems to have passed - the Web GUI is functioning well. During this problem I ran a test suite in the production environment (which is run through a console application). The tests failed, but I didn't save the error from the enyim client (I'm running a C#/ASP.Net site). I thought I had a bug in my tests, because the site was working fine. Later, when I ran the tests again - they passed (I didn't change anything in the tests or Membase server). I know that my tests try to open a new connection to Membase. Could I have reached a connection limit on the Membase server?

We have 6 web servers working against the Membase server, with 6,300~ ops/sec.

I ran "cat /proc/net/sockstat" and got the following result:

sockets: used 14470
TCP: inuse 14396 orphan 0 tw 0 alloc 14404 mem 354
UDP: inuse 2
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

 

Do these numbers make sense or are they extreme?


Thanks,
Tom.

Top
  • Login or register to post comments
Thu, 12/30/2010 - 10:13
steve
Offline
Joined: 03/15/2010
Groups: None

Hi Tom,

We've seen that kind of behavior when the Membase server gets extremely busy. The sockets used does seem high for only 6 clients into your membase server, but it does depend also on what else was is running on that server. Was Membase the only thing running on that box?

Some other thoughts come to mind...

- Is there anything interesting in your logs (which you can see in your Membase web console GUI). For example, many repeated exits & restart reports or other errors?

- What's the ulimit on that membase box? Please see the Release Notes regarding setting a higher ulimit (search for MB-2099)... http://wiki.membase.org/display/membase/Membase+Server+1.6.4

- Is this a VM or physical H/W server? That is, was there contention for resources at that time?

Cheers,

Steve

Top
  • Login or register to post comments
Sun, 01/02/2011 - 06:35
Tom
Offline
Joined: 06/07/2010
Groups: None

Hi Steve,

Membase is the only thing running on this server. It's an Amazon m1.large instance, I think that in this size it's a physical server (and not a VM).

One thing I didn't mention is that we're running both ASP.Net and classic ASP on our IIS servers. Each technology holds its own connections to membase, so that may account for the large number of connections.

In the log I see a lot of messages in this form:
Request with path /pools/default/bucketsStreaming/[bucket name] took too long to execute: 160196 ms.

About the ulimit, I'm not sure (I'm not very proficient in linux). We haven't configured anything specific for the membase user, and when I run "ulimit -a" I get the following output:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 61504
max locked memory       (kbytes, -l) 1048576
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Is it ok?

Would it help if I send you the log?

Thank you,
Tom.

Top
  • Login or register to post comments
Mon, 01/03/2011 - 00:09
steve
Offline
Joined: 03/15/2010
Groups: None

Hi Tom,

The most useful thing to send would be the (compressed) output of the collect_info program, which tries to gather lots of info that we need to diagnose issues, to try to reduce email/forum ping pong.  See the /opt/membase/VER/bin/collect_info script.

Instead of attaching publicly here (as it does grab lots of system info), you could private message me or send to steve DOT yen AT membase DOT com

Cheers,

Steve

 

Top
  • Login or register to post comments
Mon, 01/03/2011 - 07:20
Tom
Offline
Joined: 06/07/2010
Groups: None

Hi Steve,


I found the file named collect_info in these paths:
/opt/membase/1.6.0.1/bin/ep_engine/management/collect_info
/opt/membase/1.6.0.1/bin/ns_server/collect_info
Which one should I execute?

I chose to add another membase server, so I now have two servers in my cluster. I've noticed a significant improvement - the cpu consumption went down from 60 to 15, and not surprisingly the network activity went down to half of its size before the new server. The sockets in use went down to 3645 (from the whopping 14000+).


How can I tell that this is the right time to add another machine to the cluster? In more detail: how can I tell if the machine has reached some kind of limit (maximum number of connections, or NIC throughput) or if this is a bug in membase?


Thank you,
Tom.

Top
  • Login or register to post comments
Mon, 01/03/2011 - 10:21
steve
Offline
Joined: 03/15/2010
Groups: None

Hi Tom,

The best collect_info to use is the ns_server version.

High CPU and the high number of sockets are probably inter-related. From your messages, I presume you're running membase 1.6.0.1? I'd recommend an upgrade to the recently released 1.6.4 / 1.6.4.x version, which has some important bug fixes and improvements. Please see: wiki.membase.org/display/membase/Releases for more information and the details on the upgrade steps.

Related to when to grow your cluster, have you also seen the sizing guidelines? wiki.membase.org/display/membase/Sizing+Guidelines

Cheers,
Steve
 
 

Top
  • Login or register to post comments
Tue, 01/04/2011 - 00:50
Tom
Offline
Joined: 06/07/2010
Groups: None

Hi Steve,

I am running version 1.6.0.1. Since I added another server things seem fine so I'm planning to wait to the next release which is in less than two weeks. I didn't see anything in the release notes referring to the behavior I was experiencing. Also, regarding the sizing guidelines - they all refer to memory sizes/issues and all the buckets I have use less than 15% of the allocated RAM (it was probably double that amount before I added a server, but it still means that I can have my entire working set in memory).

I sent you the logs.

Thank you,
Tom.

Top
  • Login or register to post comments
Tue, 01/04/2011 - 12:22
steve
Offline
Joined: 03/15/2010
Groups: None

Hi Tom,

Thanks again for the collect_info/logs. Unfortunately, there wasn't any obvious smoking gun towards a root cause there. In particular, the many "Mnesia is overloaded" messages in the log do indicated an overloaded system, especially disk resource, but not an answer to "why".

By the way, how have you configured your clients?  eg, number of connections that "fan in" to the membase node.

Best,

Steve

 

Top
  • Login or register to post comments
Wed, 01/05/2011 - 00:54
Tom
Offline
Joined: 06/07/2010
Groups: None

Hi Steve,

Regarding the disk - I am using a membase bucket, but it's in very low usage - 8 ops/sec. It also has only 12% of RAM used (which means that before I added a server it was probably still less than 30%). I also have two memcached buckets (one with ~6000 ops/sec and another with ~300 ops/sec).

About the connections - I'm using the Enyim client's default. I have 3 buckets, two types of server side technologies (ASP.Net and classic ASP), and 6 IIS servers. This means I have 36 (2*3*6) instances of the Enyim client. If I'm reading this correctly then the default for maxPoolSize is 20, which means I should have 720 connections at the most. I guess I'm probably missing something, since the server has a lot more connections than 720.
 

Thank you,
Tom.

Top
  • Login or register to post comments
Wed, 01/05/2011 - 09:56
steve
Offline
Joined: 03/15/2010
Groups: None

A quick look at your logs shows memcached last reported 900+ connections (curr_connections statistic), but it had spiked to over 9000 for a time period.

More connections are also used internally by Membase for its own stats gathering / heartbeat purposes, for replication (only in a multi-node cluster), and by server-side moxi (which should be quiescent as you're using the Enyim client).

Would this be correlated with any spike of activity on the web app side?  I'm less familiar with ASP / ASP.net, but drawing an analogy to PHP-style technologies -- could there have been a spike of web app "worker processes"?

Cheers,

Steve

Top
  • Login or register to post comments
Wed, 01/05/2011 - 13:03
perry
Offline
Joined: 10/11/2010
Groups:

Can you run netstat on the server and check if the connections are coming from the app servers or from something else?

the maxPoolSize is only applies to 2.11+ (i'll update the docs to make this clear), previous versions had a more higher value, which should not be hit under normal circumstances. what's the latency between the app and the membase servers?

also, can you please either

a) set the maxPoolSize in your config to 20 (<socketPool maxPoolSize="20" />)
b) download the latest beta and give it a go?"

Thanks! 

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Thu, 01/06/2011 - 01:39
Tom
Offline
Joined: 06/07/2010
Groups: None

Steve,
About the spike - we do have spikes, but I've noticed a high number of connections under normal load conditions.

Perry,
I ran netstat - all the connections are coming from the web servers.

About the latency - I'm wasn't sure on how to go about it. I wrote a short test that saves a key and then measures the time it takes to perform 500 get operations on that key. On my dev machine the test takes about 0.06 sec. On a prod machine I got varying results, all around 1 sec. Is this extremely slow?

Regarding the maxPoolSize - it sounds like a possible solution for the connections problem, but won't it slow down the overall access speeds?

Thank you,
Tom.

Top
  • Login or register to post comments
Thu, 01/06/2011 - 06:58
Attila Kisko
Offline
Joined: 04/22/2010
Groups: None

1 second is very high. i'm doing 1K op/sec on my MBP in vmware (whihc is not an ideal environment)

what kind of objects are you storing? simple values, like ints, strings, etc, or complex binary serialized objects? what's their average size? do you have logging enabled? is this a membase or a memcached bucket?

limiting the maxConnetions only causes issues when the (incoming requests/sec) > (cache operation/sec), but the cache operations usually take 1-20 msecs not 1sec.

so first we should figure out why your cache is so slow.

Top
  • Login or register to post comments
Sun, 01/09/2011 - 06:03
Tom
Offline
Joined: 06/07/2010
Groups: None

Hi Attila,

I've reached 500 ops/sec, which means ~20 msecs per get operation - so, it's the upper limit you've noted. I am storing mostly serialized objects. I'm not sure about the average size, but I've made a small test and their size ranges from 2kb to 9kb.

What sort of logging are you referring to?

About the buckets, they're mostly memcached.

I'll try to limit the number of connections gradually on my web servers and I'll update accordingly.

Thank you,
Tom.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker