Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

High cpu load even when Java database client is idle

25 replies [Last post]
  • Login or register to post comments
Wed, 02/01/2012 - 02:03
stefan
Offline
Joined: 02/01/2012
Groups: None

I am using the Java „Release 1.1 Developer Preview“ library to connect to Couchbase 2.0 preview 3. All is working fine, including view access. However as soon as I connect to the database one core on my quad core Mac with Lion goes up to 100% cpu load.

This is enough to demonstate it:

import net.spy.memcached.CouchbaseClient;
...
ArrayList baseURIs = new ArrayList();
baseURIs.add(new URI("http://localhost:8091/pools"));
couchbaseClient = new CouchbaseClient(baseURIs, "default", "");
Scanner scanner = new Scanner(System.in);
String input = scanner.next();
couchbaseClient.shutdown(10, TimeUnit.SECONDS);

"default" is the only bucket in the server and I am using just one node. After entering input the cpuload is going down to 0% again.

I have added those jars to my build path:

commons-codec-1.6.jar
httpclient-4.1.2.jar
httpcore-4.1.4.jar
httpcore-nio-4.1.4.jar
jettison-1.3.1.jar
netty-3.2.7.Final.jar
spymemcached-2.8-preview3.jar

My problem looks similar to this one here:
http://www.couchbase.com/forums/thread/using-couchbaseclient-connect

Any solution for this?

Stefan

PS: A small add on as this might be related. Suddenly the server also gets 100-150% cpuload (beam.smp) right from the start, even without any client access at all. Is there already a schedule for a new developer preview release?

Top
  • Login or register to post comments
Wed, 02/01/2012 - 12:00
mikew
Offline
Joined: 03/14/2011
Groups:

First off, I assume your using the Spymemcached 2.8 developer preview (probably preview 3). We are aware of this issue and are working on a fix for it. It will be available in the next java developer preview release. Also, we have split the Spymemcached into Spymemcached and Couchbase Client. Couchbase Client will contain all of the Couchbase code and Spymemcached will be specifically for memcached server. Couchbase Client 1.1-preview should contain the fix to this bug so when we release it you should upgrade to this version.

You may track this issue here:
http://www.couchbase.com/issues/browse/SPY-64

Top
  • Login or register to post comments
Wed, 02/01/2012 - 12:36
stefan
Offline
Joined: 02/01/2012
Groups: None

Great, thank you. Do you already have a release schedule when the update will be available?

Is there also a fix that the server also uses 100% cpu of one core? This seems unrelated, as this also happens before any client connects to the server.

Stefan

Top
  • Login or register to post comments
Mon, 04/16/2012 - 15:14
marchywka
Offline
Joined: 04/06/2012
Groups: None

From what I can tell on the bug report this is unresolved and I just hit it with one client LOL.
Apparently the offending stack trace is this although I am just reacting to a quick look at dump,

Thread 10100: (state = IN_JAVA)
- com.couchbase.client.ViewConnection.handleIO() @bci=33, line=142 (Compiled frame; information may be imprecise)
- com.couchbase.client.ViewConnection.run() @bci=15, line=253 (Compiled frame)

My main thread is trying to do an upsert, I go get a record and modify it,
line 302 is this with hs the string value
rv = client.cas(key, s.getCas(), hs);

and s having been obtained as,

CASValue s = client.gets(key);

Thread 10093: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
- java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=226 (Interpreted frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(int, long) @bci=122, line=1033 (Interpreted frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(int, long) @bci=25, line=1326 (Interpreted frame)
- java.util.concurrent.CountDownLatch.await(long, java.util.concurrent.TimeUnit) @bci=10, line=282 (Interpreted frame)
- net.spy.memcached.internal.OperationFuture.get(long, java.util.concurrent.TimeUnit) @bci=6, line=87 (Interpreted frame)
- net.spy.memcached.MemcachedClient.cas(java.lang.String, long, int, java.lang.Object, net.spy.memcached.transcoders.Transcoder) @
bci=19, line=559 (Interpreted frame)
- net.spy.memcached.MemcachedClient.cas(java.lang.String, long, java.lang.Object, net.spy.memcached.transcoders.Transcoder) @bci=8
, line=539 (Interpreted frame)
- net.spy.memcached.MemcachedClient.cas(java.lang.String, long, java.lang.Object) @bci=9, line=583 (Interpreted frame)
- com.phluant.third.couch.Main.rmw(com.couchbase.client.CouchbaseClient, java.lang.String, java.util.Map, java.lang.String[], long
[]) @bci=174, line=302 (Interpreted frame)
- com.phluant.third.couch.Main$1.run() @bci=78, line=243 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)

Top
  • Login or register to post comments
Mon, 04/16/2012 - 16:21
marchywka
Offline
Joined: 04/06/2012
Groups: None

I just went ahead and patched the code, no idea how this is supposed to work but it is not hard
to see how a tight loop could develop. This seems to work, I ran 10 threads ok, the
transactions seem right with no obvious loss of performance ( 50 ms for cas based upsert to
remote server ) but signigicant CPU reduction ( I'm sure if you have any idea what the code
does you can do better for yield or wait etc, my code it the sleep/do_something stuff that was uselss

com/couchbase/client/ViewConnection.java

public void handleIO() {
boolean did_something=false;
for (ViewNode node : couchNodes) {
node.doWrites(); did_something=true;
}

for (ViewNode qa : nodesToShutdown) {
nodesToShutdown.remove(qa);
Collection notCompletedOperations = qa.destroyWriteQueue();
try {
qa.shutdown();
} catch (IOException e) {
getLogger().error("Error shutting down connection to "
+ qa.getSocketAddress());
}
redistributeOperations(notCompletedOperations);
did_something=true;
}
//System.out.println(" mike couch");
//if (!did_something)
{try { Thread.sleep(1); } catch (Exception e) {} }

}

Top
  • Login or register to post comments
Tue, 04/17/2012 - 11:36
mikew
Offline
Joined: 03/14/2011
Groups:

Sleeping when something doesn't happen on the view IO thread doesn't seem like the right thing to do to me. You post did however point me to what I think is the real problem. When the operation queue is empty we should be blocking until an operation becomes available. I just submitted a change to fix the issue, but haven't tested it yet. The change is here if your interested:

http://review.couchbase.org/#change,14959

Also, I appreciate the code attached to your post. It provided a great hint to the root of the problem.

Top
  • Login or register to post comments
Tue, 04/17/2012 - 13:40
marchywka
Offline
Joined: 04/06/2012
Groups: None

yeah, that sounds right but I I had no idea what to wait for and the sleep made it work for now. I guess I could have looked into the methods called from the loop but thought someone may know off hand.

Top
  • Login or register to post comments
Tue, 04/17/2012 - 16:20
mikew
Offline
Joined: 03/14/2011
Groups:

No problem. I actually suspected something else more complex was the issue which was why I hadn't taken the time to look at it. You code made the problem very obvious to me and I really appreciate it.

Top
  • Login or register to post comments
Wed, 04/18/2012 - 04:46
marchywka
Offline
Joined: 04/06/2012
Groups: None

Thanks, I probably could have spent a few minutes trying to find that but glad it helped.
I guess in that loop there were only a few calls but still trying to figure out a
wait/notify would have involved a lot of effort being unfamiliar with the code.
It worked for me LOL.

Top
  • Login or register to post comments
Wed, 04/18/2012 - 11:34
marchywka
Offline
Joined: 04/06/2012
Groups: None

btw, take throws interrupted exception and you probably do want to poll although that
would require more code mods. I'm looking at threading models more generally, and my first case has many clients so I may dig into this although it is not high priority for us right now.

http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html#take()

Top
  • Login or register to post comments
Thu, 04/19/2012 - 12:49
mikew
Offline
Joined: 03/14/2011
Groups:

Why wouldn't we want to do to take() just because it throws an exception? In my code I caught the exception an logged that we were interrupted. We might want to do a little bit more than just log the exception, but I don't see how take() is not the best solution. If you have different thoughts on this I am interested in hearing them.

Top
  • Login or register to post comments
Thu, 04/19/2012 - 13:18
marchywka
Offline
Joined: 04/06/2012
Groups: None

The exception wasn't the issue, take() seems fine if you have one thread per client. I was thinking more generally about a polling / selector scheme here but again I'm not real sure what the load on this thread would be in real life etc. Still just looking.

Thanks.

Top
  • Login or register to post comments
Thu, 04/19/2012 - 16:35
mikew
Offline
Joined: 03/14/2011
Groups:

That thread is just responsible for moving view requests to a per server connection pool and taking care of topology changes in the cluster. I wrote all of this code a while ago I'm sure there are many ways we could improve the performance. If you see anything that seems like it can be improved or have any other questions let me know. We also have a code review system set up at review.couchbase.com if you make any changes and want to contribute them.

Top
  • Login or register to post comments
Thu, 04/19/2012 - 16:42
marchywka
Offline
Joined: 04/06/2012
Groups: None

well, thanks but it would be quite substantial change and without knowing how the IO and threads interact it would be a bit of an effort to design well but generally any blocking call would seem to be suspect.
I just happen to be thinking about the concept in general as we have a netty and thread-per-page front end
on our server right now and trying to find bottle necks etc. For couchbase, my interest was
doing fast upserts and I would be more interested in extending the incr method to take a vector of
longs instead of just one value.

Top
  • Login or register to post comments
Sat, 04/28/2012 - 20:55
Velasticus
Offline
Joined: 04/06/2012
Groups: None

FYI,

There is a JIRA open for this at: http://www.couchbase.com/issues/browse/JCBC-26
I have attached a patch to the JIRA as well as a pull request on github (https://github.com/couchbase/couchbase-java-client/pull/1).

This fixed it for me.

Top
  • Login or register to post comments
Sun, 04/29/2012 - 03:18
marchywka
Offline
Joined: 04/06/2012
Groups: None

well, what do you do when it times out? One thread babysitting one client I guess could take() and throw if interrupted. I used sleep because I was too lazy to dig down one layer and poll() in theory makes a lot of sense if you are using a thread shared with other stuff.

Top
  • Login or register to post comments
Fri, 05/04/2012 - 07:56
drakmir
Offline
Joined: 01/07/2012
Groups: None

I've posted a possible fix for the 100% cpu issue here:

http://www.couchbase.com/issues/browse/JCBC-20
http://www.couchbase.com/issues/browse/JCBC-26

Top
  • Login or register to post comments
Tue, 08/07/2012 - 11:03
ingenthr
Offline
Joined: 03/16/2010
Groups:

We're planning to get a DP2 later this week. I've tossed together a build with some of the latest features and these fixes (hasn't been fully tested, some code still in review) here:
http://dl.dropbox.com/u/1537838/CouchbaseJavaObserve.zip

If anyone wants to give it a shot and pass along feedback, that'd be greatly appreciated.

Top
  • Login or register to post comments
Sat, 08/18/2012 - 20:57
cb
Offline
Joined: 08/10/2012
Groups: None

Is it possible to update DP2 into the maven repository? I'm experiencing 100% CPU using the 1.1-dp release and would like to test the 1.1-dp2. According to this - the issue should be resolved.

Top
  • Login or register to post comments
Sat, 08/18/2012 - 21:10
mikew
Offline
Joined: 03/14/2011
Groups:

Our java developers are putting the final touches on a 1.2-dp release and are aiming to release it early this week so it could be finished as soon as Monday.

Top
  • Login or register to post comments
Sun, 08/19/2012 - 01:25
cb
Offline
Joined: 08/10/2012
Groups: None

this is great news. looking forward. my Mac is constantly on 100% CPU - at this state I can never move to production...

Top
  • Login or register to post comments
Wed, 08/22/2012 - 00:12
ingenthr
Offline
Joined: 03/16/2010
Groups:

The fix for this is in the 1.1-dp2, now posted: couchbase.com/develop/java/next

I hope that helps, and please give us more feedback!

Top
  • Login or register to post comments
Fri, 08/24/2012 - 15:14
ozgurcd
Offline
Joined: 08/24/2012
Groups: None

Upgrading to 1.1-dp2 seems solving the CPU problem. However, dp2 cannot find the views. After I upgraded to dp2, I can no longer use the existing views.

CouchbaseClient.getViews(); shows 0 views.

To be sure, I downgraded to 1.1-dp. Views are back with a good old %100 cpu utilization problem.

Top
  • Login or register to post comments
Fri, 08/24/2012 - 15:54
ingenthr
Offline
Joined: 03/16/2010
Groups:

I suspect I know what the views issue is. Are you using a bucket with authentication by chance?

If so, the best thing to do would be to upgrade to build 1495 or later (see http://www.couchbase.com/downloads-all). Long story short, there were some authentication changes, and thus 1.1-dp2 must be used with build 1495 and later if you're using a bucket with authentication.

Top
  • Login or register to post comments
Fri, 08/24/2012 - 20:19
cb
Offline
Joined: 08/10/2012
Groups: None

I can confirm the 1.1.-dp-2 solved the high CPU problem. In regards to views - I don't see any problem but I'm using the default bucket.

Top
  • Login or register to post comments
Fri, 09/07/2012 - 08:54
ozgurcd
Offline
Joined: 08/24/2012
Groups: None

Hello,

Yes, absolutely. I am using authentication.

Thank you for the solution, I installed build 1495 and retried, problem seems resolved but unfortunately 1495 and 1554 are not stable enough ): (I know they meant not to be stable (: )

I tried both of them and they failed (processes crashed) after I modify & re-publish a view to production.

I am hoping a somewhat more stable build.

Cheers,

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker