Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 1.8.x

Couchbase cluster node is not stable

9 replies [Last post]
  • Login or register to post comments
Wed, 12/05/2012 - 00:48
dew_ice
Offline
Joined: 09/05/2012
Groups: None

we setup a couchbase cluster with 2 nodes.
when we run test on it, we found that the nodes are not stable.
the error log in web console are following:

Event Module Code Server Node Time
Node 'ns_1@Node1' saw that node 'ns_1@Node2' came up. ns_node_disco004 ns_1@Node1 14:21:48 - Wed Dec 5, 2012
Node 'ns_1@Node2' saw that node 'ns_1@Node1' came up. ns_node_disco004 ns_1@Node2
Node 'ns_1@Node2' saw that node 'ns_1@Node1' went down. ns_node_disco005 ns_1@Node2
Node 'ns_1@Node1' saw that node 'ns_1@Node2' went down. ns_node_disco005 ns_1@Node1
Node 'ns_1@Node1' saw that node 'ns_1@Node2' came up. ns_node_disco004 ns_1@Node1
Node 'ns_1@Node1' saw that node 'ns_1@Node2' went down. ns_node_disco005 ns_1@Node1
Node 'ns_1@Node2' saw that node 'ns_1@Node1' came up. ns_node_disco004 ns_1@Node2
Node 'ns_1@Node2' saw that node 'ns_1@Node1' went down. ns_node_disco005 ns_1@Node2 14:08:34 - Wed Dec 5, 2012

These logs just a segment. the situation happen very often.
and during the error happen, failed rate of operation on couchbase is very high.

so my question is:
1. why one node judge the other node down frequently?
actully, these 2 nodes are in same network. There are NOT connect issue between them.
their ip like '10.70.22.100' and '10.70.22.101'.

2. how one onde judge the other node down?
Is there a time , e.g. 5 seconds, if one node can't get response form the other, then it judge as down?
If yes, can we re-config the time?
3. during it happen, how can we improve the success rate?

Anyone encounter this issue?
Thanks for your time!

Top
  • Login or register to post comments
Fri, 12/07/2012 - 12:55
balak
Offline
Joined: 08/23/2012
Groups: None

Which version of Couchbase are you running ?

thanks,
BALA

Top
  • Login or register to post comments
Sun, 12/09/2012 - 19:44
dew_ice
Offline
Joined: 09/05/2012
Groups: None

couchbase-server-enterprise_x86_64_1.8.1

thanks.

Top
  • Login or register to post comments
Thu, 12/13/2012 - 01:46
dew_ice
Offline
Joined: 09/05/2012
Groups: None

couchbase-server-enterprise_x86_64_1.8.1

thanks.

balak wrote:
Which version of Couchbase are you running ?

thanks,
BALA

Top
  • Login or register to post comments
Fri, 12/28/2012 - 04:00
Neo-matrix
Offline
Joined: 10/15/2012
Groups: None

Hi,
Your understanding is correct. The heartbeat(watchdog kind of) is sent to the nodes every 5 seconds. If its not responding it is considered to create a log "Node .... went Down" then again for the next heart beat it responded hence the log shows "Node .... came Up".

Could you please provide me with a overview of you cluster.

Couchbase version installed : 1.8.1, is it installed with Hotfix?
How many buckets?
RAM Quota per bucket & Node?
Resident Ratio?
Cache Miss Ratio?
What is the client SDK? because there are timeouts by default for the Client SDKs.

Thanks,
Neo

Top
  • Login or register to post comments
Sat, 01/05/2013 - 19:04
dew_ice
Offline
Joined: 09/05/2012
Groups: None

Couchbase version installed : 1.8.1, is it installed with Hotfix?
-- how to judge it ?
How many buckets?
-- 1 bucket
RAM Quota per bucket & Node?
-- 3G
Resident Ratio?
-- how to compute it?
Cache Miss Ratio?
-- how to compute it?
What is the client SDK? because there are timeouts by default for the Client SDKs.
-- java sdk 1.0.3, i have set client timeout to 10 seconds.

Top
  • Login or register to post comments
Mon, 01/07/2013 - 12:27
Neo-matrix
Offline
Joined: 10/15/2012
Groups: None

Hi,
To verify, "1.8.1, is it installed with Hotfix"?
you need to check the md5sum of /opt/couchbase/bin/memcached & /opt/couchbase/lib/memcached/ep.so

Verify the md5sum of /opt/couchbase/bin/memcached

Centos-32bit a0a6e8ac10d1537acb5fc8053eb658dc
Centos-64bit 30705e3d9db9fcb829f94ec1469697d7
Ubuntu-32bit e6e9d8c9bda52bd7245120ce1f023b43
Ubuntu-64bit 18dc15eeb83c7ced852b9fe3298ccb12

- Verify the md5sum of /opt/couchbase/lib/memcached/ep.so

-

Centos-32bit 069436f56348c25427f17b8e2bd3361a
Centos-64bit d3b3afe576833ce202d54384856a3c17
Ubuntu-32bit 1f1f48ae3d1272a87d0b464d7e69cf2f
Ubuntu-64bit 92ad28508d99fd0a5ea7d24894fdd6e9

You should get the results as above depending on the OS.

RAM Quota per bucket & Node?
3GB for bucket or node?

Resident Ratio? & Cache Miss Ratio?
Check in the monitoring graphs

Thanks
Neo

Top
  • Login or register to post comments
Tue, 01/08/2013 - 00:44
dew_ice
Offline
Joined: 09/05/2012
Groups: None

1.
[root@localhost couchbase]# md5sum bin/memcached
187c5361e961d4d4cdc75398f822c16b bin/memcached
[root@localhost couchbase]# md5sum lib/memcached/ep.so
ef4a88a9ea343540d2914fe066983508 lib/memcached/ep.so

2.
3G per node.

3. Resident Ratio? & Cache Miss Ratio?
we re-created the bucket, so can't find the resident of that time.
is it very important for this issue?

Neo-matrix wrote:
Hi,
To verify, "1.8.1, is it installed with Hotfix"?
you need to check the md5sum of /opt/couchbase/bin/memcached & /opt/couchbase/lib/memcached/ep.so

Verify the md5sum of /opt/couchbase/bin/memcached

Centos-32bit a0a6e8ac10d1537acb5fc8053eb658dc
Centos-64bit 30705e3d9db9fcb829f94ec1469697d7
Ubuntu-32bit e6e9d8c9bda52bd7245120ce1f023b43
Ubuntu-64bit 18dc15eeb83c7ced852b9fe3298ccb12

- Verify the md5sum of /opt/couchbase/lib/memcached/ep.so

-

Centos-32bit 069436f56348c25427f17b8e2bd3361a
Centos-64bit d3b3afe576833ce202d54384856a3c17
Ubuntu-32bit 1f1f48ae3d1272a87d0b464d7e69cf2f
Ubuntu-64bit 92ad28508d99fd0a5ea7d24894fdd6e9

You should get the results as above depending on the OS.

RAM Quota per bucket & Node?
3GB for bucket or node?

Resident Ratio? & Cache Miss Ratio?
Check in the monitoring graphs

Thanks
Neo

Top
  • Login or register to post comments
Tue, 01/08/2013 - 16:35
Neo-matrix
Offline
Joined: 10/15/2012
Groups: None

Hi,
So from the md5sum o/p, i consider the hot fix is not applied. Hence i would recommend you to apply the hotfix for 1.8.1.

Then the Resident & cache Ratio will help us understand the reason of timeouts in some cases.

Have recreated the bucket after getting the errors? if yes, then
After recreating the bucket, how is the performance of the cluster?

Thanks,
Neo

Top
  • Login or register to post comments
Wed, 01/23/2013 - 18:39
dew_ice
Offline
Joined: 09/05/2012
Groups: None

how to apply the hotfix ? is it free ?

cache miss ratio is 0;
resident is 100%;

recreate the bucket don't resolve the problem.
the node is still not stable.

Neo-matrix wrote:
Hi,
So from the md5sum o/p, i consider the hot fix is not applied. Hence i would recommend you to apply the hotfix for 1.8.1.

Then the Resident & cache Ratio will help us understand the reason of timeouts in some cases.

Have recreated the bucket after getting the errors? if yes, then
After recreating the bucket, how is the performance of the cluster?

Thanks,
Neo

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker