[MB-5546] Increasing the default timeouts on ns_server to avoid rebalance failures due to ep-engine stats timeout issues in large cluster or clusters where some nodes are actively using swap Created: 13/Jun/12  Updated: 09/Jan/13  Resolved: 13/Jun/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.8.1-release-candidate
Fix Version/s: 1.8.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Karan Kumar (Inactive) Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows small/large cluster
Linux small/large cluster

Bucket 1, default
vbuckets 1024
RAM 18.7G
Nodes 4 ( 2 form the base-cluster)
Items Setup for 20M items


 Description   
Related issues:-
MB-5360
MB-5352

We have multiple bugs related to the timeouts we are hitting on ns_server :-
1) When in swap
2) On windows even on a small cluster.

This bug is to recommend increasing the default timeouts.

We used the following timeouts on most of the params, its not all in one solution, but hopefully would cover basic secnarios.
ns_memcached_outer, 60000
ns_memcached_open_checkpoint, 60000
ns_memcached_outer_heavy, 60000
ns_memcached_outer_very_heavy, 120000
ns_memcached_connected, 10000
ebucketmigrator_connect, 60000

Summary, some error messages and fixes that worked:-
1) Rebalance exited with reason {exited} {'EXIT',<0.22700.12>,{timeout,{gen_server,call,[{'ns_memcached-default','ns_1@10.3.2.81'},{stats,<<"tap">>},30000]}}}}
Fix : adjust timeout value - 120sec - ns_memcached_outer_very_heavy
2) Rebalance exited with reason {exited,
{replicator_died,

Fix: Adjust timeout value - 120 sec - ns_memcached_outer_heavy

3) Rebalance exited with reason {exited,
{'EXIT',<0.24287.15>,
{timeout,
{gen_server,call,
[{'ns_memcached-default','ns_1@10.3.2.81'},
{stats,<<"tap">>},
30000]}}}}

Fix : Adjust timeout to 120sec
4) Rebalance exited with reason {{change_filter_failed,
{'EXIT',
{timeout,

Fix : Adjust timeout values -
ebucketmigrator_connect 120 secs
ns_memcached_connected 1 sec

 Comments   
Comment by Aleksey Kondratenko [ 13/Jun/12 ]
I'm a little bit reluctant to change ns_memcached_connected timeout. It's timeout we're using when asking if ns_memcached is alive. It'll just mark bucket as not quite healthy without failing anything. So raising timeout has some effects on autofailover and other things. Something I don't want to do.
Comment by Aleksey Kondratenko [ 13/Jun/12 ]
Timeouts were bumped in a commit merged for branch-181 and merged up to master. Except, as noted above, ns_memcached_connected timeout
Comment by Karan Kumar (Inactive) [ 13/Jun/12 ]
Thanks Alk.
Duly noted the concerns.
http://review.couchbase.org/#change,17230
Comment by Thuan Nguyen [ 13/Jun/12 ]
Integrated in github-ns-server-2-0 #374 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/374/])
    MB-5546: raised some timeouts to cope with some paging (Revision 0998b9c92a78185eae31dcbdd55ad92e07e0e6a8)

     Result = SUCCESS
Aliaksey Artamonau :
Files :
* src/ns_memcached.erl
* src/ebucketmigrator_srv.erl
Generated at Fri Apr 18 04:49:40 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.