[MB-7286] UI logs showing multiple errors, (XDCR bi and uni, at the start of a rebalance-in operation): Server error during processing: ["web request failed" ... Created: 28/Nov/12  Updated: 18/Mar/14  Resolved: 12/Jul/13

Status: Closed
Project: Couchbase Server
Component/s: ns_server, UI
Affects Version/s: 2.0
Fix Version/s: 3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Abhinav Dangeti Assignee: Aruna Piravi
Resolution: Fixed Votes: 0
Labels: 2.0-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: - 5:5 bidirectional XDCR
- ec2 nodes with 15G RAM
- 12.04 Ubuntu LTS
- 400G disk space on each node
- http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1967-rel.deb.manifest.xml


 Description   
+ 5 nodes rebalance in on each cluster
Cluster setup: c1:c2::10:10
biXDCR_bucket: c1 <---> c2
uniXDCR_src: c1 ---> c2 :uniXDCR_dest
Front end loads on c1 and c2 for biXDCR_bucket, and on c1 for uniXDCR_src.
c1: http://ec2-177-71-230-72.sa-east-1.compute.amazonaws.com:8091/
c2: http://ec2-175-41-186-167.ap-southeast-1.compute.amazonaws.com:8091/

UI reports whole bunch of these errors at the start of rebalancing operation and with the front end loads:
Server error during processing: ["web request failed",
{path,"/pools/default"},
{type,exit},
{what,
{timeout,
{gen_server,call,
[ns_doctor,get_tasks_version]}}},
{trace,
[{gen_server,call,2},
{menelaus_web,build_pool_info,4},
{menelaus_web,handle_pool_info_wait,5},
{menelaus_web,loop,3},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]}] (repeated 1 times)

Will attach grabbed diags from the particular server in a bit.


 Comments   
Comment by Abhinav Dangeti [ 28/Nov/12 ]
UI reports that this error seems to be on server node: ns_1@ec2-177-71-230-72.sa-east-1.compute.amazonaws.com
Grabbed diags: https://s3.amazonaws.com/bugdb/MB-7286/ec2-177-71-230-72.sa-east-1.compute.amazonaws.com-8091-diag.txt.gz
Comment by Steve Yen [ 29/Nov/12 ]
from bug-scrub.

ketaki: errors seem to go away after 10 minutes, but cluster is unaccessible?
Comment by Steve Yen [ 29/Nov/12 ]
per-bug-scrub, moved to 2.0.1
Comment by kzeller [ 05/Dec/12 ]
Added to RN:

       During a rebalance operation for clusters undergoing uni- and bi-directional replication
       via XDCR, the following server errors may appear, which are currently under
       investigation:
Comment by Junyi Xie (Inactive) [ 05/Dec/12 ]
Seems to me these errors usually mean the system is busy working on something which is pretty heavy (like rebalance, etc), and is unable to respond to UI request timely. Waiting for triage from ns_server team.
Comment by Farshid Ghods (Inactive) [ 10/Dec/12 ]
deferring to 2.1 per bug scrub meeting ( Dipti & Farshid -December 7th )
Comment by Aleksey Kondratenko [ 20/Jun/13 ]
This logs are insufficient.

Please reproduce on 2.1.0

5 <-> 5 XDCR is IMHO insane
Comment by Abhinav Dangeti [ 12/Jul/13 ]
Close for now, will reopen with additional logs if issue seen again.
Comment by Venu Uppalapati [ 18/Mar/14 ]
Aruna, this is XDCR issue that was incorrectly assigned to me.Thanks.
Comment by Aruna Piravi [ 18/Mar/14 ]
Thanks Venu, looks like Abhinav had closed it. Why was this reopened and by whom?
Comment by Aruna Piravi [ 18/Mar/14 ]
Ok, looks like Abhinav left it as Fixed which Maria reassigned. Closing this issue, will reopen if seen in system tests.
Generated at Fri Sep 19 13:01:10 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.