[MB-3575] Moxi should automatically skip removed or uninitialized nodes Created: 08/Apr/11  Updated: 10/Apr/12  Resolved: 10/Apr/12

Status: Resolved
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 1.6.5.3
Fix Version/s: 1.7 GA
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Steve Yen
Resolution: Fixed Votes: 0
Labels: 1.7.0-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Flagged:
Release Note

 Description   
-Configure Moxi with a list of URL's:
./moxi http://node1:8091/pools/default/bucketsStreaming/default, http://node2:8091/pools/default/bucketsStreaming/default, http://node3:8091/pools/default/bucketsStreaming/default

-Rebalance node 1 out of the cluster (for maintenance)
-Restart Moxi
-Moxi will continuously spin on node1 with the following errors:
2011-04-08 15:30:21: (agent_config.c.389) configuration received
2011-04-08 15:30:21: (agent_config.c.448) ERROR: could not parse JSON from REST server: Requested resource not found.

Two things need to happen here:
-Moxi's logging needs to be improved to tell the user which Node it is connecting to and which ones are giving it problems (not just once at the top, but for every log mesage)
-Moxi needs to realize that this node has an invalid config (it knows it got an error at least) and move onto the next.

 Comments   
Comment by Perry Krug [ 08/Apr/11 ]
Workaround is to stop the service on the node that was removed from the cluster.
Comment by Aleksey Kondratenko [ 15/Apr/11 ]
This also happens with fast bucket deletion patches. Steps are:

* create cluster
* load data
* delete bucket
* create bucket with same name
* try loading data
* moxi fails because it keeps invalid connections in pool
Comment by Steve Yen [ 05/May/11 ]
This requires some internal API changes between moxi and libconflate to pass the error knowledge across several function invocations to the right place. Not risky.
Comment by Perry Krug [ 10/May/11 ]
A Pivotal Tracker story has been created for this Issue: http://www.pivotaltracker.com/story/show/13245071
Comment by Steve Yen [ 17/May/11 ]
Some fixes on the way to the real fix...

http://review.membase.org/6338
http://review.membase.org/6339
Comment by Steve Yen [ 17/May/11 ]
Unfortunately, the code that discovers a problem is in an asynchronous thread that's "far away" from the REST HTTP code.

Will bend code to our will.
Comment by Steve Yen [ 17/May/11 ]
 http://review.membase.org/6342
 http://review.membase.org/6343
Comment by Perry Krug [ 05/Jun/11 ]
Perry Krug deleted the linked story in Pivotal Tracker
Comment by Farshid Ghods (Inactive) [ 10/Apr/12 ]
this test is failing now ( 1.8.1 manifest )

OK
./t/issue-MB-3575.sh: line 212: 19957 Terminated: 15 ./moxi -z http://127.0.0.1:22100/bad,http://127.0.0.1:4567/pools/default/bucketsStreaming/default,http://127.0.0.1:22101/bad -Z port_listen=11266,downstream_conn_max=1,downstream_max=0,downstream_timeout=300,wait_queue_timeout=300,downstream_conn_queue_timeout=300,connect_timeout=300,auth_timeout=300 2>> /tmp/issue-MB-3575.out
No matching processes belonging to you were found
----------------------
FAIL count expect 0, got 20
make: *** [test] Error 1
Comment by Farshid Ghods (Inactive) [ 10/Apr/12 ]
never mind .didn't have gem sinatra installed
Generated at Tue Sep 30 20:49:13 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.