[MB-7786] [RN 2.0.2] Frequent replication start error messages "Failed to grab remote bucket info, vbucket.." at start of replication. Created: 19/Feb/13  Updated: 17/Jun/13  Resolved: 13/May/13

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0.1
Fix Version/s: 2.1.0
Security Level: Public

Type: Improvement Priority: Major
Reporter: Ketaki Gangal Assignee: Ketaki Gangal
Resolution: Fixed Votes: 0
Labels: 2.0.2-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.1.-160-rel

Attachments: File ec2-23-20-108-25.compute-1.amazonaws.com.out     PNG File Screen Shot 2013-02-25 at 6.11.45 PM.png    
Issue Links:
Duplicate
is duplicated by MB-7457 Spurious errors reported at startup o... Closed
Flagged:
Release Note

 Description   
Seeing 2 issues when trying to setup replication for the first time between two clusters -


1.Frequently seeing error messages on creating the 1st time replication from cluster1 to cluster2.
"Failed to grab remote bucket info, vbucket.."

Both the buckets on source/destination are available for a long period of time, so it does not look like it is an issue w/ bucket not being ready.

I dont have logs on this currently, will add soon.

Seeing this across platforms - linux / windows and on most 2.0.1 runs.

2.Replication replicates data as expected, and these error messages persist for over an hour on the xdcr-last 10 errors. This gives user a wrong idea about the state of replication.

The intial Replication-call should either wait long enough to avoid these errors/ figure out if something else can be done here.

And also, how frequently do we clean up the xdcr-error messages on the console? Can we clear them sooner than current time?



 Comments   
Comment by Ketaki Gangal [ 20/Feb/13 ]
Please release-note this.
Comment by Ketaki Gangal [ 25/Feb/13 ]
Logs from the node it is trying to reach :

clusters
http://ec2-54-235-229-199.compute-1.amazonaws.com:8091/index.html#sec=replications
 
to

http://ec2-107-22-40-124.compute-1.amazonaws.com:8091/index.html#sec=analytics&statsBucket=%2Fpools%2Fdefault%2Fbuckets%2Fsasl%3Fbucket_uuid%3De1f9d1e199f28b83c35f26c61ee90ec9
Comment by Ketaki Gangal [ 25/Feb/13 ]
Hi Aliaksey,

I ve added logs from one of the nodes. Could you take a look?

Please re-assign this to me/ Jin after you do so.

thanks,
Ketaki
Comment by Ketaki Gangal [ 07/Mar/13 ]
Change added here http://review.couchbase.org/#/c/24986/, will be part of next branch.
Comment by Ketaki Gangal [ 12/Mar/13 ]
http://review.couchbase.org/#/c/24986/
Comment by kzeller [ 15/Mar/13 ]
Added as known issue to RN 2.0.1:

When you create a replication between two clusters, you
may experience the incorrect error message
"Failed to grab remote bucket info, vbucket". Replication will start as
and function expected, but the incorrect error message may persist for some time.
Please ignore this incorrect error.
Comment by Aliaksey Artamonau [ 15/Mar/13 ]
I would not call the error incorrect. It's just that replication is able to recover from it.
Comment by kzeller [ 15/Mar/13 ]
Redo as:

When you create a replication between two clusters, you
may experience two error messages:
"Failed to grab remote bucket info, vbucket" and "Error replicating vbucket X". Nonetheless,
replication
will still start and then function as expected, but the error messages may appear
for some time in the Web Console. Please ignore this behavior.
Comment by Aliaksey Artamonau [ 15/Mar/13 ]
Looks good to me.
Comment by kzeller [ 15/Mar/13 ]
Yes indeed.... : )
Comment by kzeller [ 15/Mar/13 ]
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-server-rn_2-0-0l.html
Comment by Aleksey Kondratenko [ 02/Apr/13 ]
Not sure who to assign on.

Code-wise we've fixed it. It was caused by thundering herd of those remote bucket info requests and we don't allow that anymore.

I believe folks wanted to add this to release note.

Anyways Aliaksey is done with that.
Comment by Maria McDuff [ 16/Apr/13 ]
karen, are you finished documenting this?
just flagging this for you for release note.

Will assign to Ketaki for verification/closing.

Thanks.
Comment by kzeller [ 16/Apr/13 ]
I add this to the 2.0.1 release notes as minor known issue to ignore. Is the message now fixed for 2.0.2?
Comment by Aleksey Kondratenko [ 16/Apr/13 ]
As can be seen above it is fixed.
Comment by Maria McDuff [ 13/May/13 ]
pls verify / close.
if issue is fixed, karen does not need to RN for 2.0.2
Comment by kzeller [ 13/May/13 ]
Relabeled in RN 2.0.2 as Fix. For earlier versions was in release notes as known issue.
Generated at Fri Apr 18 11:50:00 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.