[MB-4366] ns_server is reusing tap names unsafely which causes data loss or inconsistency in replication when a node is removed and added back Created: 19/Oct/11 Updated: 09/Jan/13 Resolved: 12/Apr/12 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 1.7.2, 1.8.0 |
| Fix Version/s: | 1.8.1 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Farshid Ghods | Assignee: | Aleksey Kondratenko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 1.8.1-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Description |
|
screenshot attached
NOTE: we're converting this to main 'named tap issues' ticket. So what's not safe about reusing named taps as of 1.8.0? If something happened to destination node after tap was disconnected. And if that something affected data for vbuckets replicated as part of named tap, then subsequent reuse of named tap will incorrectly assume that we can continue sending stuff instead of re-negotiating which data needs to be resent. |
| Comments |
| Comment by Farshid Ghods [ 19/Oct/11 ] |
| another screenshot : 5 minutes after stopping the rebalance |
| Comment by Farshid Ghods [ 20/Oct/11 ] |
|
tap stream only stops if there is no item added to the backlog
if the user keeps the load running this tap stream remains alive forever |
| Comment by Aleksey Kondratenko [ 22/Dec/11 ] |
| Farshid, cannot make sense of this screenshots. Can you elaborate? |
| Comment by Farshid Ghods [ 22/Dec/11 ] |
|
basically that means there is still one tap_rebalance stream open and running even after rebalance was stopped.
we seem to be stopping most of the streams except one |
| Comment by Farshid Ghods [ 22/Dec/11 ] |
| waiting 5 minutes will not work if there are ongoing mutuations in the cluster because this tap stream only times out after 5 minutes of inactivity |
| Comment by Aleksey Kondratenko [ 22/Dec/11 ] |
| so it's ep-engine issue then ? I mean we close tap streams as much as possible in ns_server. Named tap streams are kept alive by ep-engine. If there's anything ns_server can do to really stop those tap producers, I'll be happy to do that. |
| Comment by Steve Yen [ 30/Mar/12 ] |
| this is the main ticket for the named tap approach/fix |
| Comment by Steve Yen [ 30/Mar/12 ] |
| is this a blocker for 1.8.1? |
| Comment by Dipti Borkar [ 30/Mar/12 ] |
|
Yes, because this may be causing data loss in some conditions.
Farshid, I believe there are a few other tickets where this is the underlying problem. Can you reference them here for completeness? Thanks |
| Comment by Aleksey Kondratenko [ 07/Apr/12 ] |
|
http://review.couchbase.org/14555 fixes it on 1.8.1. 1.8 and master have a bit different code in this area so this work still needs some forward-porting. |
| Comment by Steve Yen [ 09/Apr/12 ] |
| fix is in gerrit (but more work still needed to enable 1.8.2) |
| Comment by Aleksey Kondratenko [ 09/Apr/12 ] |
| let's keep this open for now. While I'll adapt it for 1.8.2 I may have to change 1.8.1 code to enable forward-compatibility with 1.8.2 and master |
| Comment by Dipti Borkar [ 11/Apr/12 ] |
|
Aliaksey, code complete is friday and we need to merge everything in by then.
What changes need to be made to ensure forward-compatibility? |
| Comment by Aleksey Kondratenko [ 11/Apr/12 ] |
| Minor. I'll be doing that tomorrow first-priority. |
| Comment by Aleksey Kondratenko [ 12/Apr/12 ] |
| I've found no further changes to 1.8.1 are needed. 1.8.2 implementation is here http://review.couchbase.org/14827 |
| Comment by Thuan Nguyen [ 20/Apr/12 ] |
|
Integrated in github-ns-server-2-0 #333 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/333/]) only reuse tap name when changing vbucket filter. reimplemented named tap fix for branch-18. Result = SUCCESS Aliaksey Artamonau : Files : * src/ns_server_cluster_sup.erl * src/ebucketmigrator_srv.erl * src/ns_vbm_sup.erl Aliaksey Artamonau : Files : * src/ns_vbm_new_sup.erl * src/ns_vbm_sup.erl * src/ebucketmigrator_srv.erl * src/ns_server_cluster_sup.erl * src/cb_gen_vbm_sup.erl |
| Comment by Thuan Nguyen [ 25/Apr/12 ] |
|
Integrated in github-ns-server-2-0 #337 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/337/]) fixed typo in start_vbucket_filter_change. Result = SUCCESS Steve Yen : Files : * src/ebucketmigrator_srv.erl |