[MB-7631] Bodies for deleted documents shouldn't be read (causes garbage data to be read) Created: 29/Jan/13  Updated: 14/Feb/13  Resolved: 14/Feb/13

Status: Resolved
Project: Couchbase Server
Component/s: storage-engine
Affects Version/s: 2.0
Fix Version/s: 2.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Marty Schoch Assignee: Filipe Manana
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Observed on Mac and Linux platforms


 Description   
Steps to reproduce:

1. Set up XDCR (observed when sending to ElasticSearch but not believe to be related to this)
2. Create JSON document
3. Observe that document is replicated as expected to remote side

[{meta={id=2, rev=2-000015818d7d3c0a0000000000000000, expiration=0, flags=0}, base64=eyJjbGljayI6InRvIGVkaXQiLCJuZXcgaW4gMi4wIjoidGhlcmUgYXJlIG5vIHJlc2VydmVkIGZpZWxkIG5hbWVzIn0=}]

in the _bulk_docs request the document appears to be JSON and the base64 data correctly represents the document

4. Delete the document
5. Observe that the document deletion is replicated across to remote side, but with some unusual settings:

[{meta={id=2, rev=3-000015818d7d3c0b0000000000000000, att_reason=non-JSON mode, expiration=0, flags=0, deleted=true}, base64=V1QBAOAAAENfbG9jYWwvdmJzdGF0ZXsiBQdoIjogImFjdGl2ZSIsICJjaGVja3BvaW50X2lkARsAMQEWXG1heF9kZWxldGVkX3NlcW5vIjogIjAifQ==}]

Note that the document is indicated as binary (non-JSON mode) and that the the base64 encoded data appears to be some random memory on the server:

echo "V1QBAOAAAENfbG9jYWwvdmJzdGF0ZXsiBQdoIjogImFjdGl2ZSIsICJjaGVja3BvaW50X2lkARsAMQEWXG1heF9kZWxldGVkX3NlcW5vIjogIjAifQ==" | base64 -D
WT?C_local/vbstate{"h": "active", "checkpoint_id\max_deleted_seqno": "0"}



 Comments   
Comment by Marty Schoch [ 29/Jan/13 ]
Also, although the decoded data here is very text-like, in other tests the base64 data observed was just garbage.
Comment by Junyi Xie (Inactive) [ 29/Jan/13 ]
Hi Aaron,

When XDCR fetched deletion mutation from storage, XDCR would see some garbage in document body as Marty described and it is marked as NON-JSON. This seems waster of bandwidth because destination side (either a regular XDCR receiver or ES cluster) just need the key and metadata, it does not need anything encoded in base64.

Please be free assign to the right person if necessary. Thanks.


Comment by Junyi Xie (Inactive) [ 29/Jan/13 ]
add storage engine in component list since the garbage comes from storage.
Comment by Dipti Borkar [ 30/Jan/13 ]
is this something that we need to fix in 2.0.1? Do we run into the elasticsearch problems because of this?
Comment by Marty Schoch [ 30/Jan/13 ]
This does not cause any problem for Elasticsearch, when processing a delete we do not look at the binary flag or the body of the base64 data. As Junyi mentioned, it may waste bandwidth transferring bytes thats are not used for anything.
Comment by Aaron Miller (Inactive) [ 30/Jan/13 ]
Couchstore considers it "valid" for deleted items to have bodies, and will use whatever content the upstream user provided. It should be fine for XDCR to consider deleted items to be "empty" and send no body, though, as from the perspective of the rest of the components in the system, a deleted item is effectively empty.

There may be some code in CouchDB/XDCR misinterpreting the empty body and reading some garbage from the beginning of the file erroneously though (Location 0 is considered to be "empty"), as I believe ep-engine correctly sets the body to empty when doing deletes. I will investigate. Do we know if this happens with *every* deleted item, or just some?
Comment by Dipti Borkar [ 30/Jan/13 ]
BTW, I forget, do we propagate expired documents? if yes, then for use cases in which we have TTL, this may be a huge overhead ?

I don't know if this is something we want to fix in 2.0.1 given the timing, perhaps 2.0.2?
Comment by Marty Schoch [ 30/Jan/13 ]
@aaron after we discussed the issue and theorized what was happening, the first doc I put through these steps exhibited this behavior. i can't say it always happens, but it has been very easy to reproduce.

@dipti regarding "do we propagate expired documents?" a good question the more i think about it. while the remote cluster should expire the item on its own, when the expiration hits the disk on the source cluster, i would expect that docid/revision to go across in a _revs_diff request. in the case of the elasticsearch plugin this would seem result in the doc/rev being considered missing (we already deleted it), since its missing, the _bulk_docs would then send the delete across the wire (introducing the overhead in your question) but, i'm not sure thats actually how it works in practice
Comment by Junyi Xie (Inactive) [ 30/Jan/13 ]
@Dipti, XDCR does replicate expired item as a deletion. But since the item can expire on its own on remote cluster, there is a race condition that no guarantee the deletion will be replicated to destination, depending on on which cluster, the expiration happens first. For ES, it does not impact the correctness, but may cause some network bandwidth overhead.

 
Comment by Dipti Borkar [ 30/Jan/13 ]
Bandwidth for sure, but also latency. We need to optimize for latency.
How hard is the fix?

KV + XDCR (particularly for caching use case is very important)
Comment by Marty Schoch [ 30/Jan/13 ]
@Dipti I'm not sure I understand the part about latency. If you've configured TTL on the Elasticsearch cluster, the expired document will be removed from the index within the "indices.ttl.interval" (default 60s). The only other thing is that at some later time the document expiration will also come across the wire via XDCR. This will involve both a _revs_diff and a _bulk_docs request. The net effect of these is nothing (the document has already been removed from the index) So what latency would be affected?
Comment by Jin Lim [ 03/Feb/13 ]
Based on the above communication thread, there are two obvious things we can discuss here:
* overhead of replicating expired items (especially with the race condition mentioned above) - yes waste of bandwidth but can this affect maintaining consistent XDCR performance including latency when the number of expired items are huge?
* Aaron's suggestion - XDCR to consider a deleted/expired item to be empty thus send no body over wire.

We will discuss above two points and make a decision for feasible steps. Thanks.
Comment by Jin Lim [ 03/Feb/13 ]
Assign it back to Junyi to follow up for next bug scrub meeting.
Comment by Jin Lim [ 04/Feb/13 ]
Per bug scrubs, we will provide a fix that XDCR sends nobody across clusters for a deleted/expired item.
Comment by Junyi Xie (Inactive) [ 04/Feb/13 ]
Talked to Damien and Jin briefly.

XDCR just read all mutations from storage and try to send to remote side. XDCR itself does not create or parse the document body. All info XDCR need is the key and metadata, not the body of document. The content in doc boy is totally beyond XDCR's control.

Looks like the "garbage in doc body" is persisted in storage because by Marty's testcase, the garbage is something created in memory. So even without XDCR, we may waste potentially a lot of storage space by persisting a bunch of garbage for deleted items. XDCR just expose this issue.

On XDCR side, ofcourse we can parse the doc body before sending it to avoid the garbage, but IHMO it is NOT a good fix, because it just "hides" the issue instead of fixing it. Parsing doc body is not supposed to happen in XDCR layer.

Per Aaron's comment, "ep-engine correctly sets the body to empty when doing deletes", therefore I would suggest to CouchDB/CouchStore team look at this issue first why we read garbage for deleted item.

Comment by Junyi Xie (Inactive) [ 04/Feb/13 ]
remove xdcr component
Comment by Junyi Xie (Inactive) [ 04/Feb/13 ]
Per comment's from Damien, it is also possible a bug in ep_engine which messed up the doc_body ptr of deleted item. Add ep_engine in the component list.
Comment by Dipti Borkar [ 11/Feb/13 ]
is this a small change that we can fix in 2.0.1?
Comment by Junyi Xie (Inactive) [ 11/Feb/13 ]
Dipti,

My feeling is if we can find the culprit, it may be a small fix. But we need to figure out where the problem is. As I mentioned above, we incorrectly store some garbage of deleted item. XDCR just exposed that.

IMHO, ep_engine should first look at this issue to see if there is any dangling pointer of doc_body. If not, couchdb and couchstore will need to investigate. Thanks!
Comment by Jin Lim [ 11/Feb/13 ]
EP Engine (Jin) will take a look at this issue this week. As Junyi stated earlier if this turns out to be an ep engine issue, this should be a small fix. Thanks.
Comment by Farshid Ghods (Inactive) [ 11/Feb/13 ]
Jin/Dipti/Yaseen,

given that this is not a regression from 2.0 and it does not impact xdcr functionality and does not cause huge bandwidth issue i recommend moving this to 2.0.2
Comment by Dipti Borkar [ 11/Feb/13 ]
I think it does cause a huge bandwidth issue for large documents and expirations. This is particularly the case for ad targeting. Also, many people have complained about unusual disk growth with TTL, I suspect this may be the issue. In addition to storing tombstones, we are also storing data in couchstore. questions that will help us decide are:

- do we store extra data for every deleted key?
- is this also the case for expired items?
- is it a fixed amount of additional data that gets stored? how much per document?

I'm not so concerned about deletes since typically very small percentage of items are deleted <1%. But more concerned about expired items.
Comment by Dipti Borkar [ 11/Feb/13 ]
Forum post just for reference: http://www.couchbase.com/forums/thread/persistence-becoming-issue-can-it-be-disabled
Comment by Jin Lim [ 11/Feb/13 ]
I agree we will still continue to look into this if there is a small fix. We just didn't have enough engineering cycles for debugging this, but will do so this week.
Please see my comments below for Dipti's questions above;

- do we store extra data for every deleted key? yes on src node but not dst node (Junyi please correct me if wrong)
- is this also the case for expired items? yes since ep engine handles both explicitly deleted and expired items with the same disk write operation (del).
- is it a fixed amount of additional data that gets stored? it appears to be not much but random according to Junyi's latest debugging

Again, will do a little bit of more engineering debugging and make a final call. Thanks for practical suggestion from QE though.
Comment by Junyi Xie (Inactive) [ 11/Feb/13 ]
Jin,

Thanks a lot for looking into this.


Dipti,

Here is my take on your questions

- do we store extra data for every deleted key?
It is fairly easy to reproduce, but I am not sure if it is for "every" deleted key

- is this also the case for expired items?
Expired time will end up a del op from ep_engine, if the code path is the same, probably it will be a victim of this bug too.

- is it a fixed amount of additional data that gets stored? how much per document?
The garbage seems from a random segment of memory (possibly due to some dangling pointer), if it is the case, we store a random chunk of data in doc_body, the size could be variant.

 
Comment by Jin Lim [ 13/Feb/13 ]
I finally did some debugging this issue and found that ep engine is handling deleted items correctly. It never passes doc body but NULL to couchstore for deleting docs.
However, with Aaron's help, it appears to be that open_doc_int() in couch_db.erl still tries to read the doc body for a deleted item instead of trasting it as a empty doc. Assign it to couchdb team (Filipe) for further debugging and a fix. Thanks.
 
Comment by Jin Lim [ 13/Feb/13 ]
Hi Filipe, it looks like a fix is needed in couchdb. If appropriate, please assign it to Aaron as he know the root cause as well. Thanks.
Comment by Filipe Manana [ 13/Feb/13 ]
Is it that hard to add to the replicator code that doesn't read bodies for deleted items, or make couch_db:open_doc/2 not read bodies for deleted items?
Comment by Aaron Miller (Inactive) [ 13/Feb/13 ]
http://review.couchbase.org/#/c/24586/ Here.
Comment by Junyi Xie (Inactive) [ 13/Feb/13 ]
Filipe,

It is not hard but replicator is not supposed to do that, and more importantly, that does not solve the problem. You still store a bunch of garbage on disk, which caused more serious disk usage issue as Dipit mentioned.
Comment by Aaron Miller (Inactive) [ 13/Feb/13 ]
We aren't storing garbage, I think. We've only been _reading_ garbage.
Comment by Filipe Manana [ 13/Feb/13 ]
From what I understood from Jin's comments, ep-engine/couchstore side are doing the right thing, that is, not persisting garbage to disk for deleted documents.
It's only a problem of couchdb not knowing that it shouldn't read bodies for deleted documents (unlike in Apache CouchDB).

Issues like this could well be solved by people who work on the replicator imho.
Comment by Jin Lim [ 13/Feb/13 ]
Aaron/Filipe/Junyi, thanks for your input and quick help here. It looks like we can address this issue fairly simple, wether from the replicator or couchdb sides. If I may I will leave this up to your decision. However, please let me know since it is fairly simple, good fix wether we should consider push it to 2.0.1 or not. Thanks.
Comment by Junyi Xie (Inactive) [ 13/Feb/13 ]
It is good news that we do not actually store garbage. If the couchdb read some garbage (as Aaron pointed out), then couchdb should fix it.

Replicator is just one consumer of this API and relies on couchdb to fetch all mutations, and IMHO it should not look into doc body. Put some code in replicator does not solve the problem, other caller of couchdb:open_doc() will still read garbage.
 
Comment by Junyi Xie (Inactive) [ 13/Feb/13 ]
based on inputs from Aaron, change the bug title
Comment by Jin Lim [ 13/Feb/13 ]
OK, someone changed it to 2.0.1 fix. Everyone OK with it?
Comment by Junyi Xie (Inactive) [ 13/Feb/13 ]
I am ok with that. But the fix is in couchdb so Filipe should make the final decision.
Comment by Filipe Manana [ 13/Feb/13 ]
Yes, fine for me.
Comment by Thuan Nguyen [ 14/Feb/13 ]
Integrated in github-couchdb-preview #563 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/563/])
    MB-7631 Interpret bp of 0 as empty body (Revision c086371a81e43021966620c469134e08530fe38a)
Revert "MB-7631 Interpret bp of 0 as empty body" (Revision bfd853e859ab961f8ee6c8fb8b38e667b3dd0a9c)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couchdb/couch_db.erl

Filipe David Borba Manana :
Files :
* src/couchdb/couch_db.erl
Comment by Thuan Nguyen [ 14/Feb/13 ]
Integrated in github-couchdb-preview #564 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/564/])
    MB-7631 Interpret bp of 0 as empty body (Revision 8e961880696e0398229e7a837344b73053beed84)

     Result = SUCCESS
Filipe David Borba Manana :
Files :
* src/couchdb/couch_db.erl
Comment by Jin Lim [ 14/Feb/13 ]
http://review.couchbase.org/#/c/24586/ merged.
Generated at Thu Aug 21 14:44:36 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.