[MB-6538] In rare cases CRC codes dont match when reading data from couch file Created: 05/Sep/12 Updated: 24/Oct/12 Resolved: 24/Oct/12 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | storage-engine |
| Affects Version/s: | None |
| Fix Version/s: | 2.0 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Aleksey Kondratenko | Assignee: | Aaron Miller |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | 2.0-beta-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
I experimented with building index on 6 cluster_run nodes and 9E6 simple docs. Everything went fine and results appeared right, but I'm seeing
[ns_server:debug,2012-09-05T21:31:14.218,n_5@10.17.21.241:compaction_daemon:compaction_daemon:schedule_next_compaction:1204]Finished compaction too soon. Next run will be in 30s [couchdb:error,2012-09-05T21:31:14.296,n_2@10.17.21.241:<0.9681.0>:couch_log:error:42]Set view `default`, replica group `_design/dev_t`, doc loader error error: {file_corruption,<<"file corruption">>} stacktrace: [{couch_file,pread_iolist,2}, {couch_db,open_doc_int,3}, {couch_set_view_updater,load_doc,4}, {couch_set_view_updater,'-load_changes/7-fun-0-',6}, {couch_btree,stream_kv_node2,8}, {couch_btree,stream_kp_node,7}, {couch_btree,fold,4}, {couch_db,enum_docs_since,5}] [couchdb:error,2012-09-05T21:31:14.297,n_2@10.17.21.241:<0.6715.0>:couch_log:error:42]Set view `default`, replica group `_design/dev_t`, received error from updater: {file_corruption, <<"file corruption">>} [couchdb:info,2012-09-05T21:31:17.856,n_2@10.17.21.241:<0.6715.0>:couch_log:info:39]Starting updater for set view `default`, replica group `_design/dev_t` [couchdb:info,2012-09-05T21:31:17.856,n_2@10.17.21.241:<0.9753.0>:couch_log:info:39]Updater for set view `default`, replica group `_design/dev_t` started in logs. Will attach logs from this box. |
| Comments |
| Comment by Karan Kumar [ 05/Sep/12 ] |
| Which build? |
| Comment by Karan Kumar [ 05/Sep/12 ] |
| ohh. cluster_run |
| Comment by Filipe Manana [ 06/Sep/12 ] |
| This happens when reading from a database file, not from an index file. |
| Comment by Aleksey Kondratenko [ 06/Sep/12 ] |
| corrupted files attached |
| Comment by Aaron Miller [ 10/Sep/12 ] |
| in the corrupted doc in 252.couch it looks like the file got stomped on by one byte. Both docs have the same CRC, and should have the same data, but this byte got messed up somehow. |
| Comment by Aaron Miller [ 10/Sep/12 ] |
| see attached screenshot |
| Comment by Aaron Miller [ 10/Sep/12 ] |
| other file (253.couch.1) |
| Comment by Aaron Miller [ 16/Sep/12 ] |
| I don't understand the name change here. The files in question were never compacted. |
| Comment by Karen Zeller [ 17/Sep/12 ] |
|
Added to beta release notes: In rare cases codes used to test for data corruption (CRC, checksum) codes do not match when reading data from couch
file. |
| Comment by Farshid Ghods [ 02/Oct/12 ] |
|
Aliaksey,
did you use RAM disk for persistence when running this test ? |
| Comment by Aleksey Kondratenko [ 03/Oct/12 ] |
|
No. Don't understand why this would matter. _Any_ (well except for direct io) write to filesystem is write to kernel's page cache first.
|
| Comment by Damien Katz [ 04/Oct/12 ] |
| We think this was a regression, possibly a dangling pointer, in the ep-engine that has since been fixed. Please reopen if there is another instance of the recently. |
| Comment by Aleksey Kondratenko [ 23/Oct/12 ] |
| got this again |
| Comment by Aleksey Kondratenko [ 23/Oct/12 ] |
| vbucket in question was in bucket other which was populated by incoming xdcr |
| Comment by Aleksey Kondratenko [ 23/Oct/12 ] |
| attaching diags from node having that badness |
| Comment by Aaron Miller [ 24/Oct/12 ] |
| Single byte error again. |
| Comment by Aleksey Kondratenko [ 24/Oct/12 ] |
| Sorry folks, found that my box actually has bad RAM. |