[MB-5534] keys failing to persist to disk with state "ram_but_not_disk" during rebalance Created: 12/Jun/12 Updated: 10/Jan/13 Resolved: 19/Jun/12 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | couchbase-bucket, storage-engine |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Tommie McAfee | Assignee: | Jin Lim |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | 2.0.0-1314-rel | ||
| Attachments: |
|
| Description |
|
in viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in
Load 200k docs Add+Rebalance 6 nodes to cluster while running queries Looks like what happens is the keys being moved to new servers are not always being persisted to disk. ...from test log's we report when key a did not persist and cannot be indexed by view-engine. ["query doc_id: admin0150-2008_12_27 doesn\'t exist in bucket: default", "Error expected in results for key with invalid state {\'key_vb_state\': \'active\', \'key_last_modification_time\': \'1339537159\', \'key_data_age\': \'0\', \'key_cas\': \'7680066560307527\', \'key_exptime\': \'0\', \'key_is_dirty\': \'0\', \'key_flags\': \'0\', \'key_valid\': \'ram_but_not_disk\'}", There should be plenty of time to allow this as we retry the query several times and no new docs are being loaded/updated during the rebalance. diags attached. |
| Comments |
| Comment by Tommie McAfee [ 12/Jun/12 ] |
| Hi Jin, trying to repro for you. I did not see anything in the logs about error's while persisting keys. |
| Comment by Tommie McAfee [ 12/Jun/12 ] |
|
Seeing now the state of the servers are in the same condition as |
| Comment by Jin Lim [ 13/Jun/12 ] |
|
|
| Comment by Tommie McAfee [ 13/Jun/12 ] |
|
|
| Comment by Tommie McAfee [ 14/Jun/12 ] |
|
Hey Jin, I currently have a cluster in this state were keys are not persisted and am using a script to help get the following info:
Here state of key(admin0110-2008_10_06) on host 10.2.2.64 says is 'ram but not disk': {'key_vb_state': 'replica', 'key_last_modification_time': '1339724247', 'key_data_age': '0', 'key_cas': '7866802812969421', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'} If I query the database, notice this key is skipped in the results: startkey = [2008,10,6] , startkey_docid = admin0109-2008_10_06 ….. {"id":"admin0109-2008_10_06","key":[2008,10,6],"value":["employee-109-0522e76","109@couchbase.com"]}, {"id":"admin0111-2008_10_06","key":[2008,10,6],"value":["employee-111-0522e76","111@couchbase.com"]}, {"id":"admin0112-2008_10_06","key":[2008,10,6],"value":["employee-112-0522e76","112@couchbase.com"]}, {"id":"admin0113-2008_10_06","key":[2008,10,6],"value":["employee-113-0522e76","113@couchbase.com"]}, This is post rebalance in. master node is 10.2.2.60 Would be great if someone could take a look. |
| Comment by Tommie McAfee [ 14/Jun/12 ] |
| build 1334 |
| Comment by Filipe Manana [ 15/Jun/12 ] |
|
Unlikely to be because of We've seen this error "ram_but_not_disk" before the changes that added the occasional deadlock were merged. Also, if the deadlock was triggered, all subsequent operations and tests would just timeout or never finish, and this wasn't the case in the results of |
| Comment by Filipe Manana [ 15/Jun/12 ] |
|
With build 1338, which includes the view-engine dead lock fixes (unrelated to the problem btw, http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/ The test detects, via memcached, that some keys are in RAM but didn't get persisted to disk. |
| Comment by Filipe Manana [ 15/Jun/12 ] |
|
Also, looking at one of the tests output:
[2012-06-15 09:47:29,432] - [data_helper] [140513919293184] - INFO - key arch0008-2008_09_22 doesn't exist in memcached [2012-06-15 09:47:29,511] - [data_helper] [140513919293184] - INFO - key arch0000-2008_10_09 doesn't exist in memcached [2012-06-15 09:47:29,581] - [data_helper] [140513919293184] - INFO - key arch0191-2008_01_23 doesn't exist in memcached [2012-06-15 09:47:29,659] - [data_helper] [140513919293184] - INFO - key arch0187-2008_05_14 doesn't exist in memcached [2012-06-15 09:47:29,739] - [data_helper] [140513919293184] - INFO - key arch0070-2008_09_19 doesn't exist in memcached [2012-06-15 09:47:29,815] - [data_helper] [140513919293184] - INFO - key arch0040-2008_07_27 doesn't exist in memcached [2012-06-15 09:47:29,894] - [data_helper] [140513919293184] - INFO - key arch0095-2008_10_05 doesn't exist in memcached [2012-06-15 09:47:29,979] - [data_helper] [140513919293184] - INFO - key arch0087-2008_03_13 doesn't exist in memcached [2012-06-15 09:47:30,058] - [data_helper] [140513919293184] - INFO - key arch0034-2008_04_13 doesn't exist in memcached [2012-06-15 09:47:30,139] - [data_helper] [140513919293184] - INFO - key arch0084-2008_04_08 doesn't exist in memcached From: http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/artifact/logs/testrunner-12-Jun-15_05-28-10/test_employee_dataset_alldocs_queries_rebalance_in.log |
| Comment by Filipe Manana [ 15/Jun/12 ] |
| Farshid, can you give this high priority, like trunk green blocker? |
| Comment by Jin Lim [ 15/Jun/12 ] |
|
The db files being used for this failed test are gone by now. Otherwise, we could have first verified whether the missing document has never persisted or has actually persisted on disk but somehow being not known to upper layers. Anyway, while looking at remaining logs, I found a bunch of compactor crash one just like this below. And, I think this needs to be looked at by engineering team - as the compactor crash could have left a db file unknown state including missing docs.
[ns_server:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:compaction_daemon:compaction_daemon:handle_info:211] Compactor <0.10399.2> exited unexpectedly: {timeout, {gen_server,call, ['capi_set_view_manager-default', fetch_ddocs]}}. Moving to the next bucket. [error_logger:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:error_logger:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: compaction_daemon:-spawn_bucket_compactor/3-fun-2-/0 pid: <0.10399.2> registered_name: [] exception exit: {timeout, {gen_server,call, ['capi_set_view_manager-default',fetch_ddocs]}} in function gen_server:call/2 in call from compaction_daemon:ddoc_names/1 in call from compaction_daemon:'-spawn_bucket_compactor/3-fun-2-'/4 ancestors: [compaction_daemon,<0.431.0>,ns_server_sup, ns_server_cluster_sup,<0.59.0>] messages: [] links: [<0.438.0>] dictionary: [] trap_exit: false status: running heap_size: 6765 stack_size: 24 reductions: 174901 neighbours: |
| Comment by Filipe Manana [ 15/Jun/12 ] |
|
Jin, that doesn't cause the data loss (either persistence or memory) issues.
I'll put my hands on fire if it does. |
| Comment by Filipe Manana [ 15/Jun/12 ] |
| And btw, this is not the compactor, it's just the compaction scheduler. This failure happened before compaction being triggered - even if it failing during compaction, when compaction is restarted it never reuses an existing partially compacted file - it truncates the existing compact file (if any) to 0 bytes - this applies both to database and view compaction. |
| Comment by Jin Lim [ 15/Jun/12 ] |
| Thanks for the clarification, Filipe. |
| Comment by Tommie McAfee [ 15/Jun/12 ] |
|
Using a script from Jin that looks for the document in couchdb we found that the missing key exists in disk on a node different than the one reporting "ram_but_not_disk". Something may have happened after rebalance to cause this. Below you can see two nodes are returning meta_data for the key (probably the original and new node). However, the first node has state "item_deleted"
./get_key_meta.py 10.2.2.108 {'key_vb_state': 'replica', 'key_last_modification_time': '1339788618', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'} trying diff memcached.... trying diff memcached.... trying diff memcached.... trying diff memcached.... 10.2.2.63 {'key_vb_state': 'active', 'key_last_modification_time': '1339788619', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'ram_but_not_disk'} --------COUCH SEARCH ON 10.2.2.108--------- [root@localhost default]# ./finddoc.pl arch0107-2008_11_14 Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28. Useless use of a constant in void context at ./finddoc.pl line 28. -------------------------------------------------- finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while... -------------------------------------------------- skipping master doc, master.couch.1 found doc, id: arch0107-2008_11_14, in 56.couch.1 [root@localhost default]# --------COUCH SEARCH ON 10.2.2.63--------- [root@localhost default]# ./finddoc.pl arch0107-2008_11_14 Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28. Useless use of a constant in void context at ./finddoc.pl line 28. -------------------------------------------------- finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while... -------------------------------------------------- skipping master doc, master.couch.1 [root@localhost default]# |
| Comment by Filipe Manana [ 15/Jun/12 ] |
|
Tommie what's the vbucket ID?
As a side questions, where do those scripts live? |
| Comment by Tommie McAfee [ 15/Jun/12 ] |
|
>Tommie what's the vbucket ID?
56 >where do those scripts live? They were local. I will attach them here |
| Comment by Filipe Manana [ 15/Jun/12 ] |
|
Thanks.
So it seems really like a persistence issue in the first place. The key should be persisted in node 10.2.2.63, but it is not. Hard to tell when exactly the failing queries happened, but I see that at the end of the logs for those 2 nodes the vbucket 56 is not marked as active in the indexes. |
| Comment by Sharon Barr [ 15/Jun/12 ] |
| Looks like a blocker rather then a major issue |
| Comment by Tommie McAfee [ 15/Jun/12 ] |
|
More debugging tips from Filipe…. discovered update_seq = 0 for db(56) which corresponds to this key on active node. This db also reports "no documents" so no writes have happened here:
/opt/couchbase/bin/couch_dbinfo 56.couch.1 DB Info (56.couch.1) file format version: 10 update_seq: 0 no documents B-tree size: 0 bytes total disk size: 4.0 kB Information also reflected in index info: curl -s 'http://10.2.2.63:8092/_set_view/default/_design/dev_test_view-6ffa498/_info' | json_xs …. "active_partitions" : [ 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73 ], "pending_transition" : null, "update_seqs" : { "67" : 1560, "63" : 1571, "71" : 1585, "70" : 1581, "68" : 1558, "72" : 1585, "65" : 1580, "57" : 1573, "64" : 1581, "61" : 1584, "58" : 1585, "59" : 1590, "69" : 1563, "60" : 1125, "56" : 0, "73" : 1581, "66" : 1562, "62" : 1572 }, |
| Comment by Tommie McAfee [ 15/Jun/12 ] |
|
Here's the test runner command to reproduce:
python testrunner -i resources/jenkins/centos-64-7node-viewquery.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in I've disabled the job in jenkins so you can run this exact command and it will run the test against our servers if you cannot reproduce in your own vm's. Some final observations in manually trying to reproduce this is that heavy queries causes disk write queue to drain slower, and when items are still in queue before rebalance(in) then we can get into this state. Let me know if anything else is needed. thanks, |
| Comment by Jin Lim [ 18/Jun/12 ] |
|
The test now passes without any failure after locally build ep engine with the fix, http://review.couchbase.org/#change,17367.
Note: the number of items being persisted while rebalancing still fluctuates for vbuckets movements, which is perfectly normal under the current design. After the completion of rebalancing one should expect NO data loss at all. |
| Comment by Jin Lim [ 19/Jun/12 ] |
| http://review.couchbase.org/#change,17367 |
| Comment by Filipe Manana [ 19/Jun/12 ] |
| Super, thanks Jin! :) |