[MB-5534] keys failing to persist to disk with state "ram_but_not_disk" during rebalance Created: 12/Jun/12  Updated: 10/Jan/13  Resolved: 19/Jun/12

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket, storage-engine
Affects Version/s: 2.0
Fix Version/s: 2.0
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Tommie McAfee Assignee: Jin Lim
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2.0.0-1314-rel

Attachments: GZip Archive 10.2.2.108-8091-diag.txt.gz     GZip Archive 10.2.2.109-8091-diag.txt.gz     GZip Archive 10.2.2.60-8091-diag.txt.gz     GZip Archive 10.2.2.63-8091-diag.txt.gz     GZip Archive 10.2.2.64-8091-diag.txt.gz     GZip Archive 10.2.2.65-8091-diag.txt.gz     GZip Archive 10.2.2.67-8091-diag.txt.gz     File finddoc.pl     File get_key_meta.py    

 Description   
in viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in

Load 200k docs
Add+Rebalance 6 nodes to cluster while running queries

Looks like what happens is the keys being moved to new servers are not always being persisted to disk.

...from test log's we report when key a did not persist and cannot be indexed by view-engine.

["query doc_id: admin0150-2008_12_27 doesn\'t exist in bucket: default", "Error expected in results for key with invalid state {\'key_vb_state\': \'active\', \'key_last_modification_time\': \'1339537159\', \'key_data_age\': \'0\', \'key_cas\': \'7680066560307527\', \'key_exptime\': \'0\', \'key_is_dirty\': \'0\', \'key_flags\': \'0\', \'key_valid\': \'ram_but_not_disk\'}",


There should be plenty of time to allow this as we retry the query several times and no new docs are being loaded/updated during the rebalance.

diags attached.
 



 Comments   
Comment by Tommie McAfee [ 12/Jun/12 ]
Hi Jin, trying to repro for you. I did not see anything in the logs about error's while persisting keys.
Comment by Tommie McAfee [ 12/Jun/12 ]
Seeing now the state of the servers are in the same condition as MB-5511 with rebalance hanging and deadlock in indexer. Probably a side-effect of that bug.
Comment by Jin Lim [ 13/Jun/12 ]
MB-5511 tracking the root cause of this issue.
Comment by Tommie McAfee [ 13/Jun/12 ]
 MB-5511 is partially fixed (index deadlock) but rebalance is still hanging and this bug is still reproducible in latest 2.0
Comment by Tommie McAfee [ 14/Jun/12 ]
Hey Jin, I currently have a cluster in this state were keys are not persisted and am using a script to help get the following info:

Here state of key(admin0110-2008_10_06) on host 10.2.2.64 says is 'ram but not disk':

{'key_vb_state': 'replica', 'key_last_modification_time': '1339724247', 'key_data_age': '0', 'key_cas': '7866802812969421', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'}

If I query the database, notice this key is skipped in the results:

startkey = [2008,10,6] , startkey_docid = admin0109-2008_10_06
…..
{"id":"admin0109-2008_10_06","key":[2008,10,6],"value":["employee-109-0522e76","109@couchbase.com"]},
{"id":"admin0111-2008_10_06","key":[2008,10,6],"value":["employee-111-0522e76","111@couchbase.com"]},
{"id":"admin0112-2008_10_06","key":[2008,10,6],"value":["employee-112-0522e76","112@couchbase.com"]},
{"id":"admin0113-2008_10_06","key":[2008,10,6],"value":["employee-113-0522e76","113@couchbase.com"]},


This is post rebalance in.
master node is 10.2.2.60
Would be great if someone could take a look.
Comment by Tommie McAfee [ 14/Jun/12 ]
build 1334
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
Unlikely to be because of MB-5511 or MB-5541.

We've seen this error "ram_but_not_disk" before the changes that added the occasional deadlock were merged. Also, if the deadlock was triggered, all subsequent operations and tests would just timeout or never finish, and this wasn't the case in the results of MB-5331.
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
With build 1338, which includes the view-engine dead lock fixes (unrelated to the problem btw, MB-5541 MB-5511), we're still seeing this issue:

http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/

The test detects, via memcached, that some keys are in RAM but didn't get persisted to disk.
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
Also, looking at one of the tests output:

[2012-06-15 09:47:29,432] - [data_helper] [140513919293184] - INFO - key arch0008-2008_09_22 doesn't exist in memcached
[2012-06-15 09:47:29,511] - [data_helper] [140513919293184] - INFO - key arch0000-2008_10_09 doesn't exist in memcached
[2012-06-15 09:47:29,581] - [data_helper] [140513919293184] - INFO - key arch0191-2008_01_23 doesn't exist in memcached
[2012-06-15 09:47:29,659] - [data_helper] [140513919293184] - INFO - key arch0187-2008_05_14 doesn't exist in memcached
[2012-06-15 09:47:29,739] - [data_helper] [140513919293184] - INFO - key arch0070-2008_09_19 doesn't exist in memcached
[2012-06-15 09:47:29,815] - [data_helper] [140513919293184] - INFO - key arch0040-2008_07_27 doesn't exist in memcached
[2012-06-15 09:47:29,894] - [data_helper] [140513919293184] - INFO - key arch0095-2008_10_05 doesn't exist in memcached
[2012-06-15 09:47:29,979] - [data_helper] [140513919293184] - INFO - key arch0087-2008_03_13 doesn't exist in memcached
[2012-06-15 09:47:30,058] - [data_helper] [140513919293184] - INFO - key arch0034-2008_04_13 doesn't exist in memcached
[2012-06-15 09:47:30,139] - [data_helper] [140513919293184] - INFO - key arch0084-2008_04_08 doesn't exist in memcached

From:

http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/artifact/logs/testrunner-12-Jun-15_05-28-10/test_employee_dataset_alldocs_queries_rebalance_in.log
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
Farshid, can you give this high priority, like trunk green blocker?
Comment by Jin Lim [ 15/Jun/12 ]
The db files being used for this failed test are gone by now. Otherwise, we could have first verified whether the missing document has never persisted or has actually persisted on disk but somehow being not known to upper layers. Anyway, while looking at remaining logs, I found a bunch of compactor crash one just like this below. And, I think this needs to be looked at by engineering team - as the compactor crash could have left a db file unknown state including missing docs.

[ns_server:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:compaction_daemon:compaction_daemon:handle_info:211] Compactor <0.10399.2> exited unexpectedly: {timeout,
                                            {gen_server,call,
                                             ['capi_set_view_manager-default',
                                              fetch_ddocs]}}. Moving to the next bucket.
[error_logger:error] [2012-06-15 10:02:37] [ns_1@10.2.2.60:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: compaction_daemon:-spawn_bucket_compactor/3-fun-2-/0
    pid: <0.10399.2>
    registered_name: []
    exception exit: {timeout,
                        {gen_server,call,
                            ['capi_set_view_manager-default',fetch_ddocs]}}
      in function gen_server:call/2
      in call from compaction_daemon:ddoc_names/1
      in call from compaction_daemon:'-spawn_bucket_compactor/3-fun-2-'/4
    ancestors: [compaction_daemon,<0.431.0>,ns_server_sup,
                  ns_server_cluster_sup,<0.59.0>]
    messages: []
    links: [<0.438.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 174901
  neighbours:
 
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
Jin, that doesn't cause the data loss (either persistence or memory) issues.
I'll put my hands on fire if it does.
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
And btw, this is not the compactor, it's just the compaction scheduler. This failure happened before compaction being triggered - even if it failing during compaction, when compaction is restarted it never reuses an existing partially compacted file - it truncates the existing compact file (if any) to 0 bytes - this applies both to database and view compaction.
Comment by Jin Lim [ 15/Jun/12 ]
Thanks for the clarification, Filipe.
Comment by Tommie McAfee [ 15/Jun/12 ]
Using a script from Jin that looks for the document in couchdb we found that the missing key exists in disk on a node different than the one reporting "ram_but_not_disk". Something may have happened after rebalance to cause this. Below you can see two nodes are returning meta_data for the key (probably the original and new node). However, the first node has state "item_deleted"

./get_key_meta.py

10.2.2.108
{'key_vb_state': 'replica', 'key_last_modification_time': '1339788618', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'item_deleted'}
trying diff memcached....
trying diff memcached....
trying diff memcached....
trying diff memcached....
10.2.2.63
{'key_vb_state': 'active', 'key_last_modification_time': '1339788619', 'key_data_age': '0', 'key_cas': '7930769912841273', 'key_exptime': '0', 'key_is_dirty': '0', 'key_flags': '0', 'key_valid': 'ram_but_not_disk'}


--------COUCH SEARCH ON 10.2.2.108---------
[root@localhost default]# ./finddoc.pl arch0107-2008_11_14
Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28.
Useless use of a constant in void context at ./finddoc.pl line 28.
--------------------------------------------------
finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while...
--------------------------------------------------
skipping master doc, master.couch.1
found doc, id: arch0107-2008_11_14, in 56.couch.1
[root@localhost default]#



--------COUCH SEARCH ON 10.2.2.63---------
[root@localhost default]# ./finddoc.pl arch0107-2008_11_14
Unquoted string "break" may clash with future reserved word at ./finddoc.pl line 28.
Useless use of a constant in void context at ./finddoc.pl line 28.
--------------------------------------------------
finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while...
--------------------------------------------------
skipping master doc, master.couch.1
[root@localhost default]#


Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
Tommie what's the vbucket ID?

As a side questions, where do those scripts live?
Comment by Tommie McAfee [ 15/Jun/12 ]
>Tommie what's the vbucket ID?
56

>where do those scripts live?
They were local. I will attach them here
Comment by Filipe Manana (Inactive) [ 15/Jun/12 ]
Thanks.

So it seems really like a persistence issue in the first place.
The key should be persisted in node 10.2.2.63, but it is not.

Hard to tell when exactly the failing queries happened, but I see that at the end of the logs for those 2 nodes the vbucket 56 is not marked as active in the indexes.
Comment by Sharon Barr (Inactive) [ 15/Jun/12 ]
Looks like a blocker rather then a major issue
Comment by Tommie McAfee [ 15/Jun/12 ]
More debugging tips from Filipe…. discovered update_seq = 0 for db(56) which corresponds to this key on active node. This db also reports "no documents" so no writes have happened here:

/opt/couchbase/bin/couch_dbinfo 56.couch.1
DB Info (56.couch.1)
   file format version: 10
   update_seq: 0
   no documents
   B-tree size: 0 bytes
   total disk size: 4.0 kB


Information also reflected in index info:
 curl -s 'http://10.2.2.63:8092/_set_view/default/_design/dev_test_view-6ffa498/_info&#39; | json_xs
….

 "active_partitions" : [
      56,
      57,
      58,
      59,
      60,
      61,
      62,
      63,
      64,
      65,
      66,
      67,
      68,
      69,
      70,
      71,
      72,
      73
   ],
   "pending_transition" : null,
   "update_seqs" : {
      "67" : 1560,
      "63" : 1571,
      "71" : 1585,
      "70" : 1581,
      "68" : 1558,
      "72" : 1585,
      "65" : 1580,
      "57" : 1573,
      "64" : 1581,
      "61" : 1584,
      "58" : 1585,
      "59" : 1590,
      "69" : 1563,
      "60" : 1125,
      "56" : 0,
      "73" : 1581,
      "66" : 1562,
      "62" : 1572
   },
Comment by Tommie McAfee [ 15/Jun/12 ]
Here's the test runner command to reproduce:
python testrunner -i resources/jenkins/centos-64-7node-viewquery.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in

I've disabled the job in jenkins so you can run this exact command and it will run the test against our servers if you cannot reproduce in your own vm's.

Some final observations in manually trying to reproduce this is that heavy queries causes disk write queue to drain slower, and when items are still in queue before rebalance(in) then we can get into this state.

Let me know if anything else is needed.

thanks,
Comment by Jin Lim [ 18/Jun/12 ]
The test now passes without any failure after locally build ep engine with the fix, http://review.couchbase.org/#change,17367.

Note: the number of items being persisted while rebalancing still fluctuates for vbuckets movements, which is perfectly normal under the current design. After the completion of rebalancing one should expect NO data loss at all.

Comment by Jin Lim [ 19/Jun/12 ]
 http://review.couchbase.org/#change,17367
Comment by Filipe Manana (Inactive) [ 19/Jun/12 ]
Super, thanks Jin! :)
Generated at Thu Dec 18 14:33:16 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.