<!-- 
RSS generated by JIRA (5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9) at Tue Jun 18 22:49:32 CDT 2013

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary add field=key&field=summary to the URL of your request.
For example:
http://www.couchbase.com/issues/si/jira.issueviews:issue-xml/MB-5534/MB-5534.xml?field=key&field=summary
-->
<rss version="0.92" >
<channel>
    <title>Couchbase</title>
    <link>http://www.couchbase.com/issues</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>5.2.4</version>
        <build-number>845</build-number>
        <build-date>26-12-2012</build-date>
    </build-info>

<item>
            <title>[MB-5534] keys failing to persist to disk with state &quot;ram_but_not_disk&quot; during rebalance</title>
                <link>http://www.couchbase.com/issues/browse/MB-5534</link>
                <project id="10010" key="MB">Couchbase Server</project>
                        <description>in viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in&lt;br/&gt;
&lt;br/&gt;
Load 200k docs&lt;br/&gt;
Add+Rebalance 6 nodes to cluster while running queries&lt;br/&gt;
&lt;br/&gt;
Looks like what happens is the keys being moved to new servers are not always being persisted to disk.&lt;br/&gt;
&lt;br/&gt;
...from test log&amp;#39;s we report when key a did not persist and cannot be indexed by view-engine.&lt;br/&gt;
&lt;br/&gt;
[&amp;quot;query doc_id: admin0150-2008_12_27 doesn\&amp;#39;t exist in bucket: default&amp;quot;, &amp;quot;Error expected in results for key with invalid state {\&amp;#39;key_vb_state\&amp;#39;: \&amp;#39;active\&amp;#39;, \&amp;#39;key_last_modification_time\&amp;#39;: \&amp;#39;1339537159\&amp;#39;, \&amp;#39;key_data_age\&amp;#39;: \&amp;#39;0\&amp;#39;, \&amp;#39;key_cas\&amp;#39;: \&amp;#39;7680066560307527\&amp;#39;, \&amp;#39;key_exptime\&amp;#39;: \&amp;#39;0\&amp;#39;, \&amp;#39;key_is_dirty\&amp;#39;: \&amp;#39;0\&amp;#39;, \&amp;#39;key_flags\&amp;#39;: \&amp;#39;0\&amp;#39;, \&amp;#39;key_valid\&amp;#39;: \&amp;#39;ram_but_not_disk\&amp;#39;}&amp;quot;,&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
There should be plenty of time to allow this as we retry the query several times and no new docs are being loaded/updated during the rebalance.&lt;br/&gt;
&lt;br/&gt;
diags attached.&lt;br/&gt;
&amp;nbsp;&lt;br/&gt;
&lt;br/&gt;
</description>
                <environment>2.0.0-1314-rel</environment>
            <key id="17743">MB-5534</key>
            <summary>keys failing to persist to disk with state &quot;ram_but_not_disk&quot; during rebalance</summary>
                <type id="1" iconUrl="http://www.couchbase.com/issues/images/icons/issuetypes/bug.png">Bug</type>
                                <priority id="1" iconUrl="http://www.couchbase.com/issues/images/icons/priorities/blocker.png">Blocker</priority>
                    <status id="6" iconUrl="http://www.couchbase.com/issues/images/icons/statuses/closed.png">Closed</status>
                    <resolution id="1">Fixed</resolution>
                    <security id="10011">Public</security>
                        <assignee username="jin">Jin Lim</assignee>
                                <reporter username="tommie">Tommie McAfee</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 Jun 2012 18:01:55 -0500</created>
                <updated>Thu, 10 Jan 2013 01:53:38 -0600</updated>
                    <resolved>Tue, 19 Jun 2012 00:11:58 -0500</resolved>
                            <version>2.0</version>
                                <fixVersion>2.0</fixVersion>
                                <component>couchbase-bucket</component>
                <component>storage-engine</component>
                                <votes>0</votes>
                        <watches>1</watches>
                                                    <comments>
                    <comment id="29685" author="tommie" created="Tue, 12 Jun 2012 18:50:27 -0500"  >Hi Jin, trying to repro for you.  I did not see anything in the logs about error&amp;#39;s while persisting keys.</comment>
                    <comment id="29700" author="tommie" created="Tue, 12 Jun 2012 19:03:40 -0500"  >Seeing now the state of the servers are in the same condition as &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5511&quot; title=&quot;rebalance hanging in latest 2.0&quot;&gt;&lt;strike&gt;MB-5511&lt;/strike&gt;&lt;/a&gt; with rebalance hanging and deadlock in indexer.  Probably a side-effect of that bug.</comment>
                    <comment id="29802" author="jin" created="Wed, 13 Jun 2012 13:53:17 -0500"  >&lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5511&quot; title=&quot;rebalance hanging in latest 2.0&quot;&gt;&lt;strike&gt;MB-5511&lt;/strike&gt;&lt;/a&gt; tracking the root cause of this issue. </comment>
                    <comment id="29811" author="tommie" created="Wed, 13 Jun 2012 14:52:02 -0500"  >&amp;nbsp;&lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5511&quot; title=&quot;rebalance hanging in latest 2.0&quot;&gt;&lt;strike&gt;MB-5511&lt;/strike&gt;&lt;/a&gt; is partially fixed (index deadlock)  but rebalance is still hanging and this bug is still reproducible in latest 2.0 </comment>
                    <comment id="30080" author="tommie" created="Thu, 14 Jun 2012 20:50:32 -0500"  >Hey Jin, I currently have a cluster in this state were keys are not persisted and am using a script to help get the following info:&lt;br/&gt;
&lt;br/&gt;
Here state of key(admin0110-2008_10_06)  on host 10.2.2.64 says is &amp;#39;ram but not disk&amp;#39;:&lt;br/&gt;
&lt;br/&gt;
{&amp;#39;key_vb_state&amp;#39;: &amp;#39;replica&amp;#39;, &amp;#39;key_last_modification_time&amp;#39;: &amp;#39;1339724247&amp;#39;, &amp;#39;key_data_age&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_cas&amp;#39;: &amp;#39;7866802812969421&amp;#39;, &amp;#39;key_exptime&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_is_dirty&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_flags&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_valid&amp;#39;: &amp;#39;item_deleted&amp;#39;}&lt;br/&gt;
&lt;br/&gt;
If I query the database, notice this key is skipped in the results:&lt;br/&gt;
&lt;br/&gt;
startkey = [2008,10,6]  ,  startkey_docid = admin0109-2008_10_06&lt;br/&gt;
&#8230;..&lt;br/&gt;
{&amp;quot;id&amp;quot;:&amp;quot;admin0109-2008_10_06&amp;quot;,&amp;quot;key&amp;quot;:[2008,10,6],&amp;quot;value&amp;quot;:[&amp;quot;employee-109-0522e76&amp;quot;,&amp;quot;&lt;a href=&apos;mailto:109@couchbase.com&apos;&gt;109@couchbase.com&lt;/a&gt;&amp;quot;]},&lt;br/&gt;
{&amp;quot;id&amp;quot;:&amp;quot;admin0111-2008_10_06&amp;quot;,&amp;quot;key&amp;quot;:[2008,10,6],&amp;quot;value&amp;quot;:[&amp;quot;employee-111-0522e76&amp;quot;,&amp;quot;&lt;a href=&apos;mailto:111@couchbase.com&apos;&gt;111@couchbase.com&lt;/a&gt;&amp;quot;]},&lt;br/&gt;
{&amp;quot;id&amp;quot;:&amp;quot;admin0112-2008_10_06&amp;quot;,&amp;quot;key&amp;quot;:[2008,10,6],&amp;quot;value&amp;quot;:[&amp;quot;employee-112-0522e76&amp;quot;,&amp;quot;&lt;a href=&apos;mailto:112@couchbase.com&apos;&gt;112@couchbase.com&lt;/a&gt;&amp;quot;]},&lt;br/&gt;
{&amp;quot;id&amp;quot;:&amp;quot;admin0113-2008_10_06&amp;quot;,&amp;quot;key&amp;quot;:[2008,10,6],&amp;quot;value&amp;quot;:[&amp;quot;employee-113-0522e76&amp;quot;,&amp;quot;&lt;a href=&apos;mailto:113@couchbase.com&apos;&gt;113@couchbase.com&lt;/a&gt;&amp;quot;]},&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
This is post rebalance in.&lt;br/&gt;
master node is 10.2.2.60&lt;br/&gt;
Would be great if someone could take a look.&lt;br/&gt;
</comment>
                    <comment id="30081" author="tommie" created="Thu, 14 Jun 2012 20:51:46 -0500"  >build 1334</comment>
                    <comment id="30117" author="FilipeManana" created="Fri, 15 Jun 2012 11:09:44 -0500"  >Unlikely to be because of &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5511&quot; title=&quot;rebalance hanging in latest 2.0&quot;&gt;&lt;strike&gt;MB-5511&lt;/strike&gt;&lt;/a&gt; or &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5541&quot; title=&quot;viewtests.ViewRebalanceTests.test_delete_x_docs_incremental_rebalance_in occasionally fails due to deadlock in vbucket deletion&quot;&gt;&lt;strike&gt;MB-5541&lt;/strike&gt;&lt;/a&gt;.&lt;br/&gt;
&lt;br/&gt;
We&amp;#39;ve seen this error &amp;quot;ram_but_not_disk&amp;quot; before the changes that added the occasional deadlock were merged. Also, if the deadlock was triggered, all subsequent operations and tests would just timeout or never finish, and this wasn&amp;#39;t the case in the results of &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5331&quot; title=&quot;some docs are lost during rebalance in/out when queries are performed &quot;&gt;&lt;strike&gt;MB-5331&lt;/strike&gt;&lt;/a&gt;.</comment>
                    <comment id="30127" author="FilipeManana" created="Fri, 15 Jun 2012 12:53:22 -0500"  >With build 1338, which includes the view-engine dead lock fixes (unrelated to the problem btw, &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5541&quot; title=&quot;viewtests.ViewRebalanceTests.test_delete_x_docs_incremental_rebalance_in occasionally fails due to deadlock in vbucket deletion&quot;&gt;&lt;strike&gt;MB-5541&lt;/strike&gt;&lt;/a&gt; &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-5511&quot; title=&quot;rebalance hanging in latest 2.0&quot;&gt;&lt;strike&gt;MB-5511&lt;/strike&gt;&lt;/a&gt;), we&amp;#39;re still seeing this issue:&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&quot;http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/&quot;&gt;http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/&lt;/a&gt;&lt;br/&gt;
&lt;br/&gt;
The test detects, via memcached, that some keys are in RAM but didn&amp;#39;t get persisted to disk.</comment>
                    <comment id="30128" author="FilipeManana" created="Fri, 15 Jun 2012 12:56:00 -0500"  >Also, looking at one of the tests output:&lt;br/&gt;
&lt;br/&gt;
[2012-06-15 09:47:29,432] - [data_helper] [140513919293184] - INFO - key arch0008-2008_09_22 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,511] - [data_helper] [140513919293184] - INFO - key arch0000-2008_10_09 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,581] - [data_helper] [140513919293184] - INFO - key arch0191-2008_01_23 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,659] - [data_helper] [140513919293184] - INFO - key arch0187-2008_05_14 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,739] - [data_helper] [140513919293184] - INFO - key arch0070-2008_09_19 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,815] - [data_helper] [140513919293184] - INFO - key arch0040-2008_07_27 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,894] - [data_helper] [140513919293184] - INFO - key arch0095-2008_10_05 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:29,979] - [data_helper] [140513919293184] - INFO - key arch0087-2008_03_13 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:30,058] - [data_helper] [140513919293184] - INFO - key arch0034-2008_04_13 doesn&amp;#39;t exist in memcached&lt;br/&gt;
[2012-06-15 09:47:30,139] - [data_helper] [140513919293184] - INFO - key arch0084-2008_04_08 doesn&amp;#39;t exist in memcached&lt;br/&gt;
&lt;br/&gt;
From:&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&quot;http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/artifact/logs/testrunner-12-Jun-15_05-28-10/test_employee_dataset_alldocs_queries_rebalance_in.log&quot;&gt;http://qa.hq.northscale.net/job/centos-64-vew-merge-tests/202/artifact/logs/testrunner-12-Jun-15_05-28-10/test_employee_dataset_alldocs_queries_rebalance_in.log&lt;/a&gt;&lt;br/&gt;
</comment>
                    <comment id="30129" author="FilipeManana" created="Fri, 15 Jun 2012 12:56:36 -0500"  >Farshid, can you give this high priority, like trunk green blocker?</comment>
                    <comment id="30130" author="jin" created="Fri, 15 Jun 2012 13:01:48 -0500"  >The db files being used for this failed test are gone by now. Otherwise, we could have first verified whether the missing document has never persisted or has actually persisted on disk but somehow being not known to upper layers. Anyway, while looking at remaining logs, I found a bunch of compactor crash one just like this below. And, I think this needs to be looked at by engineering team - as the compactor crash could have left a db file unknown state including missing docs.&lt;br/&gt;
&lt;br/&gt;
[ns_server:error] [2012-06-15 10:02:37] [&lt;a href=&apos;mailto:ns_1@10.2.2.60&apos;&gt;ns_1@10.2.2.60&lt;/a&gt;:compaction_daemon:compaction_daemon:handle_info:211] Compactor &amp;lt;0.10399.2&amp;gt; exited unexpectedly: {timeout,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{gen_server,call,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;[&amp;#39;capi_set_view_manager-default&amp;#39;,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;fetch_ddocs]}}. Moving to the next bucket.&lt;br/&gt;
[error_logger:error] [2012-06-15 10:02:37] [&lt;a href=&apos;mailto:ns_1@10.2.2.60&apos;&gt;ns_1@10.2.2.60&lt;/a&gt;:error_logger:ale_error_logger_handler:log_report:72]&lt;br/&gt;
=========================CRASH REPORT=========================&lt;br/&gt;
&amp;nbsp;&amp;nbsp;crasher:&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;initial call: compaction_daemon:-spawn_bucket_compactor/3-fun-2-/0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;pid: &amp;lt;0.10399.2&amp;gt;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;registered_name: []&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;exception exit: {timeout,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{gen_server,call,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;[&amp;#39;capi_set_view_manager-default&amp;#39;,fetch_ddocs]}}&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;in function  gen_server:call/2&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;in call from compaction_daemon:ddoc_names/1&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;in call from compaction_daemon:&amp;#39;-spawn_bucket_compactor/3-fun-2-&amp;#39;/4&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ancestors: [compaction_daemon,&amp;lt;0.431.0&amp;gt;,ns_server_sup,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ns_server_cluster_sup,&amp;lt;0.59.0&amp;gt;]&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;messages: []&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;links: [&amp;lt;0.438.0&amp;gt;]&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;dictionary: []&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;trap_exit: false&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;status: running&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;heap_size: 6765&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stack_size: 24&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;reductions: 174901&lt;br/&gt;
&amp;nbsp;&amp;nbsp;neighbours:&lt;br/&gt;
&amp;nbsp;</comment>
                    <comment id="30131" author="FilipeManana" created="Fri, 15 Jun 2012 13:04:41 -0500"  >Jin, that doesn&amp;#39;t cause the data loss (either persistence or memory) issues.&lt;br/&gt;
I&amp;#39;ll put my hands on fire if it does.&lt;br/&gt;
</comment>
                    <comment id="30132" author="FilipeManana" created="Fri, 15 Jun 2012 13:06:42 -0500"  >And btw, this is not the compactor, it&amp;#39;s just the compaction scheduler. This failure happened before compaction being triggered - even if it failing during compaction, when compaction is restarted it never reuses an existing partially compacted file - it truncates the existing compact file (if any) to 0 bytes - this applies both to database and view compaction.</comment>
                    <comment id="30133" author="jin" created="Fri, 15 Jun 2012 13:12:05 -0500"  >Thanks for the clarification, Filipe.</comment>
                    <comment id="30145" author="tommie" created="Fri, 15 Jun 2012 14:55:18 -0500"  >Using a script from Jin that looks for the document in couchdb we found that the missing key exists in disk on a node different than the one reporting &amp;quot;ram_but_not_disk&amp;quot;.   Something may have happened after rebalance to cause this.  Below you can see two nodes are returning meta_data for the key (probably the original and new node).  However, the first node has state &amp;quot;item_deleted&amp;quot;&lt;br/&gt;
&lt;br/&gt;
./get_key_meta.py &lt;br/&gt;
&lt;br/&gt;
10.2.2.108&lt;br/&gt;
{&amp;#39;key_vb_state&amp;#39;: &amp;#39;replica&amp;#39;, &amp;#39;key_last_modification_time&amp;#39;: &amp;#39;1339788618&amp;#39;, &amp;#39;key_data_age&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_cas&amp;#39;: &amp;#39;7930769912841273&amp;#39;, &amp;#39;key_exptime&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_is_dirty&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_flags&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_valid&amp;#39;: &amp;#39;item_deleted&amp;#39;}&lt;br/&gt;
trying diff memcached....&lt;br/&gt;
trying diff memcached....&lt;br/&gt;
trying diff memcached....&lt;br/&gt;
trying diff memcached....&lt;br/&gt;
10.2.2.63&lt;br/&gt;
{&amp;#39;key_vb_state&amp;#39;: &amp;#39;active&amp;#39;, &amp;#39;key_last_modification_time&amp;#39;: &amp;#39;1339788619&amp;#39;, &amp;#39;key_data_age&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_cas&amp;#39;: &amp;#39;7930769912841273&amp;#39;, &amp;#39;key_exptime&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_is_dirty&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_flags&amp;#39;: &amp;#39;0&amp;#39;, &amp;#39;key_valid&amp;#39;: &amp;#39;ram_but_not_disk&amp;#39;}&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
--------COUCH SEARCH ON 10.2.2.108---------&lt;br/&gt;
[&lt;a href=&apos;mailto:root@localhost&apos;&gt;root@localhost&lt;/a&gt; default]# ./finddoc.pl arch0107-2008_11_14&lt;br/&gt;
Unquoted string &amp;quot;break&amp;quot; may clash with future reserved word at ./finddoc.pl line 28.&lt;br/&gt;
Useless use of a constant in void context at ./finddoc.pl line 28.&lt;br/&gt;
--------------------------------------------------&lt;br/&gt;
finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while...&lt;br/&gt;
--------------------------------------------------&lt;br/&gt;
skipping master doc, master.couch.1&lt;br/&gt;
found doc, id: arch0107-2008_11_14, in 56.couch.1&lt;br/&gt;
[&lt;a href=&apos;mailto:root@localhost&apos;&gt;root@localhost&lt;/a&gt; default]# &lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
--------COUCH SEARCH ON 10.2.2.63---------&lt;br/&gt;
[&lt;a href=&apos;mailto:root@localhost&apos;&gt;root@localhost&lt;/a&gt; default]# ./finddoc.pl arch0107-2008_11_14&lt;br/&gt;
Unquoted string &amp;quot;break&amp;quot; may clash with future reserved word at ./finddoc.pl line 28.&lt;br/&gt;
Useless use of a constant in void context at ./finddoc.pl line 28.&lt;br/&gt;
--------------------------------------------------&lt;br/&gt;
finding doc, id: arch0107-2008_11_14, in /opt/couchbase/var/lib/couchbase/data/default, may take a while...&lt;br/&gt;
--------------------------------------------------&lt;br/&gt;
skipping master doc, master.couch.1&lt;br/&gt;
[&lt;a href=&apos;mailto:root@localhost&apos;&gt;root@localhost&lt;/a&gt; default]# &lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
</comment>
                    <comment id="30147" author="FilipeManana" created="Fri, 15 Jun 2012 15:06:58 -0500"  >Tommie what&amp;#39;s the vbucket ID?&lt;br/&gt;
&lt;br/&gt;
As a side questions, where do those scripts live?</comment>
                    <comment id="30150" author="tommie" created="Fri, 15 Jun 2012 15:31:46 -0500"  >&amp;gt;Tommie what&amp;#39;s the vbucket ID? &lt;br/&gt;
56&lt;br/&gt;
&lt;br/&gt;
&amp;gt;where do those scripts live?&lt;br/&gt;
They were local.  I will attach them here</comment>
                    <comment id="30164" author="FilipeManana" created="Fri, 15 Jun 2012 15:42:24 -0500"  >Thanks.&lt;br/&gt;
&lt;br/&gt;
So it seems really like a persistence issue in the first place.&lt;br/&gt;
The key should be persisted in node 10.2.2.63, but it is not.&lt;br/&gt;
&lt;br/&gt;
Hard to tell when exactly the failing queries happened, but I see that at the end of the logs for those 2 nodes the vbucket 56 is not marked as active in the indexes.</comment>
                    <comment id="30165" author="sharon" created="Fri, 15 Jun 2012 16:10:35 -0500"  >Looks like a blocker rather then a major issue</comment>
                    <comment id="30166" author="tommie" created="Fri, 15 Jun 2012 16:12:07 -0500"  >More debugging tips from Filipe&#8230;. discovered update_seq = 0 for db(56) which corresponds to this key on active node.  This db also reports &amp;quot;no documents&amp;quot; so no writes have happened here:&lt;br/&gt;
&lt;br/&gt;
/opt/couchbase/bin/couch_dbinfo 56.couch.1 &lt;br/&gt;
DB Info (56.couch.1)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;file format version: 10&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;update_seq: 0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;no documents&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;B-tree size: 0 bytes&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;total disk size: 4.0 kB&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
Information also reflected in index info:&lt;br/&gt;
&amp;nbsp;curl -s &amp;#39;&lt;a href=&quot;http://10.2.2.63:8092/_set_view/default/_design/dev_test_view-6ffa498/_info&amp;#39;&quot;&gt;http://10.2.2.63:8092/_set_view/default/_design/dev_test_view-6ffa498/_info&amp;amp;#39;&lt;/a&gt; | json_xs&lt;br/&gt;
&#8230;.&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;&amp;quot;active_partitions&amp;quot; : [&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;56,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;57,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;58,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;59,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;60,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;61,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;62,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;63,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;64,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;65,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;66,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;67,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;68,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;69,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;70,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;71,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;72,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;73&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;],&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;pending_transition&amp;quot; : null,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;update_seqs&amp;quot; : {&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;67&amp;quot; : 1560,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;63&amp;quot; : 1571,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;71&amp;quot; : 1585,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;70&amp;quot; : 1581,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;68&amp;quot; : 1558,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;72&amp;quot; : 1585,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;65&amp;quot; : 1580,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;57&amp;quot; : 1573,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;64&amp;quot; : 1581,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;61&amp;quot; : 1584,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;58&amp;quot; : 1585,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;59&amp;quot; : 1590,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;69&amp;quot; : 1563,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;60&amp;quot; : 1125,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;56&amp;quot; : 0,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;73&amp;quot; : 1581,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;66&amp;quot; : 1562,&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;quot;62&amp;quot; : 1572&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;br/&gt;
</comment>
                    <comment id="30266" author="tommie" created="Fri, 15 Jun 2012 21:18:13 -0500"  >Here&amp;#39;s the test runner command to reproduce:&lt;br/&gt;
python testrunner -i resources/jenkins/centos-64-7node-viewquery.ini -t viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;ve disabled the job in jenkins so you can run this exact command and it will run the test against our servers if you cannot reproduce in your own vm&amp;#39;s.&lt;br/&gt;
&lt;br/&gt;
Some final observations in manually trying to reproduce this is that heavy queries causes disk write queue to drain slower, and when items are still in queue before rebalance(in) then we can get into this state.&lt;br/&gt;
&lt;br/&gt;
Let me know if anything else is needed.&lt;br/&gt;
&lt;br/&gt;
thanks,</comment>
                    <comment id="30506" author="jin" created="Mon, 18 Jun 2012 22:38:03 -0500"  >The test now passes without any failure after locally build ep engine with the fix, &lt;a href=&quot;http://review.couchbase.org/#change,17367&quot;&gt;http://review.couchbase.org/#change,17367&lt;/a&gt;.&lt;br/&gt;
&lt;br/&gt;
Note: the number of items being persisted while rebalancing still fluctuates for vbuckets movements, which is perfectly normal under the current design. After the completion of rebalancing one should expect NO data loss at all.  &lt;br/&gt;
&lt;br/&gt;
</comment>
                    <comment id="30507" author="jin" created="Tue, 19 Jun 2012 00:11:58 -0500"  >&amp;nbsp;&lt;a href=&quot;http://review.couchbase.org/#change,17367&quot;&gt;http://review.couchbase.org/#change,17367&lt;/a&gt;</comment>
                    <comment id="30524" author="FilipeManana" created="Tue, 19 Jun 2012 06:24:29 -0500"  >Super, thanks Jin! :)</comment>
                </comments>
                    <attachments>
                    <attachment id="13623" name="10.2.2.108-8091-diag.txt.gz" size="568383" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13619" name="10.2.2.109-8091-diag.txt.gz" size="1774984" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13624" name="10.2.2.60-8091-diag.txt.gz" size="1495413" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13622" name="10.2.2.63-8091-diag.txt.gz" size="1762541" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13621" name="10.2.2.64-8091-diag.txt.gz" size="575723" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13620" name="10.2.2.65-8091-diag.txt.gz" size="572761" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13618" name="10.2.2.67-8091-diag.txt.gz" size="580552" author="tommie" created="Tue, 12 Jun 2012 18:01:55 -0500" />
                    <attachment id="13658" name="finddoc.pl" size="885" author="tommie" created="Fri, 15 Jun 2012 15:33:29 -0500" />
                    <attachment id="13659" name="get_key_meta.py" size="861" author="tommie" created="Fri, 15 Jun 2012 15:33:29 -0500" />
                </attachments>
            <subtasks>
        </subtasks>
                <customfields>
                                                                        <customfield id="customfield_10180" key="com.atlassian.jira.ext.charting:firstresponsedate">
                <customfieldname>Date of First Response</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>Wed, 13 Jun 2012 13:53:17 -0500</customfieldvalue>

                </customfieldvalues>
            </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10081" key="com.pyxis.greenhopper.jira:gh-global-rank">
                <customfieldname>Rank</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>5012</customfieldvalue>
                </customfieldvalues>
            </customfield>
                                                                                                                                                                                        <customfield id="customfield_10181" key="com.atlassian.jira.ext.charting:timeinstatus">
                <customfieldname>Time In Status</customfieldname>
                <customfieldvalues>
                    
                </customfieldvalues>
            </customfield>
                                                                    </customfields>
    </item>
</channel>
</rss>