<!-- 
RSS generated by JIRA (5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9) at Thu May 23 00:48:50 CDT 2013

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary add field=key&field=summary to the URL of your request.
For example:
http://www.couchbase.com/issues/si/jira.issueviews:issue-xml/MB-7161/MB-7161.xml?field=key&field=summary
-->
<rss version="0.92" >
<channel>
    <title>Couchbase</title>
    <link>http://www.couchbase.com/issues</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>5.2.4</version>
        <build-number>845</build-number>
        <build-date>26-12-2012</build-date>
    </build-info>

<item>
            <title>[MB-7161] under higher sets per second clients dont get all view results after observing for all mutations</title>
                <link>http://www.couchbase.com/issues/browse/MB-7161</link>
                <project id="10010" key="MB">Couchbase Server</project>
                        <description>We have just recently found that under load, doing a series of writes, waiting until they&amp;#39;re durable, and then querying a view with stale=false gives incorrect results.  I should note that this is definitely tied to throughput load.   Under minimal load, it works fine.  Under a lot of load, we see the incorrect results.  This could be a client issue, but I admit that it sounds more like a server side issue at the moment.  We&amp;#39;re going to try to reproduce it with another client before raising a server issue.  It&amp;#39;s &lt;a href=&quot;http://www.couchbase.com/issues/browse/JCBC-142&quot; title=&quot;Observe Tests show that something is wrong in the observe impl&quot;&gt;&lt;strike&gt;JCBC-142&lt;/strike&gt;&lt;/a&gt; at the moment.</description>
                <environment></environment>
            <key id="20688">MB-7161</key>
            <summary>under higher sets per second clients dont get all view results after observing for all mutations</summary>
                <type id="1" iconUrl="http://www.couchbase.com/issues/images/icons/issuetypes/bug.png">Bug</type>
                                <priority id="2" iconUrl="http://www.couchbase.com/issues/images/icons/priorities/critical.png">Critical</priority>
                    <status id="5" iconUrl="http://www.couchbase.com/issues/images/icons/statuses/resolved.png">Resolved</status>
                    <resolution id="4">Incomplete</resolution>
                    <security id="10011">Public</security>
                        <assignee username="mnunberg">Mark Nunberg</assignee>
                                <reporter username="farshid">Farshid Ghods</reporter>
                        <labels>
                    </labels>
                <created>Sun, 11 Nov 2012 13:22:21 -0600</created>
                <updated>Mon, 19 Nov 2012 23:25:08 -0600</updated>
                    <resolved>Mon, 19 Nov 2012 22:09:43 -0600</resolved>
                            <version>2.0-beta-2</version>
                                <fixVersion>2.0</fixVersion>
                                <component>view-engine</component>
                                <votes>0</votes>
                        <watches>6</watches>
                                                    <comments>
                    <comment id="43765" author="farshid" created="Sun, 11 Nov 2012 13:26:18 -0600"  >if possible please run the stale=false query with debug=true&lt;br/&gt;
&lt;br/&gt;
more information about running queries with such option&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&quot;http://hub.internal.couchbase.com/confluence/display/QA/Debugging+view+engine+issues+and+reporting+them&quot;&gt;http://hub.internal.couchbase.com/confluence/display/QA/Debugging+view+engine+issues+and+reporting+them&lt;/a&gt;</comment>
                    <comment id="43769" author="ingenthr" created="Sun, 11 Nov 2012 16:15:10 -0600"  >Please attach your PHP tests where we repro&amp;#39;d this after the JCBC issue, and assign back to Farshid.</comment>
                    <comment id="43778" author="mnunberg" created="Mon, 12 Nov 2012 01:37:09 -0600"  >I&amp;#39;ve reproduced this issue with both php and java, so definitely a server issue.&lt;br/&gt;
&lt;br/&gt;
In a nutshell:&lt;br/&gt;
&lt;br/&gt;
PHP &amp;#39;test&amp;#39; which fails (really an ad-hoc script to reproduce this on something other than the Java SDK):&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&quot;http://www.couchbase.com/issues/secure/attachment/15765/observe-test.php&quot;&gt;http://www.couchbase.com/issues/secure/attachment/15765/observe-test.php&lt;/a&gt;&lt;br/&gt;
&lt;br/&gt;
Modified Java tests - functions a lot like the php test..&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&quot;https://github.com/mnunberg/couchbase-java-client/commit/3d788ab9d3a88c1dc20717c4dd110e3a8bb5f5bc&quot;&gt;https://github.com/mnunberg/couchbase-java-client/commit/3d788ab9d3a88c1dc20717c4dd110e3a8bb5f5bc&lt;/a&gt;&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
I should note something interesting here:&lt;br/&gt;
&lt;br/&gt;
1) - I could not reproduce when running under a single node cluster (sometimes the keys failed to endure within the constraints, but the ones which did  were always returned by the view&lt;br/&gt;
&lt;br/&gt;
2) - Under a two node cluster, this happens a lot more. The most interesting note (And what gives it away as a server issue) is this:&lt;br/&gt;
&lt;br/&gt;
About half of the time that the test failed, it failed by the view returning *Exactly* half the amount of expected rows;&lt;br/&gt;
&lt;br/&gt;
For example; if I would set 500 keys (i.e. ensure that they persisted) and then request the view using stale=false, I would often get back *exactly* 250.&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;ve reproduced this behavior under both Java and PHP</comment>
                    <comment id="43945" author="dipti" created="Tue, 13 Nov 2012 21:42:47 -0600"  >Farshid, this is a blocker. has there been any update on this correctness issue? </comment>
                    <comment id="43947" author="farshid" created="Tue, 13 Nov 2012 23:20:21 -0600"  >Deep,&lt;br/&gt;
&lt;br/&gt;
can you please reproduce this issue using the steps given by Mark.&lt;br/&gt;
</comment>
                    <comment id="43951" author="FilipeManana" created="Wed, 14 Nov 2012 01:49:32 -0600"  >Is someone sure/has evidence this is a view-engine issue? Like checking if things were really persisted to disk before the query?&lt;br/&gt;
Can bet a lunch the problem is not there.&lt;br/&gt;
</comment>
                    <comment id="43963" author="deepkaran.salooja" created="Wed, 14 Nov 2012 06:27:04 -0600"  >Looking at the code, I think we are not doing &amp;quot;full_set&amp;quot; queries here. &lt;br/&gt;
&lt;br/&gt;
Mark, can you please try the same queries with full_set=true option.&lt;br/&gt;
&lt;br/&gt;
Also, can you give me an idea of how many number of items you are loading/mutating. I have tried with 100K items.</comment>
                    <comment id="43974" author="deepkaran.salooja" created="Wed, 14 Nov 2012 10:09:29 -0600"  >Adding server logs here would be helpful. &lt;br/&gt;
&lt;br/&gt;
Mark, please run collect_info on your server node e.g.&lt;br/&gt;
&lt;br/&gt;
cd /opt/couchbase/bin/&lt;br/&gt;
./cbcollect_info out_file.zip&lt;br/&gt;
&lt;br/&gt;
and attach the out_file.zip here. </comment>
                    <comment id="43986" author="ingenthr" created="Wed, 14 Nov 2012 12:22:07 -0600"  >@deep: Have you run this??  the view is not a development view, thus full_set is not needed.  Even if it were, the results should be deterministic.  I would think the first step would be for you to try running it yourself.  Is there something you&amp;#39;re missing to be able to run this and gather whatever info you need?&lt;br/&gt;
&lt;br/&gt;
@filipe: I won&amp;#39;t take your bet.  :)  I don&amp;#39;t know that it is a view-engine issue.  In fact, there are some clues that it may not be.  Here&amp;#39;s what we know thus far:&lt;br/&gt;
* The issue is only manifested when there is a large number of items&lt;br/&gt;
* The issue only comes up if there is more than one server in the cluster (evidence that it could be in merging)&lt;br/&gt;
* The issue has been reproduced by two separate client libraries, verified by SDK QE (Mark) before escalating it&lt;br/&gt;
&lt;br/&gt;
From what I know of our architecture, it could certainly be in ep-engine or couchstore.  The client library is getting back (via the observe command) that data has been persisted, but perhaps it hasn&amp;#39;t really been persisted?  The other thing that could be happening is something at the view merging layer, since it only manifests itself when we have two or more nodes?</comment>
                    <comment id="43999" author="steve" created="Wed, 14 Nov 2012 13:13:25 -0600"  >to give comments/instructions to Deep, etc. on next info to get.</comment>
                    <comment id="44015" author="FilipeManana" created="Wed, 14 Nov 2012 14:21:32 -0600"  >So, to nail it down, there&amp;#39;s several things that need to be done:&lt;br/&gt;
&lt;br/&gt;
1) Check that&amp;#39;s there&amp;#39;s no indexing errors due to runtime JavaScript errors (check mapreduce_errors.1 log file on each node). This is unlikely to be the issues apparently;&lt;br/&gt;
&lt;br/&gt;
2) To know if stuff is not persisted to disk after observe unblocks the client, we need to know the ID of all keys, and know in which vbuckets (per node) keys will end up (running the hash function, or just analysis the db vbucket files after a few minutes). Adding a new key (document) to vbucket database file increases its seq number. So at query time, if we pass ?debug=true we&amp;#39;ll get information about the seq numbers of all vbucket database files at the time the query arrived - this will allows us to see if these sequences are lower than what we expected.&lt;br/&gt;
&lt;br/&gt;
Deep, do you think you can help on this? (Someone from SDKs can also due it, appreciated)&lt;br/&gt;
&lt;br/&gt;
As soon as I have a list of all keys (and the result of the hash function to know in which vbucket they end up on each node) and the output of a query with ?debug=true missing some rows, I&amp;#39;ll mention here, and add an entry to the to the wiki [1], how to analysis this issue based on all this information, so that in the future others can do this autonomously. &lt;br/&gt;
&lt;br/&gt;
[1]  &lt;a href=&quot;http://hub.internal.couchbase.com/confluence/display/QA/Debugging+view+engine+issues+and+reporting+them&quot;&gt;http://hub.internal.couchbase.com/confluence/display/QA/Debugging+view+engine+issues+and+reporting+them&lt;/a&gt;&lt;br/&gt;
</comment>
                    <comment id="44016" author="FilipeManana" created="Wed, 14 Nov 2012 14:22:07 -0600"  >See comment above Deep. Thanks.</comment>
                    <comment id="44059" author="deepkaran.salooja" created="Thu, 15 Nov 2012 06:06:14 -0600"  >Matt, &lt;br/&gt;
&lt;br/&gt;
I tested yesterday with a similar reproducer I wrote in python but couldn&amp;#39;t reproduce except for dev views(without full_set).&lt;br/&gt;
&lt;br/&gt;
I have taken up the java reproducer by Mark and set it up (with my limited knowledge of java) as below:&lt;br/&gt;
&lt;br/&gt;
&amp;gt; git clone &lt;a href=&quot;https://github.com/mnunberg/couchbase-java-client.git&quot;&gt;https://github.com/mnunberg/couchbase-java-client.git&lt;/a&gt;&lt;br/&gt;
&amp;gt; cd couchbase-java-client&lt;br/&gt;
(Modified the SERVER_URI to my server in ViewTest.java)&lt;br/&gt;
&amp;gt; ant compile jar&lt;br/&gt;
and Run the test&lt;br/&gt;
&amp;gt; java -cp .:build/ivy/lib/couchbase-client/common/*:build/jars/* org.junit.runner.JUnitCore com.couchbase.client.ViewTest&lt;br/&gt;
&lt;br/&gt;
The test testObserveWithStaleFalse is always passing on a 2-node(and 4-node) setup (1024 vbuckets, 1 replica). I am using build#1949.&lt;br/&gt;
I tried with docAmount as 500, 2500 and running the same test multiple times. &lt;br/&gt;
&lt;br/&gt;
Please let me know if I am doing anything different/incorrect.&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
</comment>
                    <comment id="44061" author="ingenthr" created="Thu, 15 Nov 2012 06:22:48 -0600"  >Deep:  can you share your environment with mark?&lt;br/&gt;
&lt;br/&gt;
Mark: can you try to repro there?</comment>
                    <comment id="44136" author="farshid" created="Thu, 15 Nov 2012 14:04:04 -0600"  >Mark,&lt;br/&gt;
&lt;br/&gt;
as part of your testing can you also grab the query results by appending ?debug=true&lt;br/&gt;
&amp;gt;&amp;gt;comments from filipe&amp;lt;&amp;lt;&amp;lt;&lt;br/&gt;
As soon as I have a list of all keys (and the result of the hash function to know in which vbucket they end up on each node) and the output of a query with ?debug=true missing some rows, &lt;br/&gt;
&lt;br/&gt;
also when you say you ran a process to consume lots of cpu is this on the same node as the one couchbase server is running ?</comment>
                    <comment id="44137" author="ingenthr" created="Thu, 15 Nov 2012 14:11:34 -0600"  >Mark may need logins to the servers.  Can you help there Farshid?</comment>
                    <comment id="44147" author="mnunberg" created="Thu, 15 Nov 2012 15:37:28 -0600"  >The script has been modified to dump the entire view output (with debug=true) into a file called &amp;#39;view_results.out&amp;#39; in the same directory.&lt;br/&gt;
&lt;br/&gt;
Yes the cpuburn is running on one of the cluster nodes (which also happens to be the one running the client)&lt;br/&gt;
&lt;br/&gt;
I don&amp;#39;t need SSH access to those other machines. I&amp;#39;ve worked around this by adding the current machine (10.3.2.100) to the cluster; making it a three node cluster.&lt;br/&gt;
&lt;br/&gt;
The server version for all those nodes is the same (1945 or such).</comment>
                    <comment id="44156" author="farshid" created="Thu, 15 Nov 2012 16:35:06 -0600"  >Thanks Mark. if you were able to replicate this issue did you happen to catprue the view_results.out ?&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
also starting another cpu intensive process or any other process that eats up i/o on couchbase cluster which is doing indexing or other things is not recommended. in this small scale however if enough cpu is left for the server to perform the indexing should happen right after persistence.&lt;br/&gt;
&lt;br/&gt;
</comment>
                    <comment id="44162" author="farshid" created="Thu, 15 Nov 2012 17:31:11 -0600"  >upload view results here : &lt;a href=&quot;http://s3.amazonaws.com/bugdb/jira/MB-7161/view-results.out&quot;&gt;http://s3.amazonaws.com/bugdb/jira/MB-7161/view-results.out&lt;/a&gt;</comment>
                    <comment id="44163" author="FilipeManana" created="Thu, 15 Nov 2012 17:58:36 -0600"  >Sorry, but can you get the raw JSON?&lt;br/&gt;
&lt;br/&gt;
It&amp;#39;s a lot easier like this, as it makes it easier to paste long arrays into a javascript interpreter or python or even erlang.&lt;br/&gt;
&lt;br/&gt;
From what I see, the view engine didn&amp;#39;t miss anything, that is, it indexed all stuff persisted in vbucket database files.&lt;br/&gt;
&lt;br/&gt;
Here&amp;#39;s how I see it:&lt;br/&gt;
&lt;br/&gt;
In debug_info object, you have an entry for each node. That entry is an object. One of its keys is &amp;quot;wanted_seqs&amp;quot;. This corresponds to the seq number of each vbucket database file (active vbuckets) at the time the query request was received.&lt;br/&gt;
Then another field (again, per node), is named &amp;quot;indexable_seqs&amp;quot;. This field tells us the seq number of each vbucket database file (active vbuckets) after the updater ran (for stale=false queries).&lt;br/&gt;
&lt;br/&gt;
As we can see (by eye), for each node, indexable_seqs == wanted_seqs. This means the view engine didn&amp;#39;t miss any persisted changes to the database file.&lt;br/&gt;
&lt;br/&gt;
If the output were in JSON, we could just paste these hash objects into a javascript console (or erlang), and no need to do the eyeballing, just compared directly the seq numbers via == or &amp;gt;=.&lt;br/&gt;
&lt;br/&gt;
What I suggest, after talking with Farshid, is right after the observe command returns, to send the ep-engine &amp;quot;stop/pause persistence&amp;quot; command to all nodes. And then use couch_dbdump against all database files to find out / confirm what documents are missing.</comment>
                    <comment id="44203" author="deepkaran.salooja" created="Fri, 16 Nov 2012 13:00:15 -0600"  >I was able to reproduce this with the php reproducer with 5 instances of burnP6 running on 1 node(making erlang and memcached starve). &lt;br/&gt;
We have some rows missing in view output. Now I picked up one of the rows.. and checked that its indeed there in the couchdb file&lt;br/&gt;
&lt;br/&gt;
e.g one of the missing key observeseq_1 exists in 145.couch.1&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&apos;mailto:root@ubuntu1104-64&apos;&gt;root@ubuntu1104-64&lt;/a&gt;:/opt/couchbase/var/lib/couchbase/data/default# ../../../../../bin/couch_dbdump 145.couch.1 &lt;br/&gt;
Doc seq: 27&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;id: observeseq_1&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;rev: 27&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;content_meta: 128&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cas: 5032553164630947, expiry: 0, flags: 0&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data: (snappy) {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_1&amp;quot;}&lt;br/&gt;
&lt;br/&gt;
Total docs: 1&lt;br/&gt;
&lt;br/&gt;
For 145.couch.1 the Doc seq is 27, but in the view query debug output for vbucket 145, we have:&lt;br/&gt;
indexable_seqs - 26&lt;br/&gt;
wanted_seqs - 26&lt;br/&gt;
&lt;br/&gt;
so that means when the view query request was made Doc Seq was 26 and it later got updated to 27.&lt;br/&gt;
&lt;br/&gt;
If I do a stale=false query after this, I get back the missing rows. i.e. in this case now doc seq 27 is picked up from vbucket 145.&lt;br/&gt;
&lt;br/&gt;
So somewhere due to slowness of the system things go out of sync..observe says the key has been persisted and client code issues a view query. but view engine doesn&amp;#39;t yet see the updated DocSeq.&lt;br/&gt;
&lt;br/&gt;
I modified the php reproducer to output JSON for query debug information. The modified code and output in json is attached.&lt;br/&gt;
</comment>
                    <comment id="44204" author="FilipeManana" created="Fri, 16 Nov 2012 13:08:54 -0600"  >Thanks Deep, excellent analysis.&lt;br/&gt;
&lt;br/&gt;
So my gut feeling is that observe replies to the client after stuff is written to database file, but *before* ep-engine/couchkvstore notifies the erlang side that the vbucket database has a new header/snapshot.&lt;br/&gt;
&lt;br/&gt;
Will confirm with Chiyoung/Mike if this is the case.</comment>
                    <comment id="44205" author="mikew" created="Fri, 16 Nov 2012 13:11:15 -0600"  >We do not respond until after the notification takes place so I don&amp;#39;t think that would be the issue. We can discuss more offline.</comment>
                    <comment id="44206" author="FilipeManana" created="Fri, 16 Nov 2012 13:11:37 -0600"  >See comment/question above Chiyoung/Mike.</comment>
                    <comment id="44207" author="steve" created="Fri, 16 Nov 2012 13:33:18 -0600"  >bug-scrub mtg:&lt;br/&gt;
&lt;br/&gt;
- feels like a race with mccouch notification / processing.  The design, though, is synchronous (?).&lt;br/&gt;
&lt;br/&gt;
- this scenario also artificially saturates server CPU (&amp;quot;cpuburn&amp;quot; ?).&lt;br/&gt;
&lt;br/&gt;
- need to understand what the cost / risk of a fix.&lt;br/&gt;
</comment>
                    <comment id="44210" author="FilipeManana" created="Fri, 16 Nov 2012 14:01:09 -0600"  >Deep, this might have been a race/timing issue.&lt;br/&gt;
&lt;br/&gt;
You did the couchdb_dump run right after observe returned and before the query? (My guess is not, unless you have Flash powers).&lt;br/&gt;
&lt;br/&gt;
Discussing with Chiyoung, without code changes to debug this issue, we can&amp;#39;t reliably find out the source of the problem. My idea is to make a change (not to commit/submit to gerrit) that disables persistence (on all nodes) right *before observe returns and after writing the stuff to disk and synchronously notifying mccouch*.</comment>
                    <comment id="44254" author="deepkaran.salooja" created="Sat, 17 Nov 2012 00:22:57 -0600"  >Filipe, couchdb_dump was done after the stale=false view query. So that&amp;#39;s in steady state, long after the test is finished. </comment>
                    <comment id="44255" author="chiyoung" created="Sat, 17 Nov 2012 00:40:50 -0600"  >Deep,&lt;br/&gt;
&lt;br/&gt;
I ran &amp;quot;observe-test.php&amp;quot; with 5 processes of burnP6 on a single node many times, and got the following output:&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&apos;mailto:chiyoung@ubu-1706&apos;&gt;chiyoung@ubu-1706&lt;/a&gt;:~$ php observe-test.php &lt;br/&gt;
Have ID observe&lt;br/&gt;
Ooops.. key observeseq_0 didn&amp;#39;t persist&lt;br/&gt;
Ooops.. key observeseq_2 didn&amp;#39;t persist&lt;br/&gt;
Ooops.. key observeseq_3 didn&amp;#39;t persist&lt;br/&gt;
...&lt;br/&gt;
Ooops.. key observeseq_297 didn&amp;#39;t persist&lt;br/&gt;
Ooops.. key observeseq_298 didn&amp;#39;t persist&lt;br/&gt;
Ooops.. key observeseq_299 didn&amp;#39;t persist&lt;br/&gt;
Found key observeseq_48, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_82, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_87, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_94, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_100, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_105, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_113, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_140, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_145, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_156, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_199, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_225, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_233, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_236, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_260, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
Found key observeseq_273, but we didn&amp;#39;t manage to persist it&lt;br/&gt;
&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 0&lt;br/&gt;
Total rows: 284&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
I looked at the php test code and saw the following code snippet:&lt;br/&gt;
&lt;br/&gt;
# Ensure that they&amp;#39;ve all persisted (sometimes not all of them will).&lt;br/&gt;
$status = $cb-&amp;gt;keyDurabilityMulti($casvals, array(&amp;quot;persist_to&amp;quot; =&amp;gt; 2, &amp;quot;replicate_to&amp;quot; =&amp;gt; 1));&lt;br/&gt;
&lt;br/&gt;
I guess that &amp;quot;keyDurabilityMulti()&amp;quot; doesn&amp;#39;t wait until all the keys are persisted. If this is true, I don&amp;#39;t see anything wrong from my tests.&lt;br/&gt;
&lt;br/&gt;
</comment>
                    <comment id="44256" author="deepkaran.salooja" created="Sat, 17 Nov 2012 01:10:20 -0600"  >You are right Chiyoung, keyDurabilityMulti doesn&amp;#39;t wait until all the keys are persisted. But it returns a status array using which the test figures out which keys were persisted and which weren&amp;#39;t. So in the view query output, the test expects only those keys which were persisted.&lt;br/&gt;
&lt;br/&gt;
$didnt_persist = array();&lt;br/&gt;
foreach ($status as $k =&amp;gt; $v) {&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# For each of those keys which did not yet persist, store them in&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# a lookup. These are the keys we don&amp;#39;t expect to show up in our view&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if (!$v) {&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;printf(&amp;quot;Ooops.. key %s didn&amp;#39;t persist\n&amp;quot;, $k);&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;$didnt_persist[$k] = true;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br/&gt;
}&lt;br/&gt;
&lt;br/&gt;
The problematic case is the output like below:&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
Have ID observe&lt;br/&gt;
Hrrm.. value observeseq_1 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_1&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_4 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_4&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_8 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_8&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_11 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_11&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_14 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_14&amp;quot;}&lt;br/&gt;
......&lt;br/&gt;
Hrrm.. value observeseq_18 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_18&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_20 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_20&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_25 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_25&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_29 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_29&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_33 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_33&amp;quot;}&lt;br/&gt;
&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 94&lt;br/&gt;
Total rows: 206&lt;br/&gt;
&lt;br/&gt;
This output is generated when the test detects a key which was reported as persisted but didn&amp;#39;t show up in view query results(as all the keys returned by view query are deleted).&lt;br/&gt;
&lt;br/&gt;
With a single node, I wasn&amp;#39;t able to repro this error. I have a 3 node cluster and on one of the nodes I am running 5 burnP6 and on the same node I run the php code. </comment>
                    <comment id="44257" author="chiyoung" created="Sat, 17 Nov 2012 01:46:42 -0600"  >Thanks for your quick response.&lt;br/&gt;
&lt;br/&gt;
I set up three node cluster (10.5.2.35, 10.5.2.36, 10.5.2.37) and was running 5 burnP6 instances and the php client on 10.5.2.37. However, I didn&amp;#39;t see any issues in 100 runs yet:&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&apos;mailto:chiyoung@ubu-1706&apos;&gt;chiyoung@ubu-1706&lt;/a&gt;:~$ ps aux | grep burn&lt;br/&gt;
root     21640 73.0  0.0    100    16 pts/0    R    23:16   9:25 burnP6&lt;br/&gt;
root     21641 75.5  0.0    100    16 pts/0    R    23:16   9:44 burnP6&lt;br/&gt;
root     21642 81.2  0.0    100    12 pts/0    R    23:16  10:29 burnP6&lt;br/&gt;
root     21643 74.9  0.0    100    12 pts/0    R    23:16   9:40 burnP6&lt;br/&gt;
root     21644 80.1  0.0    100    16 pts/0    R    23:16  10:20 burnP6&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&apos;mailto:chiyoung@ubu-1706&apos;&gt;chiyoung@ubu-1706&lt;/a&gt;:~$ ./observe-test.sh &amp;gt; result.txt&lt;br/&gt;
&lt;a href=&apos;mailto:chiyoung@ubu-1706&apos;&gt;chiyoung@ubu-1706&lt;/a&gt;:~$ &lt;br/&gt;
&lt;a href=&apos;mailto:chiyoung@ubu-1706&apos;&gt;chiyoung@ubu-1706&lt;/a&gt;:~$ &lt;br/&gt;
&lt;a href=&apos;mailto:chiyoung@ubu-1706&apos;&gt;chiyoung@ubu-1706&lt;/a&gt;:~$ cat result.txt | grep &amp;quot;Total keys that should have been deleted&amp;quot;&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 0&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 0&lt;br/&gt;
...&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 0&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 0&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 0&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
Can you give me an access to your cluster? I can take a look more tomorrow.&lt;br/&gt;
&lt;br/&gt;
</comment>
                    <comment id="44258" author="chiyoung" created="Sat, 17 Nov 2012 02:20:04 -0600"  >Okay, I saw the same issue after iterating the test in 1000 times:&lt;br/&gt;
&lt;br/&gt;
Hrrm.. value observeseq_0 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_0&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_10 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_10&amp;quot;}&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 2&lt;br/&gt;
Total rows: 292&lt;br/&gt;
</comment>
                    <comment id="44314" author="ingenthr" created="Mon, 19 Nov 2012 11:18:28 -0600"  >Unfortunately, adding a delay is not a workaround for the lack of correctness.  How long do we delay?  We&amp;#39;ll never know.  One can&amp;#39;t tell by the results in a real deployment since there&amp;#39;d be other actors in the system.</comment>
                    <comment id="44317" author="farshid" created="Mon, 19 Nov 2012 11:29:24 -0600"  >Deep has been working on reproducing this under higher mutations deterministically and he is going to update the ticket shortly</comment>
                    <comment id="44318" author="deepkaran.salooja" created="Mon, 19 Nov 2012 11:38:46 -0600"  >This can also be reproduced by running the php test when data load is in progress(20k items, 500 sets/sec) on 1 node and without any CPU burning.&lt;br/&gt;
And is much more consistently reproducible this way.&lt;br/&gt;
e.g. &lt;br/&gt;
On a 3 node cluster, start load on a node using mcsoda(no cpu burning):&lt;br/&gt;
./lib/perf_engines/mcsoda.py 10.3.121.100 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-misses=0 min-value-size=256,512  max-items=20000 exit-after-creates=1 prefix=aone&lt;br/&gt;
&lt;br/&gt;
Run the test while load is in progress and the same problem happens:&lt;br/&gt;
&lt;br/&gt;
&lt;a href=&apos;mailto:root@ubuntu1104-64&apos;&gt;root@ubuntu1104-64&lt;/a&gt;:~/php-tests# ./runany.pl misc/observe-test.php&lt;br/&gt;
php misc/observe-test.php&lt;br/&gt;
Have ID observe&lt;br/&gt;
Hrrm.. value observeseq_8 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_8&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_18 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_18&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_39 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_39&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_58 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_58&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_81 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_81&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_82 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_82&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_91 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_91&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_92 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_92&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_94 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_94&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_97 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_97&amp;quot;}&lt;br/&gt;
....&lt;br/&gt;
Hrrm.. value observeseq_265 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_265&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_266 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_266&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_270 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_270&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_275 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_275&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_276 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_276&amp;quot;}&lt;br/&gt;
Hrrm.. value observeseq_288 still has key {&amp;quot;php_value&amp;quot;:&amp;quot;observeseq_288&amp;quot;}&lt;br/&gt;
Total keys that should have been deleted, but weren&amp;#39;t: 46&lt;br/&gt;
Total rows: 1662&lt;br/&gt;
</comment>
                    <comment id="44319" author="farshid" created="Mon, 19 Nov 2012 11:41:27 -0600"  >Deep,&lt;br/&gt;
&lt;br/&gt;
do you have a python test which can be run with cluster_run configuration to reproduce this issue ?</comment>
                    <comment id="44332" author="steve" created="Mon, 19 Nov 2012 13:31:09 -0600"  >looks like no intraserver-component race.&lt;br/&gt;
&lt;br/&gt;
Chiyoung provided explanation of test-case with observe with 2-persist / 1-replica and how client mistakenly thinks active server has persisted.</comment>
                    <comment id="44333" author="steve" created="Mon, 19 Nov 2012 13:31:34 -0600"  >bug-scrub - down to critical, as it&amp;#39;s no longer a server internal issue.</comment>
                    <comment id="44337" author="chiyoung" created="Mon, 19 Nov 2012 14:00:30 -0600"  >Set up the three node cluster (10.5.2.35, 10.5.2.36, 10.5.2.37) and ran 5 instances of burnP6 on 10.5.2.37&lt;br/&gt;
&lt;br/&gt;
I ran &amp;quot;observe-test.php&amp;quot; and got the following ep-engine logs when the test was failed:&lt;br/&gt;
&lt;br/&gt;
1) From 10.5.2.37 where active vbucket 764 is placed:&lt;br/&gt;
&lt;br/&gt;
Mon Nov 19 10:54:15.275393 PST 3: PersistenceCallback: Set the seq no 345 for key &amp;#39;observeseq_228&amp;#39; and vbucket 764: seq no 345 from DB&lt;br/&gt;
&lt;br/&gt;
2) From 10.5.2.35 where replica vbucket 764 is placed:&lt;br/&gt;
&lt;br/&gt;
Mon Nov 19 10:54:14.694152 PST 3: PersistenceCallback: Set the seq no 198 for key &amp;#39;observeseq_228&amp;#39; and vbucket 764&lt;br/&gt;
Mon Nov 19 10:54:14.722671 PST 3: OBSERVE: key &amp;#39;observeseq_228&amp;#39; and vbucket 764 was persisted&lt;br/&gt;
&lt;br/&gt;
3) From the view debug output:&lt;br/&gt;
&lt;br/&gt;
&amp;quot;wanted_seqs&amp;quot;:&lt;br/&gt;
&amp;quot;0764&amp;quot; : 344&lt;br/&gt;
&lt;br/&gt;
From the test, I confirmed that the php client gets the response indicating that &amp;quot;observeseq_228&amp;quot; was persisted in vbucket 764, but it was actually persisted in replica vbucket only on 10.5.2.35. However, the view indexer will look at the active vbucket 764 on 10.5.2.37 where the item wasn&amp;#39;t persisted yet.&lt;br/&gt;
&lt;br/&gt;
It seems to me that this is a client library issue (especially libcouchbase).&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;</comment>
                    <comment id="44339" author="mnunberg" created="Mon, 19 Nov 2012 14:59:12 -0600"  >I haven&amp;#39;t been able to determine any specific client bug as far as libcouchbase is concerned with respect to observe.&lt;br/&gt;
&lt;br/&gt;
There are some things that should be noted, though.&lt;br/&gt;
&lt;br/&gt;
The observe as implemented in libcouchbase is a simple low-level observe; that is, libcouchbase will effectively &amp;quot;broadcast&amp;quot; requests to the relevant nodes, and will invoke a php-ext level callback for each observe response.&lt;br/&gt;
&lt;br/&gt;
Then the php-ext will go ahead and count the number of times those callbacks were received with a successful response (i.e. it was persisted and exists with the user-specified CAS).&lt;br/&gt;
&lt;br/&gt;
This logic is implemented here:&lt;br/&gt;
&lt;a href=&quot;https://github.com/couchbase/php-ext-couchbase/blob/ac35571a79c87281ea8fe3f03f89b7f8316ae71c/observe.c#L489&quot;&gt;https://github.com/couchbase/php-ext-couchbase/blob/ac35571a79c87281ea8fe3f03f89b7f8316ae71c/observe.c#L489&lt;/a&gt;&lt;br/&gt;
&lt;br/&gt;
Basically sitting and waiting until durability requirements have been satisfied.&lt;br/&gt;
&lt;br/&gt;
Despite the PHP client lacking any kind of &amp;#39;master-first&amp;#39; logic, my initial tests were implemented with a bucket employing a single replica. The test script requests persistence to two nodes; considering the fact that there are only two nodes for any given vbucket, a persistence requirement of two effectively includes waiting for the item to be persisted to the master.&lt;br/&gt;
&lt;br/&gt;
The script may of course be modified to persist to an arbitrary number of nodes (if the count is too many, the php client will warn, and cap the number to the maximum available (persist_to &amp;lt;= (num_replicas + 1) &amp;lt;= 4);&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;ve modified observe.c in the php ext (on 10.3.121.100) to print out keys as  they receive positive responses from the master, and indeed I can reproduce this issue when such responses are received.&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;ll actually go ahead now and dump the packets from libcouchbase to the screen as well (i.e. hex dump). Hopefully this should remove confusion about this..&lt;br/&gt;
&lt;br/&gt;
Now as far as the keys not being replicated to the master go; the php ext only knows what libcouchbase tells it, and libcouchbase only knows what it receives on the memcached connection.&lt;br/&gt;
&lt;br/&gt;
Unless libcouchbase is doing something seriously wrong with the protocol handling, I&amp;#39;m guessing the problem isn&amp;#39;t there.</comment>
                    <comment id="44364" author="mikew" created="Mon, 19 Nov 2012 18:48:59 -0600"  >From the logs:&lt;br/&gt;
&lt;br/&gt;
Ooops.. key observeseq_599 didn&amp;#39;t persist&lt;br/&gt;
&lt;br/&gt;
[libcouchbase] observe_response_handler:459 Response from 10.3.3.95&lt;br/&gt;
MAGIC=RES OP=92 STATUS=SUCCESS KLEN=0 EXTLEN=0 NBODY=27 OPAQUE=4a9 CAS=4020000&lt;br/&gt;
	Body:&lt;br/&gt;
[0000]   03 40 00 0E 6F 62 73 65   72 76 65 73 65 71 5F 35   ....obse rveseq.5&lt;br/&gt;
[0010]   39 39 00 00 1B 84 08 DD   EC B6 E7                  99...... ...&lt;br/&gt;
Key observeseq_599 persisted to master..&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
The key status in this message is 00 which corresponds to &amp;quot;Found, Not Persisted&amp;quot;, but the log message reports that the item was persisted to master. Can you look at the libcouchbase implementation to see if this is the possible error?</comment>
                    <comment id="44365" author="mikew" created="Mon, 19 Nov 2012 18:49:40 -0600"  >Please see my comment.</comment>
                    <comment id="44371" author="mnunberg" created="Mon, 19 Nov 2012 19:58:43 -0600"  >You are correct; the logging statement was placed at the wrong location (after *any* success status, rather than specifically a persisted one), and the persistence check in the php extension code checked against the wrong criteria (typo).&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;m not sure how the Java client failed, but I can&amp;#39;t reproduce it with PHP anymore in light of those items which I&amp;#39;ve fixed up.&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;ve filed a bug in PHP for this (&lt;a href=&quot;http://www.couchbase.com/issues/browse/PCBC-148&quot; title=&quot;Key persistence checks wrong criterion&quot;&gt;&lt;strike&gt;PCBC-148&lt;/strike&gt;&lt;/a&gt;, &lt;a href=&quot;http://www.couchbase.com/issues/browse/PCBC-149&quot; title=&quot;persist_to should imply master when greater than zero&quot;&gt;&lt;strike&gt;PCBC-149&lt;/strike&gt;&lt;/a&gt;), and there&amp;#39;s already &lt;a href=&quot;http://www.couchbase.com/issues/browse/JCBC-142&quot; title=&quot;Observe Tests show that something is wrong in the observe impl&quot;&gt;&lt;strike&gt;JCBC-142&lt;/strike&gt;&lt;/a&gt; (from whence the bug originated).&lt;br/&gt;
&lt;br/&gt;
Additionally, means to test this (be it via the mock, or via the cluster) should be available. In reality there&amp;#39;s usually a sub-millisecond time difference between having a key persisted and having it replicated..&lt;br/&gt;
&lt;br/&gt;
So it seems that thanks to our extra debugging, it&amp;#39;s no longer a server issue</comment>
                    <comment id="44395" author="mikew" created="Mon, 19 Nov 2012 22:05:04 -0600"  >Thanks for following up on this and I&amp;#39;m glad we have a fix. The Java client is a separate issue that I will take care it tomorrow. Please close this issue as invalid if there is no server fix needed. If you have any other questions let me know.</comment>
                    <comment id="44396" author="chiyoung" created="Mon, 19 Nov 2012 22:09:43 -0600"  >I think we have enough evidence indicating that this issue was NOT caused by the server engine side. The separate bugs were created for client sides. </comment>
                    <comment id="44400" author="ingenthr" created="Mon, 19 Nov 2012 23:25:08 -0600"  >Great to know we can put this issue to rest.  I&amp;#39;m glad we resolved where it was.&lt;br/&gt;
&lt;br/&gt;
Mike: thanks for stepping through it and reviewing everything Mark submitted.  It surely helped to get everyone looking at the same thing.&lt;br/&gt;
&lt;br/&gt;
Mark: thanks for repro&amp;#39;ing the Java issue and then continuing to iterate providing enough info so the issue could be definitely identified.  Also, great followup by filing the related issues.</comment>
                </comments>
                    <attachments>
                    <attachment id="15834" name="observe-test.php" size="3068" author="deepkaran.salooja" created="Fri, 16 Nov 2012 13:00:15 -0600" />
                    <attachment id="15833" name="view_results.out" size="63510" author="deepkaran.salooja" created="Fri, 16 Nov 2012 13:00:15 -0600" />
                </attachments>
            <subtasks>
        </subtasks>
                <customfields>
                                                                        <customfield id="customfield_10180" key="com.atlassian.jira.ext.charting:firstresponsedate">
                <customfieldname>Date of First Response</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>Sun, 11 Nov 2012 16:15:10 -0600</customfieldvalue>

                </customfieldvalues>
            </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10081" key="com.pyxis.greenhopper.jira:gh-global-rank">
                <customfieldname>Rank</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>3555</customfieldvalue>
                </customfieldvalues>
            </customfield>
                                                                                                                                                                                        <customfield id="customfield_10181" key="com.atlassian.jira.ext.charting:timeinstatus">
                <customfieldname>Time In Status</customfieldname>
                <customfieldvalues>
                    
                </customfieldvalues>
            </customfield>
                                                </customfields>
    </item>
</channel>
</rss>