<!-- 
RSS generated by JIRA (5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9) at Mon May 20 17:41:19 CDT 2013

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary add field=key&field=summary to the URL of your request.
For example:
http://www.couchbase.com/issues/si/jira.issueviews:issue-xml/MB-7160/MB-7160.xml?field=key&field=summary
-->
<rss version="0.92" >
<channel>
    <title>Couchbase</title>
    <link>http://www.couchbase.com/issues</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>5.2.4</version>
        <build-number>845</build-number>
        <build-date>26-12-2012</build-date>
    </build-info>

<item>
            <title>[MB-7160] if flush times out final stage (janitor creating vbuckets back) it returns success causing clients to see TMPFAIL after flush succeeds (WAS: there are reports that even after invoking FLUSH nodes return TMPFAIL...)</title>
                <link>http://www.couchbase.com/issues/browse/MB-7160</link>
                <project id="10010" key="MB">Couchbase Server</project>
                        <description>&amp;quot;reported by SDK team&amp;quot;&lt;br/&gt;
After a flush (through REST) for a time we still get tmpfail returned by server nodes.  This is not expected, and would be kind of annoying from an application and/or cause problems with automated tests.&lt;br/&gt;
&lt;br/&gt;
update:&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;ve updated subject. Indeed this is possible. But IMHO given that clients should always be prepared to handle TMPFAIL out of everything I&amp;#39;ve lowered down to minor.&lt;br/&gt;
&lt;br/&gt;
further update:&lt;br/&gt;
&lt;br/&gt;
I disagree on the &amp;quot;always be able to handle TMPFAIL&amp;quot;, especially in the case of running tests at the SDK side.  At the moment, we ask our users to handle tmpfail directly.  That&amp;#39;s intentional and seems to make sense as a pressure relief valve for steady state, but between these flushes and especially from unit tests, it&amp;#39;d be best if the cluster could just block either the operation request or the flush response until complete.&lt;br/&gt;
&lt;br/&gt;
Raising this to major owing to end-user reports of trouble here.&lt;br/&gt;
</description>
                <environment></environment>
            <key id="20687">MB-7160</key>
            <summary>if flush times out final stage (janitor creating vbuckets back) it returns success causing clients to see TMPFAIL after flush succeeds (WAS: there are reports that even after invoking FLUSH nodes return TMPFAIL...)</summary>
                <type id="1" iconUrl="http://www.couchbase.com/issues/images/icons/issuetypes/bug.png">Bug</type>
                                <priority id="3" iconUrl="http://www.couchbase.com/issues/images/icons/priorities/major.png">Major</priority>
                    <status id="4" iconUrl="http://www.couchbase.com/issues/images/icons/statuses/reopened.png">Reopened</status>
                    <resolution id="-1">Unresolved</resolution>
                    <security id="10011">Public</security>
                        <assignee username="alkondratenko">Aleksey Kondratenko</assignee>
                                <reporter username="farshid">Farshid Ghods</reporter>
                        <labels>
                        <label>2.0-release-notes</label>
                    </labels>
                <created>Sun, 11 Nov 2012 13:01:35 -0600</created>
                <updated>Thu, 21 Mar 2013 17:07:29 -0500</updated>
                                    <version>2.0-beta-2</version>
                <version>2.0.1</version>
                <version>2.0.2</version>
                                                <component>ns_server</component>
                                <votes>0</votes>
                        <watches>7</watches>
                                                    <comments>
                    <comment id="43763" author="farshid" created="Sun, 11 Nov 2012 13:01:59 -0600"  >Deep,&lt;br/&gt;
&lt;br/&gt;
can you please reproduce this case after coordinating with SDK team</comment>
                    <comment id="43764" author="farshid" created="Sun, 11 Nov 2012 13:03:30 -0600"  >maybe enabling traffic command is async instead of sync ?&lt;br/&gt;
&lt;br/&gt;
Matt,&lt;br/&gt;
if this is easy to reprodcue can you please add more information&lt;br/&gt;
1- how many nodes&lt;br/&gt;
2- buckets&lt;br/&gt;
3- hardware info ( cpu , RAM ) &lt;br/&gt;
&lt;br/&gt;
and upload diags if possible</comment>
                    <comment id="43780" author="ingenthr" created="Mon, 12 Nov 2012 01:58:28 -0600"  >Sergey: this issue originally came from you.&lt;br/&gt;
&lt;br/&gt;
Can you:&lt;br/&gt;
1) let us know if you still see this issue and&lt;br/&gt;
2) let us know if you&amp;#39;d opened a bug previously?&lt;br/&gt;
&lt;br/&gt;
Thanks.</comment>
                    <comment id="43782" author="avsej" created="Mon, 12 Nov 2012 04:39:04 -0600"  >The issue still there. Take a look at attached files.&lt;br/&gt;
&lt;br/&gt;
1 node&lt;br/&gt;
1 bucket&lt;br/&gt;
default number of vbuckets</comment>
                    <comment id="43813" author="steve" created="Mon, 12 Nov 2012 13:51:54 -0600"  >bug-scrub: mike described a workaround to sdk team.</comment>
                    <comment id="43823" author="avsej" created="Mon, 12 Nov 2012 14:53:56 -0600"  >Where can I find that description?</comment>
                    <comment id="43939" author="alkondratenko" created="Tue, 13 Nov 2012 18:04:09 -0600"  >Sergey, I see potential for confusion. Is that related to slow flush you&amp;#39;re seeing? Are you sure you&amp;#39;re getting 200 back ?&lt;br/&gt;
&lt;br/&gt;
Please describe your case better.&lt;br/&gt;
&lt;br/&gt;
As for the question of workaround, its basically same as any tmpfail error: just keep re-trying.&lt;br/&gt;
</comment>
                    <comment id="43949" author="avsej" created="Wed, 14 Nov 2012 00:00:16 -0600"  >as title says some spec is saying that after flush all operations should be Ok. the script show a that the cluster might return 500 error to flush</comment>
                    <comment id="43971" author="alkondratenko" created="Wed, 14 Nov 2012 09:55:59 -0600"  >Sergey, this just can&amp;#39;t be fair. 5xx reply means all bets are off.&lt;br/&gt;
&lt;br/&gt;
Real bug you&amp;#39;re after is: &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-6232&quot; title=&quot;ep-engine needs 1.5 minutes to create 1k vbuckets. Seems too slow (but gets fast with barrier=0)&quot;&gt;MB-6232&lt;/a&gt;</comment>
                    <comment id="43972" author="alkondratenko" created="Wed, 14 Nov 2012 09:56:14 -0600"  >See &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-6232&quot; title=&quot;ep-engine needs 1.5 minutes to create 1k vbuckets. Seems too slow (but gets fast with barrier=0)&quot;&gt;MB-6232&lt;/a&gt; as pointed out above</comment>
                    <comment id="43973" author="alkondratenko" created="Wed, 14 Nov 2012 09:56:31 -0600"  >And workaround is to run off ram disk</comment>
                    <comment id="45603" author="kzeller" created="Thu, 6 Dec 2012 16:23:43 -0600"  >Added to RN as:&lt;br/&gt;
&lt;br/&gt;
Several incidents have been reported that after using Flush on nodes, Couchbase &lt;br/&gt;
Server returns TMPFAIL even after a successful flush.</comment>
                    <comment id="52246" author="ingenthr" created="Thu, 7 Mar 2013 11:42:48 -0600"  >Accidentally thought this was the right one to re-open.  It actually is a dupe here, but there&amp;#39;s a related issue.</comment>
                    <comment id="52258" author="alkondratenko" created="Thu, 7 Mar 2013 13:25:42 -0600"  >Matt also said the following over email:&lt;br/&gt;
&lt;br/&gt;
BUT recent testing by our team indicates the fsync is not the underlying cause.  UPDATE, as of 3/7, &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-6232&quot; title=&quot;ep-engine needs 1.5 minutes to create 1k vbuckets. Seems too slow (but gets fast with barrier=0)&quot;&gt;MB-6232&lt;/a&gt; has now been moved from Backlog to 2.0.2.  Need to reopen &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-7160&quot; title=&quot;if flush times out final stage (janitor creating vbuckets back) it returns success causing clients to see TMPFAIL after flush succeeds (WAS: there are reports that even after invoking FLUSH nodes return TMPFAIL...)&quot;&gt;MB-7160&lt;/a&gt; with current findings&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;m waiting those datails that could indicate there&amp;#39;s some other reason for tmpfails</comment>
                    <comment id="53301" author="ingenthr" created="Thu, 21 Mar 2013 17:02:59 -0500"  >This may or may not be directly related to the specific issue reported here.  It seems to be a contributing factor anyway.&lt;br/&gt;
&lt;br/&gt;
Trond carried out a series of experiments recently recently to test whether or not the time problems with flush time were owing to the underlying &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-6232&quot; title=&quot;ep-engine needs 1.5 minutes to create 1k vbuckets. Seems too slow (but gets fast with barrier=0)&quot;&gt;MB-6232&lt;/a&gt; as previously suspected.  &lt;a href=&quot;http://www.couchbase.com/issues/browse/MB-6232&quot; title=&quot;ep-engine needs 1.5 minutes to create 1k vbuckets. Seems too slow (but gets fast with barrier=0)&quot;&gt;MB-6232&lt;/a&gt; says fsyncs and serially doing a lot of work with vbuckets cause lots of the creation/deletion slowness we see.  &lt;br/&gt;
&lt;br/&gt;
To isolate this and find a solution for a particular deployment, a number of tests were run.  In the last experiment, two tests were run:&lt;br/&gt;
1) Reduce the number of vbuckets to 8, but otherwise use a real fileystem, disks, etc. (~2sec)&lt;br/&gt;
2) Use ramfs (9.5sec)&lt;br/&gt;
&lt;br/&gt;
1) Reduce the number of vbuckets:&lt;br/&gt;
&lt;br/&gt;
I uninstalled 2.0.1-build 170 from my box and removed /opt/couchbase before reinstalling it and stopping it immediately. Then I edited /opt/couchbase/bin/couchbase-server and added (right under the license header):&lt;br/&gt;
&lt;br/&gt;
COUCHBASE_NUM_VBUCKETS=8&lt;br/&gt;
export COUCHBASE_NUM_VBUCKETS&lt;br/&gt;
&lt;br/&gt;
This sets the server to be using only 8 vbuckets instead of the default number of 1024. This number _HAS_ to be a &amp;quot;power-of-two&amp;quot;, and should not be less than the number of nodes you have in your cluster (then you&amp;#39;ll have nodes without any vbuckets).&lt;br/&gt;
&lt;br/&gt;
Running the same program as I used earlier (who did a flush of an empty bucket) now use ~2 secs on a filesystem that is mounted with &amp;quot;barrier=1&amp;quot; and less than a second (0.7) on a filesystem that is mounted with &amp;quot;barrier=0&amp;quot;&lt;br/&gt;
&lt;br/&gt;
2) Using ramfs&lt;br/&gt;
&lt;br/&gt;
I created the ramfs by running:&lt;br/&gt;
&lt;br/&gt;
mount -t ramfs -o size=100m ramfs /tmp/couchbase_data&lt;br/&gt;
chown couchbase:couchbase /tmp/couchbase_data&lt;br/&gt;
&lt;br/&gt;
Then installed couchbase and had it use /tmp/couchbase_data for storage.&lt;br/&gt;
&lt;br/&gt;
Running flush takes ~9.5 sec for an empty bucket. Reducing the number of vbuckets to 8 like described in 1 used 0.36 s&lt;br/&gt;
&lt;br/&gt;
---   ---   ---&lt;br/&gt;
&lt;br/&gt;
In an earlier set of experiments, there was more measurement of where time was going. I&amp;#39;ve edited this slightly from Trond:&lt;br/&gt;
&lt;br/&gt;
Test rig: (Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GH with 4GB memory).&lt;br/&gt;
The system is using ext4 as a filesystem, 2GB of memory to Couchbase (and installed the two sample databases during installation).&lt;br/&gt;
&lt;br/&gt;
Test was run with the following PHP script which was intended to simulate a series of flushes as unit tests are run:&lt;br/&gt;
&lt;br/&gt;
&amp;lt;?php&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;$counter = 1;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;$errors = 0;&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;$cb = new Couchbase(&amp;quot;localhost&amp;quot;, &amp;quot;Administrator&amp;quot;, &amp;quot;password&amp;quot;);&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;while ($counter &amp;lt; 11) {&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;try {&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;echo &amp;quot;\rFlushing: &amp;quot; . $counter . &amp;quot; Errors: &amp;quot; . $errors;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;$cb-&amp;gt;flush();&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;} catch (CouchbaseException $e) {&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;$errors++;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;$counter++;&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;br/&gt;
?&amp;gt;&lt;br/&gt;
&lt;br/&gt;
When I initially ran the script it reported:&lt;br/&gt;
&lt;a href=&apos;mailto:trond@desktop&apos;&gt;trond@desktop&lt;/a&gt;:~/compile/sdk/php$ time php -c php.ini example/buckets.php &lt;br/&gt;
Flushing: 10 Errors: 8&lt;br/&gt;
real 5m36.613s&lt;br/&gt;
user 0m0.012s&lt;br/&gt;
sys 0m0.020s&lt;br/&gt;
&lt;br/&gt;
Please note that we can&amp;#39;t really &amp;quot;trust&amp;quot; the numbers above when we had an error, because I don&amp;#39;t check for the reason _why_ it failed (the next one could for instance fail due to the fact that we&amp;#39;ve already got a flush running etc).&lt;br/&gt;
&lt;br/&gt;
When I remounted the filesystem with barrier=0 and I got:&lt;br/&gt;
&lt;a href=&apos;mailto:trond@deskttime&apos;&gt;trond@deskttime&lt;/a&gt; op:~/compile/sdk/php$ php -c php.ini example/buckets.php &lt;br/&gt;
Flushing: 10 Errors: 0&lt;br/&gt;
real 3m4.609s&lt;br/&gt;
user 0m0.028s&lt;br/&gt;
sys 0m0.008s&lt;br/&gt;
&lt;br/&gt;
At least all of the flush calls succeed, but look at the time.. It takes more than 3 _minutes_ to run 10 flush commands, so for a user running 500 unit tests each separated by a flush, just running the flush for their tests would result in a roughly 2 1/2 hours waiting for the flush commands in their 500 tests cycles.. (then depending on how much data they add it may take longer...)&lt;br/&gt;
&lt;br/&gt;
What I do find interesting is when I run top while I&amp;#39;m doing this I see beam.smp (erlang vm) using from 100-170% CPU (aka almost two cores), whereas memcached is relatively idle. From what I can see it looks to me that flushing a vbucket is setting all of the vbuckets to &amp;quot;dead&amp;quot; before activating them again. &lt;br/&gt;
&lt;br/&gt;
I know we don&amp;#39;t have stats for _everything_ we do, but from running the &amp;quot;timings&amp;quot; cbstat command we see (removing the printout of the distribution there):&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;set_vb_cmd (2048 total)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Avg           : (   27us)&lt;br/&gt;
&amp;nbsp;del_vb_cmd (2048 total)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Avg           : (   44us)&lt;br/&gt;
&amp;nbsp;disk_vbstate_snapshot (1622 total)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Avg           : (    6ms)&lt;br/&gt;
&amp;nbsp;disk_vb_del (1024 total)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Avg         : (    3ms)&lt;br/&gt;
&lt;br/&gt;
If I&amp;#39;m adding all of these up (assuming that they&amp;#39;re all done in sequence) we end up at roughly 13 secs, but the command &amp;quot;time&amp;quot; report of the php process doing a single flush reports:&lt;br/&gt;
real 0m23.613s&lt;br/&gt;
user 0m0.016s&lt;br/&gt;
sys 0m0.012s&lt;br/&gt;
&lt;br/&gt;
So there is 10 secs not accounted for, but this seems to vary.. (starting php and doing a single set use 0m0.033s so it&amp;#39;s not the php runtime overhead)..&lt;br/&gt;
&lt;br/&gt;
I&amp;#39;m not entirely sure what the &amp;quot;disk_vbstate_snapshot&amp;quot; is used for but it accounts for roughly 9 of the 13 secs)&lt;br/&gt;
&lt;br/&gt;
Wen I disabled that function in ep-engine I got the example program above running those 10 flushes:&lt;br/&gt;
&lt;br/&gt;
Flushing: 10 Errors: 0&lt;br/&gt;
real 0m22.411s&lt;br/&gt;
user 0m0.020s&lt;br/&gt;
sys 0m0.012s&lt;br/&gt;
&lt;br/&gt;
Interestingly enough eam.smp is now down at ~100% CPU... For fun I tried to run the script with 500 flushes (to see if it changed over time), and we&amp;#39;re down to less than 18 minutes flushing (which isn&amp;#39;t that bad ;-))&lt;br/&gt;
&lt;br/&gt;
I know we added that call to disk_vbstate_snapshot for some reason, so just removing isn&amp;#39;t an alternative. I am however pretty sure we don&amp;#39;t need to snapshot them 1600+ times while we&amp;#39;re running a flush (I would have guessed that two would do the trick.. one when they are all disabled, one when they are all back up again). It is _ns-server_ and not ep-engine who needs to ensure that the clustermap is consistent after a potential crash during the process, and to be honest ns_server could just persist a flag saying that it is doing a flush before it starts, then nuke the flag when it receives the response that all of the vbuckets are dead if it want to protect itself from coming back up in a &amp;quot;hosed&amp;quot; configuration if a crash occurs during the flush. &lt;br/&gt;
&lt;br/&gt;
Personally I think we should change our code to let ns_server send a _SINGLE_ message to ep_engine listing _ALL_ of the vbuckets it want to shut down, and it should get a single return message when that is in place (and we could then have a _SINGLE_ snapshot vbucket when we&amp;#39;ve disabled all of them &amp;quot;atomically&amp;quot;) then a SINGLE enable message (returning in a  single snapshot_vbucket_state). &lt;br/&gt;
&lt;br/&gt;
Right now we&amp;#39;re also adding FLUSH markers to our tap connections. Given that ns_server is the organizer of the entire process it could just shutdown the TAP connections before running the flush command, and then restart the replication chains when its done. (the database should be empty anyway, and we wouldn&amp;#39;t have to add special flush logic to the tap streams).&lt;br/&gt;
&lt;br/&gt;
It would be interesting to know if all of this message passing and constant vbucket state change is part of its heavy CPU usage during the flush.</comment>
                </comments>
                <issuelinks>
                        <issuelinktype id="10001">
                <name>Duplicate</name>
                                <outwardlinks description="duplicates">
                            <issuelink>
            <issuekey id="19045">MB-6232</issuekey>
        </issuelink>
                    </outwardlinks>
                                            </issuelinktype>
                    </issuelinks>
                <attachments>
                    <attachment id="15773" name="test.rb" size="349" author="avsej" created="Mon, 12 Nov 2012 04:39:04 -0600" />
                    <attachment id="15772" name="test.txt" size="456" author="avsej" created="Mon, 12 Nov 2012 04:39:04 -0600" />
                </attachments>
            <subtasks>
        </subtasks>
                <customfields>
                                                                        <customfield id="customfield_10180" key="com.atlassian.jira.ext.charting:firstresponsedate">
                <customfieldname>Date of First Response</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>Mon, 12 Nov 2012 01:58:28 -0600</customfieldvalue>

                </customfieldvalues>
            </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10081" key="com.pyxis.greenhopper.jira:gh-global-rank">
                <customfieldname>Rank</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>3552</customfieldvalue>
                </customfieldvalues>
            </customfield>
                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>