<!-- 
RSS generated by JIRA (5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9) at Tue May 21 21:29:51 CDT 2013

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary add field=key&field=summary to the URL of your request.
For example:
http://www.couchbase.com/issues/si/jira.issueviews:issue-xml/MB-4896/MB-4896.xml?field=key&field=summary
-->
<rss version="0.92" >
<channel>
    <title>Couchbase</title>
    <link>http://www.couchbase.com/issues</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>5.2.4</version>
        <build-number>845</build-number>
        <build-date>26-12-2012</build-date>
    </build-info>

<item>
            <title>[MB-4896] moxi mem leak when using haproxy roundrobin load-balancing</title>
                <link>http://www.couchbase.com/issues/browse/MB-4896</link>
                <project id="10010" key="MB">Couchbase Server</project>
                        <description>Can&amp;#39;t find the exact bug for this (other bugs also mention moxi memory leak, and might be the same, but they don&amp;#39;t mention haproxy).&lt;br/&gt;
&lt;br/&gt;
Reproduced this customer reported issue, and there&amp;#39;s a quick config workaround that can slow the leak.&lt;br/&gt;
&lt;br/&gt;
More info:&lt;br/&gt;
&lt;br/&gt;
After spinning up a 20 node cluster, with haproxy, valgrind, and a special debug build of moxi, using a configuration similar to XXX&amp;#39;s, I was able to reproduce a significant memory leak in moxi.  It occurs during topology changes, or, when moxi *thinks* there&amp;#39;s a cluster topology change.  Other customers probably never noticed, since topology changes are usually infrequent.&lt;br/&gt;
&lt;br/&gt;
Additionally, XXX&amp;#39;s use of haproxy, in roundrobin load-balancing configuration significantly exacerbated the bug/leak in moxi.  (I recall Tim had another report of a moxi mem leak from another customer.  Perhaps they&amp;#39;re also using haproxy?)&lt;br/&gt;
&lt;br/&gt;
Here&amp;#39;s XXX&amp;#39;s haproxy configuration...&lt;br/&gt;
&lt;br/&gt;
-----------------&lt;br/&gt;
global&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;log         127.0.0.1 local2 &lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;chroot      /var/lib/haproxy&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;pidfile     /var/run/haproxy.pid&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;maxconn     4000&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;user        haproxy&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;group       haproxy&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;daemon&lt;br/&gt;
&lt;br/&gt;
defaults&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mode        http&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;log         global&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;option      dontlognull&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;option      httpclose&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;option      httplog&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;option      forwardfor&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;timeout connect 10000&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;timeout client 300000&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;timeout server 300000&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;maxconn     60000&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;retries     3&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stats enable&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stats uri /haproxy-status&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stats refresh 5s&lt;br/&gt;
&lt;br/&gt;
frontend  moxi *:8092&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;default_backend             moxi&lt;br/&gt;
&lt;br/&gt;
backend moxi&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;balance roundrobin&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node1  10.80.68.152:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node2  10.80.68.178:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node3  10.80.68.146:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node4  10.80.68.166:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node5  10.80.68.154:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node6  10.80.68.158:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node7  10.80.68.156:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node8  10.80.68.160:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node9  10.80.68.162:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node10 10.80.68.144:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node11 10.80.68.170:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node12 10.80.68.174:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node13 10.80.68.164:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node14 10.80.68.168:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node15 10.80.68.150:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node16 10.80.68.148:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node17 10.80.68.176:8091 check&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server node18 10.80.68.172:8091 check&lt;br/&gt;
-----------------&lt;br/&gt;
&lt;br/&gt;
The workaround to reduce the leak includes...&lt;br/&gt;
&lt;br/&gt;
= change from haproxy&amp;#39;s &amp;#39;balance roundrobin&amp;#39; to some other load balancing choice.&lt;br/&gt;
&lt;br/&gt;
For example, when I instead used &amp;#39;balance source&amp;#39; instead of &amp;#39;balance roundrobin&amp;#39; in my haproxy configuration, the leak went away.  (Caveat: it went away until I did an actual real topology change.)&lt;br/&gt;
&lt;br/&gt;
The underlying issue is moxi&amp;#39;s doing a simple string comparison to decide whether the topology has changed.  And, every node in a cluster gives a slightly different answer as to the topology.  When moxi thinks the topology has changed, moxi will tear-down its data structures and dynamically reconfigure, and there&amp;#39;s a leak there somewhere.&lt;br/&gt;
&lt;br/&gt;
Normally, moxi, expects its HTTP/REST connection to be very long lived.  However, when haproxy&amp;#39;s in the middle, the haproxy might decide to timeout a HTTP connection that&amp;#39;s still open but hasn&amp;#39;t been doing anything.  (e.g, the HTTP/REST connection hasn&amp;#39;t been doing anything because there&amp;#39;s no topology change).  This leads to the second haproxy config workaround suggestion...&lt;br/&gt;
&lt;br/&gt;
= increase haproxy&amp;#39;s timeouts &lt;br/&gt;
&lt;br/&gt;
XXX&amp;#39;s currently using 5 minute timeouts (in millisecs)...&lt;br/&gt;
&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;timeout client 300000&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;timeout server 300000&lt;br/&gt;
&lt;br/&gt;
So, every 5 minutes, haproxy times out the connection and closes it.  moxi sees the closed HTTP/REST connection and tries again.  haproxy will choose the next server node on its list (since haproxy is in &amp;#39;balance roundrobin&amp;#39; configuration).  That next server node will return a slightly different topology answer.  Then moxi (because it&amp;#39;s doing simple string comparison) will inadvertently think the topology configuration has changed (when it actually hasn&amp;#39;t), exposing the leak.&lt;br/&gt;
&lt;br/&gt;
This was with haproxy 1.4.20.&lt;br/&gt;
&lt;br/&gt;
</description>
                <environment></environment>
            <key id="16302">MB-4896</key>
            <summary>moxi mem leak when using haproxy roundrobin load-balancing</summary>
                <type id="1" iconUrl="http://www.couchbase.com/issues/images/icons/issuetypes/bug.png">Bug</type>
                                <priority id="2" iconUrl="http://www.couchbase.com/issues/images/icons/priorities/critical.png">Critical</priority>
                    <status id="5" iconUrl="http://www.couchbase.com/issues/images/icons/statuses/resolved.png">Resolved</status>
                    <resolution id="1">Fixed</resolution>
                    <security id="10011">Public</security>
                        <assignee username="steve">Steve Yen</assignee>
                                <reporter username="steve">Steve Yen</reporter>
                        <labels>
                        <label>1.7.0-release-notes</label>
                        <label>1.7.1-release-notes</label>
                        <label>1.7.2-release-notes</label>
                        <label>1.8.0-release-notes</label>
                        <label>1.8.1-release-notes</label>
                    </labels>
                <created>Wed, 14 Mar 2012 19:31:03 -0500</created>
                <updated>Sun, 13 May 2012 18:39:08 -0500</updated>
                    <resolved>Wed, 14 Mar 2012 19:36:31 -0500</resolved>
                            <version>1.6.5</version>
                <version>1.7.2</version>
                                <fixVersion>1.8.1</fixVersion>
                <fixVersion>1.8.2</fixVersion>
                                <component>moxi</component>
                                <votes>0</votes>
                        <watches>0</watches>
                                                    <comments>
                    <comment id="24916" author="steve" created="Wed, 14 Mar 2012 19:36:31 -0500"  >&lt;a href=&quot;http://review.couchbase.org/#change,13939&quot;&gt;http://review.couchbase.org/#change,13939&lt;/a&gt;</comment>
                    <comment id="24917" author="steve" created="Wed, 14 Mar 2012 19:38:29 -0500"  >Also, the valgrind memleak report was...&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;
==23125== 280,896 bytes in 4 blocks are definitely lost in loss record 172 of 173&lt;br/&gt;
==23125==    at 0x4C25A28: calloc (vg_replace_malloc.c:467)&lt;br/&gt;
==23125==    by 0x424855: cproxy_copy_behaviors (cproxy_config.c:780)&lt;br/&gt;
==23125==    by 0x41C326: cproxy_create_downstream (cproxy.c:1270)&lt;br/&gt;
==23125==    by 0x41B22A: cproxy_add_downstream (cproxy.c:882)&lt;br/&gt;
==23125==    by 0x41B304: cproxy_reserve_downstream (cproxy.c:908)&lt;br/&gt;
==23125==    by 0x41DB61: cproxy_assign_downstream (cproxy.c:1784)&lt;br/&gt;
==23125==    by 0x41EDBF: cproxy_pause_upstream_for_downstream (cproxy.c:2222)&lt;br/&gt;
==23125==    by 0x43098A: cproxy_process_upstream_binary_nread (cproxy_protocol_b.c:188)&lt;br/&gt;
==23125==    by 0x40911C: complete_nread (memcached.c:1990)&lt;br/&gt;
==23125==    by 0x40F1E2: drive_machine (memcached.c:3576)&lt;br/&gt;
==23125==    by 0x40FD4C: event_handler (memcached.c:3799)&lt;br/&gt;
==23125==    by 0x4E3DDF8: event_base_loop (in /opt/couchbase/lib/libevent-2.0.so.5.1.0)&lt;br/&gt;
</comment>
                    <comment id="24919" author="farshid" created="Wed, 14 Mar 2012 21:13:43 -0500"  >We need to update our 1.7 and 1.8 release notes with this information</comment>
                </comments>
                    <attachments>
                </attachments>
            <subtasks>
        </subtasks>
                <customfields>
                                                                        <customfield id="customfield_10180" key="com.atlassian.jira.ext.charting:firstresponsedate">
                <customfieldname>Date of First Response</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>Wed, 14 Mar 2012 21:13:43 -0500</customfieldvalue>

                </customfieldvalues>
            </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10081" key="com.pyxis.greenhopper.jira:gh-global-rank">
                <customfieldname>Rank</customfieldname>
                <customfieldvalues>
                    <customfieldvalue>5570</customfieldvalue>
                </customfieldvalues>
            </customfield>
                                                                                                                                                                                        <customfield id="customfield_10181" key="com.atlassian.jira.ext.charting:timeinstatus">
                <customfieldname>Time In Status</customfieldname>
                <customfieldvalues>
                    
                </customfieldvalues>
            </customfield>
                                                </customfields>
    </item>
</channel>
</rss>