[MB-4896] moxi mem leak when using haproxy roundrobin load-balancing Created: 14/Mar/12  Updated: 31/Jan/14  Resolved: 14/Mar/12

Status: Resolved
Project: Couchbase Server
Component/s: moxi
Affects Version/s: 1.6.5, 1.7.2
Fix Version/s: 2.0-beta
Security Level: Public

Type: Bug Priority: Critical
Reporter: Steve Yen Assignee: Steve Yen
Resolution: Fixed Votes: 0
Labels: 1.7.0-release-notes, 1.7.1-release-notes, 1.7.2-release-notes, 1.8.0-release-notes, 1.8.1-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Can't find the exact bug for this (other bugs also mention moxi memory leak, and might be the same, but they don't mention haproxy).

Reproduced this customer reported issue, and there's a quick config workaround that can slow the leak.

More info:

After spinning up a 20 node cluster, with haproxy, valgrind, and a special debug build of moxi, using a configuration similar to XXX's, I was able to reproduce a significant memory leak in moxi. It occurs during topology changes, or, when moxi *thinks* there's a cluster topology change. Other customers probably never noticed, since topology changes are usually infrequent.

Additionally, XXX's use of haproxy, in roundrobin load-balancing configuration significantly exacerbated the bug/leak in moxi. (I recall Tim had another report of a moxi mem leak from another customer. Perhaps they're also using haproxy?)

Here's XXX's haproxy configuration...

    log local2
    chroot /var/lib/haproxy
    pidfile /var/run/haproxy.pid
    maxconn 4000
    user haproxy
    group haproxy

    mode http
    log global
    option dontlognull
    option httpclose
    option httplog
    option forwardfor
    timeout connect 10000
    timeout client 300000
    timeout server 300000
    maxconn 60000
    retries 3
    stats enable
    stats uri /haproxy-status
    stats refresh 5s

frontend moxi *:8092
    default_backend moxi

backend moxi
    balance roundrobin
    server node1 check
    server node2 check
    server node3 check
    server node4 check
    server node5 check
    server node6 check
    server node7 check
    server node8 check
    server node9 check
    server node10 check
    server node11 check
    server node12 check
    server node13 check
    server node14 check
    server node15 check
    server node16 check
    server node17 check
    server node18 check

The workaround to reduce the leak includes...

= change from haproxy's 'balance roundrobin' to some other load balancing choice.

For example, when I instead used 'balance source' instead of 'balance roundrobin' in my haproxy configuration, the leak went away. (Caveat: it went away until I did an actual real topology change.)

The underlying issue is moxi's doing a simple string comparison to decide whether the topology has changed. And, every node in a cluster gives a slightly different answer as to the topology. When moxi thinks the topology has changed, moxi will tear-down its data structures and dynamically reconfigure, and there's a leak there somewhere.

Normally, moxi, expects its HTTP/REST connection to be very long lived. However, when haproxy's in the middle, the haproxy might decide to timeout a HTTP connection that's still open but hasn't been doing anything. (e.g, the HTTP/REST connection hasn't been doing anything because there's no topology change). This leads to the second haproxy config workaround suggestion...

= increase haproxy's timeouts

XXX's currently using 5 minute timeouts (in millisecs)...

      timeout client 300000
      timeout server 300000

So, every 5 minutes, haproxy times out the connection and closes it. moxi sees the closed HTTP/REST connection and tries again. haproxy will choose the next server node on its list (since haproxy is in 'balance roundrobin' configuration). That next server node will return a slightly different topology answer. Then moxi (because it's doing simple string comparison) will inadvertently think the topology configuration has changed (when it actually hasn't), exposing the leak.

This was with haproxy 1.4.20.

Comment by Steve Yen [ 14/Mar/12 ]
Comment by Steve Yen [ 14/Mar/12 ]
Also, the valgrind memleak report was...

==23125== 280,896 bytes in 4 blocks are definitely lost in loss record 172 of 173
==23125== at 0x4C25A28: calloc (vg_replace_malloc.c:467)
==23125== by 0x424855: cproxy_copy_behaviors (cproxy_config.c:780)
==23125== by 0x41C326: cproxy_create_downstream (cproxy.c:1270)
==23125== by 0x41B22A: cproxy_add_downstream (cproxy.c:882)
==23125== by 0x41B304: cproxy_reserve_downstream (cproxy.c:908)
==23125== by 0x41DB61: cproxy_assign_downstream (cproxy.c:1784)
==23125== by 0x41EDBF: cproxy_pause_upstream_for_downstream (cproxy.c:2222)
==23125== by 0x43098A: cproxy_process_upstream_binary_nread (cproxy_protocol_b.c:188)
==23125== by 0x40911C: complete_nread (memcached.c:1990)
==23125== by 0x40F1E2: drive_machine (memcached.c:3576)
==23125== by 0x40FD4C: event_handler (memcached.c:3799)
==23125== by 0x4E3DDF8: event_base_loop (in /opt/couchbase/lib/libevent-2.0.so.5.1.0)
Comment by Farshid Ghods (Inactive) [ 14/Mar/12 ]
We need to update our 1.7 and 1.8 release notes with this information
Generated at Sun Sep 21 05:20:53 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.