[PCBC-140] Connection Socket Not Closing Created: 04/Nov/12  Updated: 07/Nov/12  Resolved: 07/Nov/12

Status: Resolved
Project: Couchbase PHP client library
Component/s: library
Affects Version/s: 1.1.0-dp5
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Jeff Minard Assignee: Mark Nunberg
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 5.5, Apache 2.2, mod_php, PHP5.3, Couchbase 2.0dp5


 Description   
(See discussion @ https://groups.google.com/forum/#!topic/couchbase/z5PWM5lS7gU)

I've been getting a "Too many files open" warning while accessing pages in my site. I tracked it down to this process using LSOF

    httpd 25992 httpduser 98u unix 0xffff81006cbcf4c0 0t0 6019192 socket
    httpd 25992 httpduser 99u unix 0xffff810074b160c0 0t0 6019193 socket
    httpd 25992 httpduser 100r 0000 0,11 0 6063613 eventpoll
    httpd 25992 httpduser 101u unix 0xffff8100701f66c0 0t0 6063614 socket
    httpd 25992 httpduser 102u unix 0xffff8100701f7a00 0t0 6063615 socket
    httpd 25992 httpduser 103r 0000 0,11 0 6064737 eventpoll
    httpd 25992 httpduser 104u unix 0xffff8100666eb6c0 0t0 6064738 socket
    httpd 25992 httpduser 105u unix 0xffff8100666eb980 0t0 6064739 socket
    httpd 25992 httpduser 108r 0000 0,11 0 6064793 eventpoll

Everytime I refresh a page, two new sockets and one new eventpoll shows up. These socket/eventpoll's only show up on pages that make use of Coucbase -- in fact, in the email thread I narrowed it down to a single command page, so I'm pretty sure it's CB here. These FD's stick around until the apache thread is restarted (due to MaxRequestsPerChild limit reached) or an httpd restart. Eventually, the number of FD's opened causes things to go very poorly for the server.

Utilizing persistent connections lessened the problem, but the number never went down/static, just increased more slowly.

 Comments   
Comment by Mark Nunberg [ 04/Nov/12 ]
I can confirm this is happening even on a CLI script. Run this script (php 5.4, debian wheezy), and lsof -p on the php process.

This seems to be an issue in libcouchbase itself and not in the php code.

Upon deeper analysis, the leaking fds seem to come from some odd socketpair call that libevent makes for no apparently good reason.

It would seem libcouchbase never properly destroys the 'event_base' object, and the socket pairs keep persisting until the application exits.

(gdb) bt
#0 socketpair () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007ffff3568181 in evsig_init () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#2 0x00007ffff35677c0 in ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#3 0x00007ffff3556645 in event_base_new_with_config () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#4 0x00007ffff3556765 in event_base_new () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
#5 0x00007ffff378b605 in lcb_create_libevent_io_opts ()
   from /sources/libcouchbase/inst/lib/libcouchbase_libevent.so.2
#6 0x00007ffff3bb9dcc in lcb_create_io_ops () from /sources/libcouchbase/inst/lib/libcouchbase.so.2
#7 0x00007ffff3dd463d in php_couchbase_create_impl (ht=<optimized out>, return_value=0x7ffff7fbf078,
    this_ptr=0x7ffff7fbd2e8, oo=1,
    return_value_used=<error reading variable: Unhandled dwarf expression opcode 0xfa>,
    return_value_ptr=<error reading variable: Unhandled dwarf expression opcode 0xfa>)
    at /sources/lcb2.0-repo/php/couchbase.c:1708
#8 0x00007ffff3ff2edc in xdebug_execute_internal (current_execute_data=0x7ffff7f86060, return_value_used=0)
    at /srv/debian_developer/xdebug/xdebug-2.2.1/build-php5/xdebug.c:1483
#9 0x000000000074688e in zend_do_fcall_common_helper_SPEC (execute_data=0x7ffff7f86060)
    at /tmp/buildd/php5-5.4.4/Zend/zend_vm_execute.h:644
#10 0x0000000000700297 in execute (op_array=0x7ffff7fbdbc8) at /tmp/buildd/php5-5.4.4/Zend/zend_vm_execute.h:410
#11 0x00007ffff3ff2a81 in xdebug_execute (op_array=0x7ffff7fbdbc8)
    at /srv/debian_developer/xdebug/xdebug-2.2.1/build-php5/xdebug.c:1391
#12 0x00000000006a00de in zend_execute_scripts (type=8, retval=0x7ffff7fbdb90, file_count=3)
    at /tmp/buildd/php5-5.4.4/Zend/zend.c:1279
#13 0x000000000063f6b3 in php_execute_script (primary_file=0x7fffffff9f50)
    at /tmp/buildd/php5-5.4.4/main/main.c:2473
#14 0x0000000000749003 in do_cli (argc=0, argv=0x7fffffffe40c) at /tmp/buildd/php5-5.4.4/sapi/cli/php_cli.c:988
#15 0x00000000004310ca in main (argc=32767, argv=0xdb9210) at /tmp/buildd/php5-5.4.4/sapi/cli/php_cli.c:1361
Comment by Mark Nunberg [ 04/Nov/12 ]
I was right about the problem, but not about the real cause.

The real cause is that the php extension creates the iops structure manually, and does not manually clean it up. Therefore the explicit creation of the iops structure causes a memory leak as well as a fd leak.

Ideally our extension code should never be dealing with the iops structure directly, which will be done in subsequent refactors.

This was not the case in the 1.0.x versions of libcouchbase where lcb would just assume the iops structure needed to be destroyed and freed it.
Comment by Mark Nunberg [ 04/Nov/12 ]
http://review.couchbase.org/22255

You can apply this patch manually if you'd like as a temporary workaround.
Comment by Mark Nunberg [ 04/Nov/12 ]
For the record, in output like this:

    httpd 25992 httpduser 105u unix 0xffff8100666eb980 0t0 6064739 socket

The fifth column indicates the underlying fd type. In this case, it's a unix socket.

If it were a TCP socket it'd say "IPv4" (or "IPv6") - and that gave me the hint, as libcouchbase on its own does not utilize unix domain sockets.
Comment by Matt Ingenthron [ 04/Nov/12 ]
Does a similar fix need to be made in 1.0.x? I peeked at the code briefly, but enough has changed that I couldn't quite tell.
Comment by Mark Nunberg [ 05/Nov/12 ]
No, this issue is specific to the lcb 2.0 iops refactor
Comment by Matt Ingenthron [ 05/Nov/12 ]
Good news, thanks!
Comment by Mark Nunberg [ 07/Nov/12 ]
I'm marking this as resolved/fixed. Reopen if not working.
Generated at Fri Aug 22 00:27:13 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.