[MB-5081] Stop/Start rebalance resulted in memcached crash on 181 (bucket_engine_release_cookie at bucket_engine.c:2388 __assert_fail ()) Created: 13/Apr/12  Updated: 18/Jun/13  Resolved: 17/May/12

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 1.8.1-release-candidate
Fix Version/s: 1.8.1, 2.0-beta
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Karan Kumar (Inactive) Assignee: Farshid Ghods (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File core-10.2.2.108-1.log    

 Description   
Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 6, Aborted.
#0 0x0000003171230265 in raise () from /lib64/libc.so.6

Thread 32 (Thread 0x2ae3fbdb2ca0 (LWP 30194)):
#0 0x00000031712d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002ae3fb931c28 in epoll_dispatch (base=0x10d1c000,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002ae3fb920a4c in event_base_loop (base=0x10d1c000, flags=0)
    at event.c:1558
#3 0x0000000000409df3 in main (argc=<value optimized out>,
    argv=<value optimized out>) at daemon/memcached.c:7586

Thread 31 (Thread 30203):
#0 0x00000031712c678b in read () from /lib64/libc.so.6
#1 0x000000317126cd57 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x000000317126d71e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x0000003171268fdb in getc () from /lib64/libc.so.6
#4 0x00002ae3fbdb4875 in check_stdin_thread (arg=0x403500)
    at extensions/daemon/stdin_check.c:19
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 30 (Thread 30204):
#0 0x000000317129a541 in nanosleep () from /lib64/libc.so.6
#1 0x000000317129a364 in sleep () from /lib64/libc.so.6
#2 0x0000000000415918 in check_isasl_db_thread (arg=<value optimized out>)
    at daemon/isasl.c:233
#3 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 29 (Thread 30205):
#0 0x00000031712d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002ae3fb931c28 in epoll_dispatch (base=0x10d1c500,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002ae3fb920a4c in event_base_loop (base=0x10d1c500, flags=0)
    at event.c:1558
#3 0x00000000004139a4 in worker_libevent (arg=0x10cf4500)
    at daemon/thread.c:305
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 28 (Thread 30206):
#0 0x0000003171e0d524 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003171e08e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x0000003171e08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaaaad40ac1 in acquire (this=0x13c7a000, it=0x2aaaaba71f00)
    at mutex.hh:70
#4 lock (this=0x13c7a000, it=0x2aaaaba71f00) at locks.hh:48
#5 LockHolder (this=0x13c7a000, it=0x2aaaaba71f00) at locks.hh:26
#6 notify (this=0x13c7a000, it=0x2aaaaba71f00) at tapconnmap.hh:191
#7 EventuallyPersistentEngine::addMutationEvent (this=0x13c7a000,
    it=0x2aaaaba71f00) at ep_engine.h:661
#8 0x00002aaaaad301b7 in EventuallyPersistentEngine::tapNotify(const void *, void *, uint16_t, uint8_t, uint16_t, ._107, uint32_t, const void *, size_t, uint32_t, uint32_t, uint64_t, const void *, size_t, uint16_t) (this=0x13c7a000,
    cookie=0x11cc1b88, engine_specific=<value optimized out>,
    nengine=<value optimized out>, tap_flags=<value optimized out>,
    tap_event=TAP_MUTATION, tap_seqno=1256, key=0x1e47f028, nkey=45, flags=0,
    exptime=0, data=0x1e47f055, ndata=2213, vbucket=22) at ep_engine.cc:2344
#9 0x00002aaaaaaad47f in bucket_tap_notify (handle=<value optimized out>,
    cookie=0x11cc1b88, engine_specific=0x1e47f020, nengine=0, ttl=254 '\376',
    tap_flags=0, tap_event=TAP_MUTATION, tap_seqno=1256, key=0x1e47f028,
    nkey=45, flags=0, exptime=0, cas=390443, data=0x1e47f055, ndata=2213,
    vbucket=22) at bucket_engine.c:1819
#10 0x00000000004068d9 in process_bin_tap_packet (event=<value optimized out>,
    c=0x11cc1b88) at daemon/memcached.c:2721
#11 0x00000000004110f5 in complete_nread_binary (c=0x13c7a5f0)
    at daemon/memcached.c:3407
#12 0x0000000000411d91 in complete_nread (c=0x11cc1b88)
    at daemon/memcached.c:3489
#13 conn_nread (c=0x11cc1b88) at daemon/memcached.c:5309
#14 0x0000000000407454 in event_handler (fd=<value optimized out>,
    which=<value optimized out>, arg=0x11cc1b88) at daemon/memcached.c:5618
#15 0x00002ae3fb920df9 in event_process_active_single_queue (base=0x10d1c280,
    flags=0) at event.c:1308
#16 event_process_active (base=0x10d1c280, flags=0) at event.c:1375
#17 event_base_loop (base=0x10d1c280, flags=0) at event.c:1572
#18 0x00000000004139a4 in worker_libevent (arg=0x10cf4600)
    at daemon/thread.c:305
#19 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#20 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 27 (Thread 30207):
#0 0x00000031712d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002ae3fb931c28 in epoll_dispatch (base=0x10d1cc80,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002ae3fb920a4c in event_base_loop (base=0x10d1cc80, flags=0)
    at event.c:1558
#3 0x00000000004139a4 in worker_libevent (arg=0x10cf4700)
    at daemon/thread.c:305
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 26 (Thread 30208):
#0 0x00000031712d48a8 in epoll_wait () from /lib64/libc.so.6
#1 0x00002ae3fb931c28 in epoll_dispatch (base=0x10d1ca00,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002ae3fb920a4c in event_base_loop (base=0x10d1ca00, flags=0)
    at event.c:1558
#3 0x00000000004139a4 in worker_libevent (arg=0x10cf4800)
    at daemon/thread.c:305
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 25 (Thread 30209):
#0 0x00002aaaaad60612 in TapProducer::windowIsFull (this=0x224c6700)
    at tapconnection.cc:402
#1 0x00002aaaaad45c34 in EventuallyPersistentEngine::doWalkTapQueue (
    this=0x13c7a000, cookie=0x1221a848, itm=0x44d107a8, es=0x44d107a0,
    nes=0x44d107bc, ttl=<value optimized out>, flags=0x44d107ba,
    seqno=0x44d107b4, vbucket=0x44d107b8, connection=0x224c6700,
    retry=@0x44d10677) at ep_engine.cc:1810
#2 0x00002aaaaad356b1 in EventuallyPersistentEngine::walkTapQueue (
    this=0x13c7a000, cookie=0x1221a848, itm=0x44d107a8, es=0x44d107a0,
    nes=0x44d107bc, ttl=0x44d107bf "\377", flags=0x44d107ba, seqno=0x44d107b4,
    vbucket=0x44d107b8) at ep_engine.cc:2062
#3 0x00002aaaaaaad336 in bucket_tap_iterator_shim (handle=0x2aaaaacb26c0,
    cookie=0x1221a848, itm=0x44d107a8, engine_specific=0x44d107a0,
    nengine_specific=0x44d107bc, ttl=0x44d107bf "\377", flags=0x44d107ba,
    seqno=0x44d107b4, vbucket=0x44d107b8) at bucket_engine.c:1848
#4 0x000000000040ffc2 in ship_tap_log (c=0x1221a848)
    at daemon/memcached.c:2395
#5 conn_ship_log (c=0x1221a848) at daemon/memcached.c:5156
#6 0x0000000000407446 in event_handler (fd=<value optimized out>,
    which=<value optimized out>, arg=0x1221a848) at daemon/memcached.c:5618
#7 0x00002ae3fb920df9 in event_process_active_single_queue (base=0x10d1c780,
    flags=0) at event.c:1308
#8 event_process_active (base=0x10d1c780, flags=0) at event.c:1375
#9 event_base_loop (base=0x10d1c780, flags=0) at event.c:1572
#10 0x00000000004139a4 in worker_libevent (arg=0x10cf4900)
    at daemon/thread.c:305
#11 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#12 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 24 (Thread 30471):
#0 0x0000003171e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad07ad8 in wait (this=0x10d368c0) at syncobject.hh:31
#2 Dispatcher::run (this=0x10d368c0) at dispatcher.cc:89
#3 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x10d36914)
    at dispatcher.cc:28
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 23 (Thread 30472):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x28c02400, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x28c02400, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x10d36fc0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x10d37014)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 22 (Thread 30473):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x112f3900, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x112f3900, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x10d36c40)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x10d36c94)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 21 (Thread 30474):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad2706e in wait (this=0x3c081000) at syncobject.hh:42
#2 wait (this=0x3c081000) at tapconnmap.hh:199
#3 EventuallyPersistentEngine::notifyPendingConnections (this=0x3c081000)
    at ep_engine.cc:3708
#4 0x00002aaaaad271d1 in EvpNotifyPendingConns (arg=0x3c081000)
    at ep_engine.cc:971
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 20 (Thread 30475):
#0 0x0000003171e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad07ad8 in wait (this=0x11544700) at syncobject.hh:31
#2 Dispatcher::run (this=0x11544700) at dispatcher.cc:89
#3 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x11544754)
    at dispatcher.cc:28
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 19 (Thread 30476):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x28c03480, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x28c03480, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x11545180)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x115451d4)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 18 (Thread 30477):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x28c02a80, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x28c02a80, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x115448c0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x11544914)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 17 (Thread 30478):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad2706e in wait (this=0x13c7b000) at syncobject.hh:42
#2 wait (this=0x13c7b000) at tapconnmap.hh:199
#3 EventuallyPersistentEngine::notifyPendingConnections (this=0x13c7b000)
    at ep_engine.cc:3708
#4 0x00002aaaaad271d1 in EvpNotifyPendingConns (arg=0x13c7b000)
    at ep_engine.cc:971
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 16 (Thread 30479):
#0 0x0000003171e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad07ad8 in wait (this=0x11544a80) at syncobject.hh:31
#2 Dispatcher::run (this=0x11544a80) at dispatcher.cc:89
#3 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x11544ad4)
    at dispatcher.cc:28
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 15 (Thread 30480):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x28c02500, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x28c02500, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x11545500)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x11545554)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 14 (Thread 30481):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x28c03180, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x28c03180, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x11544e00)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x11544e54)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 13 (Thread 30482):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad2706e in wait (this=0x2aaaad651000) at syncobject.hh:42
#2 wait (this=0x2aaaad651000) at tapconnmap.hh:199
#3 EventuallyPersistentEngine::notifyPendingConnections (this=0x2aaaad651000)
    at ep_engine.cc:3708
#4 0x00002aaaaad271d1 in EvpNotifyPendingConns (arg=0x2aaaad651000)
    at ep_engine.cc:971
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 12 (Thread 30483):
#0 0x0000003171e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad07ad8 in wait (this=0x19a8ea80) at syncobject.hh:31
#2 Dispatcher::run (this=0x19a8ea80) at dispatcher.cc:89
#3 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x19a8ead4)
    at dispatcher.cc:28
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 11 (Thread 30484):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x3034c400, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x3034c400, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x11545dc0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x11545e14)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 10 (Thread 30485):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x3034db80, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x3034db80, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x19a8e8c0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x19a8e914)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 9 (Thread 30486):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad2706e in wait (this=0x2aaaad64e000) at syncobject.hh:42
#2 wait (this=0x2aaaad64e000) at tapconnmap.hh:199
#3 EventuallyPersistentEngine::notifyPendingConnections (this=0x2aaaad64e000)
    at ep_engine.cc:3708
#4 0x00002aaaaad271d1 in EvpNotifyPendingConns (arg=0x2aaaad64e000)
    at ep_engine.cc:971
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 8 (Thread 30487):
#0 0x0000003171e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad07ad8 in wait (this=0x10d361c0) at syncobject.hh:31
#2 Dispatcher::run (this=0x10d361c0) at dispatcher.cc:89
#3 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x10d36214)
    at dispatcher.cc:28
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 7 (Thread 30488):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x112f2a80, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x112f2a80, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x10d376c0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x10d37714)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 6 (Thread 30489):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x112f3680, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x112f3680, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x19a8f6c0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x19a8f714)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 5 (Thread 30490):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad2706e in wait (this=0x3c080000) at syncobject.hh:42
#2 wait (this=0x3c080000) at tapconnmap.hh:199
#3 EventuallyPersistentEngine::notifyPendingConnections (this=0x3c080000)
    at ep_engine.cc:3708
#4 0x00002aaaaad271d1 in EvpNotifyPendingConns (arg=0x3c080000)
    at ep_engine.cc:971
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 4 (Thread 30491):
#0 0x0000003171e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad07ad8 in wait (this=0x19a8fc00) at syncobject.hh:31
#2 Dispatcher::run (this=0x19a8fc00) at dispatcher.cc:89
#3 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x19a8fc54)
    at dispatcher.cc:28
#4 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 3 (Thread 30492):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x32443880, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x32443880, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x19a8fdc0)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x19a8fe14)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 2 (Thread 30493):
#0 0x0000003171e0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaad059b0 in wait (this=0x32442b80, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x32442b80, d=...) at dispatcher.cc:286
#3 0x00002aaaaad07ca6 in Dispatcher::run (this=0x19a8fa40)
    at dispatcher.cc:123
#4 0x00002aaaaad0854b in launch_dispatcher_thread (arg=0x19a8fa94)
    at dispatcher.cc:28
#5 0x0000003171e0673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000031712d44bd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x53d29940 (LWP 30494)):
#0 0x0000003171230265 in raise () from /lib64/libc.so.6
#1 0x0000003171231d10 in abort () from /lib64/libc.so.6
#2 0x00000031712296e6 in __assert_fail () from /lib64/libc.so.6
#3 0x00002aaaaaaae280 in bucket_engine_release_cookie (cookie=0x11a04848)
    at bucket_engine.c:2388
#4 0x00002aaaaad606b9 in TapConnection::releaseReference (this=0x224c6a80,
    force=30) at tapconnection.cc:38
#5 0x00002aaaaad73929 in TapConnMap::notifyIOThreadMain (this=0x13c7a5e8)
    at tapconnmap.cc:495
#6 0x00002aaaaad26f99 in EventuallyPersistentEngine::notifyPendingConnections
    (this=0x13c7a000) at ep_engine.cc:3701
#7 0x00002aaaaad271d1 in EvpNotifyPendingConns (arg=0x13c7a000)
    at ep_engine.cc:971
#8 0x0000003171e0673d in start_thread () from /lib64/libpt

 Comments   
Comment by Chiyoung Seo [ 19/Apr/12 ]
Trond,

I've looked at the ep-engine and bucket-engine, but it seems to me that this crash could be more related to bucket-engine. Please take a look at it when you have time.
Comment by Trond Norbye [ 24/Apr/12 ]
This look like the race conditions we've seen a lot of times in bucket engine. Dustin rewrote that logic for 1.8.2 and 2.0...

Frank: what to do here?
Comment by Keith Batten (Inactive) [ 25/Apr/12 ]
seen again while doing 4 node dgm rebalance tests
Comment by Trond Norbye [ 25/Apr/12 ]
THis is a race condition from a disconnect from the upstream where the stuff is disconnecting from the downstream... I'm currently adding more tests to bucket engine to try to isolate the problem (if it's bucket engine or ep-engine doing stuff wrong)..
Comment by Trond Norbye [ 25/Apr/12 ]
There are a number of bugs currently open due to a race condition caused during abnormal tap shutdown. see http://www.couchbase.com/issues/browse/CBD-82
Comment by Thuan Nguyen [ 28/Apr/12 ]
Integrated in github-bucket-engine #49 (See [http://qa.hq.northscale.net/job/github-bucket-engine/49/])
    MB-5081 Correct the assertion predicate on reserve_cookie() (Revision 67baa91ee58fc5a302544be9a9f83132d8de664d)

     Result = SUCCESS
Chiyoung Seo :
Files :
* bucket_engine.c
Comment by Tommie McAfee [ 30/Apr/12 ]
Hi Trond, this fix is also needed for 2.0, latest
Comment by Chiyoung Seo [ 30/Apr/12 ]
We already merged that fix into the bucket engine's master branch, but looks like we still have the same crash issue in 2.0.
Comment by Tommie McAfee [ 01/May/12 ]
attached is the stack trace from 2.0 (core-10.2.2.108-1.log)
Comment by Karan Kumar (Inactive) [ 07/May/12 ]
Seeing the crash still on 181-802-rel. (Not easily).

Seems to the exact stack trace.

Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 6, Aborted.
#0 0x00007f21c4cbfa75 in raise () from /lib/libc.so.6
(gdb) t a a bt

Thread 13 (Thread 3986):
#0 0x00007f21c4d72d03 in epoll_wait () from /lib/libc.so.6
#1 0x00007f21c5b285a6 in epoll_dispatch (base=0xf30500, tv=<value optimized out>) at epoll.c:404
#2 0x00007f21c5b17171 in event_base_loop (base=0xf30500, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414b04 in worker_libevent (arg=0xf08500) at daemon/thread.c:305
#4 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 12 (Thread 3984):
#0 0x00007f21c4d644bd in read () from /lib/libc.so.6
#1 0x00007f21c4cff348 in _IO_file_underflow () from /lib/libc.so.6
#2 0x00007f21c4d00eee in _IO_default_uflow () from /lib/libc.so.6
#3 0x00007f21c4cf7c7b in getc () from /lib/libc.so.6
#4 0x00007f21c455f9c9 in check_stdin_thread (arg=0x4039f0) at extensions/daemon/stdin_check.c:19
#5 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#6 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#7 0x0000000000000000 in ?? ()

Thread 11 (Thread 3994):
#0 0x00007f21c501abc9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00007f21c07fd9ca in SyncObject::wait (this=0xf0ea00, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0xf0ea00, d=...) at dispatcher.cc:286
#3 0x00007f21c0800d01 in Dispatcher::run (this=0xf4ca80) at dispatcher.cc:123
#4 0x00007f21c080145b in launch_dispatcher_thread (arg=0xf4cad4) at dispatcher.cc:28
#5 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#6 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#7 0x0000000000000000 in ?? ()

Thread 10 (Thread 3989):
#0 0x00007f21c4d72d03 in epoll_wait () from /lib/libc.so.6
#1 0x00007f21c5b285a6 in epoll_dispatch (base=0xf30a00, tv=<value optimized out>) at epoll.c:404
#2 0x00007f21c5b17171 in event_base_loop (base=0xf30a00, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414b04 in worker_libevent (arg=0xf08800) at daemon/thread.c:305
#4 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 9 (Thread 3988):
#0 0x00007f21c4d72d03 in epoll_wait () from /lib/libc.so.6
#1 0x00007f21c5b285a6 in epoll_dispatch (base=0xf30c80, tv=<value optimized out>) at epoll.c:404
#2 0x00007f21c5b17171 in event_base_loop (base=0xf30c80, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414b04 in worker_libevent (arg=0xf08700) at daemon/thread.c:305
#4 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 8 (Thread 3975):
#0 0x00007f21c4d72d03 in epoll_wait () from /lib/libc.so.6
#1 0x00007f21c5b285a6 in epoll_dispatch (base=0xf30000, tv=<value optimized out>) at epoll.c:404
#2 0x00007f21c5b17171 in event_base_loop (base=0xf30000, flags=<value optimized out>) at event.c:1558
#3 0x0000000000409468 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7584

Thread 7 (Thread 3993):
#0 0x00007f21c501abc9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00007f21c07fd9ca in SyncObject::wait (this=0xf0e380, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0xf0e380, d=...) at dispatcher.cc:286
#3 0x00007f21c0800d01 in Dispatcher::run (this=0xf4c540) at dispatcher.cc:123
#4 0x00007f21c080145b in launch_dispatcher_thread (arg=0xf4c594) at dispatcher.cc:28
#5 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#6 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#7 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 6 (Thread 3990):
#0 TapProducer::processAck (this=0x102f800, s=1000, status=0, msg=...) at tapconnection.cc:732
#1 0x00007f21c081c44a in EventuallyPersistentEngine::tapNotify (this=0xf9a000, cookie=0x1273348, engine_specific=<value optimized out>, nengine=<value optimized out>, tap_flags=57215, tap_event=<value optimized out>, tap_seqno=1000,
    key=0x523a018, nkey=0, flags=0, exptime=0, data=0x0, ndata=0, vbucket=0) at ep_engine.cc:2341
#2 0x00007f21c081cf38 in EvpTapNotify (handle=0xf9a000, cookie=0x1273348, engine_specific=0x0, nengine=57215, ttl=0 '\000', tap_flags=0, tap_event=TAP_ACK, tap_seqno=1000, key=0x523a018, nkey=0, flags=0, exptime=0, cas=0, data=0x0,
    ndata=0, vbucket=<value optimized out>) at ep_engine.cc:961
#3 0x00007f21c335652b in bucket_tap_notify (handle=<value optimized out>, cookie=0x1273348, engine_specific=0x0, nengine=57215, ttl=0 '\000', tap_flags=0, tap_event=TAP_ACK, tap_seqno=1000, key=0x523a018, nkey=0, flags=0, exptime=0,
    cas=0, data=0x0, ndata=0, vbucket=<value optimized out>) at bucket_engine.c:1825
#4 0x000000000040431d in process_bin_tap_ack (c=0x1273348) at daemon/memcached.c:2761
#5 0x000000000040e9cc in complete_nread_binary (c=0x1) at daemon/memcached.c:3399
#6 0x000000000040f904 in complete_nread (c=0x1273348) at daemon/memcached.c:3489
#7 conn_nread (c=0x1273348) at daemon/memcached.c:5309
#8 0x0000000000406b94 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x1273348) at daemon/memcached.c:5618
#9 0x00007f21c5b1726c in event_process_active_single_queue (base=0xf30780, flags=<value optimized out>) at event.c:1308
#10 event_process_active (base=0xf30780, flags=<value optimized out>) at event.c:1375
#11 event_base_loop (base=0xf30780, flags=<value optimized out>) at event.c:1572
#12 0x0000000000414b04 in worker_libevent (arg=0xf08900) at daemon/thread.c:305
#13 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#14 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()

Thread 5 (Thread 3991):
#0 0x00007f21c4d3639d in nanosleep () from /lib/libc.so.6
#1 0x00007f21c4d6b844 in usleep () from /lib/libc.so.6
#2 0x00007f21c0840f95 in updateStatsThread (arg=<value optimized out>) at memory_tracker.cc:31
#3 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#4 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 4 (Thread 3992):
#0 0x00007f21c501a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00007f21c0800f39 in SyncObject::wait (this=0xf4c380) at syncobject.hh:31
#2 Dispatcher::run (this=0xf4c380) at dispatcher.cc:89
#3 0x00007f21c080145b in launch_dispatcher_thread (arg=0xf4c3d4) at dispatcher.cc:28
#4 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 3 (Thread 3987):
#0 0x00007f21c4d72d03 in epoll_wait () from /lib/libc.so.6
#1 0x00007f21c5b285a6 in epoll_dispatch (base=0xf30280, tv=<value optimized out>) at epoll.c:404
#2 0x00007f21c5b17171 in event_base_loop (base=0xf30280, flags=<value optimized out>) at event.c:1558
#3 0x0000000000414b04 in worker_libevent (arg=0xf08600) at daemon/thread.c:305
#4 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#5 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

Thread 2 (Thread 3985):
#0 0x00007f21c4d3639d in nanosleep () from /lib/libc.so.6
#1 0x00007f21c4d36210 in sleep () from /lib/libc.so.6
#2 0x0000000000416ce8 in check_isasl_db_thread (arg=<value optimized out>) at daemon/isasl.c:233
#3 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#4 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()

Thread 1 (Thread 3995):
#0 0x00007f21c4cbfa75 in raise () from /lib/libc.so.6
#1 0x00007f21c4cc35c0 in abort () from /lib/libc.so.6
#2 0x00007f21c4cb8941 in __assert_fail () from /lib/libc.so.6
#3 0x00007f21c33585af in do_bucket_engine_release_cookie (cookie=0x55a38c8) at bucket_engine.c:2393
#4 bucket_engine_release_cookie (cookie=0x55a38c8) at bucket_engine.c:2436
#5 0x00007f21c0862506 in TapConnMap::notifyIOThreadMain (this=<value optimized out>) at tapconnmap.cc:502
---Type <return> to continue, or q <return> to quit---
#6 0x00007f21c081b69f in EventuallyPersistentEngine::notifyPendingConnections (this=0xf9a000) at ep_engine.cc:3799
#7 0x00007f21c081b823 in EvpNotifyPendingConns (arg=0xf9a000) at ep_engine.cc:1046
#8 0x00007f21c50159ca in start_thread () from /lib/libpthread.so.0
#9 0x00007f21c4d7270d in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()
Comment by Chiyoung Seo [ 07/May/12 ]
Trond,

We still have a crash in bucket-engine in 1.8.1. Karan attached the backtrace.

If you have some time today, can you please look at it? I will then take over it in the tomorrow morning in PST.
Comment by Trond Norbye [ 07/May/12 ]
I've been sick today, but I have some thoughts about why this is happening. lets discuss tonight
Comment by Keith Batten (Inactive) [ 09/May/12 ]
I was able to reproduce this with cluster_run testrunner on my mac.

make any-test NODES=4 TEST="rebalancetests.IncrementalRebalanceInTests.test_load,replica=2,do-stop=True,delete-ratio=0.6,expiry-ratio=0.2,num_nodes=4"
Comment by Trond Norbye [ 09/May/12 ]
There is no point of updating the bug with more information. I've got what I need and updating / creating new bug reports cause an interrupt for me having to check what it is instead of finishing up the patch.
Comment by Tommie McAfee [ 09/May/12 ]
Thanks Trond. There is a code freeze in 2.0 with some major commits pending until we can prove current code base is stable. So this fix is high priority for us to give greenlight to dev,

Comment by Farshid Ghods (Inactive) [ 09/May/12 ]
@Trond,
 
we update the ticket with more information for QE internal use so that we remember which test caused this crash for verifying after the fix is merged :)
Comment by Trond Norbye [ 14/May/12 ]
Ok, I'm moving the bug under your supervision then.

Cheers,

Trond
Comment by Chiyoung Seo [ 17/May/12 ]
http://review.couchbase.org/#change,15915
Generated at Thu Jul 24 11:13:23 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.