[MB-6676] crash with couchstore in CouchKVStore::setVBucketState at src/couch-kvstore/couch-kvstore.cc:805 Created: 17/Sep/12  Updated: 26/Oct/12  Resolved: 19/Sep/12

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.0-beta
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Matt Ingenthron Assignee: Chiyoung Seo
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: CentOS 5.8 x64 build 1723


 Description   
Found a core hanging out in my install.


# gdb ../../../bin/memcached core.3168
GNU gdb (GDB) CentOS (7.0.1-42.el5.centos)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/couchbase/bin/memcached...done.
[New Thread 3225]
[New Thread 3230]
[New Thread 3228]
[New Thread 3227]
[New Thread 3226]
[New Thread 3192]
[New Thread 3191]
[New Thread 3190]
[New Thread 3189]
[New Thread 3188]
[New Thread 3187]
[New Thread 3183]
[New Thread 3182]
[New Thread 3181]
[New Thread 3180]
[New Thread 3179]
[New Thread 3178]
[New Thread 3177]
[New Thread 3168]
Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done.
Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done.
Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
Loaded symbols for /opt/couchbase/lib/memcached/ep.so
Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff6639b000
Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
Program terminated with signal 6, Aborted.
#0 0x0000003d16a30285 in raise () from /lib64/libc.so.6
(gdb) thr bt a a
No symbol "bt" in current context.
(gdb) thr a a bt

Thread 19 (Thread 0x2b6a2b3cd220 (LWP 3168)):
#0 0x0000003d16ad3648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b6a2af4d576 in epoll_dispatch (base=0x16592000,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002b6a2af38e44 in event_base_loop (base=0x16592000,
    flags=<value optimized out>) at event.c:1558
#3 0x0000000000409742 in main (argc=<value optimized out>,
    argv=<value optimized out>) at daemon/memcached.c:7914

Thread 18 (Thread 3177):
#0 0x0000003d16ac545b in read () from /lib64/libc.so.6
#1 0x0000003d16a6b677 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x0000003d16a6c03e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x0000003d16a61124 in _IO_getline_info_internal () from /lib64/libc.so.6
#4 0x0000003d16a5ffc9 in fgets () from /lib64/libc.so.6
#5 0x00002b6a2b3ce939 in check_stdin_thread (arg=<value optimized out>)
    at extensions/daemon/stdin_check.c:37
#6 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#7 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 17 (Thread 3178):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
   from /lib64/libpthread.so.0
#1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x11d2a040)
    at extensions/loggers/file_logger.c:368
#2 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#3 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 16 (Thread 3179):
#0 0x0000003d16ad3648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b6a2af4d576 in epoll_dispatch (base=0x16592500,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002b6a2af38e44 in event_base_loop (base=0x16592500,
    flags=<value optimized out>) at event.c:1558
#3 0x00000000004144f4 in worker_libevent (arg=0x11d2d900)
    at daemon/thread.c:301
#4 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 15 (Thread 3180):
#0 0x0000003d16ad3648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b6a2af4d576 in epoll_dispatch (base=0x16592280,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002b6a2af38e44 in event_base_loop (base=0x16592280,
    flags=<value optimized out>) at event.c:1558
---Type <return> to continue, or q <return> to quit---
#3 0x00000000004144f4 in worker_libevent (arg=0x11d2d9f8)
    at daemon/thread.c:301
#4 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 14 (Thread 3181):
#0 0x0000003d16ad3648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b6a2af4d576 in epoll_dispatch (base=0x16592c80,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002b6a2af38e44 in event_base_loop (base=0x16592c80,
    flags=<value optimized out>) at event.c:1558
#3 0x00000000004144f4 in worker_libevent (arg=0x11d2daf0)
    at daemon/thread.c:301
#4 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 13 (Thread 3182):
#0 0x0000003d16ad3648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b6a2af4d576 in epoll_dispatch (base=0x16592a00,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002b6a2af38e44 in event_base_loop (base=0x16592a00,
    flags=<value optimized out>) at event.c:1558
#3 0x00000000004144f4 in worker_libevent (arg=0x11d2dbe8)
---Type <return> to continue, or q <return> to quit---
    at daemon/thread.c:301
#4 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 12 (Thread 3183):
#0 0x0000003d16ad3648 in epoll_wait () from /lib64/libc.so.6
#1 0x00002b6a2af4d576 in epoll_dispatch (base=0x16592780,
    tv=<value optimized out>) at epoll.c:404
#2 0x00002b6a2af38e44 in event_base_loop (base=0x16592780,
    flags=<value optimized out>) at event.c:1558
#3 0x00000000004144f4 in worker_libevent (arg=0x11d2dce0)
    at daemon/thread.c:301
#4 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 11 (Thread 3187):
#0 0x0000003d16a99221 in nanosleep () from /lib64/libc.so.6
#1 0x0000003d16accba4 in usleep () from /lib64/libc.so.6
#2 0x00002aaaaaf2de15 in updateStatsThread (arg=0x11d2a4c0)
    at src/memory_tracker.cc:31
#3 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#4 0x0000003d16ad325d in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 10 (Thread 3188):

#0 0x0000003d16acc767 in fdatasync () from /lib64/libc.so.6
#1 0x00002aaaab1ca95f in couch_sync (handle=<value optimized out>)
    at src/os.c:117
#2 0x00002aaaaaf6ee2f in cfs_sync (h=0x1663fcc0)
    at src/couch-kvstore/couch-fs-stats.cc:86
#3 0x00002aaaab1c6d7f in couchstore_commit (db=0x16598380)
    at src/couch_db.c:199
#4 0x00002aaaaaf6877c in CouchKVStore::setVBucketState (this=0x165fc000,
    vbucketId=356, vbstate=..., stateChanged=true, newfile=false)
    at src/couch-kvstore/couch-kvstore.cc:777
#5 0x00002aaaaaf68d55 in CouchKVStore::snapshotVBuckets (this=0x165fc000,
    vbstates=Traceback (most recent call last):
  File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children
    nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (keytype, valuetype))
RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >.
std::map with 1024 elements) at src/couch-kvstore/couch-kvstore.cc:644
#6 0x00002aaaaaefa202 in EventuallyPersistentStore::snapshotVBuckets (
    this=0x11d2f200, priority=...) at src/ep.cc:905
#7 0x00002aaaaaf0a571 in SnapshotVBucketsCallback::callback(Dispatcher&, SingleThreadedRCPtr<Task>&) () from /opt/couchbase/lib/memcached/ep.so
---Type <return> to continue, or q <return> to quit---
#8 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165cec40)
    at src/dispatcher.cc:173
#9 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165cec40)
    at src/dispatcher.cc:28
#10 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#11 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 9 (Thread 3189):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaaef1788 in wait (this=0x16614090, d=...) at src/syncobject.hh:58
#2 IdleTask::run (this=0x16614090, d=...) at src/dispatcher.cc:336
#3 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165cea80)
    at src/dispatcher.cc:173
#4 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165cea80)
    at src/dispatcher.cc:28
#5 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#6 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 8 (Thread 3190):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaaef1788 in wait (this=0x166142d0, d=...) at src/syncobject.hh:58
---Type <return> to continue, or q <return> to quit---
#2 IdleTask::run (this=0x166142d0, d=...) at src/dispatcher.cc:336
#3 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165cf880)
    at src/dispatcher.cc:173
#4 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165cf880)
    at src/dispatcher.cc:28
#5 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#6 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 7 (Thread 3191):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaaef1788 in wait (this=0x16614240, d=...) at src/syncobject.hh:58
#2 IdleTask::run (this=0x16614240, d=...) at src/dispatcher.cc:336
#3 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165cf6c0)
    at src/dispatcher.cc:173
#4 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165cf6c0)
    at src/dispatcher.cc:28
#5 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#6 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 6 (Thread 3192):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1 0x00002aaaaaf0e73f in wait (this=0x1658e000) at src/syncobject.hh:58
#2 wait (this=0x1658e000) at src/syncobject.hh:74
#3 wait (this=0x1658e000) at src/tapconnmap.hh:169
#4 EventuallyPersistentEngine::notifyPendingConnections (this=0x1658e000)
    at src/ep_engine.cc:3389
#5 0x00002aaaaaf0e823 in EvpNotifyPendingConns (arg=0x1658e000)
    at src/ep_engine.cc:1119
#6 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#7 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 5 (Thread 3226):
#0 0x0000003d1760af59 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaaef4097 in wait (this=0x165dee00) at src/syncobject.hh:47
#2 Dispatcher::run (this=0x165dee00) at src/dispatcher.cc:139
#3 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165dee00)
    at src/dispatcher.cc:28
#4 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#5 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 4 (Thread 3227):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1 0x00002aaaaaef1788 in wait (this=0x166143f0, d=...) at src/syncobject.hh:58
#2 IdleTask::run (this=0x166143f0, d=...) at src/dispatcher.cc:336
#3 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165dec40)
    at src/dispatcher.cc:173
#4 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165dec40)
    at src/dispatcher.cc:28
#5 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#6 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 3 (Thread 3228):
#0 0x0000003d1760b1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaaef1788 in wait (this=0x16614ab0, d=...) at src/syncobject.hh:58
#2 IdleTask::run (this=0x16614ab0, d=...) at src/dispatcher.cc:336
#3 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165dea80)
    at src/dispatcher.cc:173
#4 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165dea80)
    at src/dispatcher.cc:28
#5 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#6 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 2 (Thread 3230):

#0 0x0000003d17607ba5 in pthread_join () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1 0x00002aaaaaef4892 in Dispatcher::stop (this=0x165de000,
    force=<value optimized out>) at src/dispatcher.cc:212
#2 0x00002aaaaaefe858 in EventuallyPersistentStore::~EventuallyPersistentStore
    (this=0x18bec000, __in_chrg=<value optimized out>) at src/ep.cc:529
#3 0x00002aaaaaf1fa4f in EventuallyPersistentEngine::~EventuallyPersistentEngine() () from /opt/couchbase/lib/memcached/ep.so
#4 0x00002aaaaaf0fc45 in EvpDestroy (handle=<value optimized out>, force=true)
    at src/ep_engine.cc:126
#5 0x00002aaaaacc3a46 in engine_shutdown_thread (arg=0x16598460)
    at bucket_engine.c:1387
#6 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#7 0x0000003d16ad325d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x48e06940 (LWP 3225)):
#0 0x0000003d16a30285 in raise () from /lib64/libc.so.6
#1 0x0000003d16a31d30 in abort () from /lib64/libc.so.6
#2 0x00002aaaaaf68a5f in CouchKVStore::setVBucketState (this=0x18ace580,
    vbucketId=957, vbstate=..., stateChanged=true, newfile=false)
    at src/couch-kvstore/couch-kvstore.cc:805
#3 0x00002aaaaaf68d55 in CouchKVStore::snapshotVBuckets (this=0x18ace580,
    vbstates=Traceback (most recent call last):
  File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 288, in children
    nodetype = gdb.lookup_type('std::_Rb_tree_node< std::pair< %s, %s > >' % (ke---Type <return> to continue, or q <return> to quit---
ytype, valuetype))
RuntimeError: No type named std::_Rb_tree_node< std::pair< const unsigned short, vbucket_state > >.
std::map with 1024 elements) at src/couch-kvstore/couch-kvstore.cc:644
#4 0x00002aaaaaefa202 in EventuallyPersistentStore::snapshotVBuckets (
    this=0x18bec000, priority=...) at src/ep.cc:905
#5 0x00002aaaaaf0a571 in SnapshotVBucketsCallback::callback(Dispatcher&, SingleThreadedRCPtr<Task>&) () from /opt/couchbase/lib/memcached/ep.so
#6 0x00002aaaaaef3e4a in Dispatcher::run (this=0x165de000)
    at src/dispatcher.cc:173
#7 0x00002aaaaaef474b in launch_dispatcher_thread (arg=0x165de000)
    at src/dispatcher.cc:28
#8 0x0000003d1760677d in start_thread () from /lib64/libpthread.so.0
#9 0x0000003d16ad325d in clone () from /lib64/libc.so.6


 Comments   
Comment by Peter Wansch (Inactive) [ 19/Sep/12 ]
Chiyoung, can you take a look?
Comment by Chiyoung Seo [ 19/Sep/12 ]
http://review.couchbase.org/#/c/20982/
Comment by Thuan Nguyen [ 25/Sep/12 ]
Integrated in github-ep-engine-2-0 #434 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/434/])
    MB-6676 Don't abort for communication failures with mccouch (Revision c0084a7daa8d29584885d8595319a3aeb409f740)

     Result = SUCCESS
Chiyoung Seo :
Files :
* src/couch-kvstore/couch-kvstore.cc
Comment by kzeller [ 26/Oct/12 ]
Add to RN : The thread responsible for persisting data had been crashing
during vBucket state changes. This was caused by unhandled file not found
exceptions. The cause of the crashes has been fixed.
Generated at Thu Apr 17 19:12:40 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.