Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Blocker
-
Resolution: Incomplete
-
Affects Version/s: 2.0-developer-preview-3
-
Fix Version/s: 2.0-developer-preview-4
-
Component/s: couchbase-bucket
-
Security Level: Public
-
Labels:
-
Environment:2.0.0r-552
centos5.4 64
10 nodes
eperf write test
Description
Running a write test (Read:Insert:Update:Delete Ratio = 20:15:60:5) with 10 nodes and 50 million items, one of the nodes failed to write its items to disk.
-
- diags.tar.bz2
- 24/Jan/12 6:49 PM
- 10.41 MB
- Keith Batten
Issue Links
- blocks
-
MB-4684
rebalance is hung at 100% when rebalancing 6 nodes
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Hide
Mike Wiederhold
added a comment -
In MemcachedEngine::waitForReadable()
// @todo do not block forever.. but allow shutdown..
int ret = poll(&fds, 1, 1000);
We should probably fix waitForWritable() too.
// @todo do not block forever.. but allow shutdown..
int ret = poll(&fds, 1, 1000);
We should probably fix waitForWritable() too.
Show
Mike Wiederhold
added a comment - In MemcachedEngine::waitForReadable()
// @todo do not block forever.. but allow shutdown..
int ret = poll(&fds, 1, 1000);
We should probably fix waitForWritable() too.
Hide
Farshid Ghods
added a comment -
i also saw this issue on a 6 node cluster.
Mike/Trond/Dustin,
how can i verify that the disk draining isssue I am seeing is the same ?
Mike/Trond/Dustin,
how can i verify that the disk draining isssue I am seeing is the same ?
Show
Farshid Ghods
added a comment - i also saw this issue on a 6 node cluster.
Mike/Trond/Dustin,
how can i verify that the disk draining isssue I am seeing is the same ?
Hide
Farshid Ghods
added a comment -
upgrading this bug to blocker since i am seeing this after few rebalanaces with 700k items
Show
Farshid Ghods
added a comment - upgrading this bug to blocker since i am seeing this after few rebalanaces with 700k items
Hide
Farshid Ghods
added a comment -
(gdb) t a a bt
Thread 12 (Thread 0x7fcdb940f700 (LWP 31585)):
#0 0x00007fcdb9c064dd in read () from /lib64/libc.so.6
#1 0x00007fcdb9b9efd8 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00007fcdb9ba0ade in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00007fcdb9b9bfcb in getc () from /lib64/libc.so.6
#4 0x00007fcdb9410875 in ?? ()
from /opt/couchbase/lib/memcached/stdin_term_handler.so
#5 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7fcdb8c0e700 (LWP 31586)):
#0 0x00007fcdb9bd8bed in nanosleep () from /lib64/libc.so.6
#1 0x00007fcdb9bd8a60 in sleep () from /lib64/libc.so.6
#2 0x0000000000414288 in ?? ()
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7fcdb8203700 (LWP 31587)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
---Type <return> to continue, or q <return> to quit---
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7fcdb7a02700 (LWP 31588)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fcdb7201700 (LWP 31589)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fcdb6a00700 (LWP 31590)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fcdb61ff700 (LWP 31591)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fcdb53e4700 (LWP 31595)):
#0 0x00007fcdb9ec83cc in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb5659cb2 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
---Type <return> to continue, or q <return> to quit---
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fcdb4be3700 (LWP 31596)):
#0 0x00007fcdb9c0a123 in poll () from /lib64/libc.so.6
#1 0x00007fcdb56e2e03 in MemcachedEngine::waitForReadable() ()
from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb56e3189 in MemcachedEngine::wait() ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb56e4058 in MemcachedEngine::delVBucket(unsigned short, Callback<bool>&) () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007fcdb56e97f9 in MCKVStore::delVBucket(unsigned short, unsigned short)
() from /opt/couchbase/lib/memcached/ep.so
#5 0x00007fcdb5661d64 in EventuallyPersistentStore::completeVBucketDeletion(unsigned short, unsigned short) () from /opt/couchbase/lib/memcached/ep.so
#6 0x00007fcdb567d325 in FastVBucketDeletionCallback::callback(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#7 0x00007fcdb565ad2f in Task::run(Dispatcher&, std::tr1::shared_ptr<Task>) ()
from /opt/couchbase/lib/memcached/ep.so
#8 0x00007fcdb5659e81 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#9 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
#10 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#11 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fcdb43e2700 (LWP 31597)):
#0 0x00007fcdb9ec874b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb5658038 in IdleTask::run(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb5659e81 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fcdb3be1700 (LWP 31598)):
#0 0x00007fcdb9ec874b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb56804c0 in EventuallyPersistentEngine::notifyPendingConnections() () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb5680591 in EvpNotifyPendingConns ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
Thread 1 (Thread 0x7fcdbb08c720 (LWP 31579)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000408e8b in ?? ()
#4 0x00007fcdb9b4ccdd in __libc_start_main () from /lib64/libc.so.6
#5 0x0000000000402969 in ?? ()
#6 0x00007fff1bf50e68 in ?? ()
#7 0x000000000000001c in ?? ()
#8 0x000000000000000f in ?? ()
#9 0x00007fff1bf5257c in ?? ()
#10 0x00007fff1bf52599 in ?? ()
#11 0x00007fff1bf5259c in ?? ()
#12 0x00007fff1bf525cf in ?? ()
#13 0x00007fff1bf525d2 in ?? ()
#14 0x00007fff1bf525d8 in ?? ()
#15 0x00007fff1bf525db in ?? ()
#16 0x00007fff1bf52609 in ?? ()
#17 0x00007fff1bf5260c in ?? ()
#18 0x00007fff1bf52613 in ?? ()
#19 0x00007fff1bf52616 in ?? ()
#20 0x00007fff1bf52619 in ?? ()
Thread 12 (Thread 0x7fcdb940f700 (LWP 31585)):
#0 0x00007fcdb9c064dd in read () from /lib64/libc.so.6
#1 0x00007fcdb9b9efd8 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00007fcdb9ba0ade in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00007fcdb9b9bfcb in getc () from /lib64/libc.so.6
#4 0x00007fcdb9410875 in ?? ()
from /opt/couchbase/lib/memcached/stdin_term_handler.so
#5 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7fcdb8c0e700 (LWP 31586)):
#0 0x00007fcdb9bd8bed in nanosleep () from /lib64/libc.so.6
#1 0x00007fcdb9bd8a60 in sleep () from /lib64/libc.so.6
#2 0x0000000000414288 in ?? ()
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7fcdb8203700 (LWP 31587)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
---Type <return> to continue, or q <return> to quit---
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7fcdb7a02700 (LWP 31588)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fcdb7201700 (LWP 31589)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fcdb6a00700 (LWP 31590)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fcdb61ff700 (LWP 31591)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fcdb53e4700 (LWP 31595)):
#0 0x00007fcdb9ec83cc in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb5659cb2 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
---Type <return> to continue, or q <return> to quit---
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fcdb4be3700 (LWP 31596)):
#0 0x00007fcdb9c0a123 in poll () from /lib64/libc.so.6
#1 0x00007fcdb56e2e03 in MemcachedEngine::waitForReadable() ()
from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb56e3189 in MemcachedEngine::wait() ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb56e4058 in MemcachedEngine::delVBucket(unsigned short, Callback<bool>&) () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007fcdb56e97f9 in MCKVStore::delVBucket(unsigned short, unsigned short)
() from /opt/couchbase/lib/memcached/ep.so
#5 0x00007fcdb5661d64 in EventuallyPersistentStore::completeVBucketDeletion(unsigned short, unsigned short) () from /opt/couchbase/lib/memcached/ep.so
#6 0x00007fcdb567d325 in FastVBucketDeletionCallback::callback(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#7 0x00007fcdb565ad2f in Task::run(Dispatcher&, std::tr1::shared_ptr<Task>) ()
from /opt/couchbase/lib/memcached/ep.so
#8 0x00007fcdb5659e81 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#9 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
#10 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#11 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fcdb43e2700 (LWP 31597)):
#0 0x00007fcdb9ec874b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb5658038 in IdleTask::run(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb5659e81 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fcdb3be1700 (LWP 31598)):
#0 0x00007fcdb9ec874b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb56804c0 in EventuallyPersistentEngine::notifyPendingConnections() () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb5680591 in EvpNotifyPendingConns ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
Thread 1 (Thread 0x7fcdbb08c720 (LWP 31579)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000408e8b in ?? ()
#4 0x00007fcdb9b4ccdd in __libc_start_main () from /lib64/libc.so.6
#5 0x0000000000402969 in ?? ()
#6 0x00007fff1bf50e68 in ?? ()
#7 0x000000000000001c in ?? ()
#8 0x000000000000000f in ?? ()
#9 0x00007fff1bf5257c in ?? ()
#10 0x00007fff1bf52599 in ?? ()
#11 0x00007fff1bf5259c in ?? ()
#12 0x00007fff1bf525cf in ?? ()
#13 0x00007fff1bf525d2 in ?? ()
#14 0x00007fff1bf525d8 in ?? ()
#15 0x00007fff1bf525db in ?? ()
#16 0x00007fff1bf52609 in ?? ()
#17 0x00007fff1bf5260c in ?? ()
#18 0x00007fff1bf52613 in ?? ()
#19 0x00007fff1bf52616 in ?? ()
#20 0x00007fff1bf52619 in ?? ()
Show
Farshid Ghods
added a comment - (gdb) t a a bt
Thread 12 (Thread 0x7fcdb940f700 (LWP 31585)):
#0 0x00007fcdb9c064dd in read () from /lib64/libc.so.6
#1 0x00007fcdb9b9efd8 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x00007fcdb9ba0ade in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x00007fcdb9b9bfcb in getc () from /lib64/libc.so.6
#4 0x00007fcdb9410875 in ?? ()
from /opt/couchbase/lib/memcached/stdin_term_handler.so
#5 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#6 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7fcdb8c0e700 (LWP 31586)):
#0 0x00007fcdb9bd8bed in nanosleep () from /lib64/libc.so.6
#1 0x00007fcdb9bd8a60 in sleep () from /lib64/libc.so.6
#2 0x0000000000414288 in ?? ()
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7fcdb8203700 (LWP 31587)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
---Type <return> to continue, or q <return> to quit---
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7fcdb7a02700 (LWP 31588)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fcdb7201700 (LWP 31589)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fcdb6a00700 (LWP 31590)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fcdb61ff700 (LWP 31591)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000412b14 in ?? ()
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fcdb53e4700 (LWP 31595)):
#0 0x00007fcdb9ec83cc in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb5659cb2 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
---Type <return> to continue, or q <return> to quit---
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fcdb4be3700 (LWP 31596)):
#0 0x00007fcdb9c0a123 in poll () from /lib64/libc.so.6
#1 0x00007fcdb56e2e03 in MemcachedEngine::waitForReadable() ()
from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb56e3189 in MemcachedEngine::wait() ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb56e4058 in MemcachedEngine::delVBucket(unsigned short, Callback<bool>&) () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007fcdb56e97f9 in MCKVStore::delVBucket(unsigned short, unsigned short)
() from /opt/couchbase/lib/memcached/ep.so
#5 0x00007fcdb5661d64 in EventuallyPersistentStore::completeVBucketDeletion(unsigned short, unsigned short) () from /opt/couchbase/lib/memcached/ep.so
#6 0x00007fcdb567d325 in FastVBucketDeletionCallback::callback(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#7 0x00007fcdb565ad2f in Task::run(Dispatcher&, std::tr1::shared_ptr<Task>) ()
from /opt/couchbase/lib/memcached/ep.so
#8 0x00007fcdb5659e81 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#9 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
#10 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#11 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fcdb43e2700 (LWP 31597)):
#0 0x00007fcdb9ec874b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb5658038 in IdleTask::run(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb5659e81 in Dispatcher::run() ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb565a6d3 in ?? () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fcdb3be1700 (LWP 31598)):
#0 0x00007fcdb9ec874b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007fcdb56804c0 in EventuallyPersistentEngine::notifyPendingConnections() () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007fcdb5680591 in EvpNotifyPendingConns ()
from /opt/couchbase/lib/memcached/ep.so
#3 0x00007fcdb9ec47e1 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fcdb9c1377d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
Thread 1 (Thread 0x7fcdbb08c720 (LWP 31579)):
#0 0x00007fcdb9c13d73 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fcdbaa58c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
#2 0x00007fcdbaa47a4c in event_base_loop ()
from /opt/couchbase/lib/libevent-2.0.so.5
#3 0x0000000000408e8b in ?? ()
#4 0x00007fcdb9b4ccdd in __libc_start_main () from /lib64/libc.so.6
#5 0x0000000000402969 in ?? ()
#6 0x00007fff1bf50e68 in ?? ()
#7 0x000000000000001c in ?? ()
#8 0x000000000000000f in ?? ()
#9 0x00007fff1bf5257c in ?? ()
#10 0x00007fff1bf52599 in ?? ()
#11 0x00007fff1bf5259c in ?? ()
#12 0x00007fff1bf525cf in ?? ()
#13 0x00007fff1bf525d2 in ?? ()
#14 0x00007fff1bf525d8 in ?? ()
#15 0x00007fff1bf525db in ?? ()
#16 0x00007fff1bf52609 in ?? ()
#17 0x00007fff1bf5260c in ?? ()
#18 0x00007fff1bf52613 in ?? ()
#19 0x00007fff1bf52616 in ?? ()
#20 0x00007fff1bf52619 in ?? ()
Show
Trond Norbye
added a comment - AFAIK Mike is looking into this
Hide
Dustin Sallings
added a comment -
This appears to be related. trap_exit pretty much always looks like a bug to me, as does infinite timeouts. infinity is a long time to wait, and avoiding supervisors is a great way to have a half-crashed process.
=========================CRASH REPORT=========================
crasher:
initial call: couch_db_updater:init/1
pid: <0.3019.0>
registered_name: []
exception exit: {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
in function gen_server:terminate/6
ancestors: [<0.3018.0>,<0.3012.0>]
messages: []
links: [<0.3018.0>]
dictionary: [{random_seed,{8236,26623,17360}}]
trap_exit: true
status: running
heap_size: 514229
stack_size: 24
reductions: 128674390
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.3018.0> terminating
** Last message in was {'EXIT',<0.3019.0>,
{{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}}
** When Server state == {db,<0.3018.0>,<0.3019.0>,<0.2584.3>,
<<"1327433645422574">>,<0.3013.0>,<0.3020.0>,
{db_header,7,330246,0,
{1032846318,{188431,3476,452524154},10086836},
{1033897971,191907,9100414},
{995250296,[],114},
0,nil,nil,1000},
330246,
{btree,<0.3013.0>,
{1057494137,{189209,3559,454504038},10131679},
#Fun<couch_db_updater.5.113966123>,
#Fun<couch_db_updater.6.1149495>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.7.73522408>,snappy,1279},
{btree,<0.3013.0>,
{1058474795,192768,9144000},
#Fun<couch_db_updater.8.38129561>,
#Fun<couch_db_updater.9.36794841>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.10.58105827>,snappy,
1279},
{btree,<0.3013.0>,
{995250296,[],114},
#Fun<couch_btree.3.75015339>,
#Fun<couch_btree.4.130216060>,
#Fun<couch_btree.5.75272376>,nil,snappy,1279},
339222,<<"default/54">>,
"/mnt/ebs/default/54.couch",[],[],nil,
{user_ctx,null,[],undefined},
#Ref<0.0.347.17103>,1000,
[before_header,after_header,on_file_open],
[create,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}}],
snappy,[]}
** Reason for termination ==
** {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_db:init/1
pid: <0.3018.0>
registered_name: []
exception exit: {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
in function gen_server:terminate/6
ancestors: [<0.3012.0>]
messages: []
links: [<0.575.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 97283
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server couch_server terminating
** Last message in was {'EXIT',<0.3018.0>,
{{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}}
** When Server state == {server,"/mnt/ebs",
{re_pattern,0,0,
<<69,82,67,80,124,0,0,0,16,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,93,0,72,25,77,0,0,0,0,0,0,
0,0,0,0,0,0,254,255,255,7,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,77,0,0,0,0,16,171,255,3,0,0,0,
128,254,255,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,69,26,84,0,72,0>>},
10000,53,"Tue, 24 Jan 2012 19:32:35 GMT"}
** Reason for termination ==
** kill
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_server:init/1
pid: <0.575.0>
registered_name: couch_server
exception exit: kill
in function gen_server:terminate/6
ancestors: [couch_primary_services,couch_server_sup,cb_couch_sup,
ns_server_cluster_sup,<0.42.0>]
messages: [{'$gen_call',{<0.1275.0>,#Ref<0.0.347.218863>},
{open,<<"default/53">>,[]}}]
links: [<0.3864.0>,<0.4076.0>,<0.4172.0>,<0.4228.0>,<0.4307.0>,
<0.4311.0>,<0.4230.0>,<0.4303.0>,<0.4182.0>,<0.4225.0>,
<0.4178.0>,<0.4120.0>,<0.4132.0>,<0.4121.0>,<0.4106.0>,
<0.3976.0>,<0.4020.0>,<0.4071.0>,<0.4016.0>,<0.4013.0>,
<0.3922.0>,<0.3960.0>,<0.3944.0>,<0.3909.0>,<0.3918.0>,
<0.3906.0>,<0.3174.0>,<0.3302.0>,<0.3377.0>,<0.3409.0>,
<0.3393.0>,<0.3345.0>,<0.3361.0>,<0.3318.0>,<0.3238.0>,
<0.3270.0>,<0.3286.0>,<0.3254.0>,<0.3206.0>,<0.3222.0>,
<0.3190.0>,<0.3046.0>,<0.3110.0>,<0.3142.0>,<0.3158.0>,
<0.3126.0>,<0.3078.0>,<0.3094.0>,<0.3062.0>,<0.1550.0>,
<0.3002.0>,<0.2985.0>,<0.598.0>,<0.818.0>,<0.570.0>]
dictionary: [{random_seed,{27839,21123,25074}}]
trap_exit: true
status: running
heap_size: 1597
stack_size: 24
reductions: 8964675
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,couch_primary_services}
Context: child_terminated
Reason: kill
Offender: [{pid,<0.575.0>},
{name,couch_server},
{mfargs,{couch_server,sup_start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]
[ns_server:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:<0.1275.0>:couchbase_compaction_daemon:compact_vbucket:243] Couldn't open vbucket database `default/53`: {'EXIT',
{kill,
{gen_server,
call,
[couch_server,
{open,
<<"default/53">>,
[]},
infinity]}}}
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.3960.0> terminating
** Last message in was {'EXIT',<0.575.0>,kill}
** When Server state == {db,<0.3960.0>,<0.3961.0>,nil,<<"1327433668986630">>,
<0.13025.2>,<0.13029.2>,
{db_header,7,259917,0,
{624220883,{181552,4106,434767765},10152256},
{624701197,185658,8676067},
{577286264,[],115},
0,nil,nil,1000},
259917,
{btree,<0.13025.2>,
{624220883,{181552,4106,434767765},10152256},
#Fun<couch_db_updater.5.113966123>,
#Fun<couch_db_updater.6.1149495>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.7.73522408>,snappy,1279},
{btree,<0.13025.2>,
{624701197,185658,8676067},
#Fun<couch_db_updater.8.38129561>,
#Fun<couch_db_updater.9.36794841>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.10.58105827>,snappy,
1279},
{btree,<0.13025.2>,
{577286264,[],115},
#Fun<couch_btree.3.75015339>,
#Fun<couch_btree.4.130216060>,
#Fun<couch_btree.5.75272376>,nil,snappy,1279},
259917,<<"default/186">>,
"/mnt/ebs/default/186.couch",[],[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,on_file_open],
[create,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}}],
snappy,[]}
** Reason for termination ==
** kill
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_db_updater:init/1
pid: <0.3019.0>
registered_name: []
exception exit: {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
in function gen_server:terminate/6
ancestors: [<0.3018.0>,<0.3012.0>]
messages: []
links: [<0.3018.0>]
dictionary: [{random_seed,{8236,26623,17360}}]
trap_exit: true
status: running
heap_size: 514229
stack_size: 24
reductions: 128674390
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.3018.0> terminating
** Last message in was {'EXIT',<0.3019.0>,
{{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}}
** When Server state == {db,<0.3018.0>,<0.3019.0>,<0.2584.3>,
<<"1327433645422574">>,<0.3013.0>,<0.3020.0>,
{db_header,7,330246,0,
{1032846318,{188431,3476,452524154},10086836},
{1033897971,191907,9100414},
{995250296,[],114},
0,nil,nil,1000},
330246,
{btree,<0.3013.0>,
{1057494137,{189209,3559,454504038},10131679},
#Fun<couch_db_updater.5.113966123>,
#Fun<couch_db_updater.6.1149495>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.7.73522408>,snappy,1279},
{btree,<0.3013.0>,
{1058474795,192768,9144000},
#Fun<couch_db_updater.8.38129561>,
#Fun<couch_db_updater.9.36794841>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.10.58105827>,snappy,
1279},
{btree,<0.3013.0>,
{995250296,[],114},
#Fun<couch_btree.3.75015339>,
#Fun<couch_btree.4.130216060>,
#Fun<couch_btree.5.75272376>,nil,snappy,1279},
339222,<<"default/54">>,
"/mnt/ebs/default/54.couch",[],[],nil,
{user_ctx,null,[],undefined},
#Ref<0.0.347.17103>,1000,
[before_header,after_header,on_file_open],
[create,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}}],
snappy,[]}
** Reason for termination ==
** {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_db:init/1
pid: <0.3018.0>
registered_name: []
exception exit: {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
in function gen_server:terminate/6
ancestors: [<0.3012.0>]
messages: []
links: [<0.575.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 97283
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server couch_server terminating
** Last message in was {'EXIT',<0.3018.0>,
{{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}}
** When Server state == {server,"/mnt/ebs",
{re_pattern,0,0,
<<69,82,67,80,124,0,0,0,16,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,93,0,72,25,77,0,0,0,0,0,0,
0,0,0,0,0,0,254,255,255,7,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,77,0,0,0,0,16,171,255,3,0,0,0,
128,254,255,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,69,26,84,0,72,0>>},
10000,53,"Tue, 24 Jan 2012 19:32:35 GMT"}
** Reason for termination ==
** kill
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_server:init/1
pid: <0.575.0>
registered_name: couch_server
exception exit: kill
in function gen_server:terminate/6
ancestors: [couch_primary_services,couch_server_sup,cb_couch_sup,
ns_server_cluster_sup,<0.42.0>]
messages: [{'$gen_call',{<0.1275.0>,#Ref<0.0.347.218863>},
{open,<<"default/53">>,[]}}]
links: [<0.3864.0>,<0.4076.0>,<0.4172.0>,<0.4228.0>,<0.4307.0>,
<0.4311.0>,<0.4230.0>,<0.4303.0>,<0.4182.0>,<0.4225.0>,
<0.4178.0>,<0.4120.0>,<0.4132.0>,<0.4121.0>,<0.4106.0>,
<0.3976.0>,<0.4020.0>,<0.4071.0>,<0.4016.0>,<0.4013.0>,
<0.3922.0>,<0.3960.0>,<0.3944.0>,<0.3909.0>,<0.3918.0>,
<0.3906.0>,<0.3174.0>,<0.3302.0>,<0.3377.0>,<0.3409.0>,
<0.3393.0>,<0.3345.0>,<0.3361.0>,<0.3318.0>,<0.3238.0>,
<0.3270.0>,<0.3286.0>,<0.3254.0>,<0.3206.0>,<0.3222.0>,
<0.3190.0>,<0.3046.0>,<0.3110.0>,<0.3142.0>,<0.3158.0>,
<0.3126.0>,<0.3078.0>,<0.3094.0>,<0.3062.0>,<0.1550.0>,
<0.3002.0>,<0.2985.0>,<0.598.0>,<0.818.0>,<0.570.0>]
dictionary: [{random_seed,{27839,21123,25074}}]
trap_exit: true
status: running
heap_size: 1597
stack_size: 24
reductions: 8964675
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,couch_primary_services}
Context: child_terminated
Reason: kill
Offender: [{pid,<0.575.0>},
{name,couch_server},
{mfargs,{couch_server,sup_start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]
[ns_server:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:<0.1275.0>:couchbase_compaction_daemon:compact_vbucket:243] Couldn't open vbucket database `default/53`: {'EXIT',
{kill,
{gen_server,
call,
[couch_server,
{open,
<<"default/53">>,
[]},
infinity]}}}
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.3960.0> terminating
** Last message in was {'EXIT',<0.575.0>,kill}
** When Server state == {db,<0.3960.0>,<0.3961.0>,nil,<<"1327433668986630">>,
<0.13025.2>,<0.13029.2>,
{db_header,7,259917,0,
{624220883,{181552,4106,434767765},10152256},
{624701197,185658,8676067},
{577286264,[],115},
0,nil,nil,1000},
259917,
{btree,<0.13025.2>,
{624220883,{181552,4106,434767765},10152256},
#Fun<couch_db_updater.5.113966123>,
#Fun<couch_db_updater.6.1149495>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.7.73522408>,snappy,1279},
{btree,<0.13025.2>,
{624701197,185658,8676067},
#Fun<couch_db_updater.8.38129561>,
#Fun<couch_db_updater.9.36794841>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.10.58105827>,snappy,
1279},
{btree,<0.13025.2>,
{577286264,[],115},
#Fun<couch_btree.3.75015339>,
#Fun<couch_btree.4.130216060>,
#Fun<couch_btree.5.75272376>,nil,snappy,1279},
259917,<<"default/186">>,
"/mnt/ebs/default/186.couch",[],[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,on_file_open],
[create,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}}],
snappy,[]}
** Reason for termination ==
** kill
[error_logger:error] [2012-01-24 21:20:42] [ns_1@10.189.54.253:error_logger:ale_error_logger_handler:log_report:72]
Show
Dustin Sallings
added a comment - This appears to be related. trap_exit pretty much always looks like a bug to me, as does infinite timeouts. infinity is a long time to wait, and avoiding supervisors is a great way to have a half-crashed process.
=========================CRASH REPORT=========================
crasher:
initial call: couch_db_updater:init/1
pid: <0.3019.0>
registered_name: []
exception exit: {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
in function gen_server:terminate/6
ancestors: [<0.3018.0>,<0.3012.0>]
messages: []
links: [<0.3018.0>]
dictionary: [{random_seed,{8236,26623,17360}}]
trap_exit: true
status: running
heap_size: 514229
stack_size: 24
reductions: 128674390
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.3018.0> terminating
** Last message in was {'EXIT',<0.3019.0>,
{{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}}
** When Server state == {db,<0.3018.0>,<0.3019.0>,<0.2584.3>,
<<"1327433645422574">>,<0.3013.0>,<0.3020.0>,
{db_header,7,330246,0,
{1032846318,{188431,3476,452524154},10086836},
{1033897971,191907,9100414},
{995250296,[],114},
0,nil,nil,1000},
330246,
{btree,<0.3013.0>,
{1057494137,{189209,3559,454504038},10131679},
#Fun<couch_db_updater.5.113966123>,
#Fun<couch_db_updater.6.1149495>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.7.73522408>,snappy,1279},
{btree,<0.3013.0>,
{1058474795,192768,9144000},
#Fun<couch_db_updater.8.38129561>,
#Fun<couch_db_updater.9.36794841>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.10.58105827>,snappy,
1279},
{btree,<0.3013.0>,
{995250296,[],114},
#Fun<couch_btree.3.75015339>,
#Fun<couch_btree.4.130216060>,
#Fun<couch_btree.5.75272376>,nil,snappy,1279},
339222,<<"default/54">>,
"/mnt/ebs/default/54.couch",[],[],nil,
{user_ctx,null,[],undefined},
#Ref<0.0.347.17103>,1000,
[before_header,after_header,on_file_open],
[create,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}}],
snappy,[]}
** Reason for termination ==
** {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_db:init/1
pid: <0.3018.0>
registered_name: []
exception exit: {{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}
in function gen_server:terminate/6
ancestors: [<0.3012.0>]
messages: []
links: [<0.575.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 97283
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_msg:76] ** Generic server couch_server terminating
** Last message in was {'EXIT',<0.3018.0>,
{{badmatch,eof},
[{couch_btree,get_node,2},
{couch_btree,lookup,3},
{couch_btree,lookup,2},
{couch_db_updater,copy_docs,4},
{couch_db_updater,copy_compact,3},
{couch_db_updater,start_copy_compact,1}]}}
** When Server state == {server,"/mnt/ebs",
{re_pattern,0,0,
<<69,82,67,80,124,0,0,0,16,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,93,0,72,25,77,0,0,0,0,0,0,
0,0,0,0,0,0,254,255,255,7,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,77,0,0,0,0,16,171,255,3,0,0,0,
128,254,255,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,69,26,84,0,72,0>>},
10000,53,"Tue, 24 Jan 2012 19:32:35 GMT"}
** Reason for termination ==
** kill
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_server:init/1
pid: <0.575.0>
registered_name: couch_server
exception exit: kill
in function gen_server:terminate/6
ancestors: [couch_primary_services,couch_server_sup,cb_couch_sup,
ns_server_cluster_sup,<0.42.0>]
messages: [{'$gen_call',{<0.1275.0>,#Ref<0.0.347.218863>},
{open,<<"default/53">>,[]}}]
links: [<0.3864.0>,<0.4076.0>,<0.4172.0>,<0.4228.0>,<0.4307.0>,
<0.4311.0>,<0.4230.0>,<0.4303.0>,<0.4182.0>,<0.4225.0>,
<0.4178.0>,<0.4120.0>,<0.4132.0>,<0.4121.0>,<0.4106.0>,
<0.3976.0>,<0.4020.0>,<0.4071.0>,<0.4016.0>,<0.4013.0>,
<0.3922.0>,<0.3960.0>,<0.3944.0>,<0.3909.0>,<0.3918.0>,
<0.3906.0>,<0.3174.0>,<0.3302.0>,<0.3377.0>,<0.3409.0>,
<0.3393.0>,<0.3345.0>,<0.3361.0>,<0.3318.0>,<0.3238.0>,
<0.3270.0>,<0.3286.0>,<0.3254.0>,<0.3206.0>,<0.3222.0>,
<0.3190.0>,<0.3046.0>,<0.3110.0>,<0.3142.0>,<0.3158.0>,
<0.3126.0>,<0.3078.0>,<0.3094.0>,<0.3062.0>,<0.1550.0>,
<0.3002.0>,<0.2985.0>,<0.598.0>,<0.818.0>,<0.570.0>]
dictionary: [{random_seed,{27839,21123,25074}}]
trap_exit: true
status: running
heap_size: 1597
stack_size: 24
reductions: 8964675
neighbours:
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,couch_primary_services}
Context: child_terminated
Reason: kill
Offender: [{pid,<0.575.0>},
{name,couch_server},
{mfargs,{couch_server,sup_start_link,[]}},
{restart_type,permanent},
{shutdown,brutal_kill},
{child_type,worker}]
[ns_server:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :<0.1275.0>:couchbase_compaction_daemon:compact_vbucket:243] Couldn't open vbucket database `default/53`: {'EXIT',
{kill,
{gen_server,
call,
[couch_server,
{open,
<<"default/53">>,
[]},
infinity]}}}
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.3960.0> terminating
** Last message in was {'EXIT',<0.575.0>,kill}
** When Server state == {db,<0.3960.0>,<0.3961.0>,nil,<<"1327433668986630">>,
<0.13025.2>,<0.13029.2>,
{db_header,7,259917,0,
{624220883,{181552,4106,434767765},10152256},
{624701197,185658,8676067},
{577286264,[],115},
0,nil,nil,1000},
259917,
{btree,<0.13025.2>,
{624220883,{181552,4106,434767765},10152256},
#Fun<couch_db_updater.5.113966123>,
#Fun<couch_db_updater.6.1149495>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.7.73522408>,snappy,1279},
{btree,<0.13025.2>,
{624701197,185658,8676067},
#Fun<couch_db_updater.8.38129561>,
#Fun<couch_db_updater.9.36794841>,
#Fun<couch_btree.5.75272376>,
#Fun<couch_db_updater.10.58105827>,snappy,
1279},
{btree,<0.13025.2>,
{577286264,[],115},
#Fun<couch_btree.3.75015339>,
#Fun<couch_btree.4.130216060>,
#Fun<couch_btree.5.75272376>,nil,snappy,1279},
259917,<<"default/186">>,
"/mnt/ebs/default/186.couch",[],[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,on_file_open],
[create,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}}],
snappy,[]}
** Reason for termination ==
** kill
[error_logger:error] [2012-01-24 21:20:42] [ ns_1@10.189.54.253 :error_logger:ale_error_logger_handler:log_report:72]
Hide
Mike Wiederhold
added a comment -
I'm going to make my own issue for the ep-engine part since I think most of the information here is more relevant to the mccouch crash.
Show
Mike Wiederhold
added a comment - I'm going to make my own issue for the ep-engine part since I think most of the information here is more relevant to the mccouch crash.
Hide
Mike Wiederhold
added a comment -
The ep-engine part of this issue is filed here: http://www.couchbase.com/issues/browse/MB-4714
Show
Mike Wiederhold
added a comment - The ep-engine part of this issue is filed here: http://www.couchbase.com/issues/browse/MB-4714
Hide
Dustin Sallings
added a comment -
Good call. I'm going to see what Damien thinks about this since it looks like a fundamental couchdb crash.
Show
Dustin Sallings
added a comment - Good call. I'm going to see what Damien thinks about this since it looks like a fundamental couchdb crash.
Hide
Dustin Sallings
added a comment -
This looks like a CouchDB crash that perhaps failed to propagate up effectively. Damien, do you have an opinion on this one?
Show
Dustin Sallings
added a comment - This looks like a CouchDB crash that perhaps failed to propagate up effectively. Damien, do you have an opinion on this one?
Hide
Damien Katz
added a comment -
This looks like an EBS error, like it ran out of space (is that possible?) or just lost some writes during compaction. Regardless, CouchDB propogated the errors up through all the levels properly (but not as cleanly as it could) and crashed the mccouch connection. I see in the log an new connection being made by ep-engine, probably to retry. I'm not sure if the hang observed is directly related to the couchdb crash and the mccouch broken connection and restart, it's possible that crash was recovered correctly and the hang is unrelated.
Show
Damien Katz
added a comment - This looks like an EBS error, like it ran out of space (is that possible?) or just lost some writes during compaction. Regardless, CouchDB propogated the errors up through all the levels properly (but not as cleanly as it could) and crashed the mccouch connection. I see in the log an new connection being made by ep-engine, probably to retry. I'm not sure if the hang observed is directly related to the couchdb crash and the mccouch broken connection and restart, it's possible that crash was recovered correctly and the hang is unrelated.
Hide
Damien Katz
added a comment -
I think this is an issue on the ep-engine side, CouchDb looks like it failed and restarted it's components correctly. (though I'm not sure why it failed, perhaps ebs weirdness)
Show
Damien Katz
added a comment - I think this is an issue on the ep-engine side, CouchDb looks like it failed and restarted it's components correctly. (though I'm not sure why it failed, perhaps ebs weirdness)
Hide
Dustin Sallings
added a comment -
No more information currently. It sounds a bit of a mystery. We might need to know more about what mccouch/couchdb is doing after these reconnects when the two parts are just sitting idly refusing to talk to each other.
Show
Dustin Sallings
added a comment - No more information currently. It sounds a bit of a mystery. We might need to know more about what mccouch/couchdb is doing after these reconnects when the two parts are just sitting idly refusing to talk to each other.
Hide
Dustin Sallings
added a comment -
Unfortunately, nothing right now. We've got fingers pointing at each other from ep-engine and mccouch. I think the state of ep-engine is captured here, but if we could get the state of proceses of mccouch on this connection at the time, we might be able to see where the mis-sync is happening.
It looks like the diags *do* have proces info in a very verbose kind of way. There might be enough information somewhere in the logs.
It looks like the diags *do* have proces info in a very verbose kind of way. There might be enough information somewhere in the logs.
Show
Dustin Sallings
added a comment - Unfortunately, nothing right now. We've got fingers pointing at each other from ep-engine and mccouch. I think the state of ep-engine is captured here, but if we could get the state of proceses of mccouch on this connection at the time, we might be able to see where the mis-sync is happening.
It looks like the diags *do* have proces info in a very verbose kind of way. There might be enough information somewhere in the logs.
Hide
Dustin Sallings
added a comment -
I think it's not worth putting time into if we're planning on replacing this path, anyway.
Show
Dustin Sallings
added a comment - I think it's not worth putting time into if we're planning on replacing this path, anyway.
Hide
Farshid Ghods
added a comment -
haven't seen this issue while running latest eperf write runs ( build 646 )
Show
Farshid Ghods
added a comment - haven't seen this issue while running latest eperf write runs ( build 646 )
Thread 5 (Thread 0x7f020b5b4700 (LWP 1341)):
#0 0x00007f020fdd7123 in poll () from /lib64/libc.so.6
#1 0x00007f020b8b0ed3 in MemcachedEngine::waitForReadable() () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007f020b8b1259 in MemcachedEngine::wait() () from /opt/couchbase/lib/memcached/ep.so
#3 0x00007f020b8b2785 in MemcachedEngine::get(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned short, Callback<GetValue>&) () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007f020b8b60d4 in MCKVStore::get(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned short, unsigned short, Callback<GetValue>&) () from /opt/couchbase/lib/memcached/ep.so
#5 0x00007f020b837f4e in EventuallyPersistentStore::completeBGFetch(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned short, unsigned short, unsigned long, void const*, unsigned long) () from /opt/couchbase/lib/memcached/ep.so
#6 0x00007f020b84c3bc in BGFetchCallback::callback(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#7 0x00007f020b82a4af in Task::run(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#8 0x00007f020b829601 in Dispatcher::run() () from /opt/couchbase/lib/memcached/ep.so
#9 0x00007f020b829e53 in ?? () from /opt/couchbase/lib/memcached/ep.so
#10 0x00007f02100917e1 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f020fde077d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f020adb3700 (LWP 1342)):
#0 0x00007f020fdd7123 in poll () from /lib64/libc.so.6
#1 0x00007f020b8b0ed3 in MemcachedEngine::waitForReadable() () from /opt/couchbase/lib/memcached/ep.so
#2 0x00007f020b8b1259 in MemcachedEngine::wait() () from /opt/couchbase/lib/memcached/ep.so
#3 0x00007f020b8b1497 in MemcachedEngine::noop(Callback<bool>&) () from /opt/couchbase/lib/memcached/ep.so
#4 0x00007f020b8b68e9 in MCKVStore::commit() () from /opt/couchbase/lib/memcached/ep.so
#5 0x00007f020b82d897 in TransactionContext::commit() () from /opt/couchbase/lib/memcached/ep.so
#6 0x00007f020b836216 in EventuallyPersistentStore::flushSome(std::queue<RCPtr<QueuedItem>, std::deque<RCPtr<QueuedItem>, std::allocator<RCPtr<QueuedItem> > > >*, std::queue<RCPtr<QueuedItem>, std::deque<RCPtr<QueuedItem>, std::allocator<RCPtr<QueuedItem> > > >*) () from /opt/couchbase/lib/memcached/ep.so
#7 0x00007f020b86cd77 in Flusher::doFlush() () from /opt/couchbase/lib/memcached/ep.so
#8 0x00007f020b86d61a in Flusher::step(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#9 0x00007f020b86df6e in FlusherStepper::callback(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#10 0x00007f020b82a4af in Task::run(Dispatcher&, std::tr1::shared_ptr<Task>) () from /opt/couchbase/lib/memcached/ep.so
#11 0x00007f020b829601 in Dispatcher::run() () from /opt/couchbase/lib/memcached/ep.so
#12 0x00007f020b829e53 in ?? () from /opt/couchbase/lib/memcached/ep.so
#13 0x00007f02100917e1 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f020fde077d in clone () from /lib64/libc.so.6
Thread 4 is the read-only dispatcher and Thread 5 is the write dispatcher (flusher). The write dispatcher is blocked and waiting for a response for a commit and the read-only dispatcher is also blocked and waiting for an item that is requested for disk fetch.
I think there are some issues in mccouch (like crash). I didn't follow up mccouch change recently. Please assign this issue to a person that has quite familiar with the latest mccouch.