[MB-3592] rebalance fails with bucket_engine error : bucket_engine.c:1876: bucket_engine_release_cookie: Assertion `peh' failed Created: 12/Apr/11 Updated: 25/May/11 Resolved: 25/May/11 |
|
| Status: | Resolved |
| Project: | Couchbase Server |
| Component/s: | bucket-engine |
| Affects Version/s: | 1.7 alpha 1 |
| Fix Version/s: | 1.7 GA |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Farshid Ghods | Assignee: | Trond Norbye |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
rebalance is failing on the latest v0.0.0. 132 builds with these error messages :
error messages: Port server memcached on node 'ns_1@10.1.5.228' exited with status 134. Restarting. Messages: memcached: bucket_engine.c:1876: bucket_engine_release_cookie: Assertion `peh' failed Rebalance exited with reason {{{badmatch,{error,closed}}, [{mc_client_binary,cmd_binary_vocal_recv,5}, {mc_client_binary,get_vbucket,2}, {ns_memcached,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [{'ns_memcached-default','ns_1@10.1.5.228'}, {get_vbucket,512}, 30000]}} ns_orchestrator002 ns_1@10.1.5.227 11:23:55 - Tue Apr 12, 2011 Control connection to memcached on 'ns_1@10.1.5.228' disconnected: {{badmatch, {error, closed}}, [{mc_client_binary, cmd_binary_vocal_recv, 5}, {mc_client_binary, get_vbucket, 2}, {ns_memcached, handle_call, 3}, {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, steps to reproduce : 1- install 2- create bucket 'default' 3- add node X 4- rebalance 5- remove X 6- rebalance rebalance fails in step 4 and step 6 |
| Comments |
| Comment by Trond Norbye [ 27/Apr/11 ] |
| Fixed by the recent bucket_engine and ep_engine fixes |
| Comment by Farshid Ghods [ 15/May/11 ] |
|
this crash happened on the latest changes again Thread 1 (Thread 22470): #0 0x00007f0347746a75 in raise () from /lib/libc.so.6 #1 0x00007f034774a5c0 in abort () from /lib/libc.so.6 #2 0x00007f034773f941 in __assert_fail () from /lib/libc.so.6 #3 0x00007f0346307bb8 in bucket_engine_release_cookie (cookie=0x3ea0da8) at bucket_engine.c:2004 #4 0x00007f034380a04f in TapConnection::releaseReference (this=0x3f24e20, force=198) at tapconnection.cc:35 #5 0x00007f034381b6d1 in TapConnectionReaperCallback::TapConnectionReaperCallback(EventuallyPersistentEngine&, TapConnection*) will post the core logs and diags |
| Comment by Farshid Ghods [ 15/May/11 ] |
|
installed version: basestar-260-g2330d10 ubuntu 64-bit |
| Comment by Trond Norbye [ 15/May/11 ] |
|
This is a variant of |
| Comment by Farshid Ghods [ 16/May/11 ] |
|
saw this on last night run against basestar-262
I think this crash happens when we delete a huge bucket few seconds after calling flush on the bucket Thread 4 (Thread 9793): #0 0x00000037fc60aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaacfdd5a in wait (this=0x4b33630) at syncobject.hh:31 #2 Dispatcher::run (this=0x4b33630) at dispatcher.cc:85 #3 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x4b3367c) at dispatcher.cc:28 #4 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 Thread 3 (Thread 9794): #0 0x00000037fbacd587 in fdatasync () from /lib64/libc.so.6 #1 0x00002aaaaad7aca9 in full_fsync (id=<value optimized out>, flags=2) at embedded/sqlite3.c:25510 #2 unixSync (id=<value optimized out>, flags=2) at embedded/sqlite3.c:25558 #3 0x00002aaaaadd0777 in vdbeCommit (db=0x60a3ad8, p=<value optimized out>) at embedded/sqlite3.c:13413 #4 0x00002aaaaadd1fbd in sqlite3VdbeHalt (p=0x604bcd8) at embedded/sqlite3.c:56514 #5 0x00002aaaaae21be9 in sqlite3VdbeExec (p=0x604bcd8) at embedded/sqlite3.c:62196 #6 0x00002aaaaae0055a in sqlite3Step (pStmt=0x604bcd8) at embedded/sqlite3.c:57947 #7 sqlite3_step (pStmt=0x604bcd8) at embedded/sqlite3.c:58011 #8 0x00002aaaaad6f199 in PreparedStatement::execute (this=0x464d8aa0) at sqlite-pst.cc:73 #9 0x00002aaaaad70648 in SqliteStrategy::execute ( this=<value optimized out>, query=<value optimized out>) at sqlite-strategies.cc:151 #10 0x00002aaaaad72da4 in SqliteStrategy::open (this=0x3926c50) at sqlite-strategies.cc:122 #11 0x00002aaaaad6d23d in open (this=0x3926d30) at sqlite-kvstore.hh:175 #12 StrategicSqlite3::reset (this=0x3926d30) at sqlite-kvstore.cc:144 #13 0x00002aaaaad00e6e in EventuallyPersistentStore::flushOneDeleteAll ( this=<value optimized out>) at ep.cc:1731 #14 0x00002aaaaad0b13d in EventuallyPersistentStore::flushOne ( this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1869 #15 0x00002aaaaad0b32e in EventuallyPersistentStore::flushSome ( this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1487 #16 0x00002aaaaad40e1b in Flusher::doFlush (this=0x58db9b0) at flusher.cc:240 #17 0x00002aaaaad416e5 in Flusher::step (this=0x51, d=..., tid=std::tr1::shared_ptr (count 0) 0x464d8f20) at flusher.cc:154 #18 0x00002aaaaad41eae in FlusherStepper::callback (this=0x4b338c0, d=..., t=<value optimized out>) at flusher.cc:23 #19 0x00002aaaaacff34f in Task::run (this=<value optimized out>, d=..., t=<value optimized out>) at dispatcher.hh:139 #20 0x00002aaaaacfdf2b in Dispatcher::run (this=0x49b5b90) at dispatcher.cc:119 #21 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x51) at dispatcher.cc:28 #22 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #23 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 Thread 2 (Thread 9795): #0 0x00000037fc60b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaacfbc19 in wait (this=0x58db900, d=...) at syncobject.hh:42 #2 IdleTask::run (this=0x58db900, d=...) at dispatcher.cc:244 #3 0x00002aaaaacfdf2b in Dispatcher::run (this=0x58da870) at dispatcher.cc:119 #4 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x58da8bc) at dispatcher.cc:28 #5 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #6 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 Thread 1 (Thread 9806): #0 0x00000037fba30265 in raise () from /lib64/libc.so.6 #1 0x00000037fba31d10 in abort () from /lib64/libc.so.6 #2 0x00000037fba296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002aaaaaaad968 in bucket_engine_release_cookie (cookie=0x621ebe8) at bucket_engine.c:2004 #4 0x00002aaaaad54a59 in TapConnection::releaseReference (this=0x6025620, force=78) at tapconnection.cc:35 #5 0x00002aaaaad66867 in TapConnectionReaperCallback (this=0x391e440) at tapconnmap.cc:21 #6 TapConnMap::shutdownAllTapConnections (this=0x391e440) at tapconnmap.cc:341 #7 0x00002aaaaad1df93 in EventuallyPersistentEngine::destroy ( this=0x391dea0, force=false) at ep_engine.cc:1734 #8 0x00002aaaaad29ef4 in EvpDestroy (handle=0x391dea0, force=true) at ep_engine.cc:96 #9 0x00002aaaaaaae711 in engine_shutdown_thread (arg=0x38e9c60) at bucket_engine.c:1099 #10 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #11 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 |
| Comment by Trond Norbye [ 25/May/11 ] |
|
-- Posted from Bugbox for Android |
| Comment by Trond Norbye [ 25/May/11 ] |
|
-- Posted from Bugbox for Android |
| Comment by Trond Norbye [ 25/May/11 ] |
|
-- Posted from Bugbox for Android |