[MB-3592] rebalance fails with bucket_engine error : bucket_engine.c:1876: bucket_engine_release_cookie: Assertion `peh' failed Created: 12/Apr/11  Updated: 11/Oct/13  Resolved: 25/May/11

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 1.7 alpha 1
Fix Version/s: 1.7 GA
Security Level: Public

Type: Bug Priority: Major
Reporter: Farshid Ghods Assignee: Trond Norbye
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive 10.1.5.38-diag.zip     Zip Archive 10.1.5.59-diag.zip     Text File core-10.1.5.59-0.log     Text File ns-diag-20110412112502.txt    

 Description   
rebalance is failing on the latest v0.0.0. 132 builds with these error messages :

error messages:

Port server memcached on node 'ns_1@10.1.5.228' exited with status 134. Restarting. Messages: memcached: bucket_engine.c:1876: bucket_engine_release_cookie: Assertion `peh' failed

Rebalance exited with reason {{{badmatch,{error,closed}},
[{mc_client_binary,cmd_binary_vocal_recv,5},
{mc_client_binary,get_vbucket,2},
{ns_memcached,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[{'ns_memcached-default','ns_1@10.1.5.228'},
{get_vbucket,512},
30000]}}
ns_orchestrator002 ns_1@10.1.5.227 11:23:55 - Tue Apr 12, 2011
Control connection to memcached on 'ns_1@10.1.5.228' disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_binary_vocal_recv,
5},
{mc_client_binary,
get_vbucket,
2},
{ns_memcached,
handle_call,
3},
{gen_server,
handle_msg,
5},
{proc_lib,
init_p_do_apply,




steps to reproduce :

1- install
2- create bucket 'default'
3- add node X
4- rebalance
5- remove X
6- rebalance

rebalance fails in step 4 and step 6


 Comments   
Comment by Trond Norbye [ 27/Apr/11 ]
Fixed by the recent bucket_engine and ep_engine fixes
Comment by Farshid Ghods (Inactive) [ 15/May/11 ]
this crash happened on the latest changes again

Thread 1 (Thread 22470):
#0 0x00007f0347746a75 in raise () from /lib/libc.so.6
#1 0x00007f034774a5c0 in abort () from /lib/libc.so.6
#2 0x00007f034773f941 in __assert_fail () from /lib/libc.so.6
#3 0x00007f0346307bb8 in bucket_engine_release_cookie (cookie=0x3ea0da8)
    at bucket_engine.c:2004
#4 0x00007f034380a04f in TapConnection::releaseReference (this=0x3f24e20,
    force=198) at tapconnection.cc:35
#5 0x00007f034381b6d1 in TapConnectionReaperCallback::TapConnectionReaperCallback(EventuallyPersistentEngine&, TapConnection*)


will post the core logs and diags
Comment by Farshid Ghods (Inactive) [ 15/May/11 ]
installed version:

basestar-260-g2330d10

ubuntu 64-bit
Comment by Trond Norbye [ 15/May/11 ]
This is a variant of MB-3764
Comment by Farshid Ghods (Inactive) [ 16/May/11 ]
saw this on last night run against basestar-262

I think this crash happens when we delete a huge bucket few seconds after calling flush on the bucket

Thread 4 (Thread 9793):
#0 0x00000037fc60aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaacfdd5a in wait (this=0x4b33630) at syncobject.hh:31
#2 Dispatcher::run (this=0x4b33630) at dispatcher.cc:85
#3 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x4b3367c)
    at dispatcher.cc:28
#4 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
#5 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

Thread 3 (Thread 9794):
#0 0x00000037fbacd587 in fdatasync () from /lib64/libc.so.6
#1 0x00002aaaaad7aca9 in full_fsync (id=<value optimized out>, flags=2)
    at embedded/sqlite3.c:25510
#2 unixSync (id=<value optimized out>, flags=2) at embedded/sqlite3.c:25558
#3 0x00002aaaaadd0777 in vdbeCommit (db=0x60a3ad8, p=<value optimized out>)
    at embedded/sqlite3.c:13413
#4 0x00002aaaaadd1fbd in sqlite3VdbeHalt (p=0x604bcd8)
    at embedded/sqlite3.c:56514
#5 0x00002aaaaae21be9 in sqlite3VdbeExec (p=0x604bcd8)
    at embedded/sqlite3.c:62196
#6 0x00002aaaaae0055a in sqlite3Step (pStmt=0x604bcd8)
    at embedded/sqlite3.c:57947
#7 sqlite3_step (pStmt=0x604bcd8) at embedded/sqlite3.c:58011
#8 0x00002aaaaad6f199 in PreparedStatement::execute (this=0x464d8aa0)
    at sqlite-pst.cc:73
#9 0x00002aaaaad70648 in SqliteStrategy::execute (
    this=<value optimized out>, query=<value optimized out>)
    at sqlite-strategies.cc:151
#10 0x00002aaaaad72da4 in SqliteStrategy::open (this=0x3926c50)
    at sqlite-strategies.cc:122
#11 0x00002aaaaad6d23d in open (this=0x3926d30) at sqlite-kvstore.hh:175
#12 StrategicSqlite3::reset (this=0x3926d30) at sqlite-kvstore.cc:144
#13 0x00002aaaaad00e6e in EventuallyPersistentStore::flushOneDeleteAll (
    this=<value optimized out>) at ep.cc:1731
#14 0x00002aaaaad0b13d in EventuallyPersistentStore::flushOne (
    this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1869
#15 0x00002aaaaad0b32e in EventuallyPersistentStore::flushSome (
    this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1487
#16 0x00002aaaaad40e1b in Flusher::doFlush (this=0x58db9b0) at flusher.cc:240
#17 0x00002aaaaad416e5 in Flusher::step (this=0x51, d=...,
    tid=std::tr1::shared_ptr (count 0) 0x464d8f20) at flusher.cc:154
#18 0x00002aaaaad41eae in FlusherStepper::callback (this=0x4b338c0, d=...,
    t=<value optimized out>) at flusher.cc:23
#19 0x00002aaaaacff34f in Task::run (this=<value optimized out>, d=...,
    t=<value optimized out>) at dispatcher.hh:139
#20 0x00002aaaaacfdf2b in Dispatcher::run (this=0x49b5b90)
    at dispatcher.cc:119
#21 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x51)
    at dispatcher.cc:28
#22 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
#23 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

Thread 2 (Thread 9795):
#0 0x00000037fc60b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1 0x00002aaaaacfbc19 in wait (this=0x58db900, d=...) at syncobject.hh:42
#2 IdleTask::run (this=0x58db900, d=...) at dispatcher.cc:244
#3 0x00002aaaaacfdf2b in Dispatcher::run (this=0x58da870)
    at dispatcher.cc:119
#4 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x58da8bc)
    at dispatcher.cc:28
#5 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
#6 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

Thread 1 (Thread 9806):
#0 0x00000037fba30265 in raise () from /lib64/libc.so.6
#1 0x00000037fba31d10 in abort () from /lib64/libc.so.6
#2 0x00000037fba296e6 in __assert_fail () from /lib64/libc.so.6
#3 0x00002aaaaaaad968 in bucket_engine_release_cookie (cookie=0x621ebe8)
    at bucket_engine.c:2004
#4 0x00002aaaaad54a59 in TapConnection::releaseReference (this=0x6025620,
    force=78) at tapconnection.cc:35
#5 0x00002aaaaad66867 in TapConnectionReaperCallback (this=0x391e440)
    at tapconnmap.cc:21
#6 TapConnMap::shutdownAllTapConnections (this=0x391e440)
    at tapconnmap.cc:341
#7 0x00002aaaaad1df93 in EventuallyPersistentEngine::destroy (
    this=0x391dea0, force=false) at ep_engine.cc:1734
#8 0x00002aaaaad29ef4 in EvpDestroy (handle=0x391dea0, force=true)
    at ep_engine.cc:96
#9 0x00002aaaaaaae711 in engine_shutdown_thread (arg=0x38e9c60)
    at bucket_engine.c:1099
#10 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
#11 0x00000037fbad3f6d in clone () from /lib64/libc.so.6
Comment by Trond Norbye [ 25/May/11 ]
MB-3777 It manifests itself in different ways

-- Posted from Bugbox for Android
Comment by Trond Norbye [ 25/May/11 ]
MB-3777 It manifests itself in different ways

-- Posted from Bugbox for Android
Comment by Trond Norbye [ 25/May/11 ]
MB-3777 It manifests itself in different ways

-- Posted from Bugbox for Android
Comment by Maria McDuff (Inactive) [ 11/Oct/13 ]
closing as dupes.
Generated at Sun Jul 13 10:44:27 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.