Memcached crashes frequently with Access Violation Error.
I checked my windows event logs today after some very flakey NS/Memcached behavior, and found that I have this message:
Over and over every couple hours:
7/16 6:13:49 AM
7/16 3:41:50 AM
7/16 1:11:31 AM
7/15 10:22:20 PM
7/15 9:06:01 PM
7/15 7:52:18 PM
...etc...
This happens on 2 different servers (different physical hardware, they are not VM instances), both are Windows Server 2008 32 bit.
"exception code 0xc0000005" is a memory access violation.
The messages occur every 1 to 3 hours, even through the middle of the night when there would be very light cache use.
Has anyone else encountered this?
I also found this crash report in the logs, maybe it would help explain what is happening?
INFO REPORT <6099.18142.0> 2010-07-15 09:24:49
===============================================================================
Deleting bucket "default" from "{server 1's IP}":11211
INFO REPORT <6099.18142.0> 2010-07-15 09:24:49
===============================================================================
Deleting bucket "default" from "{server 2's IP}":11211
SUPERVISOR REPORT <6099.77.0> 2010-07-15 09:24:49
===============================================================================
Reporting supervisor {local,ns_node_disco_sup}
Child process
errorContext start_error
reason
{{badmatch,{error,timeout}},
[{ns_node_disco,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
pid undefined
name ns_node_disco
start_function {ns_node_disco,start_link,[]}
restart_type permanent
shutdown 10
child_type worker
CRASH REPORT <6099.18223.0> 2010-07-15 09:24:49
===============================================================================
Crashing process
initial_call {ns_node_disco,init,['Argument__1']}
pid <6099.18223.0>
registered_name []
error_info
{exit,{{badmatch,{error,timeout}},
[{ns_node_disco,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
[{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}
ancestors
[ns_node_disco_sup,ns_server_sup,ns_cluster,ns_server_cluster_sup,
<6099.58.0>]
messages []
links [<6099.77.0>]
dictionary []
trap_exit false
status running
heap_size 4181
stack_size 24
reductions 436
I recreated the "default" bucket, and it seems to have fixed this issue. It is no longer dumping stuff to the log every second, and memcached has run for over 3 hrs without a crash. So I guess the moral of the story is: "never delete the default bucket!" :)
Indeed, you seem to have found an issue and solved it. Sorry for the trouble, we'll get it filed and fixed in the next release. It seems creating a 1MByte "default" bucket for now is probably an okay workaround. It would use ~4Mbyte per server actually.
Regarding the log, it's more or less a ring buffer, so it would have topped out at 100MByte I believe which is why we documented a certain amount of disk space required. Don't worry, we wouldn't (intentionally) fill up your filesystem! :)
I'm watching dump_logs.bat do its thing (I aparently have 80 MB of raw logs, so this is going to be a huge log file...) and I'm seeing it do this over and over [I]every second[/I]...
INFO REPORT <6099.6014.0> 2010-07-14 15:49:23
===============================================================================
Deleting bucket "default" from "{server 1's IP}":11211
INFO REPORT <6099.6014.0> 2010-07-14 15:49:23
===============================================================================
Deleting bucket "default" from "{server 2's IP}":11211
I did indeed remove the default bucket after install and create 3 custom ones of my own... but I doubt NorthScale should be re-deleting "default" over and over on the cluster nodes every second...?