[MB-4334] moxi assert during rebalance/multiget Created: 06/Oct/11  Updated: 13/May/12  Resolved: 13/May/12

Status: Closed
Project: Couchbase Server
Component/s: moxi
Affects Version/s: None
Fix Version/s: 1.8.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Keith Batten (Inactive) Assignee: Steve Yen
Resolution: Fixed Votes: 0
Labels: 1.8.1-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: client side moxi version 1.7.0
cluster version 1.7.1
moxi command line:
3 nodes
centos 5.4 64bit

Attachments: Text File MB-4334-moxi-vvv.txt     File MB-4334_multi_ascii.py     File MB-4334_rebalance.sh    

 Description   
1) Set up 3 node cluster
2) Start up moxi
/opt/moxi/bin/moxi -v http://10.1.5.41:8091/pools/default/bucketsStreaming/default,http://10.1.5.42:8091/pools/default/bucketsStreaming/default,http://10.1.5.134:8091/pools/default/bucketsStreaming/default -Z port_listen=11211 -u root -t 16
3) Start up set/get load against cluster
4) Start up rebalance in a loop (continually add/remove node)
5) load 1000 keys to be used for multiget
6) run ascii multiget in a loop


moxi: cproxy.c:927: cproxy_reserve_downstream: Assertion `d->timeout_tv.tv_sec == 0' failed.

Thread 19 (process 24286):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f3baf0, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f3baf0, flags=0) at event.c:1558
#3 0x0000000000408da6 in main (argc=9, argv=0x7fff83c83bf8) at memcached.c:5061

Thread 18 (process 24287):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f40640, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f40640, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 17 (process 24288):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f44c30, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f44c30, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 16 (process 24289):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f49220, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f49220, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 15 (process 24290):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f4d910, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f4d910, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 14 (process 24291):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f52000, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f52000, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 13 (process 24292):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f566f0, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f566f0, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 12 (process 24293):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f5b150, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f5b150, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 11 (process 24294):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f5f8d0, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f5f8d0, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 10 (process 24295):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f641c0, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f641c0, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 9 (process 24296):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f68ab0, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f68ab0, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 8 (process 24297):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f6d3a0, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f6d3a0, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 7 (process 24298):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f71c90, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f71c90, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 6 (process 24299):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f76580, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f76580, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 5 (process 24301):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f7fb60, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f7fb60, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 4 (process 24302):
#0 0x0000003e5f4d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000447ab4 in epoll_dispatch (base=0x8f84850, tv=<value optimized out>) at epoll.c:404
#2 0x0000000000441f95 in event_base_loop (base=0x8f84850, flags=0) at event.c:1558
#3 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 3 (process 24303):
#0 0x0000003e6000ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000412a50 in assoc_maintenance_thread (arg=<value optimized out>) at assoc.c:220
#2 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 2 (process 24304):
#0 0x0000003e5f4cae46 in poll () from /lib64/libc.so.6
#1 0x000000000044fc56 in Curl_socket_ready ()
#2 0x000000000046671b in Curl_do_perform ()
#3 0x000000000044993b in run_rest_conflate (arg=0x8f8b930) at rest.c:293
#4 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6

Thread 1 (process 24300):
#0 0x0000003e5f430265 in raise () from /lib64/libc.so.6
#1 0x0000003e5f431d10 in abort () from /lib64/libc.so.6
#2 0x0000003e5f4296e6 in __assert_fail () from /lib64/libc.so.6
#3 0x000000000041a5ad in cproxy_reserve_downstream (ptd=0x90bbe00) at cproxy.c:927
#4 0x000000000041a7ae in cproxy_assign_downstream (ptd=0x90bbe00) at cproxy.c:1783
#5 0x000000000040c86b in drive_machine (c=0x92f8600) at memcached.c:1988
#6 0x0000000000442321 in event_base_loop (base=0x8f7ae70, flags=0) at event.c:1308
#7 0x00000000004137e5 in worker_libevent (arg=<value optimized out>) at thread.c:272
#8 0x0000003e600064a7 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003e5f4d3c2d in clone () from /lib64/libc.so.6


 Comments   
Comment by Steve Yen [ 21/Mar/12 ]
Did another code review. Don't see still (yet?) how d->timeout_tv can be non-zero at that point since any release downstream has its downstream_tv cleared.
Comment by Alex Ma [ 23/Apr/12 ]
Another user saw the same assert with multiset/multiget Server 1.8/moxi 1.8

Ascii protocol
Comment by Steve Yen [ 23/Apr/12 ]
Confirming, the crash happened during rebalance?

Were they using ascii or binary moxi protocol?

If it wasn't a rebalance, can I get a stats proxy from a relatively long-running moxi (privatized and/or scrubbed, of course).

Thanks,
Steve
Comment by Alex Ma [ 24/Apr/12 ]
STAT basic:version 1.7.2_5_g3119473
STAT basic:nthreads 5
STAT basic:hostname labs6
STAT memcached:settings:maxbytes 67108864
STAT memcached:settings:maxconns 1024
STAT memcached:settings:tcpport 0
STAT memcached:settings:udpport -2
STAT memcached:settings:inter NULL
STAT memcached:settings:verbosity 0
STAT memcached:settings:oldest 0
STAT memcached:settings:evictions on
STAT memcached:settings:domain_socket NULL
STAT memcached:settings:umask 700
STAT memcached:settings:growth_factor 1.25
STAT memcached:settings:chunk_size 48
STAT memcached:settings:num_threads 5
STAT memcached:settings:stat_key_prefix :
STAT memcached:settings:detail_enabled no
STAT memcached:settings:reqs_per_event 20
STAT memcached:settings:cas_enabled yes
STAT memcached:settings:tcp_backlog 1024
STAT memcached:settings:binding_protocol auto-negotiate
STAT memcached:stats:pid 26795
STAT memcached:stats:uptime 83477
STAT memcached:stats:time 1335240838
STAT memcached:stats:version 1.7.2_5_g3119473
STAT memcached:stats:pointer_size 64
STAT memcached:stats:rusage_user 5940.055230
STAT memcached:stats:rusage_system 8618.746638
STAT memcached:stats:curr_connections 123
STAT memcached:stats:total_connections 36761379
STAT memcached:stats:connection_structures 304
STAT memcached:stats:cmd_get 0
STAT memcached:stats:cmd_set 0
STAT memcached:stats:cmd_flush 0
STAT memcached:stats:get_hits 0
STAT memcached:stats:get_misses 0
STAT memcached:stats:delete_misses 0
STAT memcached:stats:delete_hits 0
STAT memcached:stats:incr_misses 0
STAT memcached:stats:incr_hits 0
STAT memcached:stats:decr_misses 0
STAT memcached:stats:decr_hits 0
STAT memcached:stats:cas_misses 0
STAT memcached:stats:cas_hits 0
STAT memcached:stats:cas_badval 0
STAT memcached:stats:bytes_read 447531408952
STAT memcached:stats:bytes_written 260809934767
STAT memcached:stats:limit_maxbytes 67108864
STAT memcached:stats:accepting_conns 1
STAT memcached:stats:listen_disabled_num 0
STAT memcached:stats:threads 5
STAT memcached:stats:conn_yields 0
STAT proxy_main:conf_type dynamic
STAT proxy_main:behavior:cycle 200
STAT proxy_main:behavior:downstream_max 1024
STAT proxy_main:behavior:downstream_conn_max 128
STAT proxy_main:behavior:downstream_weight 0
STAT proxy_main:behavior:downstream_retry 1
STAT proxy_main:behavior:downstream_protocol 8
STAT proxy_main:behavior:downstream_timeout 5000
STAT proxy_main:behavior:downstream_conn_queue_timeout 200
STAT proxy_main:behavior:connect_timeout 400
STAT proxy_main:behavior:auth_timeout 100
STAT proxy_main:behavior:wait_queue_timeout 200
STAT proxy_main:behavior:time_stats 0
STAT proxy_main:behavior:connect_max_errors 5
STAT proxy_main:behavior:connect_retry_interval 30000
STAT proxy_main:behavior:front_cache_max 200
STAT proxy_main:behavior:front_cache_lifespan 0
STAT proxy_main:behavior:front_cache_spec
STAT proxy_main:behavior:front_cache_unspec
STAT proxy_main:behavior:key_stats_max 4000
STAT proxy_main:behavior:key_stats_lifespan 0
STAT proxy_main:behavior:key_stats_spec
STAT proxy_main:behavior:key_stats_unspec
STAT proxy_main:behavior:optimize_set
STAT proxy_main:behavior:usr
STAT proxy_main:behavior:host
STAT proxy_main:behavior:port 0
STAT proxy_main:behavior:bucket
STAT proxy_main:behavior:port_listen 11212
STAT proxy_main:behavior:default_bucket_name default
STAT proxy_main:stats:stat_configs 108
STAT proxy_main:stats:stat_config_fails 0
STAT proxy_main:stats:stat_proxy_starts 2
STAT proxy_main:stats:stat_proxy_start_fails 0
STAT proxy_main:stats:stat_proxy_existings 107
STAT proxy_main:stats:stat_proxy_shutdowns 0
STAT 11212:default:info:port 11212

STAT 11212:default:info:config_ver 108
STAT 11212:default:info:behaviors_num 4
STAT 11212:default:behavior:downstream_max 1024
STAT 11212:default:behavior:downstream_conn_max 128
STAT 11212:default:behavior:downstream_weight 0
STAT 11212:default:behavior:downstream_retry 1
STAT 11212:default:behavior:downstream_protocol 8
STAT 11212:default:behavior:downstream_timeout 5000
STAT 11212:default:behavior:downstream_conn_queue_timeout 200
STAT 11212:default:behavior:connect_timeout 400
STAT 11212:default:behavior:auth_timeout 100
STAT 11212:default:behavior:wait_queue_timeout 200
STAT 11212:default:behavior:time_stats 0
STAT 11212:default:behavior:connect_max_errors 5
STAT 11212:default:behavior:connect_retry_interval 30000
STAT 11212:default:behavior:front_cache_max 200
STAT 11212:default:behavior:front_cache_lifespan 0
STAT 11212:default:behavior:front_cache_spec
STAT 11212:default:behavior:front_cache_unspec
STAT 11212:default:behavior:key_stats_max 4000
STAT 11212:default:behavior:key_stats_lifespan 0
STAT 11212:default:behavior:key_stats_spec
STAT 11212:default:behavior:key_stats_unspec
STAT 11212:default:behavior:optimize_set
STAT 11212:default:behavior:usr default
STAT 11212:default:behavior:host
STAT 11212:default:behavior:port 0
STAT 11212:default:behavior:bucket
STAT 11212:default:behavior:port_listen 11212
STAT 11212:default:behavior:default_bucket_name default
STAT 11212:default:behavior-0:downstream_weight 0
STAT 11212:default:behavior-0:downstream_retry 1
STAT 11212:default:behavior-0:downstream_protocol 8
STAT 11212:default:behavior-0:downstream_timeout 5000
STAT 11212:default:behavior-0:downstream_conn_queue_timeout 200
STAT 11212:default:behavior-0:connect_timeout 400
STAT 11212:default:behavior-0:auth_timeout 100
STAT 11212:default:behavior-0:usr default
STAT 11212:default:behavior-0:host 172.17.18.153
STAT 11212:default:behavior-0:port 11210
STAT 11212:default:behavior-0:bucket
STAT 11212:default:behavior-1:downstream_weight 0
STAT 11212:default:behavior-1:downstream_retry 1
STAT 11212:default:behavior-1:downstream_protocol 8
STAT 11212:default:behavior-1:downstream_timeout 5000
STAT 11212:default:behavior-1:downstream_conn_queue_timeout 200
STAT 11212:default:behavior-1:connect_timeout 400
STAT 11212:default:behavior-1:auth_timeout 100
STAT 11212:default:behavior-1:usr default
STAT 11212:default:behavior-1:host 172.17.18.151
STAT 11212:default:behavior-1:port 11210
STAT 11212:default:behavior-1:bucket
STAT 11212:default:behavior-2:downstream_weight 0
STAT 11212:default:behavior-2:downstream_retry 1
STAT 11212:default:behavior-2:downstream_protocol 8
STAT 11212:default:behavior-2:downstream_timeout 5000
STAT 11212:default:behavior-2:downstream_conn_queue_timeout 200
STAT 11212:default:behavior-2:connect_timeout 400
STAT 11212:default:behavior-2:auth_timeout 100
STAT 11212:default:behavior-2:usr default
STAT 11212:default:behavior-2:host 172.17.18.152
STAT 11212:default:behavior-2:port 11210
STAT 11212:default:behavior-2:bucket
STAT 11212:default:behavior-3:downstream_weight 0
STAT 11212:default:behavior-3:downstream_retry 1
STAT 11212:default:behavior-3:downstream_protocol 8
STAT 11212:default:behavior-3:downstream_timeout 5000
STAT 11212:default:behavior-3:downstream_conn_queue_timeout 200
STAT 11212:default:behavior-3:connect_timeout 400
STAT 11212:default:behavior-3:auth_timeout 100
STAT 11212:default:behavior-3:usr default
STAT 11212:default:behavior-3:host 172.17.18.58
STAT 11212:default:behavior-3:port 11210
STAT 11212:default:behavior-3:bucket
STAT 11212:default:stats:listening 2
STAT 11212:default:stats:listening_failed 0
STAT 11212:default:frontcache:max 0
STAT 11212:default:frontcache:oldest_live 0
STAT 11212:default:frontcache:tot_get_hits 0
STAT 11212:default:frontcache:tot_get_expires 0
STAT 11212:default:frontcache:tot_get_misses 0
STAT 11212:default:frontcache:tot_get_bytes 0
STAT 11212:default:frontcache:tot_adds 0
STAT 11212:default:frontcache:tot_add_skips 0
STAT 11212:default:frontcache:tot_add_fails 0
STAT 11212:default:frontcache:tot_add_bytes 0
STAT 11212:default:frontcache:tot_deletes 0
STAT 11212:default:frontcache:tot_evictions 0
STAT 11212:default:pstd_stats:num_upstream 13
STAT 11212:default:pstd_stats:tot_upstream 36668702
STAT 11212:default:pstd_stats:num_downstream_conn 11224
STAT 11212:default:pstd_stats:tot_downstream_conn 92673
STAT 11212:default:pstd_stats:tot_downstream_conn_acquired 95418023
STAT 11212:default:pstd_stats:tot_downstream_conn_released 94408123
STAT 11212:default:pstd_stats:tot_downstream_released 59123847
STAT 11212:default:pstd_stats:tot_downstream_reserved 59123792
STAT 11212:default:pstd_stats:tot_downstream_reserved_time 0
STAT 11212:default:pstd_stats:max_downstream_reserved_time 0
STAT 11212:default:pstd_stats:tot_downstream_freed 0
STAT 11212:default:pstd_stats:tot_downstream_quit_server 81449
STAT 11212:default:pstd_stats:tot_downstream_max_reached 0
STAT 11212:default:pstd_stats:tot_downstream_create_failed 0
STAT 11212:default:pstd_stats:tot_downstream_connect_started 92673
STAT 11212:default:pstd_stats:tot_downstream_connect_wait 92673
STAT 11212:default:pstd_stats:tot_downstream_connect 83678
STAT 11212:default:pstd_stats:tot_downstream_connect_failed 146
STAT 11212:default:pstd_stats:tot_downstream_connect_timeout 118
STAT 11212:default:pstd_stats:tot_downstream_connect_interval 928451
STAT 11212:default:pstd_stats:tot_downstream_connect_max_reached 0
STAT 11212:default:pstd_stats:tot_downstream_waiting_errors 0
STAT 11212:default:pstd_stats:tot_downstream_auth 83678
STAT 11212:default:pstd_stats:tot_downstream_auth_failed 28
STAT 11212:default:pstd_stats:tot_downstream_bucket 83678
STAT 11212:default:pstd_stats:tot_downstream_bucket_failed 0
STAT 11212:default:pstd_stats:tot_downstream_propagate_failed 439407
STAT 11212:default:pstd_stats:tot_downstream_close_on_upstream_close 0
STAT 11212:default:pstd_stats:tot_downstream_conn_queue_timeout 0
STAT 11212:default:pstd_stats:tot_downstream_conn_queue_add 0
STAT 11212:default:pstd_stats:tot_downstream_conn_queue_remove 0
STAT 11212:default:pstd_stats:tot_downstream_timeout 31811
STAT 11212:default:pstd_stats:tot_wait_queue_timeout 0
STAT 11212:default:pstd_stats:tot_auth_timeout 28
STAT 11212:default:pstd_stats:tot_assign_downstream 59123792
STAT 11212:default:pstd_stats:tot_assign_upstream 59123792
STAT 11212:default:pstd_stats:tot_assign_recursion 0
STAT 11212:default:pstd_stats:tot_reset_upstream_avail 0
STAT 11212:default:pstd_stats:tot_multiget_keys 417202790
STAT 11212:default:pstd_stats:tot_multiget_keys_dedupe 0
STAT 11212:default:pstd_stats:tot_multiget_bytes_dedupe 0
STAT 11212:default:pstd_stats:tot_optimize_sets 0
STAT 11212:default:pstd_stats:tot_retry 0
STAT 11212:default:pstd_stats:tot_retry_time 0
STAT 11212:default:pstd_stats:max_retry_time 0
STAT 11212:default:pstd_stats:tot_retry_vbucket 0
STAT 11212:default:pstd_stats:tot_upstream_paused 59123792
STAT 11212:default:pstd_stats:tot_upstream_unpaused 59123792
STAT 11212:default:pstd_stats:err_oom 0
STAT 11212:default:pstd_stats:err_upstream_write_prep 0
STAT 11212:default:pstd_stats:err_downstream_write_prep 0
STAT 11212:default:pstd_stats:tot_cmd_time 8514072310757
STAT 11212:default:pstd_stats:tot_cmd_count 59123792
STAT 11212:default:pstd_stats:tot_local_cmd_time 1397715416206
STAT 11212:default:pstd_stats:tot_local_cmd_count 12464580
STAT 11212:default:pstd_stats_cmd:regular_get:seen 40974976
STAT 11212:default:pstd_stats_cmd:regular_get:hits 0
STAT 11212:default:pstd_stats_cmd:regular_get:misses 0
STAT 11212:default:pstd_stats_cmd:regular_get:read_bytes 10804703637
STAT 11212:default:pstd_stats_cmd:regular_get:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_get:cas 28956854
STAT 11212:default:pstd_stats_cmd:regular_get_key:seen 417202790
STAT 11212:default:pstd_stats_cmd:regular_get_key:hits 439615684
STAT 11212:default:pstd_stats_cmd:regular_get_key:misses 1994688
STAT 11212:default:pstd_stats_cmd:regular_get_key:read_bytes 10339072333
STAT 11212:default:pstd_stats_cmd:regular_get_key:write_bytes 175962018687
STAT 11212:default:pstd_stats_cmd:regular_get_key:cas 0
STAT 11212:default:pstd_stats_cmd:regular_set:seen 18148539
STAT 11212:default:pstd_stats_cmd:regular_set:hits 0
STAT 11212:default:pstd_stats_cmd:regular_set:misses 0
STAT 11212:default:pstd_stats_cmd:regular_set:read_bytes 44070039206
STAT 11212:default:pstd_stats_cmd:regular_set:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_set:cas 0
STAT 11212:default:pstd_stats_cmd:regular_add:seen 0
STAT 11212:default:pstd_stats_cmd:regular_add:hits 0
STAT 11212:default:pstd_stats_cmd:regular_add:misses 0
STAT 11212:default:pstd_stats_cmd:regular_add:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_add:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_add:cas 0
STAT 11212:default:pstd_stats_cmd:regular_replace:seen 0
STAT 11212:default:pstd_stats_cmd:regular_replace:hits 0
STAT 11212:default:pstd_stats_cmd:regular_replace:misses 0
STAT 11212:default:pstd_stats_cmd:regular_replace:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_replace:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_replace:cas 0
STAT 11212:default:pstd_stats_cmd:regular_delete:seen 278
STAT 11212:default:pstd_stats_cmd:regular_delete:hits 0
STAT 11212:default:pstd_stats_cmd:regular_delete:misses 0
STAT 11212:default:pstd_stats_cmd:regular_delete:read_bytes 7739
STAT 11212:default:pstd_stats_cmd:regular_delete:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_delete:cas 0
STAT 11212:default:pstd_stats_cmd:regular_append:seen 0
STAT 11212:default:pstd_stats_cmd:regular_append:hits 0
STAT 11212:default:pstd_stats_cmd:regular_append:misses 0
STAT 11212:default:pstd_stats_cmd:regular_append:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_append:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_append:cas 0
STAT 11212:default:pstd_stats_cmd:regular_prepend:seen 0
STAT 11212:default:pstd_stats_cmd:regular_prepend:hits 0
STAT 11212:default:pstd_stats_cmd:regular_prepend:misses 0
STAT 11212:default:pstd_stats_cmd:regular_prepend:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_prepend:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_prepend:cas 0
STAT 11212:default:pstd_stats_cmd:regular_incr:seen 0
STAT 11212:default:pstd_stats_cmd:regular_incr:hits 0
STAT 11212:default:pstd_stats_cmd:regular_incr:misses 0
STAT 11212:default:pstd_stats_cmd:regular_incr:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_incr:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_incr:cas 0
STAT 11212:default:pstd_stats_cmd:regular_decr:seen 0
STAT 11212:default:pstd_stats_cmd:regular_decr:hits 0
STAT 11212:default:pstd_stats_cmd:regular_decr:misses 0
STAT 11212:default:pstd_stats_cmd:regular_decr:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_decr:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_decr:cas 0
STAT 11212:default:pstd_stats_cmd:regular_flush_all:seen 0
STAT 11212:default:pstd_stats_cmd:regular_flush_all:hits 0
STAT 11212:default:pstd_stats_cmd:regular_flush_all:misses 0
STAT 11212:default:pstd_stats_cmd:regular_flush_all:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_flush_all:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_flush_all:cas 0
STAT 11212:default:pstd_stats_cmd:regular_cas:seen 0
STAT 11212:default:pstd_stats_cmd:regular_cas:hits 0
STAT 11212:default:pstd_stats_cmd:regular_cas:misses 0
STAT 11212:default:pstd_stats_cmd:regular_cas:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_cas:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_cas:cas 0
STAT 11212:default:pstd_stats_cmd:regular_stats:seen 0
STAT 11212:default:pstd_stats_cmd:regular_stats:hits 0
STAT 11212:default:pstd_stats_cmd:regular_stats:misses 0
STAT 11212:default:pstd_stats_cmd:regular_stats:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_stats:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_stats:cas 0
STAT 11212:default:pstd_stats_cmd:regular_stats_reset:seen 0
STAT 11212:default:pstd_stats_cmd:regular_stats_reset:hits 0
STAT 11212:default:pstd_stats_cmd:regular_stats_reset:misses 0
STAT 11212:default:pstd_stats_cmd:regular_stats_reset:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_stats_reset:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_stats_reset:cas 0
STAT 11212:default:pstd_stats_cmd:regular_version:seen 0
STAT 11212:default:pstd_stats_cmd:regular_version:hits 0
STAT 11212:default:pstd_stats_cmd:regular_version:misses 0
STAT 11212:default:pstd_stats_cmd:regular_version:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_version:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_version:cas 0
STAT 11212:default:pstd_stats_cmd:regular_verbosity:seen 0
STAT 11212:default:pstd_stats_cmd:regular_verbosity:hits 0
STAT 11212:default:pstd_stats_cmd:regular_verbosity:misses 0
STAT 11212:default:pstd_stats_cmd:regular_verbosity:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_verbosity:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_verbosity:cas 0
STAT 11212:default:pstd_stats_cmd:regular_quit:seen 36668411
STAT 11212:default:pstd_stats_cmd:regular_quit:hits 0
STAT 11212:default:pstd_stats_cmd:regular_quit:misses 0
STAT 11212:default:pstd_stats_cmd:regular_quit:read_bytes 146673644
STAT 11212:default:pstd_stats_cmd:regular_quit:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_quit:cas 0
STAT 11212:default:pstd_stats_cmd:regular_getl:seen 0
STAT 11212:default:pstd_stats_cmd:regular_getl:hits 0
STAT 11212:default:pstd_stats_cmd:regular_getl:misses 0
STAT 11212:default:pstd_stats_cmd:regular_getl:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_getl:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_getl:cas 0
STAT 11212:default:pstd_stats_cmd:regular_unl:seen 0
STAT 11212:default:pstd_stats_cmd:regular_unl:hits 0
STAT 11212:default:pstd_stats_cmd:regular_unl:misses 0
STAT 11212:default:pstd_stats_cmd:regular_unl:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_unl:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_unl:cas 0
STAT 11212:default:pstd_stats_cmd:regular_ERROR:seen 0
STAT 11212:default:pstd_stats_cmd:regular_ERROR:hits 0
STAT 11212:default:pstd_stats_cmd:regular_ERROR:misses 0
STAT 11212:default:pstd_stats_cmd:regular_ERROR:read_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_ERROR:write_bytes 0
STAT 11212:default:pstd_stats_cmd:regular_ERROR:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_get:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_get:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_get:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_get:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_get:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_get:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_get_key:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_get_key:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_get_key:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_get_key:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_get_key:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_get_key:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_set:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_set:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_set:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_set:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_set:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_set:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_add:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_add:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_add:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_add:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_add:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_add:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_replace:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_replace:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_replace:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_replace:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_replace:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_replace:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_delete:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_delete:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_delete:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_delete:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_delete:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_delete:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_append:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_append:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_append:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_append:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_append:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_append:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_prepend:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_prepend:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_prepend:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_prepend:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_prepend:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_prepend:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_incr:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_incr:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_incr:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_incr:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_incr:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_incr:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_decr:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_decr:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_decr:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_decr:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_decr:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_decr:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_flush_all:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_flush_all:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_flush_all:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_flush_all:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_flush_all:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_flush_all:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_cas:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_cas:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_cas:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_cas:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_cas:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_cas:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_stats:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_stats:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_stats:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_stats:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_stats:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_stats:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_stats_reset:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_stats_reset:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_stats_reset:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_stats_reset:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_stats_reset:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_stats_reset:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_version:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_version:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_version:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_version:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_version:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_version:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_verbosity:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_verbosity:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_verbosity:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_verbosity:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_verbosity:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_verbosity:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_quit:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_quit:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_quit:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_quit:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_quit:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_quit:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_getl:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_getl:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_getl:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_getl:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_getl:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_getl:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_unl:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_unl:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_unl:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_unl:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_unl:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_unl:cas 0
STAT 11212:default:pstd_stats_cmd:quiet_ERROR:seen 0
STAT 11212:default:pstd_stats_cmd:quiet_ERROR:hits 0
STAT 11212:default:pstd_stats_cmd:quiet_ERROR:misses 0
STAT 11212:default:pstd_stats_cmd:quiet_ERROR:read_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_ERROR:write_bytes 0
STAT 11212:default:pstd_stats_cmd:quiet_ERROR:cas 0
Comment by Steve Yen [ 24/Apr/12 ]
total connections look large...

  STAT memcached:stats:total_connections 36761379

probably due to client app closing conns rather than reusing them...

  STAT 11212:default:pstd_stats:num_upstream 13
  STAT 11212:default:pstd_stats:tot_upstream 36668702

although num downstream conns also looks large (but probably less likely to be the problem right now)...

  STAT 11212:default:pstd_stats:num_downstream_conn 11224
  STAT 11212:default:pstd_stats:tot_downstream_conn 92673

across 4 server nodes, across 4 worker threads, each worker thread has 701 conns. Seems large.

also, some downstream_timeouts, perhaps due to rebalance slowness...

  STAT 11212:default:pstd_stats:tot_downstream_timeout 31811

until i figure out the underlying reason for the crash, here's some workaround ideas...

- change the client app config to using persistent client-side memcached connections. I recall another user did this and was able to avoid some moxi crash (possibly the same crash?)

- increase downstream_timeout
Comment by Steve Yen [ 25/Apr/12 ]
Replicated crash on OSX moxi 1.8.0-3-g5f01fa8...

Process: moxi [9936]
Path: /Users/steveyen/work/membase/dev/voltron.couchbase/build/moxi/moxi
Identifier: moxi
Version: ??? (???)
Code Type: X86-64 (Native)
Parent Process: bash [8289]

Date/Time: 2012-04-25 15:38:34.254 -0700
OS Version: Mac OS X 10.6.8 (10K549)
Report Version: 6

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Crashed Thread: 4

Application Specific Information:
Assertion failed: (d->timeout_tv.tv_sec == 0), function cproxy_reserve_downstream, file cproxy.c, line 927.
 

Thread 0: Dispatch queue: com.apple.main-thread
0 libSystem.B.dylib 0x00007fff87c41c0a kevent + 10
1 moxi 0x000000010004be07 kq_dispatch + 423
2 moxi 0x0000000100046080 event_base_loop + 576
3 moxi 0x0000000100005ac3 main + 3891 (memcached.c:5063)
4 moxi 0x0000000100001504 start + 52

Thread 1:
0 libSystem.B.dylib 0x00007fff87c3b272 write + 10
1 moxi 0x00000001000314cd log_error_write + 749 (log.c:230)
2 moxi 0x000000010000a282 drive_machine + 242 (memcached.c:3456)
3 moxi 0x00000001000462c2 event_base_loop + 1154
4 moxi 0x00000001000128c0 worker_libevent + 128 (thread.c:274)
5 libSystem.B.dylib 0x00007fff87c61fd6 _pthread_start + 331
6 libSystem.B.dylib 0x00007fff87c61e89 thread_start + 13

Thread 2:
0 libSystem.B.dylib 0x00007fff87c3b272 write + 10
1 moxi 0x00000001000314cd log_error_write + 749 (log.c:230)
2 moxi 0x0000000100001c40 conn_set_state + 160 (memcached.c:668)
3 moxi 0x0000000100026c5d a2b_process_downstream_response + 2397 (cproxy_protocol_a2b.c:764)
4 moxi 0x000000010000ac0c drive_machine + 2684 (memcached.c:1990)
5 moxi 0x00000001000462c2 event_base_loop + 1154
6 moxi 0x00000001000128c0 worker_libevent + 128 (thread.c:274)
7 libSystem.B.dylib 0x00007fff87c61fd6 _pthread_start + 331
8 libSystem.B.dylib 0x00007fff87c61e89 thread_start + 13

Thread 3:
0 libSystem.B.dylib 0x00007fff87c3b272 write + 10
1 moxi 0x00000001000314cd log_error_write + 749 (log.c:230)
2 moxi 0x0000000100027a0e cproxy_process_a2b_downstream_nread + 654 (cproxy_protocol_a2b.c:541)
3 moxi 0x000000010000ac0c drive_machine + 2684 (memcached.c:1990)
4 moxi 0x00000001000462c2 event_base_loop + 1154
5 moxi 0x00000001000128c0 worker_libevent + 128 (thread.c:274)
6 libSystem.B.dylib 0x00007fff87c61fd6 _pthread_start + 331
7 libSystem.B.dylib 0x00007fff87c61e89 thread_start + 13

Thread 4 Crashed:
0 libSystem.B.dylib 0x00007fff87c770b6 __kill + 10
1 libSystem.B.dylib 0x00007fff87d179f6 abort + 83
2 libSystem.B.dylib 0x00007fff87d049bc __pthread_markcancel + 0
3 moxi 0x000000010001a5b7 cproxy_reserve_downstream + 839 (cproxy.c:926)
4 moxi 0x000000010001a774 cproxy_assign_downstream + 196 (cproxy.c:1785)
5 moxi 0x0000000100021905 cproxy_process_upstream_ascii + 901 (cproxy_protocol_a.c:150)
6 moxi 0x0000000100008083 try_read_command + 835 (memcached.c:3205)
7 moxi 0x000000010000a560 drive_machine + 976 (memcached.c:3536)
8 moxi 0x00000001000462c2 event_base_loop + 1154
9 moxi 0x00000001000128c0 worker_libevent + 128 (thread.c:274)
10 libSystem.B.dylib 0x00007fff87c61fd6 _pthread_start + 331
11 libSystem.B.dylib 0x00007fff87c61e89 thread_start + 13

Thread 5:
0 libSystem.B.dylib 0x00007fff87c63a6a __semwait_signal + 10
1 libSystem.B.dylib 0x00007fff87c67881 _pthread_cond_wait + 1286
2 moxi 0x0000000100011941 assoc_maintenance_thread + 305 (assoc.c:223)
3 libSystem.B.dylib 0x00007fff87c61fd6 _pthread_start + 331
4 libSystem.B.dylib 0x00007fff87c61e89 thread_start + 13

Thread 6:
0 libSystem.B.dylib 0x00007fff87c6c932 select$DARWIN_EXTSN + 10
1 libconflate.0.dylib 0x00000001000c2996 Curl_socket_ready + 438
2 libconflate.0.dylib 0x00000001000b67f0 Curl_do_perform + 496
3 libconflate.0.dylib 0x0000000100096719 run_rest_conflate + 801 (rest.c:293)
4 libSystem.B.dylib 0x00007fff87c61fd6 _pthread_start + 331
5 libSystem.B.dylib 0x00007fff87c61e89 thread_start + 13

Thread 4 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000 rbx: 0x0000000100053f05 rcx: 0x0000000100509ac8 rdx: 0x0000000000000000
  rdi: 0x00000000000026d0 rsi: 0x0000000000000006 rbp: 0x0000000100509ae0 rsp: 0x0000000100509ac8
   r8: 0x0000000000000003 r9: 0x0000000000000000 r10: 0x00007fff87c730fa r11: 0x0000000000000206
  r12: 0x000000000000039f r13: 0x000000010005230f r14: 0x0000000100052e36 r15: 0x00007fff70c88790
  rip: 0x00007fff87c770b6 rfl: 0x0000000000000206 cr2: 0x0000000102997000

Binary Images:
       0x100000000 - 0x100064fef +moxi ??? (???) <64AE645C-76DA-41FB-A00C-C4EB4DDFE5D9> /Users/steveyen/work/membase/dev/voltron.couchbase/build/moxi/moxi
       0x100093000 - 0x1000d9fef +libconflate.0.dylib ??? (???) <A2698F74-7010-74E0-F8BF-AB40EA051667> /opt/couchbase/lib/libconflate.0.dylib
       0x1000e7000 - 0x1000e9ff7 +libhashkit.0.dylib ??? (???) <EAF5A8E0-FD91-4E2E-07AC-E891EAE0F003> /opt/couchbase/lib/libhashkit.0.dylib
       0x1000ed000 - 0x1000f3fef +libvbucket.1.dylib 3.0.0 (compatibility 3.0.0) <2581DCBA-8D50-E90A-ECAF-372830B80C0D> /opt/couchbase/lib/libvbucket.1.dylib
       0x1000f9000 - 0x100109fef +libmemcached.6.dylib 7.0.0 (compatibility 7.0.0) <58E35902-BFC2-DF19-5802-7FAC3E201773> /opt/couchbase/lib/libmemcached.6.dylib
    0x7fff5fc00000 - 0x7fff5fc3bdef dyld 132.1 (???) <69130DA3-7CB3-54C8-ABC5-423DECDD2AF7> /usr/lib/dyld
    0x7fff813a2000 - 0x7fff813a6ff7 libmathCommon.A.dylib 315.0.0 (compatibility 1.0.0) <95718673-FEEE-B6ED-B127-BCDBDB60D4E5> /usr/lib/system/libmathCommon.A.dylib
    0x7fff85634000 - 0x7fff85645ff7 libz.1.dylib 1.2.3 (compatibility 1.0.0) <97019C74-161A-3488-41EC-A6CA8738418C> /usr/lib/libz.1.dylib
    0x7fff87c28000 - 0x7fff87de9fef libSystem.B.dylib 125.2.11 (compatibility 1.0.0) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
    0x7fff8835d000 - 0x7fff8839afff com.apple.LDAPFramework 2.0 (120.1) <54A6769E-D7E2-DBE2-EA61-87B9EA355DA4> /System/Library/Frameworks/LDAP.framework/Versions/A/LDAP
    0x7fff88499000 - 0x7fff884abfe7 libsasl2.2.dylib 3.15.0 (compatibility 3.0.0) <76B83C8D-8EFE-4467-0F75-275648AFED97> /usr/lib/libsasl2.2.dylib
    0x7fff884ef000 - 0x7fff88510fff libresolv.9.dylib 41.1.0 (compatibility 1.0.0) <9410EC7F-4D24-6740-AFEE-90405750FAD7> /usr/lib/libresolv.9.dylib
    0x7fff8a461000 - 0x7fff8a580fe7 libcrypto.0.9.8.dylib 0.9.8 (compatibility 0.9.8) <14115D29-432B-CF02-6B24-A60CC533A09E> /usr/lib/libcrypto.0.9.8.dylib
    0x7fff8a6b3000 - 0x7fff8a6b4ff7 com.apple.TrustEvaluationAgent 1.1 (1) <5952A9FA-BC2B-16EF-91A7-43902A5C07B6> /System/Library/PrivateFrameworks/TrustEvaluationAgent.framework/Versions/A/TrustEvaluationAgent
    0x7fff8a6f7000 - 0x7fff8a734ff7 libssl.0.9.8.dylib 0.9.8 (compatibility 0.9.8) <F743389F-F25A-A77D-4FCA-D6B01AF2EE6D> /usr/lib/libssl.0.9.8.dylib
    0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib

 
Comment by Steve Yen [ 25/Apr/12 ]
Added scripts from Keith B. and moxi -vvv output leading up to the crash
Comment by Steve Yen [ 25/Apr/12 ]
http://review.couchbase.org/15232
Comment by Steve Yen [ 26/Apr/12 ]
Test using Keith's scripts has been running for several hours now with the fix, without a crash. It only stopped because somebody reset the cluster.
Comment by Steve Yen [ 10/May/12 ]
reopening to get it out of the current sprint
Generated at Sun Jul 13 03:33:48 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.