[RCBC-111] SEGV during set operation when using/running ruby client multi-threaded Created: 16/Jan/13  Updated: 13/Jan/14  Resolved: 06/May/13

Status: Closed
Project: Couchbase Ruby client library
Component/s: library
Affects Version/s: 1.2.1
Fix Version/s: 1.3.0
Security Level: Public

Type: Bug Priority: Major
Reporter: Charles Leu Assignee: Sergey Avseyev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Foundation:
  Linux: "distro": "CentOS release 5.8 (Final)"
  Kernel: "2.6.18-274.17.1.el5xen #1 SMP Tue Jan 10 18:06:37 EST 2012 x86_64 x86_64 x86_64 GNU/Linux"
  Ruby: "ruby": "1.9.2p320 (2012-04-20 revision 35421) [x86_64-linux]"
Couchbase:
  Couchbase Cluster: "nodes": ["http://server0065-vm12.listenlogic.com:4001", "http://server0065-vm13.listenlogic.com:4001", "http://server0065-vm14.listenlogic.com:4001", "http://server0065-vm15.listenlogic.com:4001"]
  Couchbase Server: "version": "2.0.0-1976-rel-community"
  Couchbase Ruby Client: "version": 1.2.1

Issue Links:
Duplicate
is duplicated by RCBC-110 SEGV during set operation when using/... Closed

 Description   
I have a ruby test program that creates a number of threads, and per thread takes a batch of documents, and stores/inserts them into couchbase. Each thread will be executing ruby code much like:

  # Insert each record (Hash) into Couchbase
  records.each { |record|
    key = record[:id].to_s
    cb.set(key, record) if !key.nil?
  }

The program starts, and subsequently SEGVs and yields a stack trace, for example:

[New Thread 0x40fbf940 (LWP 13650)]
[New Thread 0x41a33940 (LWP 13651)]
[New Thread 0x40e58940 (LWP 13652)]
[New Thread 0x400d5940 (LWP 13653)]
[New Thread 0x40ed9940 (LWP 13654)]
[New Thread 0x4020c940 (LWP 13655)]
[New Thread 0x40977940 (LWP 13656)]
[New Thread 0x40440940 (LWP 13657)]
[New Thread 0x41b0c940 (LWP 13658)]
[New Thread 0x40764940 (LWP 13659)]
[New Thread 0x412f0940 (LWP 13660)]
[New Thread 0x41422940 (LWP 13661)]
[New Thread 0x40156940 (LWP 13662)]
[New Thread 0x41cd4940 (LWP 13663)]
[New Thread 0x41065940 (LWP 13664)]
[New Thread 0x4202a940 (LWP 13665)]
[New Thread 0x40c67940 (LWP 13666)]
[New Thread 0x41dc2940 (LWP 13667)]
[New Thread 0x41734940 (LWP 13668)]
[New Thread 0x407e5940 (LWP 13669)]
[New Thread 0x41b8d940 (LWP 13670)]
[New Thread 0x417b5940 (LWP 13671)]
[New Thread 0x41c0e940 (LWP 13672)]
[New Thread 0x406b1940 (LWP 13673)]
[New Thread 0x41e43940 (LWP 13674)]
[New Thread 0x41f4c940 (LWP 13675)]
[New Thread 0x40a5d940 (LWP 13676)]
[New Thread 0x40866940 (LWP 13677)]
[New Thread 0x40ade940 (LWP 13678)]
[New Thread 0x40b5f940 (LWP 13679)]
[New Thread 0x408e7940 (LWP 13680)]
[New Thread 0x41ec4940 (LWP 13681)]
[New Thread 0x40622940 (LWP 13682)]
[New Thread 0x4164d940 (LWP 13683)]
Fetching 1024 authors starting at offset 0
[New Thread 0x42a2b940 (LWP 13685)]
[Thread 0x42a2b940 (LWP 13685) exited]
Fetching 1024 authors starting at offset 1024
Fetching 1024 authors starting at offset 2048
Fetching 1024 authors starting at offset 3072
Fetching 1024 authors starting at offset 4096

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x4020c940 (LWP 13655)]
0x00002aaaac509a1d in cb_strip_key_prefix (bucket=0x2aaab4414300, key=46912654151960) at utils.c:471
471 rb_str_update(key, 0, RSTRING_LEN(bucket->key_prefix_val), cb_vStrEmpty);
(gdb) where
#0 0x00002aaaac509a1d in cb_strip_key_prefix (bucket=0x2aaab4414300, key=46912654151960) at utils.c:471
#1 0x00002aaaac50bd64 in cb_storage_callback (handle=<optimized out>, cookie=0x2aaab4415cf0, operation=LCB_SET, error=LCB_SUCCESS, resp=0x402099b0) at store.c:52
#2 0x00002aaaac72dfd4 in ?? () from /usr/lib64/libcouchbase.so.2
#3 0x00002aaaac72bb1a in ?? () from /usr/lib64/libcouchbase.so.2
#4 0x00002aaaac504c5c in callbacks_run (callbacks=<optimized out>) at multithread_plugin.c:600
#5 loop_run_select (argp=<optimized out>) at multithread_plugin.c:780
#6 0x00002b0ebc3e8657 in rb_ensure (b_proc=0x2aaaac504aa0 <loop_run_select>, data1=46912653493520, e_proc=0x2aaaac504f30 <loop_select_cleanup>, data2=46912654152000) at eval.c:743
#7 0x00002aaaac5039fc in loop_run (loop=<optimized out>) at multithread_plugin.c:1056
#8 lcb_io_run_event_loop (iops=<optimized out>) at multithread_plugin.c:1177
#9 0x00002aaaac736aaa in lcb_wait () from /usr/lib64/libcouchbase.so.2
#10 0x00002aaaac50c275 in cb_bucket_store (self=<optimized out>, argv=<optimized out>, argc=<optimized out>, cmd=<optimized out>) at store.c:162
#11 cb_bucket_set (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at store.c:270
#12 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
    at vm_insnhelper.c:402
#13 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359a38, num=2, blockptr=0x1, flag=0, id=21448, me=0x20c7ae0, recv=33100200) at vm_insnhelper.c:528
#14 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
#15 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
#16 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x329afe0, self=30791040, argc=1, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
#17 0x00002b0ebc4fa2a7 in vm_yield (argv=<optimized out>, argc=<optimized out>, th=<optimized out>) at vm.c:588
#18 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:740
#19 rb_yield (val=42406680) at vm_eval.c:750
#20 0x00002b0ebc3b9c9c in rb_ary_each (ary=42907960) at array.c:1427
#21 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
    at vm_insnhelper.c:402
#22 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359b40, num=0, blockptr=0x329afe1, flag=270, id=424, me=0x1ac7760, recv=42907960) at vm_insnhelper.c:528
#23 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
#24 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
#25 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x24947d0, self=27728360, argc=1, argv=0x2aaaad25a0d0, blockptr=0x0, cref=0x0) at vm.c:558
#26 0x00002b0ebc4fa1ef in rb_vm_invoke_proc (th=0x23a2dc0, proc=0x24947d0, self=27728360, argc=1, argv=0x2aaaad25a0d0, blockptr=0x0) at vm.c:604
#27 0x00002b0ebc3f01bd in proc_call (argc=1, argv=0x2aaaad25a0d0, procval=<optimized out>) at proc.c:556
#28 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
    at vm_insnhelper.c:402
#29 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359cf8, num=1, blockptr=0x1, flag=2, id=5912, me=0x1b0a800, recv=33098120) at vm_insnhelper.c:528
#30 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
#31 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
#32 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x2aaaad359e28, self=33097800, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
#33 0x00002b0ebc4fa3f8 in vm_yield (argv=<optimized out>, argc=<optimized out>, th=<optimized out>) at vm.c:588
#34 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:740
#35 loop_i () at vm_eval.c:798
#36 0x00002b0ebc3e8a4c in rb_rescue2 (b_proc=0x2b0ebc4fa3c0 <loop_i>, data1=0, r_proc=0x0, data2=0) at eval.c:646
#37 0x00002b0ebc4e8d2e in rb_f_loop (self=33097800) at vm_eval.c:826
#38 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
    at vm_insnhelper.c:402
#39 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359e00, num=0, blockptr=0x2aaaad359e29, flag=8, id=2856, me=0x1a9a1a0, recv=33097800) at vm_insnhelper.c:528
#40 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
#41 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
#42 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x2aaaad359f30, self=33097800, argc=1, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
#43 0x00002b0ebc4fa341 in rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm.c:588
#44 catch_i (tag=2082830, data=<optimized out>) at vm_eval.c:1459
#45 0x00002b0ebc4e7943 in rb_catch_obj (tag=2082830, func=0x2b0ebc4fa300 <catch_i>, data=0) at vm_eval.c:1534
#46 0x00002b0ebc4e8bdd in rb_f_catch (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1510
#47 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
    at vm_insnhelper.c:402
#48 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359f08, num=1, blockptr=0x2aaaad359f31, flag=8, id=2840, me=0x1a99e20, recv=33097800) at vm_insnhelper.c:528
#49 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
#50 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
#51 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x23f5230, self=33097800, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
#52 0x00002b0ebc4fa1ef in rb_vm_invoke_proc (th=0x23a2dc0, proc=0x23f5230, self=33097800, argc=0, argv=0x224d2d0, blockptr=0x0) at vm.c:604
#53 0x00002b0ebc501bd2 in thread_start_func_2 (th=0x23a2dc0, stack_start=0x4020c128) at thread.c:450
#54 0x00002b0ebc501cef in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:394
#55 0x0000003cb560677d in start_thread () from /lib64/libpthread.so.0
#56 0x0000003cb46d3c1d in clone () from /lib64/libc.so.6
#57 0x0000000000000000 in ?? ()

Note that this behavior can be recreated consistently.


 Comments   
Comment by Charles Leu [ 17/Jan/13 ]
Notes and Questions:
1) Running my test single threaded does _not_ recreate this problem.
2) Running my test multi threaded yields this problem reliably (100% of the time so far).
3) To date I've only used/tested Couchbase via the Ruby client library.
4) How many other developers use the Ruby client library, and of those that use it, are their apps multi-threaded?
5) I noted that the tests for the Ruby client library didn't appear to have any that are multi-threaded.
6) Do any of your other client tests using threading, and attempt to perform set operations simultaneously?
7) Perhaps the 'Component" for this issue be changed to Ruby Client Library; please do so if you believe that is more correct.
Comment by Matt Ingenthron [ 17/Jan/13 ]
Valid questions Charles. Let me pass this to our Ruby client lead and have him look it over. We can move it to the RCBC project if it's in fact different.
Comment by Matt Ingenthron [ 17/Jan/13 ]
Sergey: Can you look at this to see if it should be moved to RCBC?
Comment by Charles Leu [ 17/Jan/13 ]
I've learned that this particular issue is/was due to the sharing of the Couchbase connection amongst threads. When the test app uses thread-local storage to house/use a connection per thread, things work without ado.

Perhaps you want to put logic in place (e.g. a monitor per connection) to ensure connection sharing doesn't lead to a SEGV or other problem (e.g. deadlock). I guess the client could also implement such logic, however using thread-local storage with multiple connections is clearly a better option.

Note that I'm fine with things as they are. However if the client lib had the aforementioned logic in place, it would have saved us some time (i.e. on both my end and yours).

Thank you for your consideration and support.
Comment by Sergey Avseyev [ 22/Jan/13 ]
Hi, Charles.

As you pointed out above, the libcouchbase handle itself isn't supposed to be shared between threads, and currently the caller should protect the connection instance either by copying it with Bucket#dup for each thread, which will lead to establishing new connection, or implementing thread-safe wrapper which will make sure that the handle won't be accessed from several threads.
Comment by Mike Evans [ 21/Mar/13 ]
Interesting, just ran into this segfault in the Ruby client 1.2.2. I also have a multi-threaded app where each thread needs to do get and set operations against CB. I get that we shouldn't be sharing the handle between threads, and the Thread.current storage in 1.2.2 does fix this for connections initiated from the Couchbase module.

The packaged CouchbaseStore ActiveSupport::Cache implementation however is not threadsafe in this regard--it creates one connection to the bucket and shares that across any threads, potentially resulting in the segfault.

The fix is rudimentary, so look for a pull request shortly.
Comment by Sergey Avseyev [ 21/Mar/13 ]
Thanks, will be glad to review
Comment by Mike Evans [ 21/Mar/13 ]
Just pushed:
http://review.couchbase.org/#/c/25294/

I'm wondering if it wouldn't be helpful in some circumstances to have the option of an imposed mutex/monitor instead of a thread local variable in the Couchbase module itself. Thinking about my own app--dozens of Couchbase Models, each with their own Bucket connection--then potentially dozens of threads. Seems like that would be spinning up a lot of connections to the server per-process.

Comment by Sergey Avseyev [ 21/Mar/13 ]
Yeah, I'm planning to implement pooling there, to allow to re-use and share connections if it is possible, but once you will store docs in different buckets, it will require separate connection per bucket
Comment by Mike Evans [ 02/May/13 ]
Pushed a Connection Pool branch for review:

http://review.couchbase.org/#/c/26028/
which relies on:
https://github.com/mperham/connection_pool

It's mainly targeted at ActiveSupport::Cache::CouchbaseStore, however could be used more generically for any Couchbase bucket connections:

@pool = Couchbase::ConnectionPool.new # => default connection args and 5 connections in the pool
@pool.set('foo', 'bar') # => checks our a connection to run set on--will block until a connection is available
@pool.get('foo') # => 'bar'

I think I implemented all operations but might have missed some.


Comment by Sergey Avseyev [ 06/May/13 ]
Merged to upstream
https://github.com/couchbase/couchbase-ruby-client/commit/5454381ca3aa7ee8a02e41e0d8ff87e8408504a2
Generated at Tue Sep 02 21:20:43 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.