[CCBC-158] Unable to store 1M objects in a loop ( Internal error (0x6), Unknown implicit send message op=1 , eventually crash ) Created: 02/Jan/13  Updated: 06/Feb/13  Resolved: 07/Jan/13

Status: Closed
Project: Couchbase C client library libcouchbase
Component/s: library
Affects Version/s: 2.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Jeroen van Bemmel Assignee: Trond Norbye
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 2 VMs running Couchbase 2.0 enterprise, with 2 buckets of 256MB each. libcouchbase from git ( latest pull 1/1/2013 ). Running own event loop calling event_base_loop in a separate thread

When executing 1M lcb_store calls in a loop, storing items like "user0000001 = <16 hex bytes>" in the default bucket, I get a number of error callbacks saying "Internal error (0x6), Unknown implicit send message op=1" ( opcode 0x1 = PROTOCOL_BINARY_CMD_SET ). Finally, after a last "Internal error (0x6), Unknown implicit send message op=46" the process gets killed
Unfortunately I cannot tell where the crash happens, as running with gdb results in:
Couchbase event loop starting/build/buildd/gdb-7.4-2012.04/gdb/infrun.c:3189: internal-error: handle_inferior_event: Assertion `inf' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

After a number of attempts in both debug and release mode, I ended up with almost 350K out of 1M items entered. However, I don't manage to run the loop to its completion; something always breaks before it completes

Comment by Jeroen van Bemmel [ 02/Jan/13 ]
Using gdb 7.5.1 I was able to generate a stacktrace:
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7fe4700 (LWP 32200)]
0x00007ffff635a425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0 0x00007ffff635a425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff635db8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7bc2501 in ringbuffer_consumed (buffer=0x7d0048, nb=180) at src/ringbuffer.c:263
#3 0x00007ffff7bb73af in do_send_data (c=0x7d0000) at src/event.c:270
#4 0x00007ffff7bb74af in lcb_server_event_handler (sock=10, which=4, arg=0x7d0000) at src/event.c:308
#5 0x00007ffff777af64 in event_process_active_single_queue (activeq=0x79c010, base=0x7a0000) at event.c:1350
#6 event_process_active (base=<optimized out>) at event.c:1420
#7 event_base_loop (base=0x7a0000, flags=0) at event.c:1621
#8 0x000000000040cf12 in RegistrationDb::authDbEventThread(void*) ()
#9 0x00007ffff734de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007ffff6417cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

The abort at src/ringbuffer.c:263 is triggered:
void ringbuffer_consumed(ringbuffer_t *buffer, lcb_size_t nb)
    lcb_size_t n = ringbuffer_read(buffer, NULL, nb);
    if (n != nb) {

Comment by Jeroen van Bemmel [ 02/Jan/13 ]
During a different run, Valgrind reveals the following:
Couchbase config callback: 1==32436== Invalid write of size 1
==32436== at 0x4C2D1A5: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==32436== by 0x4E41034: ringbuffer_write (ringbuffer.c:127)
==32436== by 0x4E405E5: lcb_server_buffer_start_packet (packet.c:46)
==32436== by 0x4E408F7: lcb_server_start_packet (packet.c:138)
==32436== by 0x4E4491E: lcb_store (store.c:127)
==32436== by 0x40CC92: RegistrationDb::populate(util::string const&, util::string const&, util::string const&) (RegistrationDb.cpp:209)
==32436== by 0x405BEE: Registrar::Registrar(char const*, unsigned int, char const*, char*) (HARegistrar.cpp:65)
==32436== by 0x40414F: main (StatefulHAProxy.cpp:238)
==32436== Address 0x1e is not stack'd, malloc'd or (recently) free'd
Comment by Jeroen van Bemmel [ 02/Jan/13 ]
I should add that my program is linked with gperftools-2.0 ( malloc replacement )
Comment by Jeroen van Bemmel [ 02/Jan/13 ]
When I remove the separate thread for the event loop and call lcb_wait after each lcb_store, the system does not crash anymore, so I guess libcouchbase not being multi-thread safe also applies to the event handler loop. However, even when using only a single thread, I still don't reach 1M objects
Comment by Matt Ingenthron [ 02/Jan/13 ]
What happens in this single threaded case that prevents reaching 1M objects, if not a crash?
Comment by Trond Norbye [ 02/Jan/13 ]
Can you post the code somewhere (or email it to trond.norbye@couchbase.com) and I'll take a look at it. I've used libcouchbase in a multithreaded environment earlier so I is supported to be MT-safe)


Comment by Jeroen van Bemmel [ 02/Jan/13 ]
@Matt I got temporary failures in the store callback ( code 0xB IIRC ). I suppose I could change my code to retry, but that shouldn't be necessary with only a single client ( and isn't so straight forward from the callback handler, which was getting called by a separate thread )

@Trond the code is integrated with my own application, bit difficult to extract. You can easily reproduce it by taking your own "eventloop" example, changing the code to call the eventloop from a new pthread and adding a for loop with 1M times store( "user0xxxxx", "0123456789012345" ) after starting the thread ( from main() )
Comment by Trond Norbye [ 06/Jan/13 ]
Jeroen: I made an MT-version of the example program and uploaded it to http://review.couchbase.org/#/c/23743/ I ran that with 10 threads and 1 000 000 set operations per thread in two modes: one time where I batch up everything before I start sending it to the server, and a second time where the return of operation triggers the next operation. I was not able to make that crash...

Comment by Jeroen van Bemmel [ 06/Jan/13 ]
Hi Trond,

Thanks for following up. After briefly looking through your code, I think the difference is that in my case the lcb_store calls are made from a different thread than the one running the event loop. I can see how one can chain calls into libcouchbase by calling from a callback function, however the question is how one would do this in a regular program.

Suppose you have a thread listening on a socket, and doing a database lookup based on the contents of a packet it receives. How would use libcouchbase in such a case? Is there another way besides running the event loop after performing a lcb_get call, essentially making the code blocking?
Comment by Trond Norbye [ 07/Jan/13 ]
You need to synchronize the access to libcouchbase (and libevent). None of them use locks internally, but operates on their own "instance".

I'm not sure I understand your problem when you say in "regular program". If you hook libcouchbase into the same event loop you're currently doing your database requests in everything should work just as you want?? In the "callback" you have when you receive the result from your database you can simply schedule a libcouchbase operation. When you return to the event loop the event loop will continue to drive the libcouchbase operations as well? Or isn't your application using an event loop?

Part of the problem is that libcouchbase does more than simply ship a command to a server. It monitor the cluster to react to changes in the topology, send and receive data to the individual data nodes etc. If it wasn't for this you could in theory call lcb_flush_buffers() to have it send data and process all of the available input. It would however not detect changes in the cluster.

If you can't move your database code into an event framework (the same you use for libcouchbase), you could always create a dedicated libcouchbase thread running the event loop and just synchronize the communication channel between your "database" thread and the libcouchbase thread... (or even simpler you could have a threadpool of libcouchbase instances running in synchronized mode working off a single queue of operations). Anyway, it's hard to give any recommendations/suggestions when I don't understand your problem ;)
Comment by Jeroen van Bemmel [ 07/Jan/13 ]
I guess my problem is that my socket code does not use libevent or some other event framework. This makes it hard to use libcouchbase, because it implies a need to synchronize between threads as you point out. This in turn kind of defeats the purpose of using an event-based database client framework for performance reasons
Comment by Trond Norbye [ 07/Jan/13 ]
Yes, you would be better off to fix your socket code to be non-blocking and asynchronous so you could fit everything into an event loop...
Comment by Matt Ingenthron [ 07/Jan/13 ]
What is your code using? Something like /dev/poll on Solaris or epoll on linux or kqueue on BSD?

If so, you can probably continue to use that and then configure the event lib to use the same thing underneath.
Comment by Mark Nunberg [ 07/Jan/13 ]
I would second the proposition of having a 'libcouchbase thread' that handles command and communicates using some kind of queue. The problem you are having seems to be socket buffer synchronization.

Specifically whenever you schedule an lcb command (i.e. "lcb_store") it writes a packet to our own socket buffer. Considering libcouchbase is asynchronous, it will flush that buffer at some unknown time to the network (thereby modifying the socket buffer).

Therefore even if you ensure that you lock your handle with things like pthread_mutex_lock and such, you will still encounter issues unless you can also control when the socket buffer is actually flushed (i.e. when 'lcb_wait' is called).

As such, you'd probably need a libcouchbase "server" thread.

One possible implementation is having this thread run an event framework (e.g. libev) which will drive libcouchbase. In addition to the thread running libcouchbase asynchronously, it will also accept command proxies over a queue (i.e. proxied requests for "lcb_store", "lcb_get", etc.).

You may have this thread then schedule (within its own thread context) the command to libcouchbase. The thread will know about when there are commands available to shcedule by determining if there are items in the queue. This may be done in a non-blocking fashion by potentially constructing a 'socketpair', in which a byte of dummy data is written each time something si added to the queue.

Subsequently, the "libcouchbase thread" will receive callbacks. You can proxy these callbacks out to your own user-defined callbacks.

While this sounds complicated, I am guessing the whole thing (assuming you actually have a "Queue" implemented [ which being C++, you have several native structures to fit this model ]) shouldn't take up more than 300 lines of code.
Comment by Jeroen van Bemmel [ 07/Jan/13 ]
That would be possible, sure. For now I've decided to move away from using CouchBase/libcouchbase though, I'm only following up because I started this issue and to provide feedback to you guys.

If you want to make it easier for people to use libcouchbase, from my perspective you could
a) Document a bit better / more extensively what the implications are for the application. Basically the information from this thread would be good to have known upfront
b) Implement something like you describe above

I believe it could work well, the challenge is that the learning/adoption curve can be steep
Comment by Trond Norbye [ 07/Jan/13 ]
Earlier today I pushed through http://review.couchbase.org/#/c/23749/ giving you manual pages for "everything"... if you check out the lcb_attributes (looks better when you see it through man: http://review.couchbase.org/#/c/23749/3/man/svr4/man5/lcb_attributes.5 ) it'll describe how the MT thing fits together..

Comment by Mark Nunberg [ 07/Jan/13 ]
Looking at that manpage, the locking needs to be primarily focused on the state of the ringbuffers, meaning that two threads cannot be modifying the ringbuffer at any given time.

Locking simply before scheduling operations is not sufficient because at the same time, the event loop might have woken up and the lcb_t would have started writing/reading data to/from the socket. A model similar to the one I outlined would work well - but that is pretty much the only way to get true async operations from libcouchbase from multiple threads. A different option would be to lock it whenever calling lcb_wait() but this can result in a deadlock if someone tries to use lcb_store from within a callback itself.

Perhaps some stuff in this thread (i.e. jira thread) should be broken out into its own discussion about MT-safety and libcouchbase - the specific bits about maintaining ringbuffer/command log integrity are not immediately obvious; neither are its solutions
Generated at Fri Oct 24 06:40:44 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.