[CCBC-147] function lcb_wait works infinitly after trying to connect (by calling lcb_connect) to Couchbase server that is on Pending state Created: 19/Dec/12 Updated: 05/Mar/13 Resolved: 05/Mar/13 |
|
| Status: | Closed |
| Project: | Couchbase C client library libcouchbase |
| Component/s: | library |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.4 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Haster | Assignee: | Sergey Avseyev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
I reproduced a bug on Linux Rad Hat 5.0 x64 and on Windows XP x32
Couchbase server version 1.8.1 libcouchbase version 2.0.1 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Start server with one node.
Create many buckets, that uses all allowed memory -> Server have to chnage status on Pending (In my situation server stay in Pending for all time and can't change it to Up) After that try to connect to one of the bucket by calling lcb_connect. Call lcb_wait to wait for connection is done. As a result lcb_wait work infinitly and timeout doesn't happen Below I share call stack in Red Hat Linux: #0 0x00000034350d4473 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00002ad81b1dccc9 in ?? () from ./lib/libevent-2.0.so.5 #2 0x00002ad81b1c9cdc in event_base_loop () from ./lib/libevent-2.0.so.5 #3 0x00002ad8174c1dc6 in lcb_wait () from ./lib/libcouchbase.so.2 #4 0x000000000043c0e1 in Couchbase::connect (this=0x60df508) at couchbase_loader_source/couchbase.cpp:152 #5 0x000000000043c8b7 in connect_to_bucket (cbase=..., config=..., bucket_name=...) at couchbase_loader_source/couchbase.cpp:478 #6 0x0000000000440924 in couchbase_loader::writer_thread::writer_thread (this=0x60df480, config=..., bucket_name=..., dbh=...) at couchbase_loader_source/writer_thread.cpp:25 #7 0x000000000042168c in main (argc=1, argv=0x7fff22162028) at couchbase_loader_source/couchbase_loader.cpp:175 |
| Comments |
| Comment by Haster [ 26/Dec/12 ] |
|
I’ve investigated a little this problem and I have additional info.
My error happened scenario: First of all I try to connect to some bucket (Primary for example) which is absent lcb_connect lcb_wait here I get error. Then I create bucket, destroy instance by calling lcb_destroy and try connect to bucket again: func_create_bucket() sleep(some_time) lcb_destroy() lcb_create() lcb_connect() lcb_wait() After that I receive 3 messages from my error_callback function, where errinfo is “Number of vBuckets must be a power of two > 0 and <= 65536” And deadlock happened in lcb_wait. |
| Comment by Haster [ 27/Dec/12 ] |
|
I've investigated this issue more deeply.
The problem take place then connection timeout happend. connect lcb_wait lcb_io_run_event_loop (for Windows Platform) ... select if (ret == 0) <-- 0 then timeout happend { <-- Here our problems begin } After select returns zero code tries to reconnect in callback function(is it good idea? Maybe it is better to exit from loop and return error?), but some structures (as I think) are corrupted after that. After that event_loop calls select function, but timeout variable contain zero value (infinitly execution)... As I think, problem is near here |
| Comment by Sergey Avseyev [ 10/Jan/13 ] |
| could you post a piece of source code, demonstrating the issue? |
| Comment by Sergey Avseyev [ 10/Jan/13 ] |
|
Possibly this patch has been fixed the issue https://github.com/couchbase/libcouchbase/commit/d4948192439e61a8cc23d5e8572e81db1aebef7f
Could you verify with libcouchbase from master? To do so either pull and build the sources: git clone git://github.com/couchbase/libcouchbase.git cd libcouchbase ./config/autorun.sh && ./configure && make && sudo make install or install from snapsot deb/rpm repositories, like for example for recent ubuntus sudo wget -O/etc/apt/sources.list.d/couchbase-snapshot.list http://packages.couchbase.com/snapshot/ubuntu/couchbase-ubuntu1110.list sudo aptitude update sudo aptitude install libcouchbase-dev libcouchbase2-bin |
| Comment by Haster [ 10/Jan/13 ] |
|
My code is very big :one application server creates many processes (couchbase_loader)(one process for each cache) and whose processes operate with couchbase.
But also they cooperate with oracle database... If you want I can share all code, but you need Oracle to run it. Or I can try to write more simple testcase... Also I will check patch tomorrow and share results |
| Comment by Haster [ 11/Jan/13 ] |
| It looks good now. Thanks a lot |
| Comment by Haster [ 11/Jan/13 ] |
|
Sergey, I reproduced this issue on Windows platform.
Now it works more stable, but than I've tried to create 9 buckets in same time by executing 9 copies of my program 4 of them deadlocked in lcb_wait function. |
| Comment by Haster [ 13/Jan/13 ] |
| Добавлена тестовая программа для воспроизведения проблемы |
| Comment by Sergey Avseyev [ 06/Feb/13 ] |
| Fixed invalid memory access in win32 plugin http://review.couchbase.org/24451 |