[CCBC-147] function lcb_wait works infinitly after trying to connect (by calling lcb_connect) to Couchbase server that is on Pending state Created: 19/Dec/12  Updated: 05/Mar/13  Resolved: 05/Mar/13

Status: Closed
Project: Couchbase C client library libcouchbase
Component/s: library
Affects Version/s: 2.0.1
Fix Version/s: 2.0.4
Security Level: Public

Type: Bug Priority: Major
Reporter: Haster Assignee: Sergey Avseyev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: I reproduced a bug on Linux Rad Hat 5.0 x64 and on Windows XP x32
Couchbase server version 1.8.1
libcouchbase version 2.0.1

Attachments: File cloader.rar     Text File NMakefile_new    
Issue Links:
Duplicate
is duplicated by CCBC-167 lcb_wait() waits infinitely Closed

 Description   
Start server with one node.
Create many buckets, that uses all allowed memory -> Server have to chnage status on Pending (In my situation server stay in Pending for all time and can't change it to Up)

After that try to connect to one of the bucket by calling lcb_connect.
Call lcb_wait to wait for connection is done.
As a result lcb_wait work infinitly and timeout doesn't happen

Below I share call stack in Red Hat Linux:
#0 0x00000034350d4473 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x00002ad81b1dccc9 in ?? () from ./lib/libevent-2.0.so.5
#2 0x00002ad81b1c9cdc in event_base_loop () from ./lib/libevent-2.0.so.5
#3 0x00002ad8174c1dc6 in lcb_wait () from ./lib/libcouchbase.so.2
#4 0x000000000043c0e1 in Couchbase::connect (this=0x60df508) at couchbase_loader_source/couchbase.cpp:152
#5 0x000000000043c8b7 in connect_to_bucket (cbase=..., config=..., bucket_name=...) at couchbase_loader_source/couchbase.cpp:478
#6 0x0000000000440924 in couchbase_loader::writer_thread::writer_thread (this=0x60df480, config=..., bucket_name=..., dbh=...) at couchbase_loader_source/writer_thread.cpp:25
#7 0x000000000042168c in main (argc=1, argv=0x7fff22162028) at couchbase_loader_source/couchbase_loader.cpp:175


 Comments   
Comment by Haster [ 26/Dec/12 ]
I’ve investigated a little this problem and I have additional info.

My error happened scenario:

First of all I try to connect to some bucket (Primary for example) which is absent

lcb_connect
lcb_wait

here I get error. Then I create bucket, destroy instance by calling lcb_destroy and try connect to bucket again:

func_create_bucket()
sleep(some_time)
lcb_destroy()
lcb_create()
lcb_connect()
lcb_wait()

After that I receive 3 messages from my error_callback function, where errinfo is “Number of vBuckets must be a power of two > 0 and <= 65536”
And deadlock happened in lcb_wait.
Comment by Haster [ 27/Dec/12 ]
I've investigated this issue more deeply.
The problem take place then connection timeout happend.

connect
lcb_wait
  lcb_io_run_event_loop (for Windows Platform)
  ...
  select

  if (ret == 0) <-- 0 then timeout happend
  {
                                  <-- Here our problems begin
  }

After select returns zero code tries to reconnect in callback function(is it good idea? Maybe it is better to exit from loop and return error?), but some structures (as I think) are corrupted after that.
After that event_loop calls select function, but timeout variable contain zero value (infinitly execution)...

As I think, problem is near here
Comment by Sergey Avseyev [ 10/Jan/13 ]
could you post a piece of source code, demonstrating the issue?
Comment by Sergey Avseyev [ 10/Jan/13 ]
Possibly this patch has been fixed the issue https://github.com/couchbase/libcouchbase/commit/d4948192439e61a8cc23d5e8572e81db1aebef7f

Could you verify with libcouchbase from master?

To do so either pull and build the sources:

git clone git://github.com/couchbase/libcouchbase.git
cd libcouchbase
./config/autorun.sh && ./configure && make && sudo make install

or install from snapsot deb/rpm repositories, like for example for recent ubuntus

sudo wget -O/etc/apt/sources.list.d/couchbase-snapshot.list http://packages.couchbase.com/snapshot/ubuntu/couchbase-ubuntu1110.list
sudo aptitude update
sudo aptitude install libcouchbase-dev libcouchbase2-bin
Comment by Haster [ 10/Jan/13 ]
My code is very big :one application server creates many processes (couchbase_loader)(one process for each cache) and whose processes operate with couchbase.
But also they cooperate with oracle database...
If you want I can share all code, but you need Oracle to run it. Or I can try to write more simple testcase...

Also I will check patch tomorrow and share results
Comment by Haster [ 11/Jan/13 ]
It looks good now. Thanks a lot
Comment by Haster [ 11/Jan/13 ]
Sergey, I reproduced this issue on Windows platform.
Now it works more stable, but than I've tried to create 9 buckets in same time by executing 9 copies of my program 4 of them deadlocked in lcb_wait function.
Comment by Haster [ 13/Jan/13 ]
Добавлена тестовая программа для воспроизведения проблемы
Comment by Sergey Avseyev [ 06/Feb/13 ]
Fixed invalid memory access in win32 plugin http://review.couchbase.org/24451
Generated at Sun Oct 26 00:32:40 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.