[MB-6057] unable to create a bucket under windows Created: 30/Jul/12  Updated: 09/Jan/13  Resolved: 09/Aug/12

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: None
Fix Version/s: 2.0-beta
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Chiyoung Seo
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: windows server r2 2008 64-bit

Attachments: GZip Archive 10.3.2.188-8091-diag.txt.gz    

 Description   
Install couchbase server 2.0.0-1516 on windows server 2008 64 bit (1 node).
Run smoke test which create and delete several time default and non default bucket.
At tests create multi none default buckets, the test will fail due to unable to create bucket.
-t setgettests.MembaseBucket.test_value_100b
-t setgettests.MembaseBucket.test_value_500kb

On the UI the node's and bucket's statuses are pending.
CRASH REPORTS appear in logs:

=========================CRASH REPORT=========================
  crasher:
    initial call: supervisor_cushion:init/1
    pid: <0.23646.8>
    registered_name: []
    exception exit: {abnormal,255}
      in function gen_server:terminate/6
    ancestors: [ns_port_sup,ns_server_sup,ns_server_cluster_sup,<0.67.0>]
    messages: []=========================CRASH REPORT=========================
  crasher:
    initial call: ns_memcached:init/1
    pid: <0.23611.8>
    registered_name: 'ns_memcached-default'
    exception exit: {badmatch,{error,closed}}
      in function gen_server:terminate/6
    ancestors: ['ns_memcached_sup-default','single_bucket_sup-default',
                  <0.22907.8>]
    messages: []
    links: [<0.23665.8>,<0.23666.8>,<0.23668.8>,<0.64.0>,<0.22928.8>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 17711
    stack_size: 24
    reductions: 17835
  neighbours:
    neighbour: [{pid,<0.23668.8>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'ns_memcached_sup-default',
                              'single_bucket_sup-default',<0.22907.8>]},
                  {messages,[]},
                  {links,[<0.23611.8>,#Port<0.142256>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,10946},
                  {stack_size,24},
                  {reductions,1151}]
    links: [<0.407.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1597
    stack_size: 24
    reductions: 2361
  neighbours:

=========================CRASH REPORT=========================
  crasher:
    initial call: ns_memcached:init/1
    pid: <0.23611.8>
    registered_name: 'ns_memcached-default'
    exception exit: {badmatch,{error,closed}}
      in function gen_server:terminate/6
    ancestors: ['ns_memcached_sup-default','single_bucket_sup-default',
                  <0.22907.8>]
    messages: []
    links: [<0.23665.8>,<0.23666.8>,<0.23668.8>,<0.64.0>,<0.22928.8>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 17711
    stack_size: 24
    reductions: 17835
  neighbours:
    neighbour: [{pid,<0.23668.8>},
                  {registered_name,[]},
                  {initial_call,{erlang,apply,['Argument__1','Argument__2']}},
                  {current_function,{gen,do_call,4}},
                  {ancestors,['ns_memcached-default',
                              'ns_memcached_sup-default',
                              'single_bucket_sup-default',<0.22907.8>]},
                  {messages,[]},
                  {links,[<0.23611.8>,#Port<0.142256>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,10946},
                  {stack_size,24},
                  {reductions,1151}]

 Comments   
Comment by Aleksey Kondratenko [ 30/Jul/12 ]
Memcached simply crashed:


[user:info] [2012-07-30 14:56:27] [ns_1@10.3.2.188:ns_port_memcached:ns_port_server:handle_info:107] Port server memcached on node 'ns_1@10.3.2.188' exited with status 255. Restarting. Messages: Trying to connect to mccouch: "localhost:11213"
Connected to mccouch: "localhost:11213"
Warning: failed to load the engine session stats due to IO exception "basic_ios::clear"
Failed to load mutation log, falling back to key dump
Extension support isn't implemented in this version of bucket_engine
metadata loaded in 2332 usec
[ns_server:info] [2012-07-30 14:56:27] [ns_1@10.3.2.188:ns_port_memcached:ns_port_server:log:169] memcached<0.7213.9>: Connected to mccouch: "localhost:11213"
memcached<0.7213.9>: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear"
memcached<0.7213.9>: Failed to load mutation log, falling back to key dump
memcached<0.7213.9>: Extension support isn't implemented in this version of bucket_engine
memcached<0.7213.9>: metadata loaded in 2332 usec
Comment by Peter Wansch (Inactive) [ 30/Jul/12 ]
Trond, can you take a look?
Comment by Trond Norbye [ 31/Jul/12 ]
From the attached logfile it seems to crash 4 seconds after the node was started. Is it possible to even start the build on windows? Can I get access to a server with the windows version and the failed version to do basic testing?
Comment by Iryna Mironava [ 31/Jul/12 ]
Win 7 64 bit VM: http://10.3.2.188:8091 with build #1516
Administrator/password
Comment by Farshid Ghods (Inactive) [ 31/Jul/12 ]
Jin is also working on this - he is seeing that couchbase service does not start
Comment by Jin Lim [ 31/Jul/12 ]
I was able to reproduce this locally and fixed it by applying the two reverts below.

http://review.couchbase.org/#change,18972
http://review.couchbase.org/#change,18973

However, while as successfully starting stand-alone memcached with couchNotifier (aka mccouch) being disabled via command line and doing some I/Os (upto 1M) successfully. I still cannot start ns_server (erl process) via 'net start couchbaseserver". We need to have Bin merge the above reverts asap and continue debug from there. Probably we need erl engineers take a look at it as well.
Comment by Jin Lim [ 02/Aug/12 ]
Downloaded the latest build and src code from the main trunk.

"net start couchbaseserver" still fails starting memcached. The log indicates that memcached exited right after the warmup.
However, I was able to start stand-alone memcached with the same ep.so library and finished running some I/Os. See the compiled output below:

 c:\Program Files\Couchbase\Server\bin>memcached -p 11211 -E ..\lib\memcached\ep.
so -r -e "ht_size=12289;ht_locks=23;min_data_age=1;dbname=..\var\lib\couchbase\d
ata\default;queue_age_cap=5;max_size=84901888;vb0=true;backend=couchdb" -u nobody
Warning: failed to load the engine session stats due to IO exception "basic_ios
::clear" -> This is a normal error output for no stats.json file being found during the initial startup!
Failed to load mutation log, falling back to key dump
metadata loaded in 2749 usec
warmup completed in 3601 usec

Jins-MacBook-Pro:bin jin$ ./memcachetest -h 10.3.121.142:11211 -i 100 -V
Average with 1 threads
Get operations:
     #of ops. min max avg max90th max95th max99th
         6716 1504 us 4555 ms 22 ms 40 ms 54 ms 110 ms

Set operations:
     #of ops. min max avg max90th max95th max99th
         3284 1699 us 937 ms 25 ms 46 ms 60 ms 151 ms

Usr: 0.238064
Sys: 1.037286
Tot: 236.509054
Server time:
Usr: 0.000000
Sys: 0.000000
Total gets: 0
Total sets: 100
Jins-MacBook-Pro:bin jin$ ./cbstats 10.3.121.142:11211 timings
 data_age (1682 total)
    1s - 2s : ( 74.97%) 1261 #######################################################
    2s - 4s : (100.00%) 421 ##################
 disk_commit (143909 total)
    0 - 1s : (100.00%) 143909 ########################################################################
 disk_insert (100 total)
    8us - 16us : ( 31.00%) 31 #######################
    16us - 32us : ( 70.00%) 39 #############################
    32us - 64us : ( 98.00%) 28 #####################
    64us - 128us : ( 99.00%) 1
    256us - 512us : (100.00%) 1
 disk_update (1582 total)
    0 - 1us : ( 0.06%) 1
    8us - 16us : ( 18.27%) 288 #############
    16us - 32us : ( 58.03%) 629 #############################
    32us - 64us : ( 99.87%) 662 ###############################
    64us - 128us : (100.00%) 2
 disk_vbstate_snapshot (1 total)
    1ms - 2ms : (100.00%) 1 #############################################################################
 get_cmd (6716 total)
    0 - 1us : ( 0.07%) 5
    8us - 16us : ( 33.41%) 2239 ########################
    16us - 32us : ( 97.56%) 4308 ###############################################
    32us - 64us : ( 99.82%) 152 #
    64us - 128us : ( 99.99%) 11
    128us - 256us : (100.00%) 1
 get_stats_cmd (13 total)
    64us - 128us : ( 7.69%) 1 #####
    128us - 256us : ( 46.15%) 5 #############################
    1ms - 2ms : ( 53.85%) 1 #####
    2ms - 4ms : (100.00%) 6 ###################################
 item_alloc_sizes (3384 total)
    32 - 64 : ( 0.89%) 30
    64 - 128 : ( 4.05%) 107 ##
    128 - 256 : ( 7.36%) 112 ##
    256 - 512 : ( 13.09%) 194 ####
    512 - 1KB : ( 28.99%) 538 ###########
    1KB - 2KB : ( 50.35%) 723 ###############
    2KB - 4KB : (100.00%) 1680 ####################################
 storage_age (1682 total)
    0 - 1s : ( 74.32%) 1250 ######################################################
    1s - 2s : (100.00%) 432 ###################
Jins-MacBook-Pro:bin jin$ ./cbstats 10.3.121.142:11211 all | grep curr
 curr_connections: 11
 curr_conns_on_port_11211: 3
 curr_items: 100
 curr_items_tot: 100
 curr_temp_items: 0
 ep_concurrentDB: 1
 ep_store_max_concurrency: 10
 vb_active_curr_items: 100
 vb_pending_curr_items: 0
 vb_replica_curr_items: 0
Comment by Jin Lim [ 02/Aug/12 ]
It appears to be that it is not an ep_engine issue since we can start memcached with the ep library without any issue and do some I/Os.
Comment by Sharon Barr (Inactive) [ 06/Aug/12 ]
Hi there.
it's a blocker bug. if it is not an ep-engine issue, then it should be passed to the next owner/component. no update since last Thursday..
Comment by Peter Wansch (Inactive) [ 06/Aug/12 ]
Bin, thank you for looking into this.
Comment by Bin Cui [ 07/Aug/12 ]
The last workable build was 1486. After that, we build snappy and couchstore as dll and ep.so link them as dlls.
Test 1: Rollback to static linking. Rebuild snappy and couchstore as static libraries. so ep.so will link them as static ones. However, the problem still exists.
Test 2: Using windbg, i observed that first chance exceptions triggerred when loading bucket_engine.so and ep.so.
Comment by Bin Cui [ 07/Aug/12 ]
Test 3: Based on build 1486. Replace memcached.exe and bucket-engine.so with the latest build, it still works. But memcached.exe will crash as soon as ep.so is replaced with the latest one.
Comment by Bin Cui [ 08/Aug/12 ]
The root cause of crashing is narrowed down to commit ccaa609dc02493b5d4dcee842a13dae8439f9714 on ep-engine, i.e. CBD-458 Add support for configurable dispatcher task start time, which was checked in at Jul 24. After reverting the change, i no longer see memcached crash any more. Chiyoung is investigating it now.
Comment by Peter Wansch (Inactive) [ 09/Aug/12 ]
Chiyoung is investigating the issue. Thank you.
Comment by Chiyoung Seo [ 09/Aug/12 ]
http://review.couchbase.org/#change,19454
Comment by Thuan Nguyen [ 10/Aug/12 ]
Integrated in github-ep-engine-2-0 #392 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/392/])
    MB-6057 Disable access scanner scheduling at a specific time (Revision 6c0952133127c583adb25a87a3c84d97875d534a)

     Result = SUCCESS
Chiyoung Seo :
Files :
* configuration.json
Comment by Thuan Nguyen [ 31/Aug/12 ]
Integrated in github-ep-engine-2-0 #420 (See [http://qa.hq.northscale.net/job/github-ep-engine-2-0/420/])
    Revert "MB-6057 Disable access scanner scheduling at a specific time" (Revision 64b4818f10aa52f9c9bf3f4f6ed5a3b27f1808aa)

     Result = SUCCESS
Mike Wiederhold :
Files :
* configuration.json
Generated at Mon Jul 14 01:47:58 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.