[MB-6612] queries returns error {badmatch,{not_found,no_db_file} Created: 11/Sep/12  Updated: 10/Jan/13  Resolved: 25/Sep/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: None
Fix Version/s: 2.0-beta-2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Iryna Mironava Assignee: Peter Wansch (Inactive)
Resolution: Fixed Votes: 0
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: centos, 1 node, 1 bucket
build 1712

Attachments: GZip Archive 10.2.2.108-8091-diag.txt.gz     GZip Archive 10.2.2.60-8091-diag.txt.gz     GZip Archive 10.2.2.63-8091-diag.txt.gz     GZip Archive 10.2.2.64-8091-diag.txt.gz     GZip Archive 10.2.2.65-8091-diag.txt.gz     GZip Archive 10.3.121.104.txt.tar.gz     File error.1     File error.2    

 Description   
test to reproduce: viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,num_nodes_to_add=1,skip_rebalance=truen
Test uses employee data set:
            -documents are structured as {"name": name<string>,
                                       "join_yr" : year<int>,
                                       "join_mo" : month<int>,
                                       "join_day" : day<int>,
                                       "email": email<string>,
                                       "job_title" : title<string>,
                                       "type" : type<string>,
                                       "desc" : desc<tring>}
        Steps to repro:
           Test is just started and created views, first 1-2 seconds after view creation queries for all 6 views returned errors

6 ddocs, 1 view per ddoc:
test_view-217467b
test_view-820faf6
test_view-6a68c94
test_view-d74d96e
test_view-b943fd9

map fns:
'function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^UI "); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}'
'function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^System "); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}'
'function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^Senior "); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}'
'function (doc) { if(doc.job_title !== undefined) emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] ); }'
'function (doc) { if(doc.job_title !== undefined) emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] ); }' + _count reduce fn
'function (doc, meta) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^admin"); if(meta.id.match(myregexp)) { emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}'

exact queries:
 http://10.2.2.60:8092/default/_design/test_view-6a68c94/_view/test_view-6a68c94?debug=true&start_key=%5B2008%2C7%2Cnull%5D&connection_timeout=60000
http://10.2.2.60:8092/default/_design/test_view-b943fd9/_view/test_view-b943fd9?debug=true&start_key=%5B2008%2C7%2Cnull%5D&connection_timeout=60000

error while querying returned:
{{{badmatch,{not_found,no_db_file}},
  [{couch_db_set,'-handle_call/3-fun-1-',3},
   {lists,foldl,3},
   {couch_db_set,handle_call,3},
   {gen_server,handle_msg,5},
   {proc_lib,init_p_do_apply,3}]},
 {gen_server,call,
             [<0.30044.20>,
              {add_partitions,[24,861,862,863,864,865,866,867,868,869,870,871,
                               872,873,874,875,876,877,878,879,880,881,882,
                               883,884,885,886,887,888,889,890,891,892,893,
                               894,895,896,897,898,899,900,901,902,903,904,
                               905,906,907,908,909,910,911,912,913,914,915,
                               916,917,918,919,920,921,922,923,924,925,926,
                               927,928,929,930,931,932,933,934,935,936,937,
                               938,939,940,941,942,943,944,945,946,947,948,
                               949,950,951,952,953,954,955,956,957,958,959,
                               960,961,962,963,964,965,966,967,968,969,970,
                               971,972,973,974,975,976,977,978,979,980,981,
                               982,983,984,985,986,987,988,989,990,991,992,
                               993,994,995,996,997,998,999,1000,1001,1002,
                               1003,1004,1005,1006,1007,1008,1009,1010,1011,
                               1012,1013,1014,1015,1016,1017,1018,1019,1020,
                               1021,1022,1023]},
              infinity]}} {"error":"{{{badmatch,{not_found,no_db_file}},\n [{couch_db_set,'-handle_call/3-fun-1-',3},\n {lists,foldl,3},\n {couch_db_set,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]},\n {gen_server,call,\n [<0.30044.20>,\n {add_partitions,[24,861,862,863,864,865,866,867,868,869,870,871,\n 872,873,874,875,876,877,878,879,880,881,882,\n 883,884,885,886,887,888,889,890,891,892,893,\n 894,895,896,897,898,899,900,901,902,903,904,\n 905,906,907,908,909,910,911,912,913,914,915,\n 916,917,918,919,920,921,922,923,924,925,926,\n 927,928,929,930,931,932,933,934,935,936,937,\n 938,939,940,941,942,943,944,945,946,947,948,\n 949,950,951,952,953,954,955,956,957,958,959,\n 960,961,962,963,964,965,966,967,968,969,970,\n 971,972,973,974,975,976,977,978,979,980,981,\n 982,983,984,985,986,987,988,989,990,991,992,\n 993,994,995,996,997,998,999,1000,1001,1002,\n 1003,1004,1005,1006,1007,1008,1009,1010,1011,\n 1012,1013,1014,1015,1016,1017,1018,1019,1020,\n 1021,1022,1023]},\n infinity]}}","reason":"{gen_server,call,\n [<0.30029.20>,\n {set_view_group_req,update_after,true,\n [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,\n 18,19,20,21,22,23,24,25,26,27,28,29,30,31,\n 32,33,34,35,36,37,38,39,40,41,42,43,44,45,\n 46,47,48,49,50,51,52,53,54,55,56,57,58,59,\n 60,61,62,63,64,65,66,67,68,69,70,71,72,73,\n 74,75,76,77,78,79,80,81,82,83,84,85,86,87,\n 88,89,90,91,92,93,94,95,96,97,98,99,100,101,\n 102,103,104,105,106,107,108,109,110,111,112,\n 113,114,115,116,117,118,119,120,121,122,123,\n 124,125,126,127,128,129,130,131,132,133,134,\n 135,136,137,138,139,140,141,142,143,144,145,\n 146,147,148,149,150,151,152,153,154,155,156,\n 157,158,159,160,161,162,163,164,165,166,167,\n 168,169,170,171,172,173,174,175,176,177,178,\n 179,180,181,182,183,184,185,186,187,188,189,\n 190,191,192,193,194,195,196,197,198,199,200,\n 201,202,203,204,205,206,207,208,209,210,211,\n 212,213,214,215,216,217,218,219,220,221,222,\n 223,224,225,226,227,228,229,230,231,232,233,\n 234,235,236,237,238,239,240,241,242,243,244,\n 245,246,247,248,249,250,251,252,253,254,255,\n 256,257,258,259,260,261,262,263,264,265,266,\n 267,268,269,270,271,272,273,274,275,276,277,\n 278,279,280,281,282,283,284,285,286,287,288,\n 289,290,291,292,293,294,295,296,297,298,299,\n 300,301,302,303,304,305,306,307,308,309,310,\n 311,312,313,314,315,316,317,318,319,320,321,\n 322,323,324,325,326,327,328,329,330,331,332,\n 333,334,335,336,337,338,339,340,341,342,343,\n 344,345,346,347,348,349,350,351,352,353,354,\n 355,356,357,358,359,360,361,362,363,364,365,\n 366,367,368,369,370,371,372,373,374,375,376,\n 377,378,379,380,381,382,383,384,385,386,387,\n 388,389,390,391,392,393,394,395,396,397,398,\n 399,400,401,402,403,404,405,406,407,408,409,\n 410,411,412,413,414,415,416,417,418,419,420,\n 421,422,423,424,425,426,427,428,429,430,431,\n 432,433,434,435,436,437,438,439,440,441,442,\n 443,444,445,446,447,448,449,450,451,452,453,\n 454,455,456,457,458,459,460,461,462,463,464,\n 465,466,467,468,469,470,471,472,473,474,475,\n 476,477,478,479,480,481,482,483,484,485,486,\n 487,488,489,490,491,492,493,494,495,496,497,\n 498,499,500,501,502,503,504,505,506,507,508,\n 509,510,511,512,513,514,515,516,517,518,519,\n 520,521,522,523,524,525,526,527,528,529,530,\n 531,532,533,534,535,536,537,538,539,540,541,\n 542,543,544,545,546,547,548,549,550,551,552,\n 553,554,555,556,557,558,559,560,561,562,563,\n 564,565,566,567,568,569,570,571,572,573,574,\n 575,576,577,578,579,580,581,582,583,584,585,\n 586,587,588,589,590,591,592,593,594,595,596,\n 597,598,599,600,601,602,603,604,605,606,607,\n 608,609,610,611,612,613,614,615,616,617,618,\n 619,620,621,622,623,624,625,626,627,628,629,\n 630,631,632,633,634,635,636,637,638,639,640,\n 641,642,643,644,645,646,647,648,649,650,651,\n 652,653,654,655,656,657,658,659,660,661,662,\n 663,664,665,666,667,668,669,670,671,672,673,\n 674,675,676,677,678,679,680,681,682,683,684,\n 685,686,687,688,689,690,691,692,693,694,695,\n 696,697,698,699,700,701,702,703,704,705,706,\n 707,708,709,710,711,712,713,714,715,716,717,\n 718,719,720,721,722,723,724,725,726,727,728,\n 729,730,731,732,733,734,735,736,737,738,739,\n 740,741,742,743,744,745,746,747,748,749,750,\n 751,752,753,754,755,756,757,758,759,760,761,\n 762,763,764,765,766,767,768,769,770,771,772,\n 773,774,775,776,777,778,779,780,781,782,783,\n 784,785,786,787,788,789,790,791,792,793,794,\n 795,796,797,798,799,800,801,802,803,804,805,\n 806,807,808,809,810,811,812,813,814,815,816,\n 817,818,819,820,821,822,823,824,825,826,827,\n 828,829,830,831,832,833,834,835,836,837,838,\n 839,840,841,842,843,844,845,846,847,848,849,\n 850,851,852,853,854,855,856,857,858,859,860,\n 861,862,863,864,865,866,867,868,869,870,871,\n 872,873,874,875,876,877,878,879,880,881,882,\n 883,884,885,886,887,888,889,890,891,892,893,\n 894,895,896,897,898,899,900,901,902,903,904,\n 905,906,907,908,909,910,911,912,913,914,915,\n 916,917,918,919,920,921,922,923,924,925,926,\n 927,928,929,930,931,932,933,934,935,936,937,\n 938,939,940,941,942,943,944,945,946,947,948,\n 949,950,951,952,953,954,955,956,957,958,959,\n 960,961,962,963,964,965,966,967,968,969,970,\n 971,972,973,974,975,976,977,978,979,980,981,\n 982,983,984,985,986,987,988,989,990,991,992,\n 993,994,995,996,997,998,999,1000,1001,1002,\n 1003,1004,1005,1006,1007,1008,1009,1010,\n 1011,1012,1013,1014,1015,1016,1017,1018,\n 1019,1020,1021,1022,1023],\n true,main},\n infinity]}"}

 Comments   
Comment by Aleksey Kondratenko [ 11/Sep/12 ]
Potentially ns_server bug
Comment by Aleksey Kondratenko [ 11/Sep/12 ]
Cannot find any occurences of no_db_ in logs attached. Please, double check on your end
Comment by Aleksey Kondratenko [ 11/Sep/12 ]
See above
Comment by Iryna Mironava [ 11/Sep/12 ]
there were no such messages in the log, just queries responses had it
Comment by Aleksey Kondratenko [ 11/Sep/12 ]
I see no trace of this pid either.

Most likely logs have rotated past incident.

May I have at least error level logs in case they still have some traces ?
Comment by Iryna Mironava [ 11/Sep/12 ]
error logs attached
Comment by Iryna Mironava [ 11/Sep/12 ]
logs of error level are attached
Comment by Aleksey Kondratenko [ 11/Sep/12 ]
Now I see multiple occurences of this problem. But sadly it was 'too long time ago'. May I ask you to re-run your tests and capture diags at the middle ?
Comment by Iryna Mironava [ 11/Sep/12 ]
logs are attached
Comment by Farshid Ghods (Inactive) [ 12/Sep/12 ]
promoting this to blocker since this happens more frequenetly now and its easy to reproduce
Comment by Aleksey Kondratenko [ 12/Sep/12 ]
Found bug. Fixed and will soon upload to gerrit
Comment by Aleksey Kondratenko [ 12/Sep/12 ]
Fix is 3 commits ending here: http://review.couchbase.org/20813
Comment by Farshid Ghods (Inactive) [ 13/Sep/12 ]
Peter,

given this issue is related to view engine can Filipe take a look ?
Comment by Karan Kumar (Inactive) [ 14/Sep/12 ]
Seeing this on system tests as well. build-1717
Comment by Karan Kumar (Inactive) [ 14/Sep/12 ]
[error_logger:error,2012-09-14T2:08:03.441,ns_1@10.3.121.16:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: couch_db_set:init/1
    pid: <0.17473.73>
    registered_name: []
    exception exit: {{badmatch,{not_found,no_db_file}},
                     [{couch_db_set,'-handle_call/3-fun-1-',3},
                      {lists,foldl,3},
                      {couch_db_set,handle_call,3},
                      {gen_server,handle_msg,5},
                      {proc_lib,init_p_do_apply,3}]}
      in function gen_server:terminate/6
    ancestors: [<0.17468.73>,<0.17346.73>,<0.17345.73>]
    messages: []
    links: [<0.17468.73>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 2584
    stack_size: 24
    reductions: 38041
  neighbours:
Comment by Aleksey Kondratenko [ 14/Sep/12 ]
fix is still sitting in gerrit
Comment by Farshid Ghods (Inactive) [ 16/Sep/12 ]
Karan,

was this during rebalancing ?
did the error go away after retrying ?
Comment by Thuan Nguyen [ 25/Sep/12 ]
Integrated in github-couchdb-preview #507 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/507/])
    MB-6612: stop maintaining open dbs lru (Revision 7587aa0cf6af53e49487771d1ab31eb425b29cee)
MB-6612: open databases synchronously (Revision fa5b6fe30dbf84e03dec273c92ba6e17b2621ff7)
MB-6612: just store pid in couch server ets tables (Revision 261348623337e84e38635c2470c95bec3b7aa757)
MB-6612: exclude opening dbs from all_known_databases_with_prefix (Revision b94592131af1efb170f5710f65b142d4d786dc0f)

     Result = SUCCESS
pwansch :
Files :
* src/couchdb/couch_server.erl

pwansch :
Files :
* src/couchdb/couch_server.erl

pwansch :
Files :
* src/couchdb/couch_server.erl

pwansch :
Files :
* src/couchdb/couch_server.erl
Comment by Thuan Nguyen [ 26/Sep/12 ]
Integrated in github-couchdb-preview #508 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/508/])
    MB-6736: Revert "MB-6612: exclude opening dbs from..." (Revision 804ef001394269d3c02e3eb1ef05ac3a02f86c54)

     Result = SUCCESS
Farshid Ghods :
Files :
* src/couchdb/couch_server.erl
Comment by kzeller [ 26/Oct/12 ]
RN text: Querying a view during rebalance had resulted in several errors due to the way database
storage files were managed and named. We now exclude opening
databases that are meant to be excluded from indexing and we now
synchronously open databases
for indexing to resolve the problem.
Comment by Aleksey Kondratenko [ 14/Nov/12 ]
Final relevant commit is this:

commit 8ba83d2df2b35ebd8fbc621dcad193aec2255fe5
Author: Aliaksey Kandratsenka <alk@tut.by>
Date: Mon Aug 27 17:44:43 2012 -0700

    MB-6413: re-implemented bucket_databases using efficient API
    
    As of http://review.couchbase.org/20223 couchdb supports listing all
    known vbuckets with given prefix. That allows us to only touch
    vbuckets of bucket we need.
    
    And it also opens all databases at startup. That allows us to avoid
    using slow all_databases call.
    
    Change-Id: I656000e408af6b977cb27e981d216f6ea11ac0cb
    Reviewed-on: http://review.couchbase.org/20224
    Reviewed-by: Damien Katz <damien@couchbase.com>
    Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>

So we first fixed MB-6413 where:

Underlying issue was due to race in a way couch implemented all_databases that ns_server used to enumerate all physically present vbuckets. Particularly it used readdir which is known to be not atomic with respect to renames. And in practice it raced with final rename at the end of compaction causing some vbuckets to disappear. Which caused view queries to fail because we incorrectly removed valid vbucket from set of indexed vbuckets in all views.

Fix was to stop doing slow and unreliable all_databases and just keep list of all open vbuckets in public ordered ets table. Thus we fixed race above and at the same time we've got efficient way to list just vbuckets of some given bucket.

In this bug (MB-6612) we discovered that fix wasn't quite correctly implemented as pointed out in one of commits:

commit fa5b6fe30dbf84e03dec273c92ba6e17b2621ff7
Author: Aliaksey Kandratsenka <alk@tut.by>
Date: Wed Sep 12 18:30:52 2012 -0700

    MB-6612: open databases synchronously
    
    Problem that we see in MB-6612 is that all_known_databases_with_prefix
    is scanning couch_dbs_by_name ets table. Given that previously we also
    stored dbs being opened that lead to condition where some vbuckets
    would be thought as present while in fact they are simply being tried
    to be opened and do not actually exist.
    
    Given that we don't need async db opening in couchbase fork of couchdb
    for quite some time it seems logical to just get rid of async open and
    thus never have onopened dbs in ets table.
    
    Change-Id: I5d1dad5f60c64d197e143cf4a7be1996a4fc4ea2
    Reviewed-on: http://review.couchbase.org/20812
    Reviewed-by: Damien Katz <damien@couchbase.com>
    Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>

As part of doing this work we also removed legacy couchdb behavior where some unused dbs were closed. I.e.:

commit 261348623337e84e38635c2470c95bec3b7aa757
Author: Aliaksey Kandratsenka <alk@tut.by>
Date: Wed Sep 12 18:34:04 2012 -0700

    MB-6612: just store pid in couch server ets tables
    
    Instead of {opened,..} tuple.
    
    Change-Id: Ie88c1137fa55f55e8b9f1128c12f15750d4b36bd
    Reviewed-on: http://review.couchbase.org/20813
    Reviewed-by: Damien Katz <damien@couchbase.com>
    Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>

commit 7587aa0cf6af53e49487771d1ab31eb425b29cee
Author: Aliaksey Kandratsenka <alk@tut.by>
Date: Wed Sep 12 18:14:32 2012 -0700

    MB-6612: stop maintaining open dbs lru
    
    Because that's long unused and simplifies things for actual MB-6612
    fix.
    
    Change-Id: Ie8ac7e5f6d03e962f181d84461cc26ee53cb7309
    Reviewed-on: http://review.couchbase.org/20811
    Reviewed-by: Damien Katz <damien@couchbase.com>
    Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>

And after all that we found another issue in original implementation:

commit 7a0f8699723d3557bcc71d1964464a0e1248c7cf
Author: Aliaksey Artamonau <aliaksiej.artamonau@gmail.com>
Date: Fri Oct 26 13:56:30 2012 -0700

    MB-7025 Shutdown databases in couch_server:terminate correctly.
    
    Since recently couch_dbs_by_name contains only PID and nothing
    more.
    
    It's not clear though if this actually fixes referred bug.
    
    Change-Id: Ifb3a0bd9a9c011c4836e0513a4adbaba94c3c8c1
    Reviewed-on: http://review.couchbase.org/22021
    Reviewed-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
    Reviewed-by: Filipe David Borba Manana <fdmanana@gmail.com>
Generated at Wed Jul 30 16:36:13 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.