[MB-6612] queries returns error {badmatch,{not_found,no_db_file} Created: 11/Sep/12 Updated: 10/Jan/13 Resolved: 25/Sep/12 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | None |
| Fix Version/s: | 2.0-beta-2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Iryna Mironava | Assignee: | Peter Wansch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | regression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
centos, 1 node, 1 bucket
build 1712 |
||
| Attachments: |
|
| Description |
|
test to reproduce: viewquerytests.ViewQueryTests.test_employee_dataset_startkey_endkey_queries_rebalance_in,num_nodes_to_add=1,skip_rebalance=truen
Test uses employee data set: -documents are structured as {"name": name<string>, "join_yr" : year<int>, "join_mo" : month<int>, "join_day" : day<int>, "email": email<string>, "job_title" : title<string>, "type" : type<string>, "desc" : desc<tring>} Steps to repro: Test is just started and created views, first 1-2 seconds after view creation queries for all 6 views returned errors 6 ddocs, 1 view per ddoc: test_view-217467b test_view-820faf6 test_view-6a68c94 test_view-d74d96e test_view-b943fd9 map fns: 'function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^UI "); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}' 'function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^System "); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}' 'function (doc) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^Senior "); if(doc.job_title.match(myregexp)){ emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}' 'function (doc) { if(doc.job_title !== undefined) emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] ); }' 'function (doc) { if(doc.job_title !== undefined) emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] ); }' + _count reduce fn 'function (doc, meta) { if(doc.job_title !== undefined) { var myregexp = new RegExp("^admin"); if(meta.id.match(myregexp)) { emit([doc.join_yr, doc.join_mo, doc.join_day], [doc.name, doc.email] );}}}' exact queries: http://10.2.2.60:8092/default/_design/test_view-6a68c94/_view/test_view-6a68c94?debug=true&start_key=%5B2008%2C7%2Cnull%5D&connection_timeout=60000 http://10.2.2.60:8092/default/_design/test_view-b943fd9/_view/test_view-b943fd9?debug=true&start_key=%5B2008%2C7%2Cnull%5D&connection_timeout=60000 error while querying returned: {{{badmatch,{not_found,no_db_file}}, [{couch_db_set,'-handle_call/3-fun-1-',3}, {lists,foldl,3}, {couch_db_set,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.30044.20>, {add_partitions,[24,861,862,863,864,865,866,867,868,869,870,871, 872,873,874,875,876,877,878,879,880,881,882, 883,884,885,886,887,888,889,890,891,892,893, 894,895,896,897,898,899,900,901,902,903,904, 905,906,907,908,909,910,911,912,913,914,915, 916,917,918,919,920,921,922,923,924,925,926, 927,928,929,930,931,932,933,934,935,936,937, 938,939,940,941,942,943,944,945,946,947,948, 949,950,951,952,953,954,955,956,957,958,959, 960,961,962,963,964,965,966,967,968,969,970, 971,972,973,974,975,976,977,978,979,980,981, 982,983,984,985,986,987,988,989,990,991,992, 993,994,995,996,997,998,999,1000,1001,1002, 1003,1004,1005,1006,1007,1008,1009,1010,1011, 1012,1013,1014,1015,1016,1017,1018,1019,1020, 1021,1022,1023]}, infinity]}} {"error":"{{{badmatch,{not_found,no_db_file}},\n [{couch_db_set,'-handle_call/3-fun-1-',3},\n {lists,foldl,3},\n {couch_db_set,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]},\n {gen_server,call,\n [<0.30044.20>,\n {add_partitions,[24,861,862,863,864,865,866,867,868,869,870,871,\n 872,873,874,875,876,877,878,879,880,881,882,\n 883,884,885,886,887,888,889,890,891,892,893,\n 894,895,896,897,898,899,900,901,902,903,904,\n 905,906,907,908,909,910,911,912,913,914,915,\n 916,917,918,919,920,921,922,923,924,925,926,\n 927,928,929,930,931,932,933,934,935,936,937,\n 938,939,940,941,942,943,944,945,946,947,948,\n 949,950,951,952,953,954,955,956,957,958,959,\n 960,961,962,963,964,965,966,967,968,969,970,\n 971,972,973,974,975,976,977,978,979,980,981,\n 982,983,984,985,986,987,988,989,990,991,992,\n 993,994,995,996,997,998,999,1000,1001,1002,\n 1003,1004,1005,1006,1007,1008,1009,1010,1011,\n 1012,1013,1014,1015,1016,1017,1018,1019,1020,\n 1021,1022,1023]},\n infinity]}}","reason":"{gen_server,call,\n [<0.30029.20>,\n {set_view_group_req,update_after,true,\n [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,\n 18,19,20,21,22,23,24,25,26,27,28,29,30,31,\n 32,33,34,35,36,37,38,39,40,41,42,43,44,45,\n 46,47,48,49,50,51,52,53,54,55,56,57,58,59,\n 60,61,62,63,64,65,66,67,68,69,70,71,72,73,\n 74,75,76,77,78,79,80,81,82,83,84,85,86,87,\n 88,89,90,91,92,93,94,95,96,97,98,99,100,101,\n 102,103,104,105,106,107,108,109,110,111,112,\n 113,114,115,116,117,118,119,120,121,122,123,\n 124,125,126,127,128,129,130,131,132,133,134,\n 135,136,137,138,139,140,141,142,143,144,145,\n 146,147,148,149,150,151,152,153,154,155,156,\n 157,158,159,160,161,162,163,164,165,166,167,\n 168,169,170,171,172,173,174,175,176,177,178,\n 179,180,181,182,183,184,185,186,187,188,189,\n 190,191,192,193,194,195,196,197,198,199,200,\n 201,202,203,204,205,206,207,208,209,210,211,\n 212,213,214,215,216,217,218,219,220,221,222,\n 223,224,225,226,227,228,229,230,231,232,233,\n 234,235,236,237,238,239,240,241,242,243,244,\n 245,246,247,248,249,250,251,252,253,254,255,\n 256,257,258,259,260,261,262,263,264,265,266,\n 267,268,269,270,271,272,273,274,275,276,277,\n 278,279,280,281,282,283,284,285,286,287,288,\n 289,290,291,292,293,294,295,296,297,298,299,\n 300,301,302,303,304,305,306,307,308,309,310,\n 311,312,313,314,315,316,317,318,319,320,321,\n 322,323,324,325,326,327,328,329,330,331,332,\n 333,334,335,336,337,338,339,340,341,342,343,\n 344,345,346,347,348,349,350,351,352,353,354,\n 355,356,357,358,359,360,361,362,363,364,365,\n 366,367,368,369,370,371,372,373,374,375,376,\n 377,378,379,380,381,382,383,384,385,386,387,\n 388,389,390,391,392,393,394,395,396,397,398,\n 399,400,401,402,403,404,405,406,407,408,409,\n 410,411,412,413,414,415,416,417,418,419,420,\n 421,422,423,424,425,426,427,428,429,430,431,\n 432,433,434,435,436,437,438,439,440,441,442,\n 443,444,445,446,447,448,449,450,451,452,453,\n 454,455,456,457,458,459,460,461,462,463,464,\n 465,466,467,468,469,470,471,472,473,474,475,\n 476,477,478,479,480,481,482,483,484,485,486,\n 487,488,489,490,491,492,493,494,495,496,497,\n 498,499,500,501,502,503,504,505,506,507,508,\n 509,510,511,512,513,514,515,516,517,518,519,\n 520,521,522,523,524,525,526,527,528,529,530,\n 531,532,533,534,535,536,537,538,539,540,541,\n 542,543,544,545,546,547,548,549,550,551,552,\n 553,554,555,556,557,558,559,560,561,562,563,\n 564,565,566,567,568,569,570,571,572,573,574,\n 575,576,577,578,579,580,581,582,583,584,585,\n 586,587,588,589,590,591,592,593,594,595,596,\n 597,598,599,600,601,602,603,604,605,606,607,\n 608,609,610,611,612,613,614,615,616,617,618,\n 619,620,621,622,623,624,625,626,627,628,629,\n 630,631,632,633,634,635,636,637,638,639,640,\n 641,642,643,644,645,646,647,648,649,650,651,\n 652,653,654,655,656,657,658,659,660,661,662,\n 663,664,665,666,667,668,669,670,671,672,673,\n 674,675,676,677,678,679,680,681,682,683,684,\n 685,686,687,688,689,690,691,692,693,694,695,\n 696,697,698,699,700,701,702,703,704,705,706,\n 707,708,709,710,711,712,713,714,715,716,717,\n 718,719,720,721,722,723,724,725,726,727,728,\n 729,730,731,732,733,734,735,736,737,738,739,\n 740,741,742,743,744,745,746,747,748,749,750,\n 751,752,753,754,755,756,757,758,759,760,761,\n 762,763,764,765,766,767,768,769,770,771,772,\n 773,774,775,776,777,778,779,780,781,782,783,\n 784,785,786,787,788,789,790,791,792,793,794,\n 795,796,797,798,799,800,801,802,803,804,805,\n 806,807,808,809,810,811,812,813,814,815,816,\n 817,818,819,820,821,822,823,824,825,826,827,\n 828,829,830,831,832,833,834,835,836,837,838,\n 839,840,841,842,843,844,845,846,847,848,849,\n 850,851,852,853,854,855,856,857,858,859,860,\n 861,862,863,864,865,866,867,868,869,870,871,\n 872,873,874,875,876,877,878,879,880,881,882,\n 883,884,885,886,887,888,889,890,891,892,893,\n 894,895,896,897,898,899,900,901,902,903,904,\n 905,906,907,908,909,910,911,912,913,914,915,\n 916,917,918,919,920,921,922,923,924,925,926,\n 927,928,929,930,931,932,933,934,935,936,937,\n 938,939,940,941,942,943,944,945,946,947,948,\n 949,950,951,952,953,954,955,956,957,958,959,\n 960,961,962,963,964,965,966,967,968,969,970,\n 971,972,973,974,975,976,977,978,979,980,981,\n 982,983,984,985,986,987,988,989,990,991,992,\n 993,994,995,996,997,998,999,1000,1001,1002,\n 1003,1004,1005,1006,1007,1008,1009,1010,\n 1011,1012,1013,1014,1015,1016,1017,1018,\n 1019,1020,1021,1022,1023],\n true,main},\n infinity]}"} |
| Comments |
| Comment by Aleksey Kondratenko [ 11/Sep/12 ] |
| Potentially ns_server bug |
| Comment by Aleksey Kondratenko [ 11/Sep/12 ] |
| Cannot find any occurences of no_db_ in logs attached. Please, double check on your end |
| Comment by Aleksey Kondratenko [ 11/Sep/12 ] |
| See above |
| Comment by Iryna Mironava [ 11/Sep/12 ] |
| there were no such messages in the log, just queries responses had it |
| Comment by Aleksey Kondratenko [ 11/Sep/12 ] |
|
I see no trace of this pid either. Most likely logs have rotated past incident. May I have at least error level logs in case they still have some traces ? |
| Comment by Iryna Mironava [ 11/Sep/12 ] |
| error logs attached |
| Comment by Iryna Mironava [ 11/Sep/12 ] |
| logs of error level are attached |
| Comment by Aleksey Kondratenko [ 11/Sep/12 ] |
| Now I see multiple occurences of this problem. But sadly it was 'too long time ago'. May I ask you to re-run your tests and capture diags at the middle ? |
| Comment by Iryna Mironava [ 11/Sep/12 ] |
| logs are attached |
| Comment by Farshid Ghods [ 12/Sep/12 ] |
|
promoting this to blocker since this happens more frequenetly now and its easy to reproduce
|
| Comment by Aleksey Kondratenko [ 12/Sep/12 ] |
| Found bug. Fixed and will soon upload to gerrit |
| Comment by Aleksey Kondratenko [ 12/Sep/12 ] |
| Fix is 3 commits ending here: http://review.couchbase.org/20813 |
| Comment by Farshid Ghods [ 13/Sep/12 ] |
|
Peter, given this issue is related to view engine can Filipe take a look ? |
| Comment by Karan Kumar [ 14/Sep/12 ] |
| Seeing this on system tests as well. build-1717 |
| Comment by Karan Kumar [ 14/Sep/12 ] |
|
[error_logger:error,2012-09-14T2:08:03.441,ns_1@10.3.121.16:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT========================= crasher: initial call: couch_db_set:init/1 pid: <0.17473.73> registered_name: [] exception exit: {{badmatch,{not_found,no_db_file}}, [{couch_db_set,'-handle_call/3-fun-1-',3}, {lists,foldl,3}, {couch_db_set,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} in function gen_server:terminate/6 ancestors: [<0.17468.73>,<0.17346.73>,<0.17345.73>] messages: [] links: [<0.17468.73>] dictionary: [] trap_exit: false status: running heap_size: 2584 stack_size: 24 reductions: 38041 neighbours: |
| Comment by Aleksey Kondratenko [ 14/Sep/12 ] |
| fix is still sitting in gerrit |
| Comment by Farshid Ghods [ 16/Sep/12 ] |
|
Karan,
was this during rebalancing ? did the error go away after retrying ? |
| Comment by Thuan Nguyen [ 25/Sep/12 ] |
|
Integrated in github-couchdb-preview #507 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/507/]) Result = SUCCESS pwansch : Files : * src/couchdb/couch_server.erl pwansch : Files : * src/couchdb/couch_server.erl pwansch : Files : * src/couchdb/couch_server.erl pwansch : Files : * src/couchdb/couch_server.erl |
| Comment by Thuan Nguyen [ 26/Sep/12 ] |
|
Integrated in github-couchdb-preview #508 (See [http://qa.hq.northscale.net/job/github-couchdb-preview/508/]) Result = SUCCESS Farshid Ghods : Files : * src/couchdb/couch_server.erl |
| Comment by Karen Zeller [ 26/Oct/12 ] |
|
RN text: Querying a view during rebalance had resulted in several errors due to the way database
storage files were managed and named. We now exclude opening databases that are meant to be excluded from indexing and we now synchronously open databases for indexing to resolve the problem. |
| Comment by Aleksey Kondratenko [ 14/Nov/12 ] |
|
Final relevant commit is this:
commit 8ba83d2df2b35ebd8fbc621dcad193aec2255fe5 Author: Aliaksey Kandratsenka <alk@tut.by> Date: Mon Aug 27 17:44:43 2012 -0700 As of http://review.couchbase.org/20223 couchdb supports listing all known vbuckets with given prefix. That allows us to only touch vbuckets of bucket we need. And it also opens all databases at startup. That allows us to avoid using slow all_databases call. Change-Id: I656000e408af6b977cb27e981d216f6ea11ac0cb Reviewed-on: http://review.couchbase.org/20224 Reviewed-by: Damien Katz <damien@couchbase.com> Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> So we first fixed Underlying issue was due to race in a way couch implemented all_databases that ns_server used to enumerate all physically present vbuckets. Particularly it used readdir which is known to be not atomic with respect to renames. And in practice it raced with final rename at the end of compaction causing some vbuckets to disappear. Which caused view queries to fail because we incorrectly removed valid vbucket from set of indexed vbuckets in all views. Fix was to stop doing slow and unreliable all_databases and just keep list of all open vbuckets in public ordered ets table. Thus we fixed race above and at the same time we've got efficient way to list just vbuckets of some given bucket. In this bug ( commit fa5b6fe30dbf84e03dec273c92ba6e17b2621ff7 Author: Aliaksey Kandratsenka <alk@tut.by> Date: Wed Sep 12 18:30:52 2012 -0700 Problem that we see in is scanning couch_dbs_by_name ets table. Given that previously we also stored dbs being opened that lead to condition where some vbuckets would be thought as present while in fact they are simply being tried to be opened and do not actually exist. Given that we don't need async db opening in couchbase fork of couchdb for quite some time it seems logical to just get rid of async open and thus never have onopened dbs in ets table. Change-Id: I5d1dad5f60c64d197e143cf4a7be1996a4fc4ea2 Reviewed-on: http://review.couchbase.org/20812 Reviewed-by: Damien Katz <damien@couchbase.com> Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> As part of doing this work we also removed legacy couchdb behavior where some unused dbs were closed. I.e.: commit 261348623337e84e38635c2470c95bec3b7aa757 Author: Aliaksey Kandratsenka <alk@tut.by> Date: Wed Sep 12 18:34:04 2012 -0700 Instead of {opened,..} tuple. Change-Id: Ie88c1137fa55f55e8b9f1128c12f15750d4b36bd Reviewed-on: http://review.couchbase.org/20813 Reviewed-by: Damien Katz <damien@couchbase.com> Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> commit 7587aa0cf6af53e49487771d1ab31eb425b29cee Author: Aliaksey Kandratsenka <alk@tut.by> Date: Wed Sep 12 18:14:32 2012 -0700 Because that's long unused and simplifies things for actual fix. Change-Id: Ie8ac7e5f6d03e962f181d84461cc26ee53cb7309 Reviewed-on: http://review.couchbase.org/20811 Reviewed-by: Damien Katz <damien@couchbase.com> Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> And after all that we found another issue in original implementation: commit 7a0f8699723d3557bcc71d1964464a0e1248c7cf Author: Aliaksey Artamonau <aliaksiej.artamonau@gmail.com> Date: Fri Oct 26 13:56:30 2012 -0700 Since recently couch_dbs_by_name contains only PID and nothing more. It's not clear though if this actually fixes referred bug. Change-Id: Ifb3a0bd9a9c011c4836e0513a4adbaba94c3c8c1 Reviewed-on: http://review.couchbase.org/22021 Reviewed-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> Reviewed-by: Filipe David Borba Manana <fdmanana@gmail.com> |