[MB-4781] start_key_docid returns unexpected (or unintuitive) results Created: 07/Feb/12  Updated: 09/Jan/13  Due: 07/Feb/12  Resolved: 17/Feb/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0-developer-preview-4
Fix Version/s: 2.0-developer-preview-4
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Tommie McAfee Assignee: Benjamin Young
Resolution: Fixed Votes: 0
Labels: 2.0-dev-preview-4-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 6 node cluster .deb build 653

Attachments: Zip Archive 10.1.2.104-8091-diag.txt.zip     Zip Archive 10.1.2.105-8091-diag.txt.zip     Zip Archive 10.1.2.106-8091-diag.txt.zip     Zip Archive 10.1.2.107-8091-diag.txt.zip     Zip Archive 10.1.2.108-8091-diag.txt.zip     Zip Archive 10.1.2.109-8091-diag.txt.zip     File query_with_key     File query_with_key_and_startkeydocid    

 Description   
Below are the results from a query that returns multiple rows for a single key. The third document has id "0-8fbe114" but if I apply the start_key_docid filter I still get the same exact results with the list starting at doc id "0-14479a7." although the view should've returned a subset. I also tried start_key_docid in combination with start_key and end_key but neither of these returns a subset starting at the requested docid.

QUERY WITH KEY = [2008,11,1]
curl "http://10.1.2.104:8092/default/_design/dev_test_view-ed4bf91/_view/dev_test_view-ed4bf91?full_set=true&key=%5B2008%2C11%2C1%5D&connection_timeout=60000&debug=true" > query_with_key

{"id":"0-2857e2f","key":[2008,11,1],"value":{"_id":"0-2857e2f","_rev":"1-000040a864a752c20000024c00000000","$flags":0,"$expiration":0,"name":"employee-0-...
{"id":"0-2857e2f","key":[2008,11,1],"value":{"_id":"0-328e876","_rev":"1-000040a8b60c1357000001b800000000","$flags":0,"$expiration":0,"name":"employee-0-...
{"id":"0-91f1a76","key":[2008,11,1],"value":{"_id":"0-91f1a76","_rev":"1-000040a6f0ddebd30000023500000000","$flags":0,"$expiration":0,"name":"employee-0-...
{"id":"1-2857e2f","key":[2008,11,1],"value":{"_id":"1-2857e2f","_rev":"1-000040a86e8ccc2d0000024c00000000","$flags":0,"$expiration":0,"name":"employee-1-...
{"id":"1-2857e2f","key":[2008,11,1],"value":{"_id":"1-328e876","_rev":"1-000040a8bffda165000001b800000000","$flags":0,"$expiration":0,"name":"employee-1-...
{"id":"1-91f1a76","key":[2008,11,1],"value":{"_id":"1-91f1a76","_rev":"1-000040a7b360b9840000023500000000","$flags":0,"$expiration":0,"name":"employee-1-...
{"id":"10-2857e2f","key":[2008,11,1],"value":{"_id":"10-2857e2f","_rev":"1-000040a9848f822f0000024f00000000","$flags":0,"$expiration":0,"name":"employee-10-...


QUERY WITH KEY = [2008,11,1] and START_KEY_DOCID = "0-8fbe114"
curl "http://10.1.2.104:8092/default/_design/dev_test_view-ed4bf91/_view/dev_test_view-ed4bf91?full_set=true&key=%5B2008%2C11%2C1%5D&start_key_docid=%220-8fbe114%22&connection_timeout=60000&limit=10&skip=0" > query_with_startkeydocid

   ....results are the same as previous query, although I expected them to start with the requested doc_id...

Also noticed that "id" and "_id" are mismatch - not sure if that has something to do with the behavior of this filter.




 Comments   
Comment by Filipe Manana [ 07/Feb/12 ]
"start_key_doc_id" (same as startkey_docid) is meant to be used together with "start_key" (same as startkey), not "key".

The _id is because you're apparently emitting the documents themselves as map values. This is how it works in the Couch since ever, general rule: meta information in docs has a _ prefix, everywhere else (views, changes feed) it doesn't.
Comment by Tommie McAfee [ 07/Feb/12 ]
Right, I was advised to try with start_key, but results are the same...and are not starting at requested id.

Perhaps only the filters that can be used in conjunction with say "key" should be selectable in the UI. Otherwise a non-couch user may be expecting these filters to do something.
Comment by Filipe Manana [ 07/Feb/12 ]
Tommie, you specified "start_key_docid" - this doesn't exist - use "startkey_docid" or "start_key_doc_id".

Originally, in couch every name uses the _ logic to separate words - all except startkey, endkey and startkey_docid and endkey_docid. For these 4, the aliases "start_key", "end_key", "start_key_doc_id" and "end_key_doc_id" were added upstream (I did it) and to our codebase.

Comment by Tommie McAfee [ 07/Feb/12 ]
UI bug there in variable naming as this "start_key_docid" was added to query via couchbase 2.0 UI.

Also, I tried using startkey_docid and "start_key_doc_id" , but neither seem to be filtering the results.

Comment by Filipe Manana [ 07/Feb/12 ]
Tommie, do you think you can write a simple script to create that dataset and do the 2 queries?
I would like to try it locally.
thanks
Comment by Farshid Ghods (Inactive) [ 08/Feb/12 ]
Tommie,

can you please provide a test case which Filipe can run against cluster_run with one node ?
Comment by Filipe Manana [ 09/Feb/12 ]
Waiting for the testrunner test or a standalone script to reproduce.
Comment by Tommie McAfee [ 09/Feb/12 ]
Filipe, maybe you also have some data to give this a quick try.
  I tried it on a simple set of integers from data loaded in test runner and doesn't look like its working (startkey_docid = 684a59d-480)

http://10.17.3.56:9500/default/_design/dev_test_view-684a59d/_view/dev_test_view-684a59d?start_key=200&startkey_docid=%22684a59d-480%22&connection_timeout=60000&limit=10&skip=0

{"total_rows":61,"rows":[
{"id":"684a59d-262","key":262,"value":null},
{"id":"684a59d-480","key":480,"value":null},
{"id":"684a59d-510","key":510,"value":null},
.....

also tried start_key_doc_id and start_key_docid
Comment by Farshid Ghods (Inactive) [ 09/Feb/12 ]
i also noticed that debug=true does not return any extra info .

http://10.17.3.56:9500/default/_design/dev_test_view-684a59d/_view/dev_test_view-684a59d?start_key=200&startkey_docid=%22684a59d-480%22&connection_timeout=60000&limit=10&skip=0&debug=true

{"total_rows":61,"rows":[
{"id":"684a59d-262","key":262,"value":null},
{"id":"684a59d-480","key":480,"value":null},
{"id":"684a59d-510","key":510,"value":null},
{"id":"684a59d-661","key":661,"value":null},
{"id":"684a59d-944","key":944,"value":null},
{"id":"684a59d-1175","key":1175,"value":null},
{"id":"684a59d-1204","key":1204,"value":null},
{"id":"684a59d-1394","key":1394,"value":null},
{"id":"684a59d-1576","key":1576,"value":null},
{"id":"684a59d-1607","key":1607,"value":null}
]
}
Comment by Tommie McAfee [ 14/Feb/12 ]
Hi Filipe,

Still not getting the start_key_docid filter to work as expected. There is now a test in testrunner that you can use to reproduce this:

python testrunner -i <resource_file> -t viewquerytests.ViewQueryTests.test_simple_dataset_startkey_endkey_docid_queries

2012-02-13 19:41:14,686 - root - INFO - Quering view dev_test_view-11f6a22 with params: {'debug': 'true', 'start_key': 5000, 'startkey_docid': '"11f6a22-5100"'}
2012-02-13 19:41:14,687 - root - INFO - Params {'debug': 'true', 'start_key': 5000, 'connection_timeout': 60000, 'startkey_docid': '"11f6a22-5100"', 'full_set': True}
2012-02-13 19:41:14,687 - root - INFO - index query url: http://10.2.2.10:8091/couchBase/default/_design/dev_test_view-11f6a22/_view/dev_test_view-11f6a22?debug=true&start_key=5000&connection_timeout=60000&startkey_docid="11f6a22-5100"&full_set=true
2012-02-13 19:41:14,906 - root - INFO - view returned in 0.21882891655 seconds
2012-02-13 19:41:14,906 - root - INFO - was able to get view results after trying 1 times
2012-02-13 19:41:14,917 - root - INFO - key_set has 5000 elements
2012-02-13 19:41:14,917 - root - INFO - retrieved 5000 keys expected: 4900
Comment by Tommie McAfee [ 16/Feb/12 ]
Filipe,

I have this query result from using start_key = 20:

{"total_rows":30000,"rows":[
{"id":"3a25fe7-20","key":20,"value":null},
{"id":"da9d0f6-20","key":20,"value":null},
{"id":"eda9e3d-20","key":20,"value":null},
{"id":"3a25fe7-21","key":21,"value":null},
{"id":"da9d0f6-21","key":21,"value":null},
{"id":"eda9e3d-21","key":21,"value":null},
{"id":"3a25fe7-22","key":22,"value":null},
{"id":"da9d0f6-22","key":22,"value":null},
{"id":"eda9e3d-22","key":22,"value":null},
{"id":"3a25fe7-23","key":23,"value":null}
]
}

attempting to set start_key_docid to "da9d0f6-20" returns the same number of rows, but the first duplicate key should be skipped:

http://127.0.0.1:9500/default/_design/dev_test_view-9460592/_view/dev_test_view-9460592?full_set=true&debug=true&start_key=40&start_key_docid=%22a67408a-40%22&connection_timeout=60000&limit=10&skip=0



Comment by Tommie McAfee [ 16/Feb/12 ]
This is basically a bug with UI because it uses start_key_docid instead of 'startkey_docid'

this query works -

http://127.0.0.1:9500/default/_design/dev_test_view-9460592/_view/dev_test_view-9460592?full_set=true&startkey=40&startkey_docid=a67408a-40&connection_timeout=60000&limit=10&skip=0
Comment by Benjamin Young [ 17/Feb/12 ]
Yeah, looks like it could have also been start_key_doc_id (per Filipe's first comment).

Will fix.
Comment by Benjamin Young [ 17/Feb/12 ]
Resolved: http://review.couchbase.org/13336

Feel free to close this ticket (or re-open it) based on the final review/merging.
Comment by Thuan Nguyen [ 17/Feb/12 ]
Integrated in github-ns-server-2-0 #303 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/303/])
    removing underscores from startkey/endkey fields. MB-4781 (Revision 9bc2bde6f88d43b273f7278e18a91c3871404cf2)

     Result = SUCCESS
Aliaksey Kandratsenka :
Files :
* priv/public/index.html
Comment by francares [ 28/Mar/12 ]
Does anyone know if this bug was fixed in DP4?
I´m using Couchbase version: 2.0.0 community edition (build-724) and still happens. I´m using startkey_docid and startkey query params.
Comment by Tommie McAfee [ 28/Mar/12 ]
Yes, this was fixed in dp4. What do you're id's and keys look like?
Depending on your docids, you should not have quote's around the startkey_docid, even if the id's are strings.
Comment by francares [ 28/Mar/12 ]
They are GUIDs.

When I call to the view with following URLs:
http://10.230.58.221:8092/test/_design/dev_appsByCategory/_view/appsByCategory?startkey_docid=%2200%22&connection_timeout=60000&limit=10&skip=0

or

http://10.230.58.221:8092/test/_design/dev_appsByCategory/_view/appsByCategory?startkey_docid=00&connection_timeout=60000&limit=10&skip=0

It returns keys like 03057CA7-5F27-4364-87FD-892548D8CB43, so the filter is not performed in the view.
Comment by francares [ 28/Mar/12 ]
Same happens with string type keys.
Comment by Tommie McAfee [ 29/Mar/12 ]
Well, couple of things here, as I also thought this was unintuitive before understanding how this used to work in couchdb.

Using startkey_docid requires 2 things:
1 that the startkey filter is also used in the same query
2 that the results returned from using startkey contain duplicate keys

so if I have:
{
   "key0" : "val0"
   "key1" : "val1" <_id = k1v1>
   "key1" : "val2" <_id = k1v2>
   "key1" : "val3" <_id = k1v3>
   "key2" : "val4"
}

I can do something like
starkey = key1, startkey_doid = k1v2

and my results would be
{
   "key1" : "val2" <_id = k1v2>
   "key1" : "val3" <_id = k1v3>
   "key2" : "val4"
}

Could be in your case the only thing you need is startkey if all your map functions are emitting unique keys.
Generated at Sat Aug 30 10:28:58 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.