Order by makes query very slow

Hi,

I have a bucket named user with 160k records in it. When I run query to select recebt 10 records from the it, it takes more than 8 sec to select 10 record.

ORDER BY needs to produce all the qualifying records. Check the article and see if you can create index based on query order https://dzone.com/articles/designing-index-for-query-in-couchbase-n1ql

If further help needed please post exact query and sample document

@vsr1

Hi,

sorry for the delay didn’t get time to come here, however query which is creating problem is this

SELECT , (6371acos(cos(radians(28.549056336607507))*cos(radians(place_lat))*cos(radians(place_long)-radians(77.2689280000161))+
sin(radians(28.549056336607507))*sin(radians(place_lat)))) AS distance,(SELECT t2.user_photo,t2.name FROM users t2 use keys pins.user_id
WHERE t2.user_name= pins.user_name) AS details FROM pins USE INDEX (pin_geo_latlon_new USING GSI)
WHERE user_name NOT IN [‘mustanish123’] AND place_lat>=28.546153612815345 AND place_lat<=28.551958980392783 AND
place_long>=77.26699680953641 AND place_long<=77.27085919047727 AND status=1 AND cat_id!=1 ORDER BY distance ASC LIMIT 10 OFFSET 0;

I also want to sort this record based on pin_created_date for that I wrote this indexing

CREATE INDEX my-journey-new ON pins(user_name,place_lat,place_long,pin_create_date DESC) but this is not working

Sample data after running query

[
{
“details”: [
{
“name”: “Abhinav”,
“user_photo”: “Abhi.mmi.jpg”
}
],
“distance”: 0.026640982767172527,
“pins”: {
“action”: {
“commentCount”: 0,
“flagCount”: 0,
“likeCount”: 0
},
“alt”: 0,
“cat_id”: 0,
“cat_name”: null,
“child_cat_id”: 0,
“child_cat_name”: null,
“course”: 0,
“expiry_date”: 0,
“flag”: 0,
“images”: [],
“modified_by”: null,
“name”: “Abhinav”,
“photo”: “Abhi.mmi.jpg”,
“pin_create_date”: “2016-10-14 11:02:06”,
“pin_id”: “pin3100259512”,
“pin_privacy”: 0,
“pin_rate”: 1,
“pin_story”: “Grt”,
“pin_story_title”: null,
“pin_type”: “review”,
“place_id”: “2FHGLJ”,
“place_lat”: 28.549148,
“place_long”: 77.268676,
“place_name”: “Navketan Engineering Corporation”,
“point”: 0,
“progress”: 0,
“quality”: 0,
“social_id”: “0”,
“speed”: 0,
“status”: 1,
“user_id”: “4d01892401d0cbe60835148cc4a09737”,
“user_name”: “abhi.mmi”,
“utc”: 0
}
},
{
“details”: [
{
“name”: “lab”,
“user_photo”: “lab1235a813709b5436.jpg”
}
],
“distance”: 0.04750143392658823,
“pins”: {
“action”: {
“commentCount”: null,
“flagCount”: null,
“likeCount”: null
},
“alt”: 0,
“cat_id”: 3,
“cat_name”: “Smart City Issue”,
“child_cat_id”: 19,
“child_cat_name”: “Electricity Failure”,
“course”: 0,
“expiry_date”: 1834224456,
“flag”: 0,
“identifier”: “faa6daf3b45b46a2bf02ad1798ff428c”,
“images”: [
{
“created_on”: “2018-02-15 16:17:37”,
“photo”: “9b5a8bc969f476cd.jpg”,
“status”: 0,
“user_name”: “lab123”
}
],
“modified_by”: “”,
“name”: “lab”,
“old_pin_id”: 35166,
“photo”: “lab1235a813709b5436.jpg”,
“pin_create_date”: “2018-02-15 16:17:36”,
“pin_id”: “pin73012bf47bb9399a235c”,
“pin_privacy”: 0,
“pin_rate”: 0,
“pin_story”: “Bhruutbtbtyi”,
“pin_story_title”: “”,
“pin_type”: “report”,
“place_id”: “0”,
“place_lat”: 28.5491023,
“place_long”: 77.2684445,
“place_name”: “Reported Near 226, Okhla Industrial Estate Phase 3, New Delhi, Delhi. 5 M From Airtel Parking Area Pin 110020”,
“point”: 0,
“progress”: 0,
“quality”: 0,
“social_id”: 104,
“speed”: 0,
“status”: 1,
“user_id”: “205000fd12d7e7ee820c351e18899406”,
“user_name”: “lab123”,
“utc”: 0
}
}
]

ORDER BY needs to produce all the qualified rows before applying LIMIT or OFFSET because lost row may come first in the ORDER.

So their is no way to optimize order by queries in n1ql but creating index similar to this CREATE INDEX my-journey-new ON pins(user_name,place_lat,place_long,pin_create_date DESC) reduced execution time to 30 ms for another query but how do I optimize this query.I mean their must be a way to optimize this.

ORDER BY needs to produce all the qualified rows before applying LIMIT or OFFSET because lost row may come first in the ORDER. If query ORDER BY matches with index key order left to right and index or query doesn’t change order of items produced by indexer we will use index order avoid sort (query can’t have NOT, OR, IN predicates, no JOINS, no GROUP BY).

Your query doesn’t qualify it needs produce all qualified documents.

Thanks for explaining that one more thing where can I learn more about scan in couchbase I mean this
simple select * from pins where user_name NOT IN[‘mustanish123’] LIMIT 10 on explain does union scan but When IN operator is used exlain says Indexscan2.I hope I made my question clear

https://developer.couchbase.com/documentation/server/current/performance/index-scans.html

You are using NOT IN operator.

user_name NOT IN[‘mustanish123’] is same as
user_name < ‘mustanish123’ OR user_name > ‘mustanish123’

Couchbase IndexScans are range scans those are converted to equivalent range to minimize scans.

Hi,

SELECT p.pin_id,p.place_id,p.place_name,p.pin_story,p.pin_rate,p.cat_name,p.child_cat_name,p.pin_create_date,p.place_lat,p.place_long,p.pin_type,p.user_name,p.images, (acos(sin(RADIANS(28.549056336607507))*sin(RADIANS(p.place_lat)) + cos( RADIANS(28.549056336607507))*cos(RADIANS(p.place_lat))*cos (RADIANS(p.place_long) - RADIANS(77.2689280000161))) * 6371) AS distance,u.user_photo,u.name FROM pins AS p INNER JOIN users AS u ON KEYS p.user_id WHERE p.cat_id = 2 AND p.status=1 ORDER BY distance ASC LIMIT 10 OFFSET 0

Above query sort result based on distance and as suggested by you I am not using NOT IN but this query also taked more than 600ms.

Index used by this query is

CREATE INDEX my_world_idx1 ON pins(cat_id) WHERE (status = 1)

Explain for this query is

{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan2”,
“index”: “my_world_idx1”,
“index_id”: “2a1ca5e34e467f97”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “pins”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: “2”,
“inclusion”: 3,
“low”: “2”
}
]
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“as”: “p”,
“keyspace”: “pins”,
“namespace”: “default”
},
{
"#operator": “Join”,
“as”: “u”,
“keyspace”: “users”,
“namespace”: “default”,
“on_keys”: “(p.user_id)”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((p.cat_id) = 2) and ((p.status) = 1))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “(p.pin_id)”
},
{
“expr”: “(p.place_id)”
},
{
“expr”: “(p.place_name)”
},
{
“expr”: “(p.pin_story)”
},
{
“expr”: “(p.pin_rate)”
},
{
“expr”: “(p.cat_name)”
},
{
“expr”: “(p.child_cat_name)”
},
{
“expr”: “(p.pin_create_date)”
},
{
“expr”: “(p.place_lat)”
},
{
“expr”: “(p.place_long)”
},
{
“expr”: “(p.pin_type)”
},
{
“expr”: “(p.user_name)”
},
{
“expr”: “(p.images)”
},
{
“as”: “distance”,
“expr”: “(acos(((sin(radians(28.549056336607507)) * sin(radians((p.place_lat)))) + ((cos(radians(28.549056336607507)) * cos(radians((p.place_lat)))) * cos((radians((p.place_long)) - radians(77.2689280000161)))))) * 6371)”
},
{
“expr”: “(u.user_photo)”
},
{
“expr”: “(u.name)”
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “10”,
“sort_terms”: [
{
“expr”: “distance
}
]
},
{
"#operator": “Limit”,
“expr”: “10”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT p.pin_id,p.place_id,p.place_name,p.pin_story,p.pin_rate,p.cat_name,p.child_cat_name,p.pin_create_date,p.place_lat,p.place_long,\np.pin_type,p.user_name,p.images, (acos(sin(RADIANS(28.549056336607507))sin(RADIANS(p.place_lat)) + cos( RADIANS(28.549056336607507))\ncos(RADIANS(p.place_lat))*cos (RADIANS(p.place_long) - RADIANS(77.2689280000161))) * 6371) AS distance,u.user_photo,u.name FROM \npins AS p INNER JOIN users AS u ON KEYS p.user_id WHERE p.cat_id=2 AND p.status=1 ORDER BY distance \nASC LIMIT 10 OFFSET 0”
}

Your Query doesn’t qualify. Please Check article https://dzone.com/articles/designing-index-for-query-in-couchbase-n1ql Step 6

Your query ORDER BY distance and distance is calculated in the query projection and NOT a leading Indexkey that has chosen. Also you have JOIN.
So, ORDER BY needs to produce all the qualified rows before applying LIMIT or OFFSET because lost row may come first in the ORDER.

So what should be done to optimize these kinds of query.Their must be a way to optimize.