N1Ql query performance with IN clause

manish_bansal · December 11, 2019, 12:10pm

We have below N1ql query and corresponding index which is performing as expected.
Query:

select * from bucket WHERE TRUE AND type = ‘ORDER’ AND orderType = “X” AND
orderDetails.code IN [“100”]
ORDER BY type ASC, orderType ASC, orderDetails.code ASC LIMIT 200 OFFSET 0
Index:

CREATE INDEX idx_01ONbucket(type,orderType,(orderDetails.code),(orderDetails.status),prodNum) WHERE (type = "ORDER")

Execution time: 25 ms

Query 2:

select * from bucket WHERE TRUE AND type = ‘ORDER’ AND orderType = “X” AND
orderDetails.code IN [“100”,“200”]
ORDER BY type ASC, orderType ASC, orderDetails.code ASC LIMIT 200 OFFSET 0

Execution time: 10 sec

Total records: 75k

Question: why is it taking almost 1000 times for second query? Is it possible to improve performance?

vsr1 · December 11, 2019, 1:17pm

What version of the couchbase. You should try with latest version. Execute the query in query workbench and post the output from PlanText tab

manish_bansal · December 11, 2019, 2:08pm

This is 5.0.1 Community edition.

Plan for Query 2

{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IndexScan2”,
“index”: “idx_01”,
“index_id”: “2cb329ed9071ba64”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “bucket”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: “"ORDER"”,
“inclusion”: 3,
“low”: “"ORDER"”
},
{
“high”: “"X"”,
“inclusion”: 3,
“low”: “"X"”
},
{
“high”: “"100"”,
“inclusion”: 3,
“low”: “"100"”
}
]
},
{
“exact”: true,
“range”: [
{
“high”: “"ORDER"”,
“inclusion”: 3,
“low”: “"ORDER"”
},
{
“high”: “"X"”,
“inclusion”: 3,
“low”: “"X"”
},
{
“high”: “"101"”,
“inclusion”: 3,
“low”: “"101"”
}
]
}
],
“using”: “gsi”
},
{
“#operator”: “Fetch”,
“keyspace”: “bucket”,
“namespace”: “default”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Filter”,
“condition”: “(((true and ((bucket.type) = "ORDER")) and ((bucket.orderType) in ["X"])) and (((bucket.orderDetails).code) in ["100", "101"]))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“as”: “_ID”,
“expr”: “(meta(bucket).id)”
},
{
“as”: “_CAS”,
“expr”: “(meta((bucket.bucket\r\n)).cas)”
},
{
“expr”: “bucket”,
“star”: true
}
]
}
]
}
}
]
},
{
“#operator”: “Order”,
“limit”: “200”,
“sort_terms”: [
{
“expr”: “(bucket.type)”
},
{
“expr”: “(bucket.orderType)”
},
{
“expr”: “((bucket.orderDetails).code)”
}
]
},
{
“#operator”: “Limit”,
“expr”: “200”
},
{
“#operator”: “FinalProject”
}
]
},
“text”: “SELECT META(bucket).id as _ID, META(bucket\r\n).cas as _CAS, bucket.* FROM bucket WHERE TRUE AND type = ‘ORDER’ AND orderType IN ["X"] AND\r\norderDetails.code IN ["100","101"]\r\nORDER BY type ASC, orderType ASC, orderDetails.code ASC LIMIT 200 OFFSET 0”
}

Plan for Query 1:

{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IndexScan2”,
“index”: “idx_01”,
“index_id”: “2cb329ed9071ba64”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “bucket”,
“limit”: “200”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: “"ORDER"”,
“inclusion”: 3,
“low”: “"ORDER"”
},
{
“high”: “"X"”,
“inclusion”: 3,
“low”: “"X"”
},
{
“high”: “"100"”,
“inclusion”: 3,
“low”: “"100"”
}
]
}
],
“using”: “gsi”
},
{
“#operator”: “Fetch”,
“keyspace”: “bucket”,
“namespace”: “default”
},
{
“#operator”: “Parallel”,
“maxParallelism”: 1,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Filter”,
“condition”: “(((true and ((bucket.type) = "ORDER")) and ((bucket.orderType) in ["X"])) and (((bucket.orderDetails).code) in ["100"]))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“as”: “_ID”,
“expr”: “(meta(bucket).id)”
},
{
“as”: “_CAS”,
“expr”: “(meta((bucket.bucket\r\n)).cas)”
},
{
“expr”: “bucket”,
“star”: true
}
]
},
{
“#operator”: “FinalProject”
}
]
}
}
]
},
{
“#operator”: “Limit”,
“expr”: “200”
}
]
},
“text”: “SELECT META(bucket).id as _ID, META(bucket\r\n).cas as _CAS, bucket.* FROM bucket WHERE TRUE AND type = ‘ORDER’ AND orderType IN ["X"] AND\r\norderDetails.code IN ["100"]\r\nORDER BY type ASC, orderType ASC, orderDetails.code ASC LIMIT 200 OFFSET 0”
}

vsr1 · December 11, 2019, 4:39pm

IN caluse has more than 1 element. So it must produce all possible values and order the results. This use case has improved in 6.0. You can try 6.0

In 5.1 Use covered query produce document keys and then fetch the documents

SELECT t.*
FROM bucket AS t
USE KEYS ( SELECT RAW d.id
           FROM ( ( SELECT META().id , type, orderType, orderDetails.code
                    FROM bucket
                    WHERE TRUE AND type = "ORDER" AND orderType = "X" AND orderDetails.code = "100"
                    ORDER BY type ASC, orderType ASC, orderDetails.code ASC LIMIT 200 )
                  UNION
                   ( SELECT META().id , type, orderType, orderDetails.code
                     FROM bucket
                     WHERE TRUE AND type = "ORDER" AND orderType = "X" AND orderDetails.code = "200"
                     ORDER BY type ASC, orderType ASC, orderDetails.code ASC LIMIT 200)
                ) AS d
           ORDER BY d.type ASC, d.orderType ASC, d.orderDetails.code ASC LIMIT 200 OFFSET 0)
ORDER BY t.type ASC, t.orderType ASC, t.orderDetails.code ASC;

manish_bansal · December 12, 2019, 10:24am

Thanks @vsr1 for detailed analysis. Unfortunately, i cannot implement any of the solution as my DB is in production so up-gradation of version is not possible as of now. Also, changing all the queries into series of UNION calls will be time consuming.
Further, to confirm on your hypothesis, i removed order by clause from my queries and added couple of more IN clauses. Fortunately, that query responded with ms. The moment i add order by calsue, query takes 10s of secs to respond.

My question is what is it which takes so much time for sorting? AFAIK, indexer keeps the data in sorted manner. Since, in all my cases, my index is getting used, only thing required is to concatenate the results of multiple calls. Am i missing anything?

vsr1 · December 12, 2019, 1:35pm

Once you have order It must produce all possible values. Last value may be first in the sort.
Query can use index order only if it doesn’t change the order in the process. In 6.0 this has been taken care.

manish_bansal · January 6, 2020, 9:57am

@vsr1 We tried with 6.0 CE version. However, it performed even worst. Below is the way this issue can be simulated.
Import travel-sample data-set with ram quota as below for different services.

data = 4732 mb
index = 512
search = 512

Execute below queries to create the data for current use case.

update travel-sample set orderType = ‘T_ORDER’ where type = ‘route’ and id >= 35000
mutations: 10748
update travel-sample set orderType = ‘C_ORDER’ where type = ‘route’ and id < 35000
mutations: 13276

update travel-sample set orderType = ‘T_ORDER’ where type in [‘hotel’,‘landmark’];
mutations: 5412
update travel-sample set orderType = ‘C_ORDER’ where type in [‘airline’,‘airport’];
mutations: 2155

update travel-sample set brandCode = ‘11’ where type <> ‘route’;
mutations: 7567
update travel-sample set brandCode = ‘26’ where type = ‘route’;
mutations: 24024

create index idx_travel_sample_brandcode_ordertype_01 on travel-sample (brandCode,orderType);

Query 1:
select * from travel-sample where brandCode in [‘11’] and orderType in [‘C_ORDER’]
order by brandcode asc,orderType asc limit 200

Query 2:
select * from travel-sample where brandCode in [‘11’,‘26’] and orderType in [‘C_ORDER’,‘T_ORDER’]
order by brandcode asc,orderType asc limit 200

Performance data
CE 5.1.1 build 5723 - Query 1 = 632 ms ; Query 2: < 10 sec
CE 6.0.0 build 1693 - Query 1 = 1.9 sec; Query 2: > 30 sec

manish_bansal · January 6, 2020, 10:14am

Further, to add, explain plan has no change. It is same in both version 5 and 6.

Topic		Replies	Views
Query with "ORDER BY" and "IN" operator very slow SQL++ query , n1ql	4	2340	March 8, 2018
Why IN cluse is such peneltiy SQL++ query	1	1222	November 4, 2016
Performance of single vs multiple n1ql queries for small data sets SQL++	6	2444	August 23, 2017
Index for query SQL++ n1ql	4	1762	August 26, 2016
Couchbase Query Help SQL++ n1ql	4	1199	March 8, 2017

N1Ql query performance with IN clause

Related topics