N1Ql with cover index performance very slow

geraldss · October 4, 2016, 7:35pm

It should be a few days.

rkumar · October 6, 2016, 4:12pm

i’m facing a unique case ,that the index i make in explain it shows up but if i use “use index(xx)” data won’t populate and with other index it populate.

CREATE DEFINITION
create index idx7 on default (SOURCE_ID,PERSON_ID,LOCATION_ID,(distinct array c.QUESTION_ID FOR c IN CONTENTS when c.SENTIMENTS IS MISSING END ))WHERE t.TYPE=“DOCUMENT”
AND t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<‘2015-01-01’;

QUERY is
SELECT t.ID FROM default t
WHERE t.TYPE=“DOCUMENT”
AND t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<‘2015-01-01’
and t.SOURCE_ID=258
and t.PERSON_ID=0
AND t.LOCATION_ID=11931 and
( ANY c IN CONTENTS SATISFIES c.QUESTION_ID=1073 and c.SENTIMENTS IS MISSING END) AND any client in t.CLIENT satisfies client=“retailpsi” end;

EXPLAIN:
[
{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IntersectScan”,
“scans”: [
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan”,
“index”: “idx7”,
“index_id”: “1702691442b51b29”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“258”,
“0”,
“11931”,
“1073”
],
“Inclusion”: 3,
“Low”: [
“258”,
“0”,
“11931”,
“1073”
]
}
}
],
“using”: “gsi”
}
},
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan”,
“index”: “idx_client”,
“index_id”: “f7e9a9cccb98083d”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“"retailpsi"”
],
“Inclusion”: 3,
“Low”: [
“"retailpsi"”
]
}
}
],
“using”: “gsi”
}
}
]
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Fetch”,
“as”: “t”,
“keyspace”: “default”,
“namespace”: “default”
},
{
“#operator”: “Filter”,
“condition”: “(((((((((t.TYPE) = "DOCUMENT") and ("2001-01-01" <= (t.MENTION_TIME))) and ((t.MENTION_TIME) < "2015-01-01")) and ((t.SOURCE_ID) = 258)) and ((t.PERSON_ID) = 0)) and ((t.LOCATION_ID) = 11931)) and any c in (t.CONTENTS) satisfies (((c.QUESTION_ID) = 1073) and ((c.SENTIMENTS) is missing)) end) and any client in (t.CLIENT) satisfies (client = "retailpsi") end)”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(t.ID)”
}
]
},
{
“#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “SELECT t.ID FROM default t\n WHERE t.TYPE="DOCUMENT"\n AND t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<‘2015-01-01’ \n and t.SOURCE_ID=258 \nand t.PERSON_ID=0\nAND t.LOCATION_ID=11931 and\n( ANY c IN CONTENTS SATISFIES c.QUESTION_ID=1073 and c.SENTIMENTS IS MISSING END) AND any client in t.CLIENT satisfies client="retailpsi" end;”
}
]

could you help me out here

geraldss · October 6, 2016, 6:06pm

Hi @rkumar,

Can you upgrade to 4.5.1 so we can continue.

rkumar · October 6, 2016, 6:50pm

@geraldss,

That date in Staging server,so directly we cannot update to 4.5.1.might be fews days i will upgrate.
But is there any way that i can directly upgrate to 4.5.1 without reinstalling or moving the data.
even thou syntax is change in 4.5.1 i guess.

But for above scenario the problem is not COUNT ,the problem with indexing not capture the query requirement.
i need urgently solution for that.plese

Thanks!

geraldss · October 6, 2016, 7:24pm

Ok, let’s try a covering index. What is the execution time?

CREATE INDEX idx5 ON default
(PERSON_ID, LOCATION_ID, MENTION_TIME, ID, CONTENTS)
WHERE MENTION_TIME>='2001-01-01' AND TYPE="DOCUMENT";

EXPLAIN SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),count(t.ID)
FROM default t use index(idx5)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE="DOCUMENT"
AND t.PERSON_ID=143124 AND t.LOCATION_ID=5204 AND s.CATEGORY_ID IN [4000,4001,4002,4003,4004,4005,4010,4011,4012,4013]
and t.MENTION_TIME >='2001-01-01' and t.MENTION_TIME<'2015-01-01' ;

rkumar · October 6, 2016, 7:43pm

Hey @geraldss,

I think you pick the wrong one.
Again i repete the whole thing for you,so that it will be clear for you .

Problem: i have a query and for that i create index to satisfied the requirement.
but it will not take that index,and by forcing index with keyword use index(),it wont populate the data.
Query:
SELECT round(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),count(distinct t.ID),t.PERSON_ID
FROM default t use index(idx8)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE=“DOCUMENT”

and t.SOURCE_ID=246
AND t.LOCATION_ID=2286
and c.QUESTION_ID=969
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,4036,4037,4038,4039,4040]
and t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<'2015-01-01’and any client in t.CLIENT satisfies client =“novamed” end group by (t.PERSON_ID) ;

Index Definition:
CREATE INDEX idx8 ON default(LOCATION_ID,SOURCE_ID,PERSON_ID>0) WHERE ((“2001-01-01” <= MENTION_TIME) and (TYPE = “DOCUMENT”))

Time for Execution: 8+ seconds with 1M records.
I dont know what is the problem,but it sould not take this much time.
Please guide me where m doing wrong.

geraldss · October 6, 2016, 7:50pm

Ok, I will take a look. Meanwhile, can you run the index5 and query that I just sent you. It will help me figure things out. Please post the execution time.

rkumar · October 6, 2016, 8:14pm

Hi @geraldss,

It take huge time to buildup,after this much time only 5% ,so it will take complete night to build ,so by tomorrow iwill update you on this ,apart of that i need to optimized that above problem.

geraldss · October 6, 2016, 10:41pm

Why do you have PERSON_ID > 0 in your CREATE INDEX?

rkumar · October 7, 2016, 4:51am

@geraldss,

we try to create a index which give me low latency,in that exercise i did that,but not sure its the best condition.
Our condition is that person_id must be greater the 0,so i put in index.

Regards,
rahul

geraldss · October 7, 2016, 5:16pm

Here you go.

CREATE INDEX idx8 ON default(LOCATION_ID,SOURCE_ID,PERSON_ID)
WHERE (("2001-01-01" <=MENTION_TIME) and (TYPE = "DOCUMENT"));


SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),COUNT(DISTINCT t.ID),t.PERSON_ID
FROM default t USE INDEX(idx8)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE="DOCUMENT"
AND t.SOURCE_ID=246
AND t.LOCATION_ID=2286
AND c.QUESTION_ID=969
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,\
4036,4037,4038,4039,4040]
AND t.MENTION_TIME >='2001-01-01' AND t.MENTION_TIME<'2015-01-01' AND ANY client IN t.CLIENT SATISFIES client ="novamed" END
GROUP BY (t.PERSON_ID);

rkumar · October 17, 2016, 6:27am

Hi Geraldss,

Thank you for above help.but some scenario came into picture that is below.

If using filter if data didn’t populate ,execution time is in MS() milli seconds
if data populate using filter value have two scene.
2a. If personId is zero,then only execution time is in ms,
2b. if personId is not zero,they took 2-3 seconds to execute.

Apart of that i have some more question
1.we are using 16gb ram memory with single node,so if we increase the node and distribute the data over the node,execution time will have impact and how.

Thanks & Regards,
rahul

geraldss · October 17, 2016, 2:09pm

For 2b, send the EXPLAIN plan.

Your memory, 16gb, sounds low.

rkumar · October 19, 2016, 9:37am

Hey @geraldss,

Below is what you asked:
Note: 1. i used index idx8,which you mention above.
2. i upgrade to CB v 4.5.1

explain SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),COUNT(DISTINCT t.ID),t.PERSON_ID
FROM default t use index(idx8)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE=“DOCUMENT”
AND t.SOURCE_ID=246
AND t.LOCATION_ID=2286
AND c.QUESTION_ID=969
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,
4036,4037,4038,4039,4040]
AND t.MENTION_TIME >=‘2001-01-01’ AND t.MENTION_TIME<‘2015-01-01’ AND ANY client IN t.CLIENT SATISFIES client =“novamed” END
GROUP BY (t.PERSON_ID);

EXPLAIN

[
{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IndexScan”,
“index”: “idx8”,
“index_id”: “fcedf99f578eb9ce”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“2286”,
“successor(246)”
],
“Inclusion”: 1,
“Low”: [
“2286”,
“246”
]
}
}
],
“using”: “gsi”
},
{
“#operator”: “Fetch”,
“as”: “t”,
“keyspace”: “default”,
“namespace”: “default”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Unnest”,
“as”: “c”,
“expr”: “(t.CONTENTS)”
},
{
“#operator”: “Unnest”,
“as”: “s”,
“expr”: “(c.SENTIMENTS)”
},
{
“#operator”: “Filter”,
“condition”: “(((((((((t.TYPE) = "DOCUMENT") and ((t.SOURCE_ID) = 246)) and ((t.LOCATION_ID) = 2286)) and ((c.QUESTION_ID) = 969)) and ((s.CATEGORY_ID) in [4001, 4002, 4003, 4004, 4005, 4010, 4011, 4012, 4013, 4014, 4015, 4016, 4017, 4018, 4019, 4020, 4021, 4022, 4023, 4024, 4025, 4026, 4027, 4028, 4029, 4030, 4031, 4032, 4033, 4034, 4035, 4036, 4037, 4038, 4039, 4040])) and ("2001-01-01" <= (t.MENTION_TIME))) and ((t.MENTION_TIME) < "2015-01-01")) and any client in (t.CLIENT) satisfies (client = "novamed") end)”
},
{
“#operator”: “InitialGroup”,
“aggregates”: [
“avg((s.SCORE))”,
“count((c.SENTIMENTS))”,
“count(distinct (t.ID))”
],
“group_keys”: [
“(t.PERSON_ID)”
]
}
]
}
},
{
“#operator”: “IntermediateGroup”,
“aggregates”: [
“avg((s.SCORE))”,
“count((c.SENTIMENTS))”,
“count(distinct (t.ID))”
],
“group_keys”: [
“(t.PERSON_ID)”
]
},
{
“#operator”: “FinalGroup”,
“aggregates”: [
“avg((s.SCORE))”,
“count((c.SENTIMENTS))”,
“count(distinct (t.ID))”
],
“group_keys”: [
“(t.PERSON_ID)”
]
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “round(avg((s.SCORE)), 1)”
},
{
“expr”: “count((c.SENTIMENTS))”
},
{
“expr”: “count(distinct (t.ID))”
},
{
“expr”: “(t.PERSON_ID)”
}
]
},
{
“#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),COUNT(DISTINCT t.ID),t.PERSON_ID\nFROM default t use index(idx8)\nUNNEST t.CONTENTS c\nUNNEST c.SENTIMENTS s\nWHERE t.TYPE="DOCUMENT"\nAND t.SOURCE_ID=246\nAND t.LOCATION_ID=2286\nAND c.QUESTION_ID=969\nAND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,\\n4036,4037,4038,4039,4040]\nAND t.MENTION_TIME >=‘2001-01-01’ AND t.MENTION_TIME<‘2015-01-01’ AND ANY client IN t.CLIENT SATISFIES client ="novamed" END\nGROUP BY (t.PERSON_ID);”
}
]

EXECUTION TIME: 8+ seconds
Regards,
rahul

geraldss · October 19, 2016, 2:29pm

Please try each of these indexes separately, using a USE INDEX clause for each.

CREATE INDEX idx_question ON default
( DISTINCT ARRAY c.QUESTION_ID FOR c IN CONTENTS END )
WHERE TYPE = "DOCUMENT" AND MENTION_TIME >= "2001-01-01";

CREATE INDEX idx_category ON default
( DISTINCT ARRAY ( DISTINCT ARRAY s.CATEGORY_ID FOR s IN c.SENTIMENTS END ) FOR c IN CONTENTS END )
WHERE TYPE = "DOCUMENT" AND MENTION_TIME >= "2001-01-01";

CREATE INDEX idx_client ON default
( DISTINCT ARRAY client FOR client IN CLIENT END )
WHERE TYPE = "DOCUMENT" AND MENTION_TIME >= "2001-01-01";

rkumar · October 20, 2016, 7:24pm

Hi @geraldss,

I did the same what you mention above ,but only idx_question index is seem to working with 6+ seconds,but with combination with idx8 &,idx_question its give me 3.6 seconds.

CREATE INDEX idx8 ON default(LOCATION_ID,SOURCE_ID,PERSON_ID)
WHERE ((“2001-01-01” <=MENTION_TIME) and (TYPE = “DOCUMENT”));

SELECT round(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),count(distinct t.ID),t.PERSON_ID
FROM default t use index(idx8,idx_question)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE=“DOCUMENT”
and t.SOURCE_ID=289
AND t.LOCATION_ID=3600
and c.QUESTION_ID=1347
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,4036,4037,4038,4039,4040]
and t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<'2015-01-01’and any client in t.CLIENT satisfies client =“retailpsi” end group by (t.PERSON_ID) ;

i tried below index as well:
Index: CREATE INDEX idx_for_phy ON default(LOCATION_ID,SOURCE_ID,(distinct (array client for client in CLIENT end))) WHERE ((“2001-01-01” <= MENTION_TIME) and (TYPE = “DOCUMENT”))

but still with different filer data i’m getting different execution time.

we are working with hard code data with 1 million and we have 30 million of records,will it worth it.

Please help us.
Regrdas,
Rahul

geraldss · October 20, 2016, 7:37pm

Hi @vsr1 @keshav_m, please help @rkumar.

He needs pretty=false, max_parallelism, and index selection.

geraldss · October 20, 2016, 7:43pm

Rahul, one other thing. Your test data should have the same distribution as realistic production data. You should not generate test data with all the same values, for example.

rkumar · October 20, 2016, 7:50pm

@geraldss,

we have production data in mysql in that we transfer 1 million records in couchbase 4.5.1,
so input data must be change with different user .

so suppose our requirement is like below:
1.Find AVg of Score and number of Sentiments, No:Document for a physician(for each personId )using
mentiontime between A and B, location id E and SourceId F and Multiple category id G and Client H.

so two condition for persionId
a. personId >0(data populate in some seconds)
b.persionId=0(data populate in milli seconds).

What is pretty=false,and max_parallelism,will it help me to reduce the execution time.
give me some example so i tried now.

geraldss · October 20, 2016, 9:38pm

Use cbq shell.

\SET -max-parallelism 16;
\SET -pretty false;

@isha

Topic		Replies	Views
Slow N1QL performance on covered queries SQL++ query , n1ql , index	5	2141	August 8, 2017
N1ql - cover index is way slower than expected SQL++	2	1147	May 24, 2017
Very low performance of an indexed N1QL query SQL++ n1ql	6	3601	January 20, 2016
Very slow performance on query without index on simple documents SQL++	9	10819	July 26, 2016
Perfomance issues SQL++ query	7	2525	January 31, 2016

N1Ql with cover index performance very slow

Related topics