N1Ql with cover index performance very slow

It should be a few days.

Hi @geraldss,

i’m facing a unique case ,that the index i make in explain it shows up but if i use “use index(xx)” data won’t populate and with other index it populate.

CREATE DEFINITION
create index idx7 on default (SOURCE_ID,PERSON_ID,LOCATION_ID,(distinct array c.QUESTION_ID FOR c IN CONTENTS when c.SENTIMENTS IS MISSING END ))WHERE t.TYPE="DOCUMENT"
AND t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<‘2015-01-01’;

QUERY is
SELECT t.ID FROM default t
WHERE t.TYPE="DOCUMENT"
AND t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<'2015-01-01’
and t.SOURCE_ID=258
and t.PERSON_ID=0
AND t.LOCATION_ID=11931 and
( ANY c IN CONTENTS SATISFIES c.QUESTION_ID=1073 and c.SENTIMENTS IS MISSING END) AND any client in t.CLIENT satisfies client=“retailpsi” end;

EXPLAIN:
[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IntersectScan”,
“scans”: [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “idx7”,
“index_id”: “1702691442b51b29”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“258”,
“0”,
“11931”,
“1073”
],
“Inclusion”: 3,
“Low”: [
“258”,
“0”,
“11931”,
“1073”
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “idx_client”,
“index_id”: “f7e9a9cccb98083d”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“retailpsi”"
],
“Inclusion”: 3,
“Low”: [
"“retailpsi”"
]
}
}
],
“using”: “gsi”
}
}
]
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“as”: “t”,
“keyspace”: “default”,
“namespace”: “default”
},
{
"#operator": “Filter”,
“condition”: “(((((((((t.TYPE) = “DOCUMENT”) and (“2001-01-01” <= (t.MENTION_TIME))) and ((t.MENTION_TIME) < “2015-01-01”)) and ((t.SOURCE_ID) = 258)) and ((t.PERSON_ID) = 0)) and ((t.LOCATION_ID) = 11931)) and any c in (t.CONTENTS) satisfies (((c.QUESTION_ID) = 1073) and ((c.SENTIMENTS) is missing)) end) and any client in (t.CLIENT) satisfies (client = “retailpsi”) end)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “(t.ID)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “SELECT t.ID FROM default t\n WHERE t.TYPE=“DOCUMENT”\n AND t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<‘2015-01-01’ \n and t.SOURCE_ID=258 \nand t.PERSON_ID=0\nAND t.LOCATION_ID=11931 and\n( ANY c IN CONTENTS SATISFIES c.QUESTION_ID=1073 and c.SENTIMENTS IS MISSING END) AND any client in t.CLIENT satisfies client=“retailpsi” end;”
}
]

could you help me out here

Hi @rkumar,

Can you upgrade to 4.5.1 so we can continue.

@geraldss,

That date in Staging server,so directly we cannot update to 4.5.1.might be fews days i will upgrate.
But is there any way that i can directly upgrate to 4.5.1 without reinstalling or moving the data.
even thou syntax is change in 4.5.1 i guess.

But for above scenario the problem is not COUNT ,the problem with indexing not capture the query requirement.
i need urgently solution for that.plese

Thanks!

Ok, let’s try a covering index. What is the execution time?

CREATE INDEX idx5 ON default
(PERSON_ID, LOCATION_ID, MENTION_TIME, ID, CONTENTS)
WHERE MENTION_TIME>='2001-01-01' AND TYPE="DOCUMENT";

EXPLAIN SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),count(t.ID)
FROM default t use index(idx5)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE="DOCUMENT"
AND t.PERSON_ID=143124 AND t.LOCATION_ID=5204 AND s.CATEGORY_ID IN [4000,4001,4002,4003,4004,4005,4010,4011,4012,4013]
and t.MENTION_TIME >='2001-01-01' and t.MENTION_TIME<'2015-01-01' ;

Hey @geraldss,

I think you pick the wrong one.
Again i repete the whole thing for you,so that it will be clear for you .

Problem: i have a query and for that i create index to satisfied the requirement.
but it will not take that index,and by forcing index with keyword use index(),it wont populate the data.
Query:
SELECT round(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),count(distinct t.ID),t.PERSON_ID
FROM default t use index(idx8)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE=“DOCUMENT”

and t.SOURCE_ID=246
AND t.LOCATION_ID=2286
and c.QUESTION_ID=969
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,4036,4037,4038,4039,4040]
and t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<'2015-01-01’and any client in t.CLIENT satisfies client =“novamed” end group by (t.PERSON_ID) ;

Index Definition:
CREATE INDEX idx8 ON default(LOCATION_ID,SOURCE_ID,PERSON_ID>0) WHERE ((“2001-01-01” <= MENTION_TIME) and (TYPE = “DOCUMENT”))

Time for Execution: 8+ seconds with 1M records.
I dont know what is the problem,but it sould not take this much time.
Please guide me where m doing wrong.

Ok, I will take a look. Meanwhile, can you run the index5 and query that I just sent you. It will help me figure things out. Please post the execution time.

Hi @geraldss,

It take huge time to buildup,after this much time only 5% ,so it will take complete night to build ,so by tomorrow iwill update you on this ,apart of that i need to optimized that above problem.

Why do you have PERSON_ID > 0 in your CREATE INDEX?

@geraldss,

we try to create a index which give me low latency,in that exercise i did that,but not sure its the best condition.
Our condition is that person_id must be greater the 0,so i put in index.

Regards,
rahul

Here you go.

CREATE INDEX idx8 ON default(LOCATION_ID,SOURCE_ID,PERSON_ID)
WHERE (("2001-01-01" <=MENTION_TIME) and (TYPE = "DOCUMENT"));


SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),COUNT(DISTINCT t.ID),t.PERSON_ID
FROM default t USE INDEX(idx8)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE="DOCUMENT"
AND t.SOURCE_ID=246
AND t.LOCATION_ID=2286
AND c.QUESTION_ID=969
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,\
4036,4037,4038,4039,4040]
AND t.MENTION_TIME >='2001-01-01' AND t.MENTION_TIME<'2015-01-01' AND ANY client IN t.CLIENT SATISFIES client ="novamed" END
GROUP BY (t.PERSON_ID);

Hi Geraldss,

Thank you for above help.but some scenario came into picture that is below.

  1. If using filter if data didn’t populate ,execution time is in MS() milli seconds
  2. if data populate using filter value have two scene.
    2a. If personId is zero,then only execution time is in ms,
    2b. if personId is not zero,they took 2-3 seconds to execute.

Apart of that i have some more question
1.we are using 16gb ram memory with single node,so if we increase the node and distribute the data over the node,execution time will have impact and how.

Thanks & Regards,
rahul

For 2b, send the EXPLAIN plan.

Your memory, 16gb, sounds low.

Hey @geraldss,

Below is what you asked:
Note: 1. i used index idx8,which you mention above.
2. i upgrade to CB v 4.5.1

explain SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),COUNT(DISTINCT t.ID),t.PERSON_ID
FROM default t use index(idx8)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE="DOCUMENT"
AND t.SOURCE_ID=246
AND t.LOCATION_ID=2286
AND c.QUESTION_ID=969
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,
4036,4037,4038,4039,4040]
AND t.MENTION_TIME >=‘2001-01-01’ AND t.MENTION_TIME<‘2015-01-01’ AND ANY client IN t.CLIENT SATISFIES client =“novamed” END
GROUP BY (t.PERSON_ID);

EXPLAIN

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “idx8”,
“index_id”: “fcedf99f578eb9ce”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“2286”,
“successor(246)”
],
“Inclusion”: 1,
“Low”: [
“2286”,
“246”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“as”: “t”,
“keyspace”: “default”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “c”,
“expr”: “(t.CONTENTS)”
},
{
"#operator": “Unnest”,
“as”: “s”,
“expr”: “(c.SENTIMENTS)”
},
{
"#operator": “Filter”,
“condition”: “(((((((((t.TYPE) = “DOCUMENT”) and ((t.SOURCE_ID) = 246)) and ((t.LOCATION_ID) = 2286)) and ((c.QUESTION_ID) = 969)) and ((s.CATEGORY_ID) in [4001, 4002, 4003, 4004, 4005, 4010, 4011, 4012, 4013, 4014, 4015, 4016, 4017, 4018, 4019, 4020, 4021, 4022, 4023, 4024, 4025, 4026, 4027, 4028, 4029, 4030, 4031, 4032, 4033, 4034, 4035, 4036, 4037, 4038, 4039, 4040])) and (“2001-01-01” <= (t.MENTION_TIME))) and ((t.MENTION_TIME) < “2015-01-01”)) and any client in (t.CLIENT) satisfies (client = “novamed”) end)”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“avg((s.SCORE))”,
“count((c.SENTIMENTS))”,
“count(distinct (t.ID))”
],
“group_keys”: [
"(t.PERSON_ID)"
]
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
“avg((s.SCORE))”,
“count((c.SENTIMENTS))”,
“count(distinct (t.ID))”
],
“group_keys”: [
"(t.PERSON_ID)"
]
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“avg((s.SCORE))”,
“count((c.SENTIMENTS))”,
“count(distinct (t.ID))”
],
“group_keys”: [
"(t.PERSON_ID)"
]
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “round(avg((s.SCORE)), 1)”
},
{
“expr”: “count((c.SENTIMENTS))”
},
{
“expr”: “count(distinct (t.ID))”
},
{
“expr”: “(t.PERSON_ID)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “SELECT ROUND(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),COUNT(DISTINCT t.ID),t.PERSON_ID\nFROM default t use index(idx8)\nUNNEST t.CONTENTS c\nUNNEST c.SENTIMENTS s\nWHERE t.TYPE=“DOCUMENT”\nAND t.SOURCE_ID=246\nAND t.LOCATION_ID=2286\nAND c.QUESTION_ID=969\nAND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,\\n4036,4037,4038,4039,4040]\nAND t.MENTION_TIME >=‘2001-01-01’ AND t.MENTION_TIME<‘2015-01-01’ AND ANY client IN t.CLIENT SATISFIES client =“novamed” END\nGROUP BY (t.PERSON_ID);”
}
]

EXECUTION TIME: 8+ seconds
Regards,
rahul

Please try each of these indexes separately, using a USE INDEX clause for each.

CREATE INDEX idx_question ON default
( DISTINCT ARRAY c.QUESTION_ID FOR c IN CONTENTS END )
WHERE TYPE = "DOCUMENT" AND MENTION_TIME >= "2001-01-01";

CREATE INDEX idx_category ON default
( DISTINCT ARRAY ( DISTINCT ARRAY s.CATEGORY_ID FOR s IN c.SENTIMENTS END ) FOR c IN CONTENTS END )
WHERE TYPE = "DOCUMENT" AND MENTION_TIME >= "2001-01-01";

CREATE INDEX idx_client ON default
( DISTINCT ARRAY client FOR client IN CLIENT END )
WHERE TYPE = "DOCUMENT" AND MENTION_TIME >= "2001-01-01";

Hi @geraldss,

I did the same what you mention above ,but only idx_question index is seem to working with 6+ seconds,but with combination with idx8 &,idx_question its give me 3.6 seconds.

CREATE INDEX idx8 ON default(LOCATION_ID,SOURCE_ID,PERSON_ID)
WHERE ((“2001-01-01” <=MENTION_TIME) and (TYPE = “DOCUMENT”));

SELECT round(AVG( s.SCORE),1),COUNT(c.SENTIMENTS),count(distinct t.ID),t.PERSON_ID
FROM default t use index(idx8,idx_question)
UNNEST t.CONTENTS c
UNNEST c.SENTIMENTS s
WHERE t.TYPE="DOCUMENT"
and t.SOURCE_ID=289
AND t.LOCATION_ID=3600
and c.QUESTION_ID=1347
AND s.CATEGORY_ID IN [4001,4002,4003,4004,4005,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4029,4030,4031,4032,4033,4034,4035,4036,4037,4038,4039,4040]
and t.MENTION_TIME >=‘2001-01-01’ and t.MENTION_TIME<'2015-01-01’and any client in t.CLIENT satisfies client =“retailpsi” end group by (t.PERSON_ID) ;

i tried below index as well:
Index: CREATE INDEX idx_for_phy ON default(LOCATION_ID,SOURCE_ID,(distinct (array client for client in CLIENT end))) WHERE ((“2001-01-01” <= MENTION_TIME) and (TYPE = “DOCUMENT”))

but still with different filer data i’m getting different execution time.

we are working with hard code data with 1 million and we have 30 million of records,will it worth it.

Please help us.
Regrdas,
Rahul

Hi @vsr1 @keshav_m, please help @rkumar.

He needs pretty=false, max_parallelism, and index selection.

Rahul, one other thing. Your test data should have the same distribution as realistic production data. You should not generate test data with all the same values, for example.

@geraldss,

we have production data in mysql in that we transfer 1 million records in couchbase 4.5.1,
so input data must be change with different user .

so suppose our requirement is like below:
1.Find AVg of Score and number of Sentiments, No:Document for a physician(for each personId )using
mentiontime between A and B, location id E and SourceId F and Multiple category id G and Client H.

so two condition for persionId
a. personId >0(data populate in some seconds)
b.persionId=0(data populate in milli seconds).

What is pretty=false,and max_parallelism,will it help me to reduce the execution time.
give me some example so i tried now.

Use cbq shell.

\SET -max-parallelism 16;
\SET -pretty false;

@isha