Indexing for Aggregation of Array data

wchandler · January 20, 2020, 5:14pm

Hi,

I’m struggling to get the best query plan for the following, this is currently taking roughly 10 seconds to run which is not great.

I have a document such as:

{
  "type": "scores",
  "CustomerName": "dc8fb0ed-78c6-48f9-acc1-cdca1c5e823f",
  "CustomerCompanies": [
    "95ed7060-99da-447f-b593-0ccd2c810a7d",
    "7829d0d5-67ff-4adf-81b7-7f7455e04d29",
    "b1d2fd38-0011-49fa-a2b5-87291f652d49",
  ],
  "dateTime": "2020-01-16T12:02:45Z",
  "kpi": {
     "Kpi"{ 
	 "name":"first",
      "rawValue": 0,
      "score": 5,
      "duration": 350
    },
     "Kpi":{
		"name":""second",
      "rawValue": 0,
      "score": 1,
      "duration": 297
    },
    "Kpi"{ 
	 "name":"third",
      "rawValue": 0,
      "score": 6,
      "duration": 297
    },
  "stats": {
    "totaltimespent": 297,
    "totaltimenotworking": 297
  }

I’m looking to do a query to provide me with the score * the duration of each KPI across a particular customer company, like:

SELECT 
    SUM(d.firstScore * d.firstDuration) as firstResult,
	SUM(d.secondScore * d.secondDuration) as secondResult
FROM (
	SELECT 
	   SUM(CASE WHEN kpi.name = "first" THEN kpi.score ELSE 0 END) as firstScore,
	   SUM(CASE WHEN kpi.name = "first" THEN kpi.duration ELSE 0 END) as firstDuration,
   	   SUM(CASE WHEN kpi.name = "second" THEN kpi.score ELSE 0 END) as secondScore,
   	   SUM(CASE WHEN kpi.name = "second" THEN kpi.duration ELSE 0 END) as secondDuration
  	FROM `customer_scores` a
   	   UNNEST kpis kpi
   	WHERE kpi.name IS NOT NULL
 	AND a.type = "scores"
 	AND ANY o IN a.CustomerCompanies SATISFIES o = "ff2d1082-5063-4a69-8010-0d377755c5cb" END
 	AND a.dateTime > '2020-01-01' AND a.dateTime <= '2020-01-31'
 ) d

I have about 10,000 records in the bucket.

index I’ve got so far:

CREATE INDEX `customer_scores_kpis`
ON `customer_scores`((all (array [(`kpi`.`name`), (`kpi`.`score`), (`kpi`.`duration`)] for `kpi` in `kpis` end)))
WHERE ((`Type` = "scores") and ((`kpi`.`name`) is not null))

and

CREATE INDEX `customer_scores_companies` 
ON `customer_scores`((distinct (array (`o`.`CustomerCompany`) for `o` in `CustomerCompanies` end)),`dateTime`) 
WHERE (`Type` = "scores")

Any help people can provide would be greatly appreciated.

Thank you.

vsr1 · January 20, 2020, 9:25pm

If your kpis array is small you can use covering index.

CREATE INDEX ix1 ON customer_scores(DISTINCT CustomerCompanies, dateTime, ARRAY [kpi.name, IFMISSINGORNULL(kpi.score,0) * IFMISSINGORNULL(kpi.duration,0)] FOR kpi IN kpis END ) WHERE type = "scores";

SELECT (OBJECT v.name:v.val FOR v IN results END).*
LET results = ( SELECT kpi[0] AS name, SUM(kpi[1]) AS val
                FROM ( SELECT ARRAY [kpi.name, IFMISSINGORNULL(kpi.score,0) * IFMISSINGORNULL(kpi.duration,0)]
                              FOR kpi IN kpis END AS kpis
                       FROM customer_scores AS a
                       WHERE a.type = "scores"
                             AND ANY o IN a.CustomerCompanies SATISFIES o = "ff2d1082-5063-4a69-8010-0d377755c5cb" END
                             AND a.dateTime > "2020-01-01" AND a.dateTime <= "2020-01-31"
                    ) AS d UNNEST d.kpis AS kpi
                WHERE kpi[0] IS NOT NULL
                GROUP BY kpi[0]) ;

Non covering index

CREATE INDEX ix2 ON customer_scores(DISTINCT CustomerCompanies, dateTime) WHERE type = "scores";

SELECT (OBJECT v.name:v.val FOR v IN results END).*
LET results = (SELECT kpi.name, SUM(IFMISSINGORNULL(kpi.score,0) * IFMISSINGORNULL(kpi.duration,0)) AS val
               FROM customer_scores AS a
               UNNEST a.kpis AS kpi
               WHERE a.type = "scores"
                     AND ANY o IN a.CustomerCompanies SATISFIES o = "ff2d1082-5063-4a69-8010-0d377755c5cb" END
                     AND a.dateTime > "2020-01-01" AND a.dateTime <= "2020-01-31"
                     AND kpi.name IS NOT NULL
               GROUP BY kpi.name);

wchandler · January 22, 2020, 8:25am

This looks great, thank you very much this will guide me in moving this forward!

Topic		Replies	Views
Indexing & Querying Array for Aggregation SQL++ n1ql , index , arrays , array-query	14	1750	January 29, 2020
Array operation slow SQL++ n1ql	1	591	September 3, 2022
Index specific items inside an array not working SQL++	6	796	February 14, 2019
Covering Index with performance using select query SQL++ n1ql	3	1738	August 29, 2016
N1ql index performance is different on same query SQL++ query , n1ql , index	3	848	March 30, 2020

Indexing for Aggregation of Array data

Related topics