Indexing for Aggregation of Array data

Hi,

I’m struggling to get the best query plan for the following, this is currently taking roughly 10 seconds to run which is not great.

I have a document such as:

{
  "type": "scores",
  "CustomerName": "dc8fb0ed-78c6-48f9-acc1-cdca1c5e823f",
  "CustomerCompanies": [
    "95ed7060-99da-447f-b593-0ccd2c810a7d",
    "7829d0d5-67ff-4adf-81b7-7f7455e04d29",
    "b1d2fd38-0011-49fa-a2b5-87291f652d49",
  ],
  "dateTime": "2020-01-16T12:02:45Z",
  "kpi": {
     "Kpi"{ 
	 "name":"first",
      "rawValue": 0,
      "score": 5,
      "duration": 350
    },
     "Kpi":{
		"name":""second",
      "rawValue": 0,
      "score": 1,
      "duration": 297
    },
    "Kpi"{ 
	 "name":"third",
      "rawValue": 0,
      "score": 6,
      "duration": 297
    },
  "stats": {
    "totaltimespent": 297,
    "totaltimenotworking": 297
  }

I’m looking to do a query to provide me with the score * the duration of each KPI across a particular customer company, like:

SELECT 
    SUM(d.firstScore * d.firstDuration) as firstResult,
	SUM(d.secondScore * d.secondDuration) as secondResult
FROM (
	SELECT 
	   SUM(CASE WHEN kpi.name = "first" THEN kpi.score ELSE 0 END) as firstScore,
	   SUM(CASE WHEN kpi.name = "first" THEN kpi.duration ELSE 0 END) as firstDuration,
   	   SUM(CASE WHEN kpi.name = "second" THEN kpi.score ELSE 0 END) as secondScore,
   	   SUM(CASE WHEN kpi.name = "second" THEN kpi.duration ELSE 0 END) as secondDuration
  	FROM `customer_scores` a
   	   UNNEST kpis kpi
   	WHERE kpi.name IS NOT NULL
 	AND a.type = "scores"
 	AND ANY o IN a.CustomerCompanies SATISFIES o = "ff2d1082-5063-4a69-8010-0d377755c5cb" END
 	AND a.dateTime > '2020-01-01' AND a.dateTime <= '2020-01-31'
 ) d

I have about 10,000 records in the bucket.

index I’ve got so far:

CREATE INDEX `customer_scores_kpis`
ON `customer_scores`((all (array [(`kpi`.`name`), (`kpi`.`score`), (`kpi`.`duration`)] for `kpi` in `kpis` end)))
WHERE ((`Type` = "scores") and ((`kpi`.`name`) is not null))

and

CREATE INDEX `customer_scores_companies` 
ON `customer_scores`((distinct (array (`o`.`CustomerCompany`) for `o` in `CustomerCompanies` end)),`dateTime`) 
WHERE (`Type` = "scores")

Any help people can provide would be greatly appreciated.

Thank you.

If your kpis array is small you can use covering index.

CREATE INDEX ix1 ON customer_scores(DISTINCT CustomerCompanies, dateTime, ARRAY [kpi.name, IFMISSINGORNULL(kpi.score,0) * IFMISSINGORNULL(kpi.duration,0)] FOR kpi IN kpis END ) WHERE type = "scores";

SELECT (OBJECT v.name:v.val FOR v IN results END).*
LET results = ( SELECT kpi[0] AS name, SUM(kpi[1]) AS val
                FROM ( SELECT ARRAY [kpi.name, IFMISSINGORNULL(kpi.score,0) * IFMISSINGORNULL(kpi.duration,0)]
                              FOR kpi IN kpis END AS kpis
                       FROM customer_scores AS a
                       WHERE a.type = "scores"
                             AND ANY o IN a.CustomerCompanies SATISFIES o = "ff2d1082-5063-4a69-8010-0d377755c5cb" END
                             AND a.dateTime > "2020-01-01" AND a.dateTime <= "2020-01-31"
                    ) AS d UNNEST d.kpis AS kpi
                WHERE kpi[0] IS NOT NULL
                GROUP BY kpi[0]) ;

Non covering index

CREATE INDEX ix2 ON customer_scores(DISTINCT CustomerCompanies, dateTime) WHERE type = "scores";

SELECT (OBJECT v.name:v.val FOR v IN results END).*
LET results = (SELECT kpi.name, SUM(IFMISSINGORNULL(kpi.score,0) * IFMISSINGORNULL(kpi.duration,0)) AS val
               FROM customer_scores AS a
               UNNEST a.kpis AS kpi
               WHERE a.type = "scores"
                     AND ANY o IN a.CustomerCompanies SATISFIES o = "ff2d1082-5063-4a69-8010-0d377755c5cb" END
                     AND a.dateTime > "2020-01-01" AND a.dateTime <= "2020-01-31"
                     AND kpi.name IS NOT NULL
               GROUP BY kpi.name);
1 Like

This looks great, thank you very much this will guide me in moving this forward!