Array operation slow

Hi,
I have table purchases with such structure:

SELECT VALUE a 
FROM purchases a
LIMIT 2
[
	{
		"type": "type_x",
		"customer": "c4211995-e113-4cf1-98cb-93bb06b5847e",
		"seller": "44ca450c-tim4-7856-boa3-d60ae9eb7ade",
		"price": 13
	},
	{
		"type": "type_x",
		"customer": "c4214985-e419-4cf1-98cb-93bb06b5847e",
		"seller": "44ca450c-ffb4-4626-b1a3-d60ae9eb7ade",
		"price": 12
	}
]

To speed up this query :

SELECT 
	customer, 
	seller,
	SUM(price),
	count(*)
FROM purchases
WHERE customer IS NOT MISSING
	and seller IS NOT MISSING
GROUP BY
	customer, 
	seller

I use such index :

CREATE INDEX adv_customer_seller_price ON `default`:`purchases`(`customer`,`seller`,`price`)

Without index query is completed in 16.1s, then with index it is completed in 358.2ms

Then there is a similar structure, only now data is in an array:

SELECT VALUE a 
FROM purchases_array a
LIMIT 2
[
	{
		"arr": [
			{
				"type": "type_x",
				"customer": "c4214985-e419-4cf1-98cb-93bb06b5847e",
				"seller": "44ca450c-ffb4-4626-b1a3-d60ae9eb7ade",
				"price": 12
			},
			{
				"type": "type_x",
				"customer": "c4214985-e419-4cf1-98cb-93bb06b5847e",
				"seller": "44ca450c-ffb4-4626-b1a3-d60ae9eb7ade",
				"price": 12
			}
		]
	},
	{
		"arr": [
			{
				"type": "type_x",
				"customer": "c4214985-e419-4cf1-98cb-93bb06b5847e",
				"seller": "44ca450c-ffb4-4626-b1a3-d60ae9eb7ade",
				"price": 12
			},
			{
				"type": "type_x",
				"customer": "c4214985-e419-4cf1-98cb-93bb06b5847e",
				"seller": "44ca450c-ffb4-4626-b1a3-d60ae9eb7ade",
				"price": 12
			}
		]
	}
]

I need to speed up this query:

SELECT a.customer,
       a.seller,
       SUM(a.price),
       COUNT(*)
FROM purchases_array p
UNNEST p.arr a
WHERE a.customer IS NOT MISSING
    and a.seller IS NOT MISSING
GROUP BY a.customer,
         a.seller

It runs in 8.2s without an index.

Index Advisor suggests such index:

CREATE INDEX adv_ALL_arr_seller_customer ON `default`:`purchases_array`(
	ALL ARRAY FLATTEN_KEYS(`a`.`seller`,`a`.`customer`) FOR a IN `arr` END
)

But this index did not speed up the query.

I also tried such index:

CREATE INDEX adv_ALL_arr_seller_customer ON `default`:`purchases_array`(
	ALL ARRAY FLATTEN_KEYS(`a`.`seller`,`a`.`customer`,`a`.`price`) FOR a IN `arr` END
)

with such query:


SELECT a.customer,
       a.seller,
       SUM(a.price),
       COUNT(*)
FROM purchases_array p
UNNEST p.arr a
WHERE a.customer IS NOT MISSING
    and a.seller IS NOT MISSING
    and a.price IS NOT MISSING
GROUP BY a.customer,
         a.seller

It didn’t help either.

Is there a way to speed up the query, and preferably as fast as in the case of an expanded structure (as in purchases)?

If it is okay then use IS NOT NULL

CREATE INDEX adv_ALL_arr_seller_customer ON `default`( ALL ARRAY FLATTEN_KEYS(`a`.`seller`,`a`.`customer`,`a`.`price`) FOR a IN `arr` END);

SELECT a.customer, a.seller, SUM(a.price), COUNT(1)
FROM default p
UNNEST p.arr a
WHERE a.seller IS NOT NULL
GROUP BY a.customer, a.seller;

Query predicate has unknowns i.e NULL or MISSING Array index will not cover because from index entry it will not able to derive array (arr) , expression (a.seller, a.customer, a.price) i is MISSING or NULL
UNNEST p.arr requires arr Must be ARRAY

MB-49110 has examples why it can’t cover

1 Like