Array Index problem

Hi Folk,

I got an Nested Array Index problem. I followed the reference and blog, but no luck. Need help from expertise.

Doc Structure:

{
	"type": "data",
	"name":"abc",
	"dc":"a1",
	...
	"Server": [
		{
			"Name": "webextsrv1",
			"Port": 1234,
			...
		},
		{
			...
		},
		...
	],
	"Cluster": [
		{
			"Name": "webext",
			"ClusterAddress": "x.x.x.x",
		},
		...
	],
	"Application": [
		{
			"Name": "myweb",
			"Version": "1.0",
			"Order": 100,
			...
		},
		...
	],
	...
}

Index:

CREATE INDEX `data-server1` ON `EM-NP`(ALL ARRAY a.Name FOR a IN Application END, Application)
WHERE (`type` = "data")

CREATE INDEX `data-server2` ON `EM-NP`(ALL Application)
WHERE (`type` = "data")

Count query:

select count(*)
from `EM-NP` e
USE INDEX (`data-serverX`)
UNNEST e.`Application` a
WHERE e.`type` = "data" AND a.Name like '%'  <== this is required in 4.5 to enable index.

Paged data query:

select meta(e).id as data_id, 
e.`name`, e.dc, ...
t.ownerteam, t.supportteam, t.dmz, t.enabled, 
a.* 
from `EM-NP` e 
USE INDEX (`data-serverX`)
LEFT join `EM-NP` t ON KEYS e.spec_id 
UNNEST e.`Application` a 
where e.`type` = 'data' AND a.`Name` like '%'
ORDER BY e.dc asc 
OFFSET 0 LIMIT 25

Problem 1, for index data-server1:

For count query, query time looks good, but count is incorrect, much less than it should be. E.g: 5000 vs 25000 (should be)

For data query: index is being used, however, when I request data after 5000, it resturns []

Problem 2, for index data-server2:

For both query, it seems index is not being used, however, count query speed is acceptable.
For data query, it take more than 1 minute to return data.

I also found a funny thing: if I change the query to this, it is faster (half of the time, 30+ seconds)

select meta(e).id as data_id, 
e.*,
t.*, 
a.* 
from `EM-NP` e 

I tried both indices in couchbase 4.5 and 5.0 community edition, same issue.

All I want is some indices for nested arrays in each docs. I have about 2000 docs, each doc contains a bunch of arrays of data different type, such as Cluster, Application, Server… I need return a page of type data based on dynamic query conditions. Everything works fine, except the query execution time is not acceptable.

Appreciate for any help.

Allen

Check your indexer.log make sure there are no skipped items due to index key size.
a.Name like ‘%’ means you want all the Name that are string. Is that is your use case or Are you looking something a.Name like ‘Prefix%’.

CREATE INDEX `data-server1` ON `EM-NP`(ALL ARRAY a.Name FOR a IN Application END) WHERE (`type` = "data")
SELECT SUM(ARRAY_COUNT(ARRAY 1 FOR v IN e.Application WHEN v.Name LIKE '%' END))
FROM `EM-NP` e WHERE e.`type` = "data" AND ANY a IN e.Application SATISFIES a.Name LIKE '%' END;

CREATE INDEX `data-server1c` ON `EM-NP`(ALL ARRAY a.Name FOR a IN Application END, Application) WHERE (`type` = "data")

data-server1c index makes cover but if the array is too big pre-5.0 it may skip indexing the document.

data-server2 index doesn’t qualify because the predicate needs to be matched with array index key. In your case it is o
bject so you need to give a = {“Name”…}

data query involves JOIN/UNNEST i.e 2 JOINS and the documents can be multiplier that is why it taking time

1 Like

You are right. There are lots skipped items.

If I want to keep the doc structure, is there any known work-around?

All my code are based on this structure.

Regards,

Allen

In 5.0.0 you should not have problem (no index key size limit or it might be very large number) .
Pre 5.0.0 remove Application as index key. This will uses non covered can (i.e fetch the document)

CREATE INDEX `data-server1` ON `EM-NP`(ALL ARRAY a.Name FOR a IN Application END)
WHERE (`type` = "data");

Based on COUNT query the following performs better. UNNEST acts as like JOIN. In your case you need only count not actual unnested document the following works better.

SELECT SUM(ARRAY_COUNT(ARRAY 1 FOR v IN e.Application WHEN v.Name LIKE ‘%’ END))
FROM EM-NP e WHERE e.type = “data” AND ANY a IN e.Application SATISFIES a.Name LIKE ‘%’ END;