Very much interested in resolving issue i tried below option also but no gain in time
ID1
{
groups[“1”,“2”,‘3’]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
numbericvalue1:585,
numericvalue2…numericvalue60
}
since both groups and users are array i created only in one (groups)
select META().id from default where any grp in groups satisfies grp=1 End limit 30 (Good reponse)
select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30 (Order by Adds slowness)
select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,‘USER1’) order by {any field} limit 30 (Very very slow)
At present if the query has JOIN clause it will not use index order.
create index such that left side of JOIN can push as many constant predicates to indexer and if possible to make it covered.
CREATE INDEX idx_totturnover ON test(groupid, type, totturnover, entity, user );
EXPLAIN SELECT META(a).id
FROM test a JOIN test b ON KEYS (a.entity)
WHERE a.groupid=1
AND a.type='FIRST_FILTER'
AND a.totturnover IS NOT MISSING
AND b.type='SECOND_TYPE'
AND ARRAY_CONTAINS(b.users,a.user)
ORDER BY a.totturnover ;
select META().id from default where any grp in groups satisfies grp=1 End limit 30;
Above query doesn’t use order by so when first 30 documents are satisfied the query stops.
select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30
Above query has order by and limit 30. We need to get all the qualified documents before we decide 30 documents. If the number of qualified documents are more it takes more time.
select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,'USER1') order by {any field} limit 30
In addition to order by and limit you have extra predicate, which almost iterating the array for each qualified documents. It takes more time.
You can try the following.
CREATE INDEX idx100 ON default(DISTINCT ARRAY u FOR u IN users END);
select META().id from default where any u in users satisfies u = "USER1" End and ARRAY_CONTAINS(grp,1) order by {any field} limit 30
Also post details on.
Number of documents in bucket
Number of distinct groups and number of documents in each group
number of distinct users and number of documents each user
There are 800 to 900 users per doc so creating index on users further slows down search and ordering
Order by currently is very very costly operation even on 100K docs
below is details
Number of documents in bucket :: 200K
Number of distinct groups and number of documents in each group :: 40 odd groups and 100K docs satifying highest match
number of distinct users and number of documents each user :: tot users 1500 , 100K documents per users within 200K
The query predicates qualify 50% of the documents on top of that you are predicates searches in big arrays.
Try using max_parallelism, increase query service REST api parameters pipeline-batch, pipeline-cap parameters.
There are only so many ways to improve sort performance. Best sort performance is to avoid it totally.
Here is a way to improve your performance.
Use covering index to fetch the documents you need, in the order you need.
You can THEN do the filtering and LIMIT on top of this.
This will avoid the ORDER BY, and FETCH of the whole 200K documents.
create index idx1 ON default(field1, field2, field3);
select *
from (select * from default order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;
Added where clause for keshav_m post to pick the index
select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;
below query looks to have bug limit is getting applied on inner select statement rather on external select
select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;
Could you try this.
sselect * from default where field1 is not missing AND “USER1” IN users AND 1 IN grp ORDER BY field1 limit 30;
OR add limit (huge value) to inner query
Tried suggestion of adding more limit in inner select
looks another issue or bug
select *
from (select * from default where field1 is not missing order by field1 limit 200000) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30 offset 50000;
covering index does not return sortCount in n1qlmetrics for pagination
now if i have to use count to get data matching filter then goes back to same point
adding offset adds slowness
one more point all data is in Ram then what is the requirement of disk fetch for any join or sorting and i confirm all my data seating in Ram
not giving sortCount in metrics is not a bug or issue. If the top level query doesn’t use order by (If query uses index order the sortCount will be same as limit because we don’t look how many entries qualified) it will not give the sortCount (it is json).
For pagination, If sortCount is missing, if resultCount < limit there are no more records else more records might be there.
limit 30 offset 50000 - query service needs to process 50,030 records even though it gives 30 to client.
Data is in RAM on DataNode. Query Service needs to get the information from Index node, Data node to process the query.
Provide the exact index definitions and EXPLAIN output of the both queries.
Also could you please check the queries. It is not valid ANY syntax.
select meta().id, test and find out what are those 6 documents missing and see what are the values of grps and users and why it eliminated.
may not be able to post data have restriction for this, but if you try with sample documents in ur enviroment same issue will result sorry not able share my data