Okay
Can you please add some option for DESC order in CB 4.5 as DESC is very important
As current implementaion of DESC order by is very slow for 30K records taking almost 700ms
Okay
Can you please add some option for DESC order in CB 4.5 as DESC is very important
As current implementaion of DESC order by is very slow for 30K records taking almost 700ms
In my case ORDER BY is very very slow . Kindly suggest something I want to apply it on Date column which format is “timestamp”: “2016-04-14 19:03:05”. Should I apply index or change date format or any thing else .
I also tried for SELECT *, meta(DEFAULT).id id FROM Test_mysql_import
ORDER BY id LIMIT 5 OFFSET 0;
but no result no error up to 180 sec.
Thanks in advance.
HI @geraldss
4.5 EE still not suitable for data where every field is sortable as adding all fields in covering index not possible
order by is very very slow if query not using covering index its very worst for ordering 70K records no where near to RDBMS
One more observation joins even worst as right side only uses covered index left side its doing key value fetch and if we put order by along with join its more worst
Are you using index order? Ascending or descending? If you post your queries, we can help identify the right set of indexes. For ordering, they do not necessarily need to be covering indexes.
hi @geraldss
here is my query
INDEX is on groupid,user,entity
select meta(a).id from test a join test b on keys (a.entity) where a.groupid=1 and a.type=‘FIRST_FILTER’ and b.type=‘SECOND_TYPE’ and ARRAY_CONTAINS(b.users,a.user) order by a.totturnover
i have 60 attributes on alias a test which are sortable (tot records left side matching groupid=1 70K)
Try the following exactly.
CREATE INDEX idx_totturnover ON test( groupid, type, totturnover );
EXPLAIN SELECT META(a).id
FROM test a JOIN test b ON KEYS (a.entity)
WHERE a.groupid=1
AND a.type='FIRST_FILTER'
AND a.totturnover IS NOT MISSING
AND b.type='SECOND_TYPE'
AND ARRAY_CONTAINS(b.users,a.user)
ORDER BY a.groupid, a.type, a.totturnover
;
Hi @krishnasahu2016,
You are not providing enough information. You say that all the queries and indexes have the same performance, regardless of the query and index definitions.
(1) That is not likely, unless there is something wrong with your environment.
(2) Just saying everything is slow does not provide any information we can build on.
Can you run the exact index and query I sent. Can you post the output of EXPLAIN. Can you post the metrics from running the query.
Hi @geraldss
My working environment does not allow me to load data on any forum
But pls try below in ur working env you will face slowness
ID1 (like this with groupid 1 having 100K documents)
{
groupid:1
type:‘FILTER_1’,
entity:ENT1,
numbericvalue1:585,
numericvalue2…numericvalue60
}
ENT1( this id refered in above document)
{
users:[‘user1’,‘user2’…‘user1000’]
type:‘filter2’
}
now we need to filter by groupid 1 (Matching 100K documents) and sort by any column (60 columns) with join on entity attribute with second document and for specific user
e.g.
select META(a).id from default a join default b on keys (a.entity) where a.groupid=1 and a.type=‘FILTER1’ and b.type=‘FILTER2’ AND ARRAY_CONTAINS(b.users,‘TEST1’) order by { any attribute of a} limit 30
Please suggest on indexes or data model to make this faster
Hi @geraldss
Very much interested in resolving issue i tried below option also but no gain in time
ID1
{
groups[“1”,“2”,‘3’]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
numbericvalue1:585,
numericvalue2…numericvalue60
}
ID2
{
groups[“1”]]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
}
like this
created array index on groups
since both groups and users are array i created only in one (groups)
select META().id from default where any grp in groups satisfies grp=1 End limit 30 (Good reponse)
select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30 (Order by Adds slowness)
select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,‘USER1’) order by {any field} limit 30 (Very very slow)
Hi @krishnasahu2016,
At present if the query has JOIN clause it will not use index order.
create index such that left side of JOIN can push as many constant predicates to indexer and if possible to make it covered.
CREATE INDEX idx_totturnover ON test(groupid, type, totturnover, entity, user );
EXPLAIN SELECT META(a).id
FROM test a JOIN test b ON KEYS (a.entity)
WHERE a.groupid=1
AND a.type='FIRST_FILTER'
AND a.totturnover IS NOT MISSING
AND b.type='SECOND_TYPE'
AND ARRAY_CONTAINS(b.users,a.user)
ORDER BY a.totturnover ;
select META().id from default where any grp in groups satisfies grp=1 End limit 30;
Above query doesn’t use order by so when first 30 documents are satisfied the query stops.
select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30
Above query has order by and limit 30. We need to get all the qualified documents before we decide 30 documents. If the number of qualified documents are more it takes more time.
select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,'USER1') order by {any field} limit 30
In addition to order by and limit you have extra predicate, which almost iterating the array for each qualified documents. It takes more time.
You can try the following.
CREATE INDEX idx100 ON default(DISTINCT ARRAY u FOR u IN users END);
select META().id from default where any u in users satisfies u = "USER1" End and ARRAY_CONTAINS(grp,1) order by {any field} limit 30
Also post details on.
Number of documents in bucket
Number of distinct groups and number of documents in each group
number of distinct users and number of documents each user
HI @vsr1
There are 800 to 900 users per doc so creating index on users further slows down search and ordering
Order by currently is very very costly operation even on 100K docs
below is details
Number of documents in bucket :: 200K
Number of distinct groups and number of documents in each group :: 40 odd groups and 100K docs satifying highest match
number of distinct users and number of documents each user :: tot users 1500 , 100K documents per users within 200K
Hi @krishnasahu2016,
The query predicates qualify 50% of the documents on top of that you are predicates searches in big arrays.
Try using max_parallelism, increase query service REST api parameters pipeline-batch, pipeline-cap parameters.
There are only so many ways to improve sort performance. Best sort performance is to avoid it totally.
Here is a way to improve your performance.
Use covering index to fetch the documents you need, in the order you need.
You can THEN do the filtering and LIMIT on top of this.
This will avoid the ORDER BY, and FETCH of the whole 200K documents.
create index idx1 ON default(field1, field2, field3);
select *
from (select * from default order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;
–
Added where clause for keshav_m post to pick the index
select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;
Thank you for suggestion
what about order by desc or offset 50000 (Will this slowdown query even i use covered index sorting)
below query looks to have bug limit is getting applied on inner select statement rather on external select
select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;
Hi @krishnasahu2016,
Will check it. Thanks for the information.
Could you try this.
sselect * from default where field1 is not missing AND “USER1” IN users AND 1 IN grp ORDER BY field1 limit 30;
OR add limit (huge value) to inner query
Tried suggestion of adding more limit in inner select
looks another issue or bug
select *
from (select * from default where field1 is not missing order by field1 limit 200000) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30 offset 50000;
covering index does not return sortCount in n1qlmetrics for pagination
now if i have to use count to get data matching filter then goes back to same point
adding offset adds slowness
one more point all data is in Ram then what is the requirement of disk fetch for any join or sorting and i confirm all my data seating in Ram