Couchbase 4.5 Order by diffrence in performnace for ASC and DESC even query covered

@geraldss

Okay

Can you please add some option for DESC order in CB 4.5 as DESC is very important

As current implementaion of DESC order by is very slow for 30K records taking almost 700ms

In my case ORDER BY is very very slow :frowning: . Kindly suggest something I want to apply it on Date column which format is “timestamp”: “2016-04-14 19:03:05”. Should I apply index or change date format or any thing else .

I also tried for SELECT *, meta(DEFAULT).id id FROM Test_mysql_import ORDER BY id LIMIT 5 OFFSET 0;

but no result no error :frowning: up to 180 sec.

Thanks in advance.

Hi @pratyushtty,

Can you post your exact index and query.

HI @geraldss

4.5 EE still not suitable for data where every field is sortable as adding all fields in covering index not possible

order by is very very slow if query not using covering index its very worst for ordering 70K records no where near to RDBMS

One more observation joins even worst as right side only uses covered index left side its doing key value fetch and if we put order by along with join its more worst

Are you using index order? Ascending or descending? If you post your queries, we can help identify the right set of indexes. For ordering, they do not necessarily need to be covering indexes.

hi @geraldss

here is my query

INDEX is on groupid,user,entity

select meta(a).id from test a join test b on keys (a.entity) where a.groupid=1 and a.type=‘FIRST_FILTER’ and b.type=‘SECOND_TYPE’ and ARRAY_CONTAINS(b.users,a.user) order by a.totturnover

i have 60 attributes on alias a test which are sortable (tot records left side matching groupid=1 70K)

Try the following exactly.

CREATE INDEX idx_totturnover ON test( groupid, type, totturnover );

EXPLAIN SELECT META(a).id
FROM test a JOIN test b ON KEYS (a.entity)
WHERE a.groupid=1
AND a.type='FIRST_FILTER'
AND a.totturnover IS NOT MISSING
AND b.type='SECOND_TYPE'
AND ARRAY_CONTAINS(b.users,a.user)
ORDER BY a.groupid, a.type, a.totturnover
;

Hi @geraldss

tried above option but still same slow if i add order by even in same index order

Hi @krishnasahu2016,

You are not providing enough information. You say that all the queries and indexes have the same performance, regardless of the query and index definitions.

(1) That is not likely, unless there is something wrong with your environment.

(2) Just saying everything is slow does not provide any information we can build on.

Can you run the exact index and query I sent. Can you post the output of EXPLAIN. Can you post the metrics from running the query.

Hi @geraldss

My working environment does not allow me to load data on any forum

But pls try below in ur working env you will face slowness

ID1 (like this with groupid 1 having 100K documents)
{
groupid:1
type:‘FILTER_1’,
entity:ENT1,
numbericvalue1:585,
numericvalue2…numericvalue60
}

ENT1( this id refered in above document)
{
users:[‘user1’,‘user2’…‘user1000’]
type:‘filter2’
}

now we need to filter by groupid 1 (Matching 100K documents) and sort by any column (60 columns) with join on entity attribute with second document and for specific user

e.g.

select META(a).id from default a join default b on keys (a.entity) where a.groupid=1 and a.type=‘FILTER1’ and b.type=‘FILTER2’ AND ARRAY_CONTAINS(b.users,‘TEST1’) order by { any attribute of a} limit 30

Please suggest on indexes or data model to make this faster

Hi @geraldss

Very much interested in resolving issue i tried below option also but no gain in time
ID1
{
groups[“1”,“2”,‘3’]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
numbericvalue1:585,
numericvalue2…numericvalue60
}

ID2
{
groups[“1”]]
entity:‘ENT1’,
users:[‘User1’,‘User2’,…‘User1000’]
}

like this

created array index on groups

since both groups and users are array i created only in one (groups)

select META().id from default where any grp in groups satisfies grp=1 End limit 30 (Good reponse)

select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30 (Order by Adds slowness)

select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,‘USER1’) order by {any field} limit 30 (Very very slow)

Hi @krishnasahu2016,

At present if the query has JOIN clause it will not use index order.
create index such that left side of JOIN can push as many constant predicates to indexer and if possible to make it covered.

CREATE INDEX idx_totturnover ON test(groupid, type, totturnover, entity, user );

EXPLAIN SELECT META(a).id
FROM test a JOIN test b ON KEYS (a.entity)
WHERE a.groupid=1
AND a.type='FIRST_FILTER'
AND a.totturnover IS NOT MISSING
AND b.type='SECOND_TYPE'
AND ARRAY_CONTAINS(b.users,a.user)
ORDER BY a.totturnover ;

select META().id from default where any grp in groups satisfies grp=1 End limit 30;

Above query doesn’t use order by so when first 30 documents are satisfied the query stops.

select META().id from default where any grp in groups satisfies grp=1 End order by {any field} limit 30

Above query has order by and limit 30. We need to get all the qualified documents before we decide 30 documents. If the number of qualified documents are more it takes more time.

select META().id from default where any grp in groups satisfies grp=1 End and ARRAY_CONTAINS(users,'USER1') order by {any field} limit 30

In addition to order by and limit you have extra predicate, which almost iterating the array for each qualified documents. It takes more time.

You can try the following.

CREATE INDEX idx100 ON default(DISTINCT ARRAY u FOR u IN users END);
select META().id from default where any u in users satisfies u = "USER1" End and ARRAY_CONTAINS(grp,1) order by {any field} limit 30

Also post details on.
Number of documents in bucket
Number of distinct groups and number of documents in each group
number of distinct users and number of documents each user

1 Like

HI @vsr1

There are 800 to 900 users per doc so creating index on users further slows down search and ordering

Order by currently is very very costly operation even on 100K docs

below is details

Number of documents in bucket :: 200K
Number of distinct groups and number of documents in each group :: 40 odd groups and 100K docs satifying highest match
number of distinct users and number of documents each user :: tot users 1500 , 100K documents per users within 200K

Hi @krishnasahu2016,

   The query predicates qualify 50% of the documents on top of that you are predicates searches in big arrays. 
    
    Try using max_parallelism, increase query service REST api parameters pipeline-batch, pipeline-cap parameters.
1 Like

Hi @krishnasahu2016

There are only so many ways to improve sort performance. Best sort performance is to avoid it totally.

Here is a way to improve your performance.

  1. Use covering index to fetch the documents you need, in the order you need.

  2. You can THEN do the filtering and LIMIT on top of this.

  3. This will avoid the ORDER BY, and FETCH of the whole 200K documents.

    create index idx1 ON default(field1, field2, field3);
    select *
    from (select * from default order by field1) d
    where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
    limit 30;

1 Like

Added where clause for keshav_m post to pick the index
select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;

hi @keshav_m @vsr1

Thank you for suggestion

what about order by desc or offset 50000 (Will this slowdown query even i use covered index sorting)

HI @keshav_m @vsr1

below query looks to have bug limit is getting applied on inner select statement rather on external select

select *
from (select * from default where field1 is not missing order by field1) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30;

Hi @krishnasahu2016,

Will check it. Thanks for the information.

Could you try this.
sselect * from default where field1 is not missing AND “USER1” IN users AND 1 IN grp ORDER BY field1 limit 30;
OR add limit (huge value) to inner query

hi @keshav_m @vsr1

Tried suggestion of adding more limit in inner select
looks another issue or bug

select *
from (select * from default where field1 is not missing order by field1 limit 200000) d
where any u in users satisfies u = “USER1” End and ARRAY_CONTAINS(grp,1)
limit 30 offset 50000;

covering index does not return sortCount in n1qlmetrics for pagination
now if i have to use count to get data matching filter then goes back to same point

adding offset adds slowness

one more point all data is in Ram then what is the requirement of disk fetch for any join or sorting and i confirm all my data seating in Ram