N1ql indexing sequence - Mulitple Keys

Hi Cb Team,
i have some question about indexing sequence and would like to understand , since advised index give some better response time and different index.

  • i am fetching “orderid” from the documents and did not include “orderid” in my current index since i am using other predicates to filter out the data , now advised index suggested add the “orderid” in my current index , also suggested use where clause isEsOrdTr in the index as well but i also have document which contains isKsOrdTr , isTsOrdTr as well.

  • Can you please explain

  • Can you please share a link of any detail document and example for indexing sequence and predicates.

Query :slight_smile:

SELECT ARRAY_AGG (b.OrderId) tn
FROM Bucket_Order b
WHERE b.OrderId BETWEEN 0 AND 19
AND b.EsOrderTriggerEpoch>=1591210768 AND b.EsOrderTriggerEpoch<=1591228785
AND b.isEsOrdTr=false

Created index:

CREATE INDEX Order ON Bucket_Order (OrderId,EsOrderTriggerEpoch,isOrdTr)
PARTITION BY hash(OrderId) WITH { “defer_build”:false,“num_partition”:32 }

Advise index by couch base

CREATE INDEX adv_Order ON Bucket_Order (OrderId,OrderNumber,EsOrderTriggerEpoch) WHERE (isEsOrdTr` = false)

[ Note we have 3 types in OrdTr - > 1 : isEsOrdTr , 2 : isKsOrdTr , 3 : isTsOrdTr

{
“EpochTm1”: 1590599032,
“ESOrderTriggerEpoch”: 1590591832,
“isEsOrdTr”: false,
“OrderId”: 10,
"OrderNumber: “68636”
}

https://index-advisor.couchbase.com/indexadvisor/#1


can, you please look at my given scenario and give advise why advise index little diferrent than what i have created .

SELECT ARRAY_AGG (b.OrderId)  AS tn
FROM Bucket_Order b
WHERE b.OrderId BETWEEN 0 AND 19
AND b.EsOrderTriggerEpoch>=1591210768 AND b.EsOrderTriggerEpoch<=1591228785
AND b.isEsOrdTr=false;


CREATE INDEX ix1  ON  Bucket_Order (OrderId, EsOrderTriggerEpoch) WHERE isEsOrdTr = false;
CREATE INDEX ix2  ON  Bucket_Order (EsOrderTriggerEpoch, OrderId) WHERE isEsOrdTr = false;

See which index performs better and use that (you can add your partition key). Also if you think orderId has duplicates you can use ARRAY_AGG (DISTINCT b.OrderId)

You can avoid the Aggregation also.

SELECT tn
LET tn = (SELECT RAW b.OrderId
          FROM Bucket_Order b
          WHERE b.OrderId BETWEEN 0 AND 19
               AND b.EsOrderTriggerEpoch>=1591210768 AND b.EsOrderTriggerEpoch<=1591228785
               AND b.isEsOrdTr=false);

The following will explain why your index( may need to go through lot of index entries) vs recommended make difference. If equality is first or index where clause this can avoid number of index keys lookup can make faster

https://docs.couchbase.com/server/current/learn/services-and-indexes/indexes/index_pushdowns.html#index-key-order-and-structure

INDEX ix1 ON Bucket_Order (OrderId, EsOrderTriggerEpoch) WHERE isEsOrdTr = false; - On this index you have suggested Where clause and use equality predicate b.isEsOrdTr=false but it alos contain “True” value as well then how it will work ?

  • Might be i got your point let me elaborate and correct me if i wrong , basically we need to consider the documents in an index where isEsOrdTr value as “false” in case of value populating as “True” we are not considering in N1QL query then why need to add composite index Ex: ( OrderId , EsOrderTriggerEpoch , isEsOrdTr )

you have also mentioned "you can add your partition key " in your query currently my partition is on orderId number if i will add my partition in my where clause , then still i need to use same field in covering index like below, please explain.

CREATE INDEX ix1 ON Bucket_Order (OrderId, EsOrderTriggerEpoch) WHERE isEsOrdTr = false;

If you want to query on both isEsOrdTr = false and isEsOrdTr=true

CREATE INDEX ix1 ON Bucket_Order (isEsOrdTr, OrderId, EsOrderTriggerEpoch) 
PARTITION BY hash( `OrderId` ) WITH { “defer_build”:false,“num_partition”:32 }

If you want query all the values of isEsOrdTr, what ever you have at the beginning is right.

CREATE INDEX Order ON Bucket_Order ( OrderId , EsOrderTriggerEpoch , isOrdTr )
PARTITION BY hash( OrderId ) WITH { “defer_build”:false,“num_partition”:32 }

CREATE INDEX Order ON Bucket_Order ( OrderId , EsOrderTriggerEpoch , isOrdTr )
PARTITION BY hash( OrderId ) WITH { “defer_build”:false,“num_partition”:32 }

but using this index perfomance is very poor 9s for 5k documents , hence now i am thinking use where clause but not sure if i can use where clause and PARTITION BY combine something like below

CREATE INDEX Order ON Bucket_Order ( OrderId , EsOrderTriggerEpoch , isOrdTr )
WHERE isEsOrdTr = false
PARTITION BY hash( OrderId ) WITH { “defer_build”:false,“num_partition”:32 } , but in this index i have question if do PARTITION By OrderId then why need to num_partition”:32 do you have any thoughts …?

Where clause, partition by independent you can use it.

Depends on how many document you have. If small you don’t need PARTITION BY. If you have many you can use default 8 or even less (if have less distinct values more partitions may leave them empty). As you use range on orderId it has to do scatter gather (no partition elimination).

If you want distinct orderId

SELECT RAW b.OrderId
FROM Bucket_Order b
WHERE b.OrderId BETWEEN 0 AND 19
        AND b.EsOrderTriggerEpoch>=1591210768 AND b.EsOrderTriggerEpoch<=1591228785
        AND b.isEsOrdTr=false
GROUP BY b.OrderId ;

I have tried diffrent options and change the index and run the query this is my results.
CREATE INDEX IDX_Order ON Bucket_Order(EsOrderTriggerEpoch,OrderId)
PARTITION BY hash(OrderId)
WHERE (isEsOrdTr = false)
WITH { “num_partition”:32 }

Executed query - Total documents 800k - response time 871 ms

SELECT tn
LET tn = (SELECT RAW b.OrderId
FROM Bucket_Order b
WHERE b.EsOrderTriggerEpoch>=1591210768 AND b.EsOrderTriggerEpoch<=1591228785
AND b.isEsOrdTr=false
LIMIT 100000);
Note: Spans range { “high”: “1591228785”, “inclusion”: 3, “low”: “1591210768” }

But when i have added additional required filter orderid it took 5 sec to retrive the same amount of orderids

	SELECT tn

LET tn = (SELECT RAW b.OrderId
FROM Bucket_Order b
WHERE b.EsOrderTriggerEpoch>=1591210768 AND b.EsOrderTriggerEpoch<=1591228785
AND b.isEsOrdTr=false
AND b.OrderId >= 0 AND 19 <=b.OrderId
LIMIT 100000);

Note: Spans range { “high”: “1591228785”, “inclusion”: 3, “low”: “1591210768” } , {“high”: “19”,“inclusion”: 3,“low”: “0”}

that extra filter bumped it my query response time is too high …

Hi Vsr1,
i have found why my response bumped it up, basically filter by orderId was not included in my index and hence it making a call to a data node to get a order id , i have made query change it worked fine.

CREATE INDEX  ix1   ON  `Bucket_Order ` (  isEsOrdTr, `EsOrderTriggerEpoch`  ,  `OrderId`  )
PARTITION BY hash(  `OrderId`  );

CREATE INDEX  ix2   ON  `Bucket_Order ` (  `EsOrderTriggerEpoch`  ,  `OrderId`  )
PARTITION BY hash(  `OrderId`  ) WHERE isEsOrdTr=false;

Use one of the above index.

sorry i forgot to add where clause in my post actulay , i do using where clause in my index

CREATE INDEX Order ON Bucket_Order ( EsOrderTriggerEpoch , OrderId )
PARTITION BY hash( OrderId ) WHERE isEsOrdTr=false;

post the explain plan

The plan looks right and as you are doing 250K items and it taking time. Indexer took 1sec. cc @deepkaran.salooja

@mehul, you can check the following:
a. resident percent of the index(UI stats). If it is less than 100, allocate more memory so the scans don’t need to do disk fetch.
b. index partitions can be scanned in parallel if there are enough cores. with 16 partitions on a box, ideally it needs 16 cores for full concurrency. Also check how many cores have been allocated to the index service(UI->Settings).

Hi Deepkaran,
Do you have any document which shows how the data been store into index in the couchabase.
Now i am curioues since i have multicolumn indexes.
I have also checked resident percent and it 1oo , and for cores set up working with DBA team. Once we set up everything, we will run query and see hows query perfrome.

Hi Deepkaran,
I have checked ui-setting but not able to find cores section assigned to a index node.
I saw Ram assigned to a node , analytics , eventing but not cores , can you please navigate me.

Can, you please also explain
N1QL uses stale=ok for a consistency model . It was observed that this query utilized 24 cores completely to achieve an 80% latency of 5ms against a bucket of 20M documents.

thanks,
Mehul

Please look for “Indexer Threads” under General settings. More details on the documentation page:
https://docs.couchbase.com/server/6.5/manage/manage-settings/general-settings.html

Are the 24 cores used by the index service or query service? How many total cores are on the box?