Count Query With Time Condition - Indexing Problem

Hi,

I am having problem with my count queries. Using v4.5.3. I have index on all the queried fields. There is time condition in my query. If I remove time condition part from the query, it takes milliseconds to respond but with time condition, it takes more than 20 seconds. I checked out many related questions but they don’t have (or I couldn’t find) time condition and that is what slows my query down.

Here are 2 queries and EXPLAIN outputs:

With time condition:

select COUNT(*) as count fromtransaction_mswhere _class='net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction' and customer.code='EVM' and transactiontime > 1567896132879

Output of EXPLAIN query:

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IntersectScan”,
“scans”: [
{
"#operator": “IndexScan”,
“index”: “ptCustomerCode”,
“index_id”: “4c5b146267e036a4”,
“keyspace”: “transaction_ms”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“EVM”"
],
“Inclusion”: 3,
“Low”: [
"“EVM”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “IndexScan”,
“index”: “paymentTransactionSecondaryIndex”,
“index_id”: “b66c0eab845e7f77”,
“keyspace”: “transaction_ms”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction”"
],
“Inclusion”: 3,
“Low”: [
"“net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction”"
]
}
}
],
“using”: “gsi”
}
]
},
{
"#operator": “Fetch”,
“keyspace”: “transaction_ms”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((((transaction_ms._class) = “net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction”) and (((transaction_ms.customer).code) = “EVM”)) and (1567896132879 < (transaction_ms.transactiontime)))”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count()"
],
“group_keys”: []
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
"count(
)”
],
“group_keys”: []
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count()"
],
“group_keys”: []
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“as”: “count”,
“expr”: "count(
)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “select COUNT(*) as count from transaction_ms where _class=‘net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction’ and customer.code=‘EVM’ and transactiontime > 1567896132879”
}
]

Without time condition:

select COUNT(1) as count fromtransaction_mswhere _class='net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction' and customer.code='EVM'

Output of EXPLAIN query:

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexCountScan”,
“covers”: [
“cover (((transaction_ms.customer).code))”,
“cover ((meta(transaction_ms).id))”
],
“index”: “ptCustomerCode”,
“index_id”: “4c5b146267e036a4”,
“keyspace”: “transaction_ms”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“EVM”"
],
“Inclusion”: 3,
“Low”: [
"“EVM”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “IndexCountProject”,
“result_terms”: [
{
“as”: “count”,
“expr”: “count()"
}
]
}
]
},
“text”: "select COUNT(
) as count from transaction_ms where _class=‘net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction’ and customer.code=‘EVM’”
}
]

My query with time condition took 20.44s and the other took 337.47ms when I tried just before writing here. Both 2 queries have the same result but timings are not.

And also, here is a sample document.

{
“basket”: {
“basketAmount”: 1
},
“bankOrderId”: “----”,
“modifiedAt”: 1502375629754,
“transactionTime”: 1502375628000,
“result”: {},
“createdAt”: 1502375629754,
“merchantId”: “B9D3-A6E20103C109”,
“currency”: “TRY”,
“modifiedBy”: “----”,
“amount”: 11100,
“opc”: false,
“basketItems”: [
{
“unitPrice”: 100,
“count”: 1,
“description”: “desc”,
“basketItemId”: “0001”,
“productCode”: “test1”
}
],
“mailSubscription”: false,
“smsSubscription”: false,
“transactionId”: “12444”,
“transactionType”: “SALE”,
“createdBy”: “----”,
“installment”: 1,
“paymentMethod”: “NONE”,
“header”: {
“operationName”: “Payment”,
“application”: “Dpay”,
“transactionDate”: 1502375628000,
“transactionId”: “----”
},
"_class": “net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction” ,
“customer”: {
“name”: “----”,
“ipAddress”: “----”,
“phoneNumber”: “----”,
“surname”: “----”,
“email”: “----”,
“code”: “EVM”,
}
}

My question is:
I know about IndexScan and IndexCountScan. There is no grouping and no distinct here. I tried with count(*) and count(1) but same result. I even tried triple index with _class, customer.code, transactionTime but no change. Query with time condition doesn’t use IndexCountScan. There is also index on transactionTime field. What am I missing here?

Any suggestions?

Thanks in advance
Best regards

slhddn

Query uses IndexCountScan In the following situations.
         Query needs to be covered by the index
          All the predicates needs to be pushed to indexer as part of the spans
          Predicate pushdown as part of the spans needs to be exact and no false positives
          Projection needs to be one and it needs to be COUNT()
           COUNT arguments needs to be constant or leading Index key and spans should not include NULLs or MISSINGS

The reason above rules are indexer is doing count for query instead of transferring data. If any one of them false query need to eliminate rows and query will not choose the plan that will not result in right results.

First query the transaction_ms.transactiontime is not part of the index so it is not covered, leading key also not true.

Try the following index

CREATE INDEX ix1 ON transaction_ms(transactiontime) WHERE _class="net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction" and customer.code="EVM"

Thanks for your reply.

Index you shared works fine but customer.code can change. I can’t create indexes for every customer. I created an index using the following script:

CREATE INDEX ix1 ON transaction_ms(transactionTime, customer.code) WHERE _class="net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction"

This index speeds up the query but it is still slow. It takes more than 6 seconds. Any suggestion on this?

Thanks

You can try with CB 5.0.0

Couldn’t update yet. After updating, I’ll share the result here.

Hi again,

I tried with 5.0.1. There is no big difference.

5.0.1 took less than 5 seconds with the index above. Any suggestions? I need to speed up this query, it should return in milliseconds. How can I achieve this?

Query:

select COUNT(*) as count from transaction_ms where _class=‘net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction’ and customer.code=‘EVM’ and date_diff_millis(NOW_MILLIS(), transactionTime, ‘day’) > 180;

Explain output:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan2",
          "covers": [
            "cover ((`transaction_ms`.`transactionTime`))",
            "cover (((`transaction_ms`.`customer`).`code`))",
            "cover ((meta(`transaction_ms`).`id`))"
          ],
          "filter_covers": {
            "cover ((`transaction_ms`.`_class`))": "net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction"
          },
          "index": "transactionTimeCustomerCodeCompositeIndex",
          "index_id": "cf48f0fa62beb7c2",
          "index_projection": {
            "entry_keys": [
              0,
              1
            ]
          },
          "keyspace": "transaction_ms",
          "namespace": "default",
          "spans": [
            {
              "range": [
                {
                  "inclusion": 0,
                  "low": "null"
                },
                {
                  "high": "\"EVM\"",
                  "inclusion": 3,
                  "low": "\"EVM\""
                }
              ]
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "(((cover ((`transaction_ms`.`_class`)) = \"net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction\") and (cover (((`transaction_ms`.`customer`).`code`)) = \"EVM\")) and (180 < date_diff_millis(now_millis(), cover ((`transaction_ms`.`transactionTime`)), \"day\")))"
              },
              {
                "#operator": "InitialGroup",
                "aggregates": [
                  "count(*)"
                ],
                "group_keys": []
              }
            ]
          }
        },
        {
          "#operator": "IntermediateGroup",
          "aggregates": [
            "count(*)"
          ],
          "group_keys": []
        },
        {
          "#operator": "FinalGroup",
          "aggregates": [
            "count(*)"
          ],
          "group_keys": []
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "as": "count",
                    "expr": "count(*)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "select COUNT(*) as count from transaction_ms where _class='net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction' and customer.code='EVM' and date_diff_millis(NOW_MILLIS(), transactionTime, 'day') > 180;"
  }
]

Thanks in advance.

Using date_diff_millis decreases query performance.

I modified the query like below:

select COUNT(*) as count from transaction_ms where _class=‘net.infoowl.fraud.microservice.transaction.domain.PaymentTransaction’ and customer.code=‘EVM’ and transactionTime < NOW_MILLIS()-15552000000;

It took about 2 seconds. How to speed up?

Hi again,

After upgrading, we didn’t see very good results but then we added 2 new clusters and increased their resources. We also added new indexes. Now most queries return in milliseconds.

Thanks for your help.