Query performance issue while parameterizing

Hello CB Team,

I have a fully covered N1QL index. When I execute the query without parameterizing input values, it works great and provides results within 100ms. However, when I parameterize a particular input value, it gives me trouble, and the query latency jumps to 20-25 seconds (“elapsedTimeMs”:23854,“executionTimeMs”:23853). Note in both scenarios I receive the rows.

When I compared the query plans, there isn’t much difference, and there is no fetch operator.

Index:

CREATE INDEX `by_type_location_size_sg_idx`
       ON `sync_gateway_sw1`(
            `type`,
            IFMISSINGORNULL(`locationRoot`, `locationId`, `location`, "\uefff"),
            SUBSTR(META().`id`, 0),
            IFMISSINGORNULL(`subType`, `changeType`),
            ENCODED_SIZE(OBJECT_REMOVE(self, "_sync")),
            ENCODED_SIZE(self)
          )
    WHERE (`type` IS VALUED)
      AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
      AND (NOT IFMISSINGORNULL(`_deleted`, FALSE))
      AND (META().`id` NOT LIKE "_sync:%")

Query:

      SELECT  meta(sg).`id`
        FROM `sync_gateway_sw1` AS sg
      WHERE (sg.`type` IN ["location"])
         AND (((sg.`type` = "") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = "") AND (SUBSTR(META(sg).`id`,0) > ""))
          OR  ((sg.`type` = "") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > ""))
          OR   (sg.`type` > $lastType))
         AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
         AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
         AND (META(sg).`id` NOT LIKE "_sync:%")
       ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff"), SUBSTR(META(sg).`id`,0)
       LIMIT 10;
        named: {
          '$lastType' => ""
        }

NOTE: In the above query I removed most of the parameters and just wanted to keep the one that is affecting the performance.

ISSUE:
Here it looks like ‘$lastType’ is the one that gives me trouble. As soon as I replace the parameter with the value (“”) directly in the query, it works perfectly. You can see that ‘sg.type’ is the first leading key in the index. Here, I’m performing key-set pagination, seeking directly to the index row where I stopped in the last iteration.

It would be great if someone could explain to me what is happening here and provide a solution. Please let me know if you need any further information.

Thanks,
Vishnu

CREATE INDEX `by_type_location_size_sg_idx` ON `sync_gateway_sw1`(
            `type`,
            IFMISSINGORNULL(`locationRoot`, `locationId`, `location`, "\uefff"),
            SUBSTR(META().`id`, 0),
            IFMISSINGORNULL(`subType`, `changeType`),
            ENCODED_SIZE(OBJECT_REMOVE(self, "_sync")),
            ENCODED_SIZE(self)
          ) WHERE IFMISSINGORNULL(`softDelete`, FALSE) = FALSE
                  AND IFMISSINGORNULL(`_deleted`, FALSE) = FALSE
                  AND META().`id` NOT LIKE "_sync:%";

$lastType = ""
$loc = ""
$id  = ""

SELECT t.*
FROM ( SELECT  meta().id, type, IFMISSINGORNULL(`locationRoot`, `locationId`, `location`, "\uefff") AS loc
       FROM `sync_gateway_sw1` AS sg
       WHERE IFMISSINGORNULL(`softDelete`, FALSE) = FALSE
             AND IFMISSINGORNULL(`_deleted`, FALSE) = FALSE
             AND META().`id` NOT LIKE "_sync:%"
             AND type >= $lastType
       ORDER BY type, loc, SUBSTR(META().`id`,0)
) AS t
WHERE t.type  != $lastType
      OR t.loc != $loc
      OR t.id > $id
LIMIT 10;

Hi Vrs1,

I tried the new index you provided, but the query you gave me won’t work for my scenario because I’m executing key-set pagination on top of over 500 million indexed rows. Essentially, the index contains all documents with multiple types, which is why the type field is my first leading key. The INNER query in your SUBQUERY will keep returning the same result to me repeatedly in every iteration.

I have a scenario where I need to iterate through all documents of specific types. For instance: types = [“location”, “user”]

Since this index is sorted by the fields below,

  1. type (1st leading key)
  2. locationId (2nd leading key)
  3. meta().id (3rd leading key)

I can structure my query to run the iteration. In each iteration, I can start from where I left off in the previous one, thereby avoiding over-fetching rows (that I already fetched before) and query will be very efficient.

Say my requirement for this example is FETCH ONLY types = [“location”, “user”] WITH LIMIT 100 (i.e. I need 100 rows in every iterations of types locations or users)

#1st ITERATION: In the first iteration since there are no previous/last values - $lastType=“”, $lastLocation=“”, $lastKey=“” and $documentTypes=[“location”, “user”]

      SELECT  sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") AS `location`, meta(sg).`id`
        FROM `sync_gateway_sw1` AS sg
      WHERE (sg.`type` IN $documentTypes)
         AND (((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = $lastLocation) AND (SUBSTR(META(sg).`id`,0) > $lastKey))
          OR  ((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > $lastLocation))
          OR   (sg.`type` > $lastType))
         AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
         AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
         AND (META(sg).`id` NOT LIKE "_sync:%")
       ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff"), SUBSTR(META(sg).`id`,0)
       LIMIT 100;

^^^ This query gives me first 100 rows in sorted order of type, locationId and meta().id

  • Imagine this is the last row(or 100th row) I received => {“type”:“location”, “location”:“a3dbe271-03c1-4cb7-a638-b2a3cb80cf47”, “id”:“bc656abc-5c3e-4140-a7ac-e8bc7728b095”}. I can use these values to pinpoint the next 100 rows where I left off in the previous query

#2nd ITERATION: Therefore in the second iteration $lastType=“location”, $lastLocation=“a3dbe271-03c1-4cb7-a638-b2a3cb80cf47”, $lastKey=“bc656abc-5c3e-4140-a7ac-e8bc7728b095” and $documentTypes=[“location”, “user”]

      SELECT  sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") AS `location`, meta(sg).`id`
        FROM `sync_gateway_sw1` AS sg
      WHERE (sg.`type` IN $documentTypes)
         AND (((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = $lastLocation) AND (SUBSTR(META(sg).`id`,0) > $lastKey))
          OR  ((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > $lastLocation))
          OR   (sg.`type` > $lastType))
         AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
         AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
         AND (META(sg).`id` NOT LIKE "_sync:%")
       ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff"), SUBSTR(META(sg).`id`,0)
       LIMIT 100;

^^^ This query gives me next 100 rows and no delay here from 2nd iteration onwards.

I hope you understood the intension of using such queries. My issue is not the query or the index, but the delay of 25 seconds it produces when I introduce parameterization in my code for this specific WHERE condition OR (sg.type > $lastType)). In as soon I replace the $lastType param with a value like OR (sg.type > “”)) it works.
NOTE: This issue occurs only in my first iteration; from the second iteration onwards, everything is fine.

I believe this issue could be related to a bug in your parameterization module. Another possibility is that in the first iteration, when $lastType is an empty string (“”), the query might be interpreting (sg.type > $lastType)) as a potential N1QL injection attack, intentionally causing delays in the response. If you need, I can provide logs from our Couchbase server to compare parameterized and non-parameterized queries.

Thanks,
Vishnu

index items sorted in first key, then second key,…

If you have multiple range keys keyset pagination will not work.
type “a” there 1000
type “b” there 3000
WHERE type > “” AND META().id > “xx”
LIMIT 10
you should able to start where you left of i,e “a” can break in the middle. So it will not work.

Hi Vsr1,

The key-set pagination works with multiple range keys. In your example imagine you have two types of documents => [“a”, “b”]. And there are 1000 type “a” documents and 3000 type “b” documents that are indexed.

  • 1st leading key = type
  • 2nd leading key = meta().id

If you have a query that looks like below:

1st ITERATION:

 SELECT  `type`, meta().`id`
    FROM `sync_gateway_sw1` AS sg
 WHERE (`type` IN ["a", "b"])
   AND ( ((sg.`type` = "")AND(meta().`id` > ""))  OR   (sg.`type` > ""))
 LIMIT 100

^^^ Here since the index begins with document “a” and “a” is larger than empty string (“”), this condition → (sg.type > “”) gets satisfied & returns you the first 100 rows of “a”.

  • ^^^ Imagine the 100th (last) row in 1st iteration returned you id=“83838” and type=“a”

2nd ITERATION: lastType=“a”, lastKey=“83838”

 SELECT  `type`, meta().`id`
       FROM `sync_gateway_sw1` AS sg
 WHERE (`type` IN ["a", "b"])
   AND ( ((sg.`type` = "a")AND(meta().`id` > "83838"))  OR   (sg.`type` > "a"))
LIMIT 100

^^^ This time this condition → ((sg.type = “a”)AND(meta().id > “83838”)) gets satisfied and return you next set of “a” having the IDs greater than “83838”, so that you will receive “a” records starting from the point where you left off in the 1st iteration. This process continuous until all documents of “a” and “b” are returned.

Thanks,
Vishnu

If works great. Extend that with your 3 keys.
Now check if that optimal or not By looking at profile various values. How many documents IndexScan produces how many filters thrown away.

Prepare statement (adhoc=false) WITH Query parameters it makes complex you can supply any values. If you need optimal one use adhoc=True with query parameters

Hi Vsr1,

Yes as I mentioned before at this point I’m not worried about index, query or index scan. I also don’t think saving the query plan on client SDK side(using prepare statement) will solve this issue.

My concern is ‘PARAMETERIZATION.’ Why does query performance degrade when I parameterize it? Query latency jumps from less than 100ms to 25 seconds. Parameterization is simply a way of passing values via parameters; why should it degrade performance? Does parameterization go through any extra checks? Does it intentionally cause delays in the response if it detects any potential N1QL injection attack?

adhoc=true effectively un-parameterizes the query. i.e. since the query plan is not going to be cached, the query is re-written with the parameters substituted with the values provided before the query plan is generated.

“With Couchbase Server 6.5 and newer, if a statement has placeholders, and a placeholder is supplied, the Query Service will generate specially optimized plans. Therefore, if you are supplying the placeholder each time, adhoc = true will actually return a better-optimized plan (at the price of generating a fresh plan for each query).”

My understanding is that if the actual value is available (versus a placholder to be provided at execution) a more efficient query plan can be generated. An obvious example would be LIKE ?. When parameterized, an index scan would be required. When not parameterized, and if the value was ‘Paris’ (i.e no wild cards) then the query plan could have an index lookup versus an index scan.

For your example, with $lastType=“”, the predicate type >= $lastType is always true (everything is >= the empty string).

Hi Michael,

I tried adhoc=true. I dont see any difference. I still see the 25 seconds delay in response from CB server.

  • I’m using Ruby SDK ( ‘couchbase’, ‘3.4.0’) and Couchbase server(
    Enterprise Edition 6.6.3 build 9808)
  • I’m using named_parameters not positional_parameters

So I set adhoc=true and just to make sure I printed the options as well:

options = Couchbase::Cluster::QueryOptions.new
options.adhoc = true

And I can see → “adhoc”:true,
options={“timeout”:null,“retry_strategy”:null,“client_context”:null,“parent_span”:null,“adhoc”:true,“client_context_id”:null,“max_parallelism”:null,“readonly”:false,“scan_wait”:null,“scan_cap”:null,“pipeline_cap”:null,“pipeline_batch”:null,“metrics”:true,“profile”:“off”,“flex_index”:null,“preserve_expiry”:null,“scope_qualifier”:null,“scan_consistency”:“not_bounded”,“mutation_state”:null,“transcoder”:{},“positional_parameters”:null,“named_parameters”:{“$lastType”:“”},“raw_parameters”:{}}

I took an additional step forward by capturing the TCPdump on query port 8093.

POST /query/service HTTP/1.1
host: couchbase1.spdev.sightplan-ops.net:8093
authorization: Basic XXXXX== <redacted>
client-context-id: 4848cf-13f4-1e40-33cc-4c024c38f712ab
connection: keep-alive
content-length: 792
content-type: application/json
user-agent: cxx/1.0.0/ec53d75;Linux/x86_64; client/e96632-c201-954d-b5c6-6cc5d76c22a332; session/117666-76b4-8340-3533-a041acb5c3711b; Linux-5.4.0-155-generic; ruby_sdk/85dacb90;ssl/1010114f;ruby_abi/3.0.0

{"$lastType":"","client_context_id":"4848cf-13f4-1e40-33cc-4c024c38f712ab","scan_consistency":"not_bounded","statement":"SELECT meta(sg).`id` FROM `sync_gateway_sw1` AS sg WHERE (sg.`type` IN [\"location\"]) AND (((sg.`type` = \"\") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"...\") = \"\") AND (SUBSTR(META(sg).`id`,0) > \"\")) OR ((sg.`type` = \"\") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"...\") > \"\")) OR (sg.`type` > $lastType)) AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE)) AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE)) AND (META(sg).`id` NOT LIKE \"_sync:%\") ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"...\"), SUBSTR(META(sg).`id`,0) LIMIT 10;","timeout":"74500ms"}

^^^ You mentioned “adhoc=true effectively un-parameterizes the query” - But I still parameters. $lastType is the one parameter here in the query request (check HTTP request). Did you mean when I pass adhoc=true the client SDK will unparameterize the query before it send the request to the server?

Here is the comparison of PARAMETERIZED EXPLAIN (25 seconds delay) v/s UN-PARAMETERIZED EXPLAIN (<100milliseconds delay):

PARAMETERIZED EXPLAIN (has 25 seconds delay) :

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "as": "sg",
            "covers": [
              "cover ((`sg`.`type`))",
              "cover (ifmissingornull((`sg`.`locationRoot`), (`sg`.`locationId`), (`sg`.`location`), \"\"))",
              "cover (substr0((meta(`sg`).`id`), 0))",
              "cover (ifmissingornull((`sg`.`subType`), (`sg`.`changeType`)))",
              "cover (encoded_size(object_remove(`sg`, \"_sync\")))",
              "cover (encoded_size(`sg`))",
              "cover ((meta(`sg`).`id`))"
            ],
            "filter_covers": {
              "cover (((`sg`.`type`) is valued))": true,
              "cover ((not ((meta(`sg`).`id`) like \"_sync:%\")))": true,
              "cover ((not ifmissingornull((`sg`.`_deleted`), false)))": true,
              "cover ((not ifmissingornull((`sg`.`softDelete`), false)))": true
            },
            "index": "by_type_location_size_sg_idx_v2",
            "index_id": "f6ee0e217e16f1a5",
            "index_order": [
              {
                "keypos": 0
              },
              {
                "keypos": 1
              },
              {
                "keypos": 2
              }
            ],
            "index_projection": {
              "entry_keys": [
                0,
                1,
                2
              ],
              "primary_key": true
            },
            "keyspace": "sync_gateway_sw1",
            "namespace": "default",
            "spans": [
              {
                "range": [
                  {
                    "high": "\"location\"",
                    "inclusion": 2,
                    "low": "$lastType"
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "maxParallelism": 1,
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(((((cover ((`sg`.`type`)) in [\"location\"]) and (((((cover ((`sg`.`type`)) = \"\") and (cover (ifmissingornull((`sg`.`locationRoot`), (`sg`.`locationId`), (`sg`.`location`), \"\")) = \"\")) and (\"\" < cover (substr0((meta(`sg`).`id`), 0)))) or ((cover ((`sg`.`type`)) = \"\") and (\"\" < cover (ifmissingornull((`sg`.`locationRoot`), (`sg`.`locationId`), (`sg`.`location`), \"\"))))) or ($lastType < cover ((`sg`.`type`))))) and cover ((not ifmissingornull((`sg`.`softDelete`), false)))) and cover ((not ifmissingornull((`sg`.`_deleted`), false)))) and cover ((not ((meta(`sg`).`id`) like \"_sync:%\"))))"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "cover ((meta(`sg`).`id`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "10"
      }
    ]
  },
  "text": "SELECT meta(sg).`id` FROM `sync_gateway_sw1` AS sg WHERE (sg.`type` IN [\"location\"]) AND (((sg.`type` = \"\") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"\") = \"\") AND (SUBSTR(META(sg).`id`,0) > \"\")) OR ((sg.`type` = \"\") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"\") > \"\")) OR (sg.`type` > $lastType)) AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE)) AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE)) AND (META(sg).`id` NOT LIKE \"_sync:%\") ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"\"), SUBSTR(META(sg).`id`,0) LIMIT 10;"
}

UNPARAMETERIZED PLAN (< 100 milliseconds delay):

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "as": "sg",
            "covers": [
              "cover ((`sg`.`type`))",
              "cover (ifmissingornull((`sg`.`locationRoot`), (`sg`.`locationId`), (`sg`.`location`), \"\"))",
              "cover (substr0((meta(`sg`).`id`), 0))",
              "cover (ifmissingornull((`sg`.`subType`), (`sg`.`changeType`)))",
              "cover (encoded_size(object_remove(`sg`, \"_sync\")))",
              "cover (encoded_size(`sg`))",
              "cover ((meta(`sg`).`id`))"
            ],
            "filter_covers": {
              "cover (((`sg`.`type`) is valued))": true,
              "cover ((not ((meta(`sg`).`id`) like \"_sync:%\")))": true,
              "cover ((not ifmissingornull((`sg`.`_deleted`), false)))": true,
              "cover ((not ifmissingornull((`sg`.`softDelete`), false)))": true
            },
            "index": "by_type_location_size_sg_idx_v2",
            "index_id": "f6ee0e217e16f1a5",
            "index_order": [
              {
                "keypos": 0
              },
              {
                "keypos": 1
              },
              {
                "keypos": 2
              }
            ],
            "index_projection": {
              "entry_keys": [
                0,
                1,
                2
              ],
              "primary_key": true
            },
            "keyspace": "sync_gateway_sw1",
            "limit": "10",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"location\"",
                    "inclusion": 3,
                    "low": "\"location\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "maxParallelism": 1,
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(((((cover ((`sg`.`type`)) in [\"location\"]) and (((((cover ((`sg`.`type`)) = \"\") and (cover (ifmissingornull((`sg`.`locationRoot`), (`sg`.`locationId`), (`sg`.`location`), \"\")) = \"\")) and (\"\" < cover (substr0((meta(`sg`).`id`), 0)))) or ((cover ((`sg`.`type`)) = \"\") and (\"\" < cover (ifmissingornull((`sg`.`locationRoot`), (`sg`.`locationId`), (`sg`.`location`), \"\"))))) or (\"\" < cover ((`sg`.`type`))))) and cover ((not ifmissingornull((`sg`.`softDelete`), false)))) and cover ((not ifmissingornull((`sg`.`_deleted`), false)))) and cover ((not ((meta(`sg`).`id`) like \"_sync:%\"))))"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "cover ((meta(`sg`).`id`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "10"
      }
    ]
  },
  "text": "SELECT meta(sg).`id` FROM `sync_gateway_sw1` AS sg WHERE (sg.`type` IN [\"location\"]) AND (((sg.`type` = \"\") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"\") = \"\") AND (SUBSTR(META(sg).`id`,0) > \"\")) OR ((sg.`type` = \"\") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"\") > \"\")) OR (sg.`type` > \"\")) AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE)) AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE)) AND (META(sg).`id` NOT LIKE \"_sync:%\") ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, \"\"), SUBSTR(META(sg).`id`,0) LIMIT 10;"
}

Thanks,
Vishnu

It would need to be happening on the server. The SDK just sends the statement as-is, with adhoc=true.
I don’t exactly know what adhoc=true does, but from the documentation, that it “generates an optimized query plan when the parameter is provided” - that’s really the only thing it can be doing.

btw - what’s the purpose of SUBSTR(META().id,0) ?

I dont see adhoc=true in the HTTP request to the server.

What I understood about PREPARE STATEMENT is ( Class: Couchbase::Cluster::QueryOptions — Documentation by YARD 0.9.25) it allows to save query plan either in SDK (client side) or on Couchbase(server side). So if it receives the same query from the same client, it doesn’t need to generate it for every iteration. But what I see is its taking only milliseconds to generate the plan.

Re - “btw - what’s the purpose of SUBSTR(META().id ,0) ?”

  • I think it should return the same (full) string.

If I diff your query plans I notice that the parameterized has a spans of $lastType to “location”, versus the unparameterized has a spans of “location” to “location” which looks to be an exact (match?). Which could explain the difference. Maybe more importantly is that the unparametrized plan has “limit 10” there. A query plan engineer would have better information than me.

            "namespace": "default",
            "spans": [
              {   
                "range": [
                  {   
                    "high": "\"location\"",
                    "inclusion": 2,
                    "low": "$lastType"
                  } 

— vs —

           "limit": "10",
           "namespace": "default",
            "spans": [
              {   
                "exact": true,
                "range": [
                  {   
                    "high": "\"location\"",
                    "inclusion": 3,
                    "low": "\"location\""
                  } 

Thats right. I noticed that difference too. It will be great if you can forward the EXPLAINs to your Query plan team. The difference in performance is 100ms vs 25s which is huge. I know Databases perform SQL injection check when query is parameterized (I hope it’s true for Couchbase as well?), thats the reason why I suspected if Couchbase server is intentionally inserting a delay of 25seconds in the response if it detects any potential N1QL injection attack. May be a way of protecting itself from an attack.

I’m 99.99% sure there is no delay inserted. Also - there’s no way to discern between SQL injection and application-generated SQL statement. The defense against that is for applications to generate the SQL, and only allow user-supplied data into parameters. So this leaves you in a pickle, as the parameterized statement gives poor performance, while the non-parameterized, SQL-Injection-vulnerable statement gives good performance.

I’m left wondering if this difference exists for any value of $lastType or only “”. If it is for any value of $lastType - then you have a problem. If it exists only for “”, then you could just make two queries. Also - why is “limit 10” apparently not pushed in the parameterized and pushed in the non-parameterized.

Ok that makes sense.

This is happening only when $lastType=“” (empty string) - which is my very 1st iteration of KeySet pagination. From the 2nd iteration onwards it works fine using the same query but with different $lastType=“validString”.

Re - "If it exists only for “”, then you could just make two queries. "

  • That’s the work around that I could implement. But I was curious to know why it’s not working for the empty string.

Re - Also - why is “limit 10” apparently not pushed in the parameterized and pushed in the non-parameterized.

  • In my both queries (parameterized and unparameterized) I have LIMIT 10 (as plain value in the query). You can see that in PARAMETERIZED EXPLAIN statement that I provided in “text:” field. So I have no idea why it’s no appearing along with keyspace & namespace. I think your Query plan engineer should have an explanation for that.
            "keyspace": "sync_gateway_sw1",
            "limit": "10",
            "namespace": "default"

What server version do you have? If you don’t have the latest, then upgrading would be worth a try. When I generate the plans against a development version of. enterprise server, they come out indentical except for the $lastType replaced with “”.

Re - "What server version do you have? "

  • I’m using Couchbase server(Enterprise Edition 6.6.3 build 9808) and Ruby SDK ( ‘couchbase’, ‘3.4.0’)

Re - then upgrading would be worth a try.

  • I tested the same with version 7.x (my colleague has the latest one), it has the same delay issue as well.

Re - When I generate the plans against a development version of. enterprise server, they come out indentical except for the $lastType replaced with “”

  • Did you mean you installed my index (I provided in my first POST here in this thread) and tried EXPLAIN on my query?

Yes.

  • I’m using Couchbase server(Enterprise Edition 6.6.3 build 9808) and Ruby SDK ( ‘couchbase’, ‘3.4.0’)

Have you tried executing your queries in the webui? Set the parameters under ‘Settings’. Just to check if the delay is in the Ruby SDK (I don’t have experience with the Ruby SDK).

Yes you can see elapsed and execution time 24.5s

1 Like