FTS + N1QL query performance is very slow

flaviu · February 11, 2021, 5:01pm

I am trying to run this query and the response is very slow. (more than minutes)

SELECT COUNT(*)
FROM `bucket` a LEFT
    JOIN `bucket` d ON a.app_uuid = d.app_uuid
    AND d.sub_type="app-extra"
WHERE d.sub_type="app"
    AND SEARCH(d,{ "fields": [ "*" ], "highlight": {}, "sort": ["-_score"], "query": { "query": " +redacted.app_release_date:>=1581379200 +redacted.app_release_date:<=1613001600  +country_code:us" }})
    AND (d.sub_type="app-extra"
        AND TO_STR(d.i_genre_id) LIKE "10%")

if query just for the i_genre_id

SELECT COUNT(*)
FROM bucket d
WHERE (d.sub_type=“app-extra”
AND TO_STR(d.i_genre_id) LIKE “10%”)

i am getting the results in 3 ms

if I query just the FTS

SELECT COUNT(*)
FROM `bucket` d
WHERE d.sub_type="app"
    AND SEARCH(d,{ "fields": [ "*" ], "highlight": {}, "sort": ["-_score"], "query": { "query": " +redacted.app_release_date:>=1581379200 +redacted.app_release_date:<=1613001600  +country_code:us" }})

I am getting the results in 500 ms

So, it seems that something strange is happening when I am using them together. This is how the explain look:

Is there any way in which I can optimize this query?

vsr1 · February 11, 2021, 5:04pm

Avoid Intersect scan with USE INDEX. FTS index must fetch the document. cc @abhinav

flaviu · February 11, 2021, 5:08pm

Not sure where to use the use index. can you please give me more details?

abhinav · February 11, 2021, 5:15pm

Here’s how you can force your query to use an FTS index with USE INDEX …

SELECT COUNT(*)
FROM `bucket` d USE INDEX(USING FTS)
WHERE d.sub_type="app"
    AND SEARCH(d,{ "fields": [ "*" ], "highlight": {}, "sort": ["-_score"], "query": { "query": " +redacted.app_release_date:>=1581379200 +redacted.app_release_date:<=1613001600  +country_code:us" }})

Would you share the index definition you’re using for it?
If you’d want this query to run just with an FTS index, you’re FTS index definition needs to have a custom type mapping “app” for type_field “sub_type” with a child field of country_code (text) and a child mapping for redacted with child field app_release_date (datetime) within it.

flaviu · February 11, 2021, 5:23pm

Thanks @abhinav

I added the USE FTS INDEX and there is an improvment on lowering the complexity

the speed remained the same (actually is so slow that I cannot get any result back from the query)

I think the FTS index is correct, as I get ~fast results (500ms)

here is the FTS index definition:

{
  "type": "fulltext-index",
  "name": "apps_search",
  "uuid": "ee69269900de6ce2",
  "sourceType": "couchbase",
  "sourceName": "<redacted>",
  "sourceUUID": "54a8a8a8c2c77f7cd8a8b789fbdb55be",
  "planParams": {
    "maxPartitionsPerPIndex": 11,
    "indexPartitions": 100
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "type_field",
      "type_field": "sub_type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "app": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "<redacted>": {
              "dynamic": false,
              "enabled": true,
              "properties": {
                "app_provider_id": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "index": true,
                      "name": "app_provider_id",
                      "type": "number"
                    }
                  ]
                },
                "app_bundle_id": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "index": true,
                      "name": "app_bundle_id",
                      "type": "text"
                    }
                  ]
                },
                "app_name": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "include_in_all": true,
                      "index": true,
                      "name": "app_name",
                      "type": "text"
                    }
                  ]
                },
                "app_primary_genre_id": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "index": true,
                      "name": "app_primary_genre_id",
                      "type": "number"
                    }
                  ]
                },
                "app_provider_name": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "include_in_all": true,
                      "index": true,
                      "name": "app_provider_name",
                      "type": "text"
                    }
                  ]
                },
                "app_release_date": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "include_term_vectors": true,
                      "index": true,
                      "name": "app_release_date",
                      "type": "number"
                    }
                  ]
                },
                "app_seller_name": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "index": true,
                      "name": "app_seller_name",
                      "type": "text"
                    }
                  ]
                },
                "user_rating_avg": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "include_term_vectors": true,
                      "index": true,
                      "name": "user_rating_avg",
                      "type": "number"
                    }
                  ]
                }
              }
            },
            "country_code": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "country_code",
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch"
    }
  },
  "sourceParams": {}
}

abhinav · February 11, 2021, 5:32pm

@flaviu Your FTS index definition looks good to me.
Looking at the EXPLAIN plan, the intersect scan is happening because your GSI index is being used to interpret …

TO_STR(d.i_genre_id) LIKE “10%”

If you’d like to eliminate this situation, you should accommodate i_genre_id into your FTS index definition and add a conjunction for it within your search function as …

i_genre_id:"10*"

However, for this your data within i_genre_id needs to be of text type.

flaviu · February 11, 2021, 5:35pm

this i_genre_id is part of a different document type. Can I index multiple document types in the same index?

for changing the data to text, do I need to change it in the document, or can I just define it in the index as text type?

abhinav · February 11, 2021, 5:56pm

Unfortunately, indexing a numeric field as text will not convert it to text automatically for you just yet. We have this feature on our roadmap though.

FTS does support indexing multiple types of documents and you will be able to run FTS search requests on such an index. This is however not supported when using the SEARCH() function from N1QL.

So if you change your documents for i_genre_id to hold text rather than numeric, I’d recommend using FTS queries directly, once you have i_genre_id also indexed within that index (from the different document sub_type). You could use the SDK for this or just do this …

curl -XPOST -H "Content-type:application/json"
http://<username>:<password>@<ip>:8094/api/index/<index_name>/query -d
'{
  "query": {
    "conjuncts": [
      {
        "field": "redacted.app_release_date",
        "min": 1581379200,
        "max": 1613001600,
        "inclusive_min": true,
        "inclusive_max": true
      },
      {
        "field": "country_code",
        "match": "us"
      },
      {
        "field": "i_genre_id",
        "wildcard": "10*"
      }
    ]
  }
}'

flaviu · February 11, 2021, 6:00pm

Thanks for the detailed explanation. let me try to see if I can do this.

Instead of using CURL, do you think this should work using the PHP SDK?

abhinav · February 11, 2021, 6:08pm

Yes, of course you can!

flaviu · February 11, 2021, 9:04pm

Dear @abhinav ,

I have converted the field to string in the documents and running the query using your suggested solution. Now, It seems that I get some results, but the speed is still super slow (187 seconds for 10 results). Also, using your suggested solution, I don’t know how to do the count of the results. I don’t think it makes sense to retrieve 200k results just to count them. Is there any option to get the count of results using just the query?

Now, in this situation, how can we see how the index is used if the query cannot be run in the Couchbase admin interface and just through the API or the CURL command?

LATER EDIT: Something that I just realized is that the results are not correct. Meaning that if I add the { "field": "redacted.app_name", "match": "dark" } to the conjunction I am getting the results which are not containing the word “dark”, so somehow the conjunction doesn’t do “AND” in between the values of the conjunction array. What could be the problem?

this is the query I am running

{
   "query":{
      "conjuncts":[
         {
            "field":"redacted.app_name",
            "match":"dark"
         },
         {
            "field":"country_code",
            "match":"us"
         },
         {
            "field":"redacted.app_primary_genre_id",
            "min":1,
            "max":1,
            "inclusive_min":true,
            "inclusive_max":true
         }
      ]
   }
}

So, definitely, I am doing something wrong, as such a bug could not be in the platform. But I don’t understand what could be the problem…

Another thing I don’t understand is how do I create an index with multiple document types and the relationships between these documents. Like, for example, if one of the document contains App details, and another document additional details about this app, where/how do you let the index know how to join these two documents?

Thank you for your kind answers!

abhinav · February 11, 2021, 10:42pm

I don’t know how to do the count of the results. I don’t think it makes sense to retrieve 200k results just to count them. Is there any option to get the count of results using just the query?

As part of the search response, you will see an entry in the JSON for “hits” which carries an array of the documents matched based on the pagination filers you use and along side it you will find “total_hits” which tells you the total number of documents in the result set which matched the query criteria.

Now, in this situation, how can we see how the index is used if the query cannot be run in the Couchbase admin interface and just through the API or the CURL command?

The FTS UI only allows for you to execute a “string” which is sort of limited, but I think it’d work for your situation. Your query string would look like this …

+redacted.app_name:dark +country_code:us +redacted.app_primary_genre_id:1

You can use the SDKs that couchbase supports to communicate with the search service or via the HTTP requests (curl).

Something that I just realized is that the results are not correct. Meaning that if I add the { "field": "redacted.app_name", "match": "dark" } to the conjunction I am getting the results which are not containing the word “dark”

This is not expected. Would you share the contents of a document that shouldn’t have been returned by your query but does?

how do I create an index with multiple document types and the relationships between these documents. Like, for example, if one of the document contains App details, and another document additional details about this app, where/how do you let the index know how to join these two documents?

Firstly, you should simply be able to define 2 type mappings to index documents of the 2 types. You will then be adding fields in the 2nd type mapping like you did for the first. Now if any of these field names are identical to those in the other mapping, you can change their “searchable as” name.

For example, lets say you have fields “x” under type mapping “type1” and fields “y” under type mapping “type 2”, here’s you get a join over x and a …

{
  "query": {
    "disjuncts": [
      {
        "field": "x",
        "match": ".."
      },
      {
        "field": "y",
        "match": "..."
      }
    ]
  }
}

, and here’s how you get a intersect over x and a …

{
  "query": {
    "conjuncts": [
      {
        "field": "x",
        "match": ".."
      },
      {
        "field": "y",
        "match": "..."
      }
    ]
  }
}

flaviu · February 12, 2021, 10:04am

I want to mention that there are results (in general the first result) who is a correct match and the match score is above 1

here is an example, this document is bigger, but I redacted some parts of it:

{
“country_code”: “is”,
“sub_type”: “app”,
“redacted”: {
“app_name”: “TRUE Field”,
“app_primary_genre_id”: 22,
}
“key”: “meta:bc9f7bc7-ad0a-4f91-a213-357872d34d8c:is”,
“created_timestamp”: 1602820362,
“type”: “tracker-app”,
“version”: 2,
“updated_timestamp”: 1611807733
}

As you can see none of the fields in the conjunction is correct:

{"query":  "conjuncts": 
    [ 
        { "field": "redacted.app_name", "match": "dark" },
        { "field": "country_code", "match": "us" },
        { "field": "redacted.app_primary_genre_id",  "min": 1, "max": 1, "inclusive_min": true, "inclusive_max": true}
     ] }

In this case, I suspect I am doing something wrong with the PHP SDK, as the documentation is almost nonexistent… so I am just guessing how to use the FTS search from PHP SDK

abhinav · February 12, 2021, 5:13pm

You’re right the query should not have matched that document.
You can quickly verify by running an HTTP request (via curl) to the server of the query like I mentioned earlier already.

Now looking at the document, I recommend using the “keyword” analyzer for fields “country_code”, so you index the content as is. (Here’s a blog on the analyzers we support - Text Analysis within a Full-Text Search Engine - The Couchbase Blog).

I’m certain you’re looking at the documentation here for the PHP sdk … Search | Couchbase Docs. I’ll need you to share your code to be able to assist you with how you’re using the SDK code for FTS.

Also, roping in @daschl here to guide you with your SDK code, once you’ve shared it.

Topic		Replies	Views
Count/Aggregation Full Text Search	36	2399	January 15, 2020
Evaluating Full Text Search Full Text Search fts	26	2826	February 24, 2021
How to configure the FTS scoring? Full Text Search	22	1509	February 16, 2021
FTS with N1QL or API Full Text Search	16	3621	September 22, 2016
Query for search including join on different bucket performing very slow SQL++ query , n1ql , couchbase-cli	9	899	July 15, 2022

FTS + N1QL query performance is very slow

Related topics