How to configure the FTS scoring?

I have a quick question regarding the FTS scoring.

For example, I have this query:

SELECT
search_score() as score,
a_app_name AS app_name
FROM search AS a
WHERE sub_type=“me”
AND SEARCH(a,{ “fields”: [ “*” ], “highlight”: {}, “sort”: [“-_score”], “query”: { “query”: ‘dark’ }, “size”: 10, “from”: 0 })
ORDER BY score DESC

and then I get a list of results:

[
    {
        "app_name": "bla bla  dark - bla dark bla bla",
        "score": 1.5706989807863172
    },
    {
        "app_name": "bla bla dark - dark bla",
        "score": 1.5326266605096939
    },
    {
        "app_name": "bla bla dark,
        "score": 1.5178101798989465
    },
    {
        "app_name": "dark",
        "score": 1.1970086199227965
    }
]

Is there any way in which I could configure the results algorithm a little bit? For example I would normally think that a result which is an exact match, (as the last one from this list) to have a higher score, but in this case this one has a lower score. Would it be possible to change anything in the way the scoring is done?

Yes @flaviu ,

Your expectation is correct and the “app_name”: “dark” would get the highest score.
Normally this can happen if there is less data density (low amount of data) with multiple index partitions.

Remember the query time scoring happens at each partition level and then later gathered and this could result in such discrepancies in the scoring with smaller data sets. With huge volumes of data, this would get normalized easily.

You could verify this by trying the same query on a single partitioned index. You can override the index partition count on the index definition edit page.

Cheers!

thanks @sreeks , I will try to limit the number of partitions. I have ~200k documents

@sreeks I just tried with one single partition, and I get exactly the same results. How can I investigate further?

Oops… I missed another point there.

You have to do a field-scoped search and in your case, the problem mostly is that as you already have ~200K plus docs.
Without the field name here, it is getting searched in the default field (_all) and hence could get noise there.

refer: Query String Queries | Couchbase Docs

example:

"query": {
    "query": "app_name:dark"
  }

The problem I have is that when I do this I don’t get any results:

SELECT *
FROM `search` AS a
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "app_name:dark" }})

This is the index:

{
  "type": "fulltext-index",
  "name": "search_reduced",
  "uuid": "35f39c227af2034a",
  "sourceType": "couchbase",
  "sourceName": "search",
  "sourceUUID": "0784c2c848bd1dcf8d8c5148fe6a2bdd",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "type_field",
      "type_field": "sub_type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "me": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "a_app_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "index": true,
                  "name": "app_id",
                  "type": "number"
                }
              ]
            },
            "a_app_bundle_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "index": true,
                  "name": "app_bundle_id",
                  "type": "text"
                }
              ]
            },
            "a_app_name": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "app_name",
                  "type": "text"
                }
              ]
            },
            "a_app_provider_name": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "app_provider_name",
                  "type": "text"
                }
              ]
            },
            "a_app_release_date": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "app_release_date",
                  "type": "number"
                }
              ]
            },
            "a_app_seller_name": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "app_seller_name",
                  "type": "text"
                }
              ]
            },
            "a_hp": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "hp",
                  "type": "number"
                }
              ]
            },
            "app_uuid": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "app_uuid",
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch"
    }
  },
  "sourceParams": {}
}

I have also tried with the app_name vs a_app_name (see the a_ at the beginning of the field name

    SELECT *
    FROM `search` AS a
    WHERE sub_type="me"
        AND SEARCH(a,{"query": { "query": "a_app_name:dark" }})

If I search directly in the search bar of the index, I get results, as you can see in the below image

What could be the problem?

The query looks right. Since you’ve indexed “a_app_name” as “app_name”, the field you need to use in the query is “app_name”.

What version of couchbase-server are you using?
I’ve just tested this on 6.6 and things work as expected.

I am on 6.6 enterprises (paid licence)

Ok what’s the exact output you see when you run your N1QL query?
Could you also share the EXPLAIN “JSON” output for your query.

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IntersectScan",
          "scans": [
            {
              "#operator": "IndexScan3",
              "as": "a",
              "index": "s_me_rd_asc",
              "index_id": "d8c6056c8acd585c",
              "index_projection": {
                "primary_key": true
              },
              "keyspace": "search",
              "namespace": "default",
              "spans": [
                {
                  "exact": true,
                  "range": [
                    {
                      "high": "\"me\"",
                      "inclusion": 3,
                      "low": "\"mer\""
                    }
                  ]
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "IndexFtsSearch",
              "as": "a",
              "index": "search_reduced",
              "index_id": "35f39c227af2034a",
              "keyspace": "search",
              "namespace": "default",
              "search_info": {
                "field": "\"\"",
                "outname": "out",
                "query": "{\"query\": {\"query\": \"app_name:dark\"}}"
              },
              "using": "fts"
            }
          ]
        },
        {
          "#operator": "Fetch",
          "as": "a",
          "keyspace": "search",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "(((`a`.`sub_type`) = \"me\") and search(`a`, {\"query\": {\"query\": \"app_name:dark\"}}))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT *\nFROM `search` AS a\nWHERE sub_type=\"me\"\n    AND SEARCH(a,{\"query\": { \"query\": \"app_name:dark\" }})"
  }
]

Your query is doing an intersect scan across GSI and FTS and not just going to FTS.
It is necessary for the expression sub_type = me" to be handed alongside the SEARCH expression to FTS by N1QL.

Would you try this query for me …

SELECT *
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "app_name:dark" }})

yes, I saw that, and tried also with the use index and still the same resul, here is the explain:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexFtsSearch",
          "as": "a",
          "index": "search_reduced",
          "index_id": "35f39c227af2034a",
          "keyspace": "search",
          "namespace": "default",
          "search_info": {
            "field": "\"\"",
            "outname": "out",
            "query": "{\"query\": {\"query\": \"app_name:dark\"}}"
          },
          "using": "fts"
        },
        {
          "#operator": "Fetch",
          "as": "a",
          "keyspace": "search",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "(((`a`.`sub_type`) = \"me\") and search(`a`, {\"query\": {\"query\": \"app_name:dark\"}}))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT *\nFROM `search` AS a USE INDEX(USING FTS)\nWHERE sub_type=\"me\"\n    AND SEARCH(a,{\"query\": { \"query\": \"app_name:dark\" }})"
  }
]

Hmm, I just tried to change the name in the search index from app_name to a_app_name and now it seems to be working. So, this may be a strange bug?!

Still something is not right, as the execution time is very long

Here is the query explain:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexFtsSearch",
          "as": "a",
          "index": "search_reduced",
          "index_id": "7bda6de8faa7b30b",
          "keyspace": "search",
          "namespace": "default",
          "search_info": {
            "field": "\"\"",
            "outname": "out",
            "query": "{\"query\": {\"query\": \"a_app_provider_name:dark\"}}"
          },
          "using": "fts"
        },
        {
          "#operator": "Fetch",
          "as": "a",
          "keyspace": "search",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "(((`a`.`sub_type`) = \"me\") and search(`a`, {\"query\": {\"query\": \"a_app_provider_name:dark\"}}))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`a`.`a_app_bundle_id`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT a_app_bundle_id\nFROM `search` AS a USE INDEX(USING FTS)\nWHERE sub_type=\me\"\n    AND SEARCH(a,{\"query\": { \"query\": \"a_app_provider_name:dark\" }})"
  }
]

if I search in the search console I get this:

Do you think in this situation would be better to run the query like this?

iI get the same result in 12.6 ms compared with 1.3 seconds. My only concern is how do I take care of the order of the correct results.

Is there a better way to run this?

Ah you’ve identified bug.

With your original index definition, if you try this query, things should work …

SELECT meta().id
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "app_name:dark" }})

SELECT * however is failing to fetch your the results, because the results go through a “verification” phase where the query is unaware that you’ve mapped a field to a different name.

Here’s the query you’ll need to run for this to get results as expected …

SELECT *
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "app_name:dark" }}, {"index": "search_reduced"})

However, there’s a bug in the N1QL’s verification path when you change the name of your fields within the search index. I’ll put a fix in for this soon.

Ref: https://issues.couchbase.com/browse/MB-44356

iI get the same result in 12.6 ms compared with 1.3 seconds. My only concern is how do I take care of the order of the correct results.

The N1QL query and the FTS query are NOT a fair comparison. With N1QL you’re fetching some document content that isn’t indexed, while the straight FTS query you’re just obtained document IDs, so there’s an extra step you’re doing within N1QL. Here’s how I recommend you change things to get comparable results …

SELECT search_meta()
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "app_name:dark" }, "fields":["a_app_bundle_id"]})

For this, you’ll need to checkbox “store” for the a_app_bundle_id field within your FTS index definition.

if I run your recomaned query I get 0 results:

but if I change from * to meta().id I get the results in 6ms?

SELECT META().id
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "a_app_name:dark" }}, {"index": "search_reduced"})

is this an expected result?

There’s a bug in the “verification” path like I mentioned already in my comment here - How to configure the FTS scoring? - #15 by abhinav

I’ll be fixing it shortly.

https://issues.couchbase.com/browse/MB-44356

I thoutht they are not related, as I changed the index and used the a_app_name in the query

So, I was not sure they are realted to the same alias name mapping (bug) as I am not using that now

You’re pursuing multiple conversations on this ticket.

Like I replied in this here comment - How to configure the FTS scoring? - #16 by abhinav,
To reduce latency of your query, SELECT over meta().id or search_meta(). For this, follow these steps -

  • Firstly, you can continue to index a_app_name as app_name
  • Now, check “store” for field “a_app_bundle_id” in your FTS index definition

And now come back to the query workbench and run this query …

SELECT search_meta()
FROM `search` AS a USE INDEX(USING FTS)
WHERE sub_type="me"
    AND SEARCH(a,{"query": { "query": "app_name:dark" }, "fields":["a_app_bundle_id"]}