Convert FTS scoring to a percentage match(absolute?)

I am hoping to replace a handmade solution based upon the “damerauLevenshtein” algorithm with an FTS counterpart.

How does one design the index or construct a query which returns matches for which the relevancy between the search term and the field values exceed a given percentage, say 50%?

If I search for “Joseph Public” in a name field, I would like all the matches returned with similarity to that search term exceeding 50% (or any other similarity threshold provided)

Thanks you.

JG

Hi @jgcoding,

I can’t think of a direct/explicit way to achieve this.
But a couple of related options coming to mind are,

  1. Try a match query making use of a “prefix_length” parameter which is greater than the minimum amount of (percentage of ) prefix matching needed to ensure that - those many tokens are already matched.
    Match query also accepts a fuzziness parameter which would then be applied to the remaining matching tokens after the specified prefix_length.
    eg:
    “query”: {
    “match”: “Joseph Public”,
    “field”:“name”,
    “operator”:“and”,
    “fuzziness”: 2,
    “prefix_length”: 7
    }

  2. Another thinking to achieve a similar result if we have multiple tokens to always search for is by using boosting based on the amount of tokens searched.
    For eg: you can have a disjunction query with multiple child match_phrase/phrase queries depending on your requirements with the highest boosting for child query with the maximum number of tokens to search for.
    eg:
    “disjuncts”: [
    {“match_phrase”: “term1 term2 term3”, “field”: “name”, “boost”: N},
    {“match_phrase”: “term1 term2”, “field”: “name”, “boost”: 2N/3}
    {“match_phrase”: “term2 term3”, “field”: “name”, “boost”: N/3}
    ]

But all these are sort of approximations and not a precise answer to your requirement.

regards,

I appreciate your effort. I will review your suggestions to determine if they may get us closer to our objective.

The more code I can replace with features and solutions already available in FTS, the better. A 25% reduction or more would be nice first round of refactoring and optimization.

Are you available for paid consultation?

Thank you.

JG

@jgcoding, if you are a licensed customer then you may reach out to the Support Team for such assistance.
Cheers!
Sreekanth