Understanding FTS score value

Hello,

I have a bucket of documents about TV shows. The FTS index is indexing the title, year and show type (movie, series, episode, etc).

Example: say I am searching for the movie “Braveheart” (1995)

When I supplied the title “braveheart”, I got:

[     {
		id: 5060546,
		score: 14.76826030751886,
		fields: {
			title: "Braveheart",
			year: "2012",
			showtype: "tvEpisode",
		},
	}, {
		id: 7266106,
		score: 14.76826030751886,
		fields: {
			title: "Braveheart",
			year: "2001",
			showtype: "tvEpisode",
		},
	}, {
		id: 2838506,
		score: 14.76826030751886,
		fields: {
			title: "Braveheart",
			year: "2012",
			showtype: "tvEpisode",
		},
	}, {
		id: 8806644,
		score: 14.768155463589437,
		fields: {
			title: "Braveheart",
			year: "2018",
			showtype: "short",
		},
	}, {
		id: 4383716,
		score: 14.768155463589437,
		fields: {
			title: "After Braveheart",
			year: "2015",
			showtype: "tvMiniSeries",
		},
	}, {
		id: 1407018,
		score: 14.48033987395586,
		fields: {
			title: "Braveheart",
			year: "2008",
			showtype: "tvEpisode",
		},
	}, {
		id: 5114738,
		score: 14.48033987395586,
		fields: {
			title: "Braveheart",
			year: "2011",
			showtype: "tvEpisode",
		},
	}, {
		id: 112573,
		score: 14.338453489376002,
		fields: {
			title: "Braveheart",
			year: "1995",
			showtype: "movie",
		},
	}, {
		id: 7597954,
		score: 14.338453489376002,
		fields: {
			title: "Braveheart",
			year: "2018",
			showtype: "tvEpisode",
		},
	},
        <<---- AND MANY MORE --->>
]

When I supplied the title and show type, I got:

[          {
		id: 0112573,
		score: 15.240031317494054,
   	        fields: {
			title: "Braveheart",
			year: "1995",
			showtype: "movie",**
		},
	}, {
		id: 0015643,
		score: 15.078619173430923,
		fields: {
			title: "Braveheart",
			year: "1925",
			showtype: "movie",
		},
	}, {
		id: 1126487,
		score: 11.182786153072048,
		fields: {
			title: "The Braveheart of Sussex",
			year: null,
			showtype: "movie",
		},
	}, {
		id: 5852040,
		score: 8.494925478013252,
		fields: {
			title: "Braveheart II: Lions of the North",
			year: null,
			showtype: "movie",
		},
	},
]

When I supplied title, show type and year, I got:

[ {
		id: 0112573,
		score: 10.100123984821272,
		fields: {
			title: "Braveheart",
			year: "1995",
			showtype: "movie",
		},
	}
]

All three searches have correct result and I am not questioning that. What I don’t understand are the score values. It seems like the more precise I went, the lesser the score value becomes (see the actual show I am searching for Id = 112573).

  1. Why is that?
  2. What are the range of scores?
    0 to 100? But the last search which pretty much matching all the indexed field only score 10.1+? I expected close to a 100.

Thanks.

Hi @mosabusan,

This is bit tricky to answer briefly. But let me try,
FTS’s internal text indexing library(bleve) uses a slightly tweaked version of standard tf-idf scoring. This improvisation is done to normalise the score by various relevant factors. The search scoring happens at the query time.

There isn’t a real pre-defined maximum score. When bleve scores a document - it sort of sums a set of sub scores to reach a final TotalScore. Scores across different searches are not directly comparable as the search query is also an input factor to the scoring function. The more conjuncts/disjuncts/sub clauses your query has, the more it influence the scoring.
The score of a particular hit is not absolute, meaning that it can only be used as a comparison to the highest score from the same search result.

So there is no defined range for this scores.

To summarise the scoring function in a formal way,

Given a document which has a field f over which a given match query q is applied, then the scoreFn for that document is defined as:

Blockquote
scoreFn(q, f) = coord(q, f) * SUM(tw(t0, q, f), tw(t1, q, f), tw(t2, q, f)…, tw(tn, q, f))
where ti := term in q
coord(q, f) = nFoundTokens(q, f)/nTokens(q)
tw(ti, q, f) = queryWeight(q, f, ti) * fieldWeight(f, ti)
queryWeight(q, ti) = w(ti) * queryNorm(q)
w(ti) = boost(ti) * idf(ti)
queryNorm(q) = 1 / SQROOT(SUM(SQ(w(t0)),…,SQ(w(tn))))
fieldWeight(f, ti) = SQROOT(FREQ(ti, f))*idf(f, ti)*fieldNorm(f)
fieldNorm(f) = 1 / SQROOT(nTokens(f))
idf(f, ti) = 1 + LN(|Docs| / (1 + FREQ(ti, FIELDNAME(f), Docs)))
Docs = a set of all indexed documents

where SQROOT, SUM, and LN denote standard mathematical functions. Auxiliary functions are:

  • coord(q, f) — is a dampening factor defined as a ratio of query tokens that are found in the given field, and the total number of tokens in a query.
  • tw(ti, q, f)ti ’s term weight is the product of ti ’s query weight and ti’s field weight.
  • queryWeight(q, ti)ti ’s query weight (wrt to q ) is the product of its inverse document frequency (see idf below) and its boosting factor.
  • queryNorm(q) — is used to normalize each query term’s contribution. It uses the Euclidean distance as the normalization factor.
  • fieldWeight(f, ti) — is a normalized product of ti ’s idf and the square root of its frequency.
  • FREQ(ti, f) — is the frequency of ti in the given field f .
  • fieldNorm(f) — normalises each (in f ) term’s contribution to the score. The normalisation factor is the square root of the number of distinct terms in f. (Note that f ’s terms may and may not be part of q. )
  • idf(f, ti) — a dampening factor that favours terms that have high frequency in a small set of field, but not across the whole indexed (document) set.
  • FREQ(ti, FIELDNAME(f), Docs) —frequency of ti across all documents’ fields that have the same ID/Name as f .

Bleve’s tf-idf scoring variant differs with the standard textbook functions (see Intro to Information Retrieval): mainly in these points.

  1. Term frequency is augmented with the square root function.
  2. The idf function is “ inverse document frequency smooth ” (due to the (1+) factor). Note that it is present in both the query weight and the field weight.
  3. The normalization factors are different for the field weight (a variant of the byte size normalization) and the query weight ( Euclidean ).
  4. The coordination factor, which is often not present by default, can have an impact on scores for small queries.

Users have an option of exploring the score computations during any search in FTS.
They can enable the “Explain” field in the searchRequest to understand the score computation for these hits. You may compare the hierarchical score function depiction there(from results) in context of the above scoreFn insights to derive on the actual numbers in context of your query and document corpus.

Sreekanth

Hi Sreekanth,

Thanks for the prompt answer. I forgot to mention that I wanted it answered in English! :rofl:

But that’s OK. I guess then, the takeaway is that:

  1. There is no fixed score range. My users shouldn’t use a hard number to determine accuracy or successful match by score.
  2. If we sort the result by score in descending order, we can assume the first one is always the “best” match. “Best match” does not mean “exact match” - although it can be.

Are these correct?

I am searching using querystring search like below example:

+title: ‘+snow +white +and +the +huntsman’ +year:2012 +showtype: movie

(** notice the plus signs (+) at the beginning of every word)

I found out, with current index, if I supply all three fields like in example, more than likely I will get one result or empty array if zero match. Chances are that one result is an exact match. I could add more indexed fields like “country” and director’s name, for example, to increase the chance of exact match and in case there are multiple movies with the same title released that year (but not likely by same director).

Is this a safe strategy?

Thanks.

Hey @mosabusan,

Yes, the take aways are right. :slight_smile:
The results are always sorted in the desc order of score by default (you can override this in a couple of ways). So that part is already taken care. The first result ought to be the highest score one.

Is this a safe strategy? - picking unique identifiers for documents depends on the nature of documents and lies in the business solution space, but as you pointed adding/indexing more fields which helps to uniquely identify the documents/movies is certainly the way to result in exact/better matches.

Your query string query contains name parts like “and” or “the” which are stop words and might not help during querying.(if you had used standard analyser in the index defn).

Cheers!

1 Like

Great.

I do use the standard analyzer and find it adequate for my purpose. I thought that analyzer already takes care of stop words? I could add stop word filter in my codes. But I am happy with the performance so far. I can even input non-English characters and it works fine.

Well, this gives me confidence with the solution I have now.

Thank you for your help, Sreekanth.

Is it possible to convert the score to a percentage match representing the relevancy of the result to the search term(s)? If so, how?

Thanks

hi @jgcoding,

Can you please elaborate your exact requirement? What does the percentage match means? Is it for the fuzzy/regex type of queries & the percentage of match happened against the search token? or something else

In either way, mostly this has to be handled at the client/application side as of today.
You may retrieve the original field value indexed from the stored fields and compute the percentage of match happened explicitly.
Or
Given the total number of matches count in the result, you can convert that into some top percentage of the result if thats what you are looking for.

Cheers!

Yes, I am seeking a score representing the percentage of similarity (fuzziness) to the search term when compared to all the terms indexed within a specific field.

Our current solution is handmade and it seems we should be able to leverage FTS and maybe a “go words” list (opposite of “stop words”).

Presently, we collect every name segment in the firstName and lastName fields. The name search is matched against this collection using a Damerau-Levenshtein algorithm. This yields a percentage match for each name combination from 0-100%. Users may apply a threshold to filter the results which exceed, say, 50% similarity.

It is alot of ugly code.

Possible solutions I have considered is storing this name collection in the opposite of a “stop words” list.