I’m pretty new here with CBLite, I inherited a code that heavily uses FTS and all looked good until we discovered if we search for some words containing dash “-” like “A-Something” the query execution takes 4-5 times longer than usual. I couldn’t figure out why this happened and how I could find a solution on my end (mobile CBLite for the iOS platform).
My second question is kinda relevant to this, the search for some words that contain parenthesis “(” NOT work at all, the result is always empty for those items.
It’s been some years since I last worked on the FTS code in CBL, but here are some things that might help:
- I don’t think the tokenizer (the thing that breaks strings into words) considers a “-” part of a word. So “foo-bar” would be indexed as the two words “foo” and “bar”, and a search for “foo-bar” would just be looking for those two words. I could be wrong though; you could so some experiments to make sure.
- If the above is true, then “A-Something” would be tokenized as “A” and “Something”, and “A” would be ignored when indexing because it’s a “stop-word”, one of the super common words like “the” and “he” that get tossed out. That means the query would be searching for just “Something”.
- Parentheses are definitely not considered part of a word. So parens in a document string are ignored. Parens in a query will perform grouping of logical expressions as described in the SQLite FTS query language docs.
Hi Jens, many thanks for the prompt reply.
Regarding using “-” as part of the word, it makes total sense to me, but there is one case I found that I’m not sure how I can explain the FTS behavior for it:
One of my search items has a name like “a-a-s” when I enter “a-a-” the search returns no result that makes sense based on what you mentioned above, but when I enter the last char that is “s” (a-a-s), it exactly returns my item! I was expecting the first two “a” get ignored and FTS, returns for all items containing/starting with “s”.