There isn’t such a function built in. UDFs are not indexable. Off-hand I don’t have a solution for such a requirement.
Your example seems to be more prefixes than suffixes. (For suffixes I’d expect the results to be “john.”,“is john”,“name is john”,“my name is john”.)
What’s the use case? Are you looking to implement auto-completion? (In which case LIKE pattern matching will be simplest on a basic field index.)
@dh Thanks for reply.
Yes, it is for real time autocomplete.
The requirement is, for example, if they want to search a record which has “my name is john.”, they said that they will search by “name is” or “john” or “my nam”. But not by “y name” or “onh”.
Unfortunately the index by SUFFIXES(“my name is john.”) is too big. Bigger than 160MiB per fields.
So I just want to create the array index like [“my”, “my name”, “my name is”, “my name is jonh.”]. Or the other way is OK: [“my name is jonh.”, “my name is”, “my name”, “my” ]
Is there any solution or recommendation?
Many thanks in advance
@jinchong as @vsr1 noted, FTS is probably the most likely to have an appropriate solution.
( You could build an array from SPLIT() results and index that but some brief experimentation suggests limited possible evaluation via an index. (SPLIT() rather than TOKENS() so as to retain order.) At this point it is unlikely to meet your needs so I wouldn’t pursue it further.
That said, if the words supplied could be in any order - i.e. match both “my name is john” and “john is my name” with [“is”, “john”] as the filter (note: individual tokens, not single string - but this could be generated in the statement) - then this could potentially work. Index may still be large though.
If they would always provide the prefix, then LIKE would be the way to go but with leading wild-cards there is no index evaluation. )
I’d recommend looking into creating a custom analyzer using the shingle token filter with separator being white space, with a min and max setting.
This should help tokenize “my name is john” into …
my
name
my name
is
name is
my name is
john
is john
name is john
my name is john
For auto complete we typically recommend the ngram tokenfilter which does the same as above at a character level. You can test out all this analyzer capability here - https://bleveanalysis.couchbase.com/analysis