CB 4.5 N1QL Like '%word%' is slow

Using CB CE 4.1 and in the process of evaluating CB EE 4.5

Created secondary indexes on the attributed which are in where clause. However the select statement
select * from mybucket where firstname like ‘%bob%’ is very slow… I read that in some of the forum posts that you have fix in 4.5.1 or CB FT Developer Preview. Can I get a download link to try .

I don’t have information about possible performance improvements in newer versions, but generally speaking (not couch queries specifically), using a wildcard at the beginning is going to be relatively slow. Generally significantly slower than wildcards at the end.

I am trying this method and will see how it performs. Tokenizing and then doing wildcards at the end meet all your requirements with better performance. Only issue i have with this is tokenizing on the fly vs storing tokenized values directly in the index. I am slightly concerned about how this will perform but would be much more concerned about starting wildcard perfromance.

Anyways, here it is (tokenizing on whitespace, lowercasing for case insensitivity):

WHERE ANY t IN SPLIT(LOWER(`firstname`)) SATISFIES t LIKE 'bob%'

This will for example match “Jones Bobby”.

Hope this is helpful even though it doesn’t directly answer the question.

2 Likes

Thankyou @benbenwilde. Will try your suggestions and let you know.

Great suggestion @benbenwilde.

However, I don’t think SPLIT() will work. I think you need the new SUFFIXES() function in 4.5.1.

You can email @keshav_m using keshav at couchbase dot com and he can send you an early build of 4.5.1.

Hmm, SPLIT is working for me in the way I described. It splits on
whitespace right? I will have to check out the new SUFFIXES function.
Ben

SPLIT will not match ‘billybob’. SUFFIXES will.

Hi @geraldss I sent an email to @keshav_m to send 4.5.1 build download. I am still waiting for it. Can you please follow up on that

Hi @keshav_m, please send the download link.

@palmeti please send me a note again. email address is: keshav at couchbase dot com thanks.

Sent a note. Please check

Thanks @Keshav… Received the links

Regarding the new SUFFIXES function - want to make sure i understand it’s behavior correctly.

Is this correct?:
SUFFIXES(“car wash”) => [“car”, “ar”, “r”, “wash”, “ash”, “sh”, “h”]

Hi @benbenwilde,

Please see https://dzone.com/articles/a-couchbase-index-technique-for-like-predicates-wi

for a full explanation.

Thank you that blog entry is very helpful. I also didn’t realize that you could so easily index array entries produced by some function. I will still prefer using a method of tokenization vs suffixes since I only need to match prefixes for each token or word. This also requires far fewer index entries.

It is my view that queries equivalent to LIKE '%bob%' are usually unnecessary and would encourage avoiding it. In the case of “washington”, I’m not sure how often someone would search for it with “hington” or “ashing” or something like that, vs searching for “wash” or “washing” or some other prefix.

I think that prefix queries on a tokenized field are far more suitable for most cases. Next level would be to allow some level of fuzziness on top of it.

1 Like