CB 4.5 N1QL Like '%word%' is slow

palmeti · August 31, 2016, 2:42pm

Using CB CE 4.1 and in the process of evaluating CB EE 4.5

Created secondary indexes on the attributed which are in where clause. However the select statement
select * from mybucket where firstname like ‘%bob%’ is very slow… I read that in some of the forum posts that you have fix in 4.5.1 or CB FT Developer Preview. Can I get a download link to try .

benbenwilde · August 31, 2016, 3:39pm

I don’t have information about possible performance improvements in newer versions, but generally speaking (not couch queries specifically), using a wildcard at the beginning is going to be relatively slow. Generally significantly slower than wildcards at the end.

I am trying this method and will see how it performs. Tokenizing and then doing wildcards at the end meet all your requirements with better performance. Only issue i have with this is tokenizing on the fly vs storing tokenized values directly in the index. I am slightly concerned about how this will perform but would be much more concerned about starting wildcard perfromance.

Anyways, here it is (tokenizing on whitespace, lowercasing for case insensitivity):

WHERE ANY t IN SPLIT(LOWER(`firstname`)) SATISFIES t LIKE 'bob%'

This will for example match “Jones Bobby”.

Hope this is helpful even though it doesn’t directly answer the question.

palmeti · August 31, 2016, 3:50pm

Thankyou @benbenwilde. Will try your suggestions and let you know.

geraldss · August 31, 2016, 8:31pm

Great suggestion @benbenwilde.

However, I don’t think SPLIT() will work. I think you need the new SUFFIXES() function in 4.5.1.

You can email @keshav_m using keshav at couchbase dot com and he can send you an early build of 4.5.1.

benbenwilde · September 1, 2016, 12:03am

Hmm, SPLIT is working for me in the way I described. It splits on
whitespace right? I will have to check out the new SUFFIXES function.
Ben

geraldss · September 6, 2016, 4:47pm

SPLIT will not match ‘billybob’. SUFFIXES will.

palmeti · September 9, 2016, 1:17pm

Hi @geraldss I sent an email to @keshav_m to send 4.5.1 build download. I am still waiting for it. Can you please follow up on that

geraldss · September 9, 2016, 1:51pm

Hi @keshav_m, please send the download link.

keshav_m · September 11, 2016, 2:06am

@palmeti please send me a note again. email address is: keshav at couchbase dot com thanks.

palmeti · September 11, 2016, 3:55pm

Sent a note. Please check

palmeti · September 12, 2016, 1:58pm

Thanks @Keshav… Received the links

benbenwilde · September 21, 2016, 7:41pm

Regarding the new SUFFIXES function - want to make sure i understand it’s behavior correctly.

Is this correct?:
SUFFIXES(“car wash”) => [“car”, “ar”, “r”, “wash”, “ash”, “sh”, “h”]

geraldss · September 21, 2016, 7:50pm

Hi @benbenwilde,

Please see A Couchbase Index Technique for LIKE Predicates With Wildcard - DZone

for a full explanation.

benbenwilde · September 21, 2016, 8:17pm

Thank you that blog entry is very helpful. I also didn’t realize that you could so easily index array entries produced by some function. I will still prefer using a method of tokenization vs suffixes since I only need to match prefixes for each token or word. This also requires far fewer index entries.

It is my view that queries equivalent to LIKE '%bob%' are usually unnecessary and would encourage avoiding it. In the case of “washington”, I’m not sure how often someone would search for it with “hington” or “ashing” or something like that, vs searching for “wash” or “washing” or some other prefix.

I think that prefix queries on a tokenized field are far more suitable for most cases. Next level would be to allow some level of fuzziness on top of it.

Topic		Replies	Views
CB 6.0.0 community with 30+ Million objects - SELECT takes too long to execute SQL++ query , java , n1ql , server , index	1	905	August 18, 2020
N1QL Count() slow with WHERE SQL++ n1ql	15	10748	October 24, 2016
N1QL: performance and advanced features SQL++	3	1795	September 6, 2016
N1QL query 'like' takes long time to finish SQL++	1	467	October 8, 2021
Indexing 'WHERE x LIKE %val%' type of query SQL++ query , n1ql	2	1355	June 12, 2017

CB 4.5 N1QL Like '%word%' is slow

Related topics