Diacritic insensitive in like query

couchbase_fan · March 3, 2020, 3:10pm

Hi,

I wanna know how i can make one query which gives me all the results without distinguish between characters that contain diacritical marks and their non-marked counterpart.

if I want to find all the records that contains iphóne/iphoné/iphonè or iphone, looking for only with “%iphone%”. I want the query to be diacritic insensitive. But I have not seen anything about this in the documentation.

Thanks

vsr1 · March 3, 2020, 4:52pm

N1QL will not do fuzzy match. You can consider FTS

abhinav · March 3, 2020, 7:19pm

Using FTS would certainly help you in this situation, to perform a fuzzy search (edit distance). Note that for this you’ll first need to define a Full Text Search index over your couchbase bucket. You can either make the index use-case specific by defining it just for the field of interest or set up a dynamic default index which indexes everything.

Here’s our documentation to aid you in this …
https://docs.couchbase.com/server/6.5/fts/fts-creating-indexes.html

Once you have an index defined, your full text query could look like this …

curl -XPOST -H "Content-type:application/json"
http://<username>:<password>@<ip>:8094/api/index/<fts_index_name>/query -d
`
    {
        "query": {
            "match": "iphone",
            "field": <field_name>,
            "fuzziness": 1
        }
    }
`

This query above would match iphóne, iphoné, iphonè and iphone.

Optionally you can use fr (french) / es (spanish) / en (english) analyzers while defining your index if you think they’ll better assist your use case.

sreeks · March 4, 2020, 2:30am

Hi @couchbase_fan,

A bit more text specific and efficient/faster way of doing this would be using the “asciifolding” character filters.
You need to create a custom analyser from the FTS web console like below. This one contains only the minimum parts for this demo.

And use this custom analyser for the field to be indexed like below.

This would make all those diacritic variations searchable.
Please note this asciifolding character filter is available on 6.5.0 release.

The problem with edit distance (fuzzy query) based approach would be, its won’t scale when we have more diacritic characters present(>2, which is very normal ) in a search text, and it won’t result in the fastest query time performance.

Cheers!

couchbase_fan · March 9, 2020, 8:49am

thanks, I’m trying the FTS, it’s works.

Topic		Replies	Views
FTS search on a field that may contain diacritic symbols Full Text Search fts	5	1314	March 30, 2022
Couchbase FTS case insensitive search Full Text Search dot-net	3	1625	November 8, 2019
FTS case insensitive Full Text Search	1	1910	February 27, 2019
Simple solution for searching in Couchbase Server? Couchbase Server 40-rc	4	2953	April 11, 2016
Full text search configuration for French names with accents Couchbase Server	10	2321	November 12, 2019

Diacritic insensitive in like query

Related topics