How to include all words to the search index, such as "from" "to"

Hi all,
I set up FTS and it is required to search by prepositions, such as “from” “to”.
From my reading, “stop_en” filter removes words from token result based on dictionary, such as “is” “a”.
So, I made a custom analyzer with just Tokenizer: Unicode and Token Filters: to_lower. Then I applied this custom analyzer to a text searching field.
But I still failed to search by “from” or “to”.

Is there any way to stop FTS removing the words? or any way to add some words what I want to include?

Many thanks in advance.
Jin

@jinchong
thanks for reaching out.

I wanted to start by clarifying your question to make sure we’re on the same page.
When you mean “failed to search by …”, did you mean that the post analysis text still did not contain “from” and “to” i.e. even with your custom analyser, those words were being filtered out?

I used the analysis wizard - https://bleveanalysis.couchbase.com/analysis

  1. I created a custom analyser based on your specs and I find that the words aren’t filtered out:

  2. I used the default english analyser and find that the words are getting filtered out(as you mentioned):

1 Like

@aditi.ahuja 's points are accurate.
@jinchong Did you assign the custom analyzer you created to the field or mapping you want analyzed? Just creating one will not suffice. If you did assign it to the field, but are not specifying the field in your query that could also be a problem. In situations such as these sharing your exact index definition and search request can help us help you better.

1 Like

Hi @aditi.ahuja and @abhinav
Thanks for your reply.
Yes, @aditi.ahuja , I confirmed it works with the analysis wizard but I don’t know why my FTS doesn’t work.

I assigned the custom analyzer to that field and I mentioned this field at query. Please refer to the below FTS settings and query: FYI, the target field name is “value” and the custom analyzer is “all-words”.

{
  "type": "fulltext-index",
  "name": "test-value",
  "uuid": "223e9e7211a5670e",
  "sourceType": "gocbcore",
  "sourceName": "my-couchbase",
  "sourceUUID": "46d8fdb6a3b6c66d824b9bbf2598ff24",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "scope.collection.type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {
        "analyzers": {
          "all-words": {
            "token_filters": [
              "to_lower"
            ],
            "tokenizer": "unicode",
            "type": "custom"
          }
        }
      },
      "default_analyzer": "all-words",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "trans.test-trans": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "value": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "all-words",
                  "index": true,
                  "name": "value",
                  "store": true,
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": {}
}

and Query is :

SELECT * FROM `my-couchbase`.`trans`.`test-trans` AS t1
WHERE SEARCH(t1, { "explain": false, "query": { "match": "from", "field": "value" } })

But this query still fails to find “John was from Sydney.”.
Please help me to find what I missed?

Many thanks and looking for your advice.
Jin

Thanks for sharing these details @jinchong .

Your index definition looks good.

The only catch is that you’re requesting for content that the search index does not really hold from your N1QL query - I’m referring to select *. This will result in what we call a KV fetch phase in the N1QL query execution, which will obtain documents from KV for the hits generated by your search index. These documents are subject to revalidation and given that the query has no context of the analyzer rules to use, this revalidation fails.

You can quickly verify this by replacing the * with meta().id which would work in obtaining just the documents IDs, where revalidation is not required because KV fetches are avoided -

SELECT meta().id FROM `my-couchbase`.`trans`.`test-trans` AS t1
WHERE SEARCH(t1, { "explain": false, "query": { "match": "from", "field": "value" } })

To get the select * to work, you will simply need to specify the index to obtain the custom analyzer rules from, required during the N1QL revalidation -

SELECT * FROM `my-couchbase`.`trans`.`test-trans` AS t1
WHERE SEARCH(t1, { "explain": false, "query": { "match": "from", "field": "value" } }, {"index": "test-value"})

I believe this last query is what you’re looking for.

1 Like

Thanks a lot @abhinav

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.