FTS seems to have a bug

I have an index built with Full Text Search:

{
  "type": "fulltext-index",
  "name": "i_fts",
  "uuid": "63f420271d739666",
  "sourceType": "gocbcore",
  "sourceName": "img_fts",
  "sourceUUID": "d10305c69fc1d9fcd0079d0ebffb211c",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "scope.collection.type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "_default._default": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "np": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "index": true,
                  "name": "np",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "pp": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "index": true,
                  "name": "pp",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "s": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "s",
                  "type": "number"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": {}
}

the document format is:

{
  "pp": "lazy brown fox",
  "np": "",
  "s": 7421065288944210239,
  "m": {
    "t": "fts"
  }
}

if I search for fox or lazy brown I am getting results, but in a document where the JSON is like this:

{
  "pp": "honey bunny",
  "np": "",
  "s": 643734567345634576,
  "m": {
    "t": "fts"
  }
}

If I search for the s field “643734567345634576” I am getting the document, so, it seems that the document is indexed.

When I search (in the UI or in the PHP SDK ) for honey, or bunny or honey bunny I am not getting any results.

What could be the problem and how I can debug further?

I have discovered that if I use fuzziness: 1 I am getting some results, which is strange because I search with the exact term (copy/paste) from the document.

I am 100% sure that what I search is exactly what is in document I am looking for.

Is there a way to debug this further?

Hi @flaviu! The behavior you observe is because of the analyzer usage in play.

First, let’s take a look at how the english (en) analyzer works. Here are the enanalyzer interpretations for the following words:

  1. lazy → lazi
  2. brown → brown
  3. fox → fox
  4. honey → honei
  5. bunny → bunni

Note that you’re using the en analyzer to index field pp AND are also including the content into the _all field (to support field agnostic searching). So the generated tokens in the bullets above are what are indexed in association with the pp and _all fields.

Now when you perform a field scoped search, like pp:honey, the search engine will …

  1. fetch the analyzer for the field pp
  2. apply the analyzer (en) over the search criteria, in this case honey, which generates the tokens: [“honei”]
  3. searches for the generated tokens, so this should see the expected hits

However, if you DO NOT field scope your search, so you search just for honey, the search engine will …

  1. detect that a field is NOT associated with your search
  2. so it will fetch the default_analyzer from your index definition which is standard
  3. applying the standard analyzer over the search criteria will generate the following tokens: [“honey”]
  4. searches for the generated tokens, but “honey” is not an indexed token, so no hits

To test out how the analyzers work over text, here’s a link to a playground we host - https://bleveanalysis.couchbase.com/analysis

As for your situation, my recommendation (which you’ve probably already guessed by now) is to use the en analyzer for the default_analyzer in your index definition (instead of standard). Here’s your updated index definition …

{
  "type": "fulltext-index",
  "name": "i_fts",
  "uuid": "",
  "sourceType": "gocbcore",
  "sourceName": "img_fts",
  "sourceUUID": "d10305c69fc1d9fcd0079d0ebffb211c",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "scope.collection.type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "en",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "_default._default": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "np": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "index": true,
                  "name": "np",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "pp": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "en",
                  "include_in_all": true,
                  "index": true,
                  "name": "pp",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "s": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "s",
                  "type": "number"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 15
    }
  },
  "sourceParams": {}
}

Hope this helps!

2 Likes