FTS index based on condition

Is it possible to have conditions in FTS index? For example I’m indexing documents of type=‘hotel’ but I only want to index hotels that have a specific field, i.e. hotels that have '“freeParking”: true" in their json should be indexed, hotels that doesn’t have that field should be skipped.

Is this possible?

1 Like

Hi lenix,

FTS does not have a feature to have indexing-time filtering on some other field value at the moment.
As of now, whole indexing works based on the document type.

thanks,
Sreekanth

So what would be other alternatives to achieve what I need? I can see 2 options:

  1. Do the filtering after retrieving the result from FTS
  2. Switch to N1QL, but I’m not sure how can I achieve same search behavior of FTS using N1QL… can you advise on this?

Thanks

You could always make this as one of the parameters of your FTS query, you can form a conjunct (AND query) with this extra search requirement, for freeParking:“true”
eg:
{
“conjuncts”:[
{“field”:"field_name, “match”: “search_term”}, // your default search requirements goes here
{“field”:“freeParking”, “bool”: true}
]
}

refer: https://docs.couchbase.com/server/6.0/fts/fts-query-types.html

We also would like to see conditional index-time filtering as @lenix requested.
Our use case is similar: Our customers can store arbitrary objects / documents. We would like to keep our FTS index small by flagging the documents with something like “indexable=true”.

Going one step further:
We would like to specify the fields to index based per document type based on a regEx: for example just index fieldnames starting with a special prefix e.g. “identifier_” (but do not index fields starting with “longdescription_” which are large textfields.

Main purpose of both features:
Have better control over FTS index size, by indexing only what you need.

Hi,

If you have the flexibility of adding fields/editing the documents, then things become easier.
One can carefully craft this new field tagging in the document in such a way that it can act as the type field for the document. We may use any field for defining the "type " of the document while creating type mapping.
ref:https://docs.couchbase.com/server/6.0/fts/fts-creating-indexes.html

eg: type fields getting inserted could be something like this,

“indexable”=“type1”
“indexable”=“type2”

Then you need to specify document type identifier json field as “indexable” in the index definition and specify explicit type mapping for both type1 and type2.
While defining a document type mapping, you have the option of cherrypicking the fields to index along with further storage tuning on each field indexed based on the query requirements later.
ref: https://blog.couchbase.com/full-text-search-indexing-best-practices/

Now during indexing, all the documents with field “indexable” with values “type1” & “type2” gets filtered, and it will get custom field indexed based on the type mapping you defined for type1 and type2.

regards,21%20AM
sreekanth

1 Like

Use N1QL+FTS using CURL as option

  1. Use Example 4 and produce document keys
  2. Use Use 1 as LEFT side of JOIN and right side use N1QL fetch (Example 3) and filter on right side key space and project it.

In next release this will make simpler. cc @keshav_m

See some of the examples of how you can do this currently at:
https://blog.couchbase.com/developer-release-curl-n1ql/
https://blog.couchbase.com/curl-comes-n1ql-querying-external-json-data/

If your search requirements are SIMPLE, you can try to use the TOKENS() in N1QL and index it.

Would it theoretically be possible to specify the indexable fields as an array in the document?

That would allow us to have fine-grained control per document.

From a very higher level, Yes.
As I see that just like indexing of any array field already existing in the document.
Its upto the user/document creator to put what all they need be in the array field.

Ah ok. But just to clarify: I was talking about an array of field-names.

e.g.
DocA:
indexableFields : ['sku', 'price', 'description']

Would mean that only those 3 fields of the whole document should be put into the FTS.

DocB:
indexableFields : ['sku']

Means that only the sku should be indexed in DocB.

Were you referring about the same thing?

I wasn’t sure if you might mean different e.g. to put all indexable content in a single array.
But that I think would have the disadvantage of duplicate fields and a larger document.

hm…Actually I didn’t understand the rationale behind your original question of putting fields into an array.
As I don’t see a problem in getting all the required fields indexed than going over this array based custom approach.

Then what goes into the index really depends on the mapping you define.
If you have the right mapping, whatever is there in indexableFields: . should get indexed.

Sorry for the confusion. I am still wrapping my head around this. I think my question is a little bit like that too: FTS mapping strategy for dynamic fields

I guess in our case we basically need a lot of different type mappings then, because each customer can manage their own “schema” for their documents. Imagine it like our customers can upload different Excel-Spreadsheets and each Spreadsheet has different header columns. For each Spreadsheet the customer should be able to define which header columns are FTS indexable and can configure this in our application. So every time this configuration is added / modify we would need to add/update a Type Mapping. This is how I understand it. So it can be come veeeerry many different type Mappings. Worst case one type mapping per Excel File. And there will be thousands…and more.

  1. Is there a limit on the number of Type Mappings which can be created in an index?

  2. Is it possible to create / update / delete the Type Mappings with the Java SDK or the REST API? (this shows how to add an index. How can I add more type mappings?

Ah…i see…
Having umpteen mapping isn’t the solution there,

I think, now I understood your original array question.
The controlling knob of what to index and what not - always specified over some type mapping. There is no way you can specify that over any other mediums/objects within the document.

Irrespective of the user selections, you may keep adding those values into a searchable sub object (indexable:{}) in your document and try dynamic mapping for that indexable sub object. But there could be some duplication here.

1 Like

It’s been long since I’ve last visited this thread but I just tried your conjunction suggestion and it works fine. One thing though, what If I want to add a condition if freeParking is missing from the document… For example I want to make a FTS query that only searches for freeParking: false or freeParking is missing. Can you tell me how to do the field is missing part?

Hey @lenix, you would want to use a boolean query to perform searches for freeParking: false or when freeParking is missing. The must_not clause under it will do it for you.
Here’s an example …

{
	"query": {
		"must_not": {
			"disjuncts": [{
				"field": "free_parking",
				"bool": true
			}]
		}
	}
} 

So this will match all documents that do not have a field “free_parking” that has a value true.

1 Like

@abhinav - I found this post seeking a solution highly similar to this one. I need to filter on objects in which a field “resolved”: false OR if “resolved” is missing.

When I run the query below, the results still return objects for which “resolved”: true. I want only documents in which “resolved” is missing OR (in very rare cases) “resolved”: false.

specs:
Nodejs SDK 2.6.9
Couchbase Server 6.04

{
    "query": {
        "must": {
            "conjuncts": [
                {
                    "start": "2020-03-25T04:00:00.000Z", 
                    "inclusive_start": true, 
                    "end": "2020-04-08T04:00:00.000Z", 
                    "inclusive_end": true, 
                    "field": "completed"
                }
            ]
        }, 
        "must_not": {
            "disjuncts": [
                {
                    "bool": true, 
                    "field": "resolved"
                }
            ]
        }
    }
}

Thanks JG

@abhinav - nevermind. As soon as I posted this I was reminded of “occam’s razor” and checked the index. I had failed to set the field data-type to boolean. I guess that helps, lol!

My apologies

JG