Filter and route records to Elasticsearch based on document field values

Hey Team,

I was going through the Elasticsearch(ES) connector 4.2 documentation. You have defined few use cases to filter the records and send it to ES but most of them were based on document Id.

For an instance I have a following record:

{
“field1”: “foo”,
“field2”: “bar”

}

I want to creating a ES type mapping where I want to filter the records based on values of field1 and field2. Something like this,

if (field1=foo && field2=bar) ==> send the record to ES index equal to “index_1”

Or Alternatively
Can we create a ES type mapping where I just create a regex for dynamically calculating the index name based on the combination of field1 & field2. So the above record should go to index equal to “index_foo_bar” ?

Just to clarify I want to avoid creating document id based on the combination of these fields.

Hi Parkash,

Welcome! Short answer is that deriving the index from the contents of the document is not supported.

It’s tricky because when a document is deleted, the connector does not know the contents of the document (it just knows the ID). If the index were derived from the document contents, the connector would not know which index to delete the document from.

Tracking this feature request as CBES-146 in case we find a way to do this in the future.

Thanks,
David

Ah, I see. Thanks for the quick response ! If you can accommodate this request in upcoming releases, it would be great. For whatever it’s worth, may be the connector can search all the indexes in ES and delete the document. Of course, this might have some performance implications but may give user the flexibility. Connector may invoke this option only when it doesn’t find a mapping b/w document Id and ES index, kind of a fallback option.

For the time being,

If I create document ids as a combination of filed1 & field2 plus some unique values say UUID, e.g, id_foo_bar_8e9df43a-2183-11ea-978f-2e728ce88125, can you let me know how can I create a dynamic mapping so that any document id starting with “id_ foo_bar” goes to ES index equal to “index_foo_bar”
id_foo_bar_8e9df43a-2183-11ea-978f-2e728ce88125 ==> “index_foo_bar”
id_abc_xyz_137c2d52-2184-11ea-978f-2e728ce88125 ==> “index_abc_xyz”

Also, would there be any performance impact on Couchbase or Couchbase ES Connector having these kind of lengthier and text based document ids ?

Can someone please reply to above question ?

Hi Parkash,

You can do something similar to that, but the current limitation is that the index name must be present in the Couchbase document ID so that it can be extracted by a capturing group in a regular expression. There’s an example of this in the sample config included with the connector:

[[elasticsearch.type]]
  # Index can be inferred from document ID by including a capturing group
  # named "index". This example matches IDs that start with one or more
  # characters followed by "::". It directs "user::alice" to index "user",
  # and "foo::bar::123" to index "foo".
  regex = '(?<index>.+?)::.*'

It’s quite common to include UUIDs in document IDs. Couchbase supports IDs up to 250 bytes, and your examples are well under that limit. As to how much it affects performance, my expectation is that it would be hard to detect the difference, but that’s just a guess.

Thanks,
David