Filter and route records to Elasticsearch based on document field values

perk61287 · December 17, 2019, 6:01pm

Hey Team,

I was going through the Elasticsearch(ES) connector 4.2 documentation. You have defined few use cases to filter the records and send it to ES but most of them were based on document Id.

For an instance I have a following record:

{
“field1”: “foo”,
“field2”: “bar”
…
}

I want to creating a ES type mapping where I want to filter the records based on values of field1 and field2. Something like this,

if (field1=foo && field2=bar) ==> send the record to ES index equal to “index_1”

Or Alternatively
Can we create a ES type mapping where I just create a regex for dynamically calculating the index name based on the combination of field1 & field2. So the above record should go to index equal to “index_foo_bar” ?

Just to clarify I want to avoid creating document id based on the combination of these fields.

david.nault · December 17, 2019, 8:55pm

Hi Parkash,

Welcome! Short answer is that deriving the index from the contents of the document is not supported.

It’s tricky because when a document is deleted, the connector does not know the contents of the document (it just knows the ID). If the index were derived from the document contents, the connector would not know which index to delete the document from.

Tracking this feature request as CBES-146 in case we find a way to do this in the future.

Thanks,
David

perk61287 · December 18, 2019, 11:13am

Ah, I see. Thanks for the quick response ! If you can accommodate this request in upcoming releases, it would be great. For whatever it’s worth, may be the connector can search all the indexes in ES and delete the document. Of course, this might have some performance implications but may give user the flexibility. Connector may invoke this option only when it doesn’t find a mapping b/w document Id and ES index, kind of a fallback option.

For the time being,

If I create document ids as a combination of filed1 & field2 plus some unique values say UUID, e.g, id_foo_bar_8e9df43a-2183-11ea-978f-2e728ce88125, can you let me know how can I create a dynamic mapping so that any document id starting with “id_ foo_bar” goes to ES index equal to “index_foo_bar”
id_foo_bar_8e9df43a-2183-11ea-978f-2e728ce88125 ==> “index_foo_bar”
id_abc_xyz_137c2d52-2184-11ea-978f-2e728ce88125 ==> “index_abc_xyz”

Also, would there be any performance impact on Couchbase or Couchbase ES Connector having these kind of lengthier and text based document ids ?

perk61287 · December 20, 2019, 5:29pm

Can someone please reply to above question ?

david.nault · December 20, 2019, 9:05pm

Hi Parkash,

You can do something similar to that, but the current limitation is that the index name must be present in the Couchbase document ID so that it can be extracted by a capturing group in a regular expression. There’s an example of this in the sample config included with the connector:

[[elasticsearch.type]]
  # Index can be inferred from document ID by including a capturing group
  # named "index". This example matches IDs that start with one or more
  # characters followed by "::". It directs "user::alice" to index "user",
  # and "foo::bar::123" to index "foo".
  regex = '(?<index>.+?)::.*'

It’s quite common to include UUIDs in document IDs. Couchbase supports IDs up to 250 bytes, and your examples are well under that limit. As to how much it affects performance, my expectation is that it would be hard to detect the difference, but that’s just a guess.

Thanks,
David

Topic		Replies	Views
Couchbase Elastic Connector filter on different field than ID Elasticsearch Connector	1	959	April 13, 2022
Feature request: Elasticsearch Connector should be able to filter documents Elasticsearch Connector	10	2054	June 9, 2020
Couchbase elasticsearch plugin Elasticsearch Connector	1	2321	October 20, 2015
Feature Request: Filter document fields Elasticsearch Connector	3	1332	August 3, 2021
Elastic search replication and dynamic index Other	0	2297	June 28, 2016

Filter and route records to Elasticsearch based on document field values

Related topics