Feature Request: Filter document fields

Emre_Kumas · July 26, 2021, 7:48am

Hi,

we have a use case which requires transferring only some of the fields in the document to the ES, not the whole thing. For example, we have 30 fields in the document but we only want 4 of them to be transferred because they’re the only ones we need. As we know, there is no way of doing that, for now.

After we examined the source project, we actually implemented this feature using TransformerChain. Then, we add the fields name to the .toml file and only they are transferred to the ES.

Example:

But this feature only remains in our current connector version. Which means whenever you upgrade the connector version, we can’t upgrade our project directly because it may fail without noticing.

We wonder what you think about this request because we heard lots of teams requiring this feature.

david.nault · July 27, 2021, 8:23pm

Hi Emre,

Adding new config options is a slippery slope, since there are so many possible ways to transform a document. The alternative we recommend is to use Elasticsearch ingest pipelines.

Ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data.

Arun Vijayraghavan wrote a blog article showing how it’s done: Using Elasticsearch Connector with Ingest Node Pipeline - The Couchbase Blog

Thanks,
David

Emre_Kumas · August 3, 2021, 12:19pm

Hi David,

thanks for your response. Your explanation sounds logical and doable to me. Using ingest pipelines can serve the same purpose.

But I only have one question. If we use ingest pipelines, all the documents with all the fields will be transmitted through the network between couchbase and elasticsearch. It won’t be a problem for a few thousand documents for sure but, do you think it will cause some performance degradation and latencies when working with millions of documents?

david.nault · August 3, 2021, 7:02pm

That’s a good insight. I would not expect performance or latency to degrade significantly. The network is typically not the bottleneck for writing to Elasticsearch; the work of indexing after the document is received is orders of magnitude slower. Of course, the only way to be sure is to measure.

Another alternative, in case your bandwidth is metered and you’re paying by the byte, would be to use a Couchbase Eventing Function to filter the documents before they even reach the connector. The function could write the filtered documents to a separate bucket.

Thanks,
David

Topic		Replies	Views
Feature request: Elasticsearch Connector should be able to filter documents Elasticsearch Connector	10	2075	June 9, 2020
Integration with Elasticsearch Best Practices Other	3	3959	September 16, 2017
Couchbase connector to elastic search - multiple indexes, pipeline problems Elasticsearch Connector	5	1853	February 27, 2021
Copying of documents from Couchbase to ElasticSearch before indexing Other	2	2795	July 27, 2013
Couchbase Elastic Connector filter on different field than ID Elasticsearch Connector	1	963	April 13, 2022

Feature Request: Filter document fields

Related topics