we have a use case which requires transferring only some of the fields in the document to the ES, not the whole thing. For example, we have 30 fields in the document but we only want 4 of them to be transferred because they’re the only ones we need. As we know, there is no way of doing that, for now.
After we examined the source project, we actually implemented this feature using TransformerChain. Then, we add the fields name to the .toml file and only they are transferred to the ES.
thanks for your response. Your explanation sounds logical and doable to me. Using ingest pipelines can serve the same purpose.
But I only have one question. If we use ingest pipelines, all the documents with all the fields will be transmitted through the network between couchbase and elasticsearch. It won’t be a problem for a few thousand documents for sure but, do you think it will cause some performance degradation and latencies when working with millions of documents?
That’s a good insight. I would not expect performance or latency to degrade significantly. The network is typically not the bottleneck for writing to Elasticsearch; the work of indexing after the document is received is orders of magnitude slower. Of course, the only way to be sure is to measure.
Another alternative, in case your bandwidth is metered and you’re paying by the byte, would be to use a Couchbase Eventing Function to filter the documents before they even reach the connector. The function could write the filtered documents to a separate bucket.