Copying of documents from Couchbase to ElasticSearch before indexing
I study the interaction of Couchbase to ElasticSearch and I have a question. Is there a way to avoid a full copy of the documents from CB to ES before indexing? Does ES have the capability to store only indexes and not entirely documents? Indeed, in this case the result is a duplication of documents that looks pretty sad.
By default in elasticsearch, the _source (the document one indexed) is stored. This means when you search, you can get the actual document source back. Moreover, elasticsearch will automatically extract fields / objects from the _source and return them if you explicitly ask for it (as well as possibly use it in other components, like highlighting).
You can specify that a specific field is also stored. This menas that the data for that field will be stored "on its own". Meaning that if you ask for "field1" (which is stored), elasticsearch will identify that its stored, and load it from the index instead of getting it from the _source (assuming _source is enabled).
When do you want to enable storing specific fields? Most times, you don't. Fetching the _source is fast and extracting it is fast as well. If you have very large documents, where the cost of storing the _source, or the cost of parsing the _source is high, you can explicitly map some fields to be stored instead.
Note, there is a cost of retrieving each stored field. So, for example, if you have a json with 10 fields with reasonable size, and you map all of them as stored, and ask for all of them, this means loading each one (more disk seeks), compared to just loading the _source (which is one field, possibly compressed).