Details
-
Type:
Improvement
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.0.1
-
Fix Version/s: None
-
Component/s: documentation
-
Security Level: Public
-
Labels:None
Description
Hi Karen,
Here are 2 sections to incorporate into the manual somewhere. Feel free to edit the text as necessary, and if any of this doesn't make sense let me know.
marty
Understanding the Couchbase plugin for Elasticsearch Performance in Practice
----------------------------------------------------------------------------
The Couchbase plugin for Elasticsearch uses XDCR for the transport of data. One of the
most important parameters controlling the performance of XDCR is "xdcrMaxConcurrentReps".
This value represents the maximum number of replication operations that will take place
concurrently from each node in the Couchbase cluster and it defaults to 32.
In practice this means if I'm replicating from a 5 node Couchbase cluster to a 1 node
Elasticsearch cluster I may have up to 160 concurrent replications targeting a single
Elasticsearch node. Each replication may require multiple TCP connections and this
can end up overwhelming the Elasticsearch node.
Once an Elasticsearch node is overwhelmed a variety of errors may occur. Some of them
are:
Error replicating vbucket 7:
{badmatch, {error,all_nodes_failed,
<<"Failed to grab remote bucket info from any of known nodes">>}}
Error replicating vbucket 7:
{error,{error,timeout}}}
These errors occur because Couchbase is unable to communicate with Elasticsearch in a
reasonable amount of time. XDCR can recover from these types of errors, but your
replication may take longer to complete, or operate with higher latency because these
operations must be retried at a later time.
In circumstances such as this, it may help to lower the "xdcrMaxConcurrentReps" so that
the total number of concurrent replications for the whole cluster is a more reasonable
number.
Initial Elasticsearch Indexing of an Existing Couchbase Bucket
--------------------------------------------------------------
Often times you have an existing Couchbase bucket with a large number of documents in
production. When you initially start to index this data with Elasticsearch a large
number of documents will be transferred in bulk. While this should work with the default
settings, there are some settings which can be tweaked in Elasticsearch to make this
initial indexing phase complete faster.
The "refresh_interval" setting in Elasticsearch controls how frequently newly indexed
items become available in search results. During a bulk load, we trade-off access to the
newly indexed items, in exchange for faster overall indexing time.
Full details about disabling and reenabling index refresh, see this section of the
Elasticsearch guide:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
Here are 2 sections to incorporate into the manual somewhere. Feel free to edit the text as necessary, and if any of this doesn't make sense let me know.
marty
Understanding the Couchbase plugin for Elasticsearch Performance in Practice
----------------------------------------------------------------------------
The Couchbase plugin for Elasticsearch uses XDCR for the transport of data. One of the
most important parameters controlling the performance of XDCR is "xdcrMaxConcurrentReps".
This value represents the maximum number of replication operations that will take place
concurrently from each node in the Couchbase cluster and it defaults to 32.
In practice this means if I'm replicating from a 5 node Couchbase cluster to a 1 node
Elasticsearch cluster I may have up to 160 concurrent replications targeting a single
Elasticsearch node. Each replication may require multiple TCP connections and this
can end up overwhelming the Elasticsearch node.
Once an Elasticsearch node is overwhelmed a variety of errors may occur. Some of them
are:
Error replicating vbucket 7:
{badmatch, {error,all_nodes_failed,
<<"Failed to grab remote bucket info from any of known nodes">>}}
Error replicating vbucket 7:
{error,{error,timeout}}}
These errors occur because Couchbase is unable to communicate with Elasticsearch in a
reasonable amount of time. XDCR can recover from these types of errors, but your
replication may take longer to complete, or operate with higher latency because these
operations must be retried at a later time.
In circumstances such as this, it may help to lower the "xdcrMaxConcurrentReps" so that
the total number of concurrent replications for the whole cluster is a more reasonable
number.
Initial Elasticsearch Indexing of an Existing Couchbase Bucket
--------------------------------------------------------------
Often times you have an existing Couchbase bucket with a large number of documents in
production. When you initially start to index this data with Elasticsearch a large
number of documents will be transferred in bulk. While this should work with the default
settings, there are some settings which can be tweaked in Elasticsearch to make this
initial indexing phase complete faster.
The "refresh_interval" setting in Elasticsearch controls how frequently newly indexed
items become available in search results. During a bulk load, we trade-off access to the
newly indexed items, in exchange for faster overall indexing time.
Full details about disabling and reenabling index refresh, see this section of the
Elasticsearch guide:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
Well I actually started to include the commands in the text as well, but there also several other related options. I'm just not sure we want to reproduce all that text in our manual. Given how frequently their APIs change I can only assume that the configuration settings and related documentation changes as well. Seems to me like if our users want to support document expiration, they have to read this section in the Elasticsearch guide.
----------------
I also just remembered that we'll need a section documenting how document expiration works.
In short, users will need to manually enable the "_ttl" field in their Elasticsearch index mapping.
See: http://www.elasticsearch.org/guide/reference/mapping/ttl-field.html
If they don't do this, then we end up relying on XDCR passing the document deletes across the wire, and this will happen substantially later than the TTL value.