How does indexing the documents impact the other async operations in couchbase.
Thu, 02/28/2013 - 16:25
Greetings,
I have couple of questions regarding indexing in Couchbase buckets.
1) How is indexing related/proportional to the number of documents in the bucket?
2) Whenever the indexing happens (on a regular time interval or on stale=false when querying views), does Couchbase index ALL the documents in the bucket or only the documents that are recently added/updated/deleted?
Thanks for the input in advance!
Shivang
In Couchbase 2.0 the indexer and Couchbase engine are separate components. Let's go through two scenarios of how indexing would take place and I think that should give a good answer to your questions.
In the first scenario, let's say you just created a new Couchbase cluster and have not put any documents in the database yet, but you have created a view already. When you add your first document to Couchbase that document must hit disk before it is eligible to be read by the indexer. When the document is written it will receive a unique sequence number. These sequence numbers start at 0 and increase monotonically after each new or updated document is written to disk. This sequence number is also passed to the indexer and this is how the indexer knows whether or not it has indexed the latest documents written to disk. So in this example, when your cluster contained no documents, but had a single view created the indexer had sequence id 0 as it's latest document indexed. If you called the view with stale=false at this point it would return an empty result. When you add your first document and it is written to disk the indexer is then notified that the latest sequence id is 1. If you call stale=false now then the indexer will index all of the items that it has not indexed yet, so in this case it will index the single document you added. Indexing is incremental so you only index the new documents that have been added since the last time the indexer is run.
Now for the second scenario lets say you've been running your cluster for a while and you have thousands of documents and you create a new view. In this case the indexer will start out with sequence number 0, but know that the latest sequence number is 100,000. In this case the indexer will have to index everything up until it reads everything written before sequence number 100,000.
So now to address your questions.
1) Indexing is incremental. Couchbase will only index documents that have not passed through a views index once. Updating an index with millions of documents in no more complex than updating an index with hundreds of documents.
2) Only the recently added documents are indexed.