Imagine you have the following design document:
{ "meta": {"id": "_design/test"}, "views": { "view1": { "map": "function(doc, meta) { emit(meta.id, doc.value); }" } } }
And the bucket only has 2 documents, document
doc1 with JSON value {"value":
1}, and document doc2 with JSON value
{"value": 2}, you query the view initially with
stale=false and
include_docs=true and get:
shell> curl -s 'http://localhost:9500/default/_design/test/_view/view1?include_docs=true&stale=false' | json_xs { "total_rows" : 2, "rows" : [ { "value" : 1, "doc" : { "json" : { "value" : 1 }, "meta" : { "flags" : 0, "expiration" : 0, "rev" : "1-000000367916708a0000000000000000", "id" : "doc1" } }, "id" : "doc1", "key" : "doc1" }, { "value" : 2, "doc" : { "json" : { "value" : 2 }, "meta" : { "flags" : 0, "expiration" : 0, "rev" : "1-00000037b8a32e420000000000000000", "id" : "doc2" } }, "id" : "doc2", "key" : "doc2" } ] }
Later on you update both documents, such that document
doc1 has the JSON value {"value":
111111} and document doc2 has the
JSON value {"value": 222222}. You then query
the view with stale=update_after (default) or
stale=ok and get:
shell> curl -s 'http://localhost:9500/default/_design/test/_view/view1?include_docs=true' | json_xs { "total_rows" : 2, "rows" : [ { "value" : 1, "doc" : { "json" : { "value" : 111111 }, "meta" : { "flags" : 0, "expiration" : 0, "rev" : "2-0000006657aeed6e0000000000000000", "id" : "doc1" } }, "id" : "doc1", "key" : "doc1" }, { "value" : 2, "doc" : { "json" : { "value" : 222222 }, "meta" : { "flags" : 0, "expiration" : 0, "rev" : "2-00000067e3ee42620000000000000000", "id" : "doc2" } }, "id" : "doc2", "key" : "doc2" } ] }
The documents included in each row don't match the value field of each row, that is, the documents included are the latest (updated) versions but the index row values still reflect the previous (first) version of the documents.
Why this behaviour? Well, include_docs=true
works by at query time, for each row, to fetch from disk the
latest revision of each document. There's no way to include a
previous revision of a document. Previous revisions are not
accessible through the latest vbucket databases MVCC snapshots
(http://en.wikipedia.org/wiki/Multiversion_concurrency_control),
and it's not possible to find efficiently from which previous MVCC
snapshots of a vbucket database a specific revision of a document
is located. Further, vbucket database compaction removes all
previous MVCC snapshots (document revisions). In short, this is a
deliberate design limit of the database engine.
The only way to ensure full consistency here is to include the
documents themselves in the values emitted by the map function.
Queries with stale=false are not 100% reliable
either, as just after the index is updated and while rows are
being streamed from disk to the client, document updates and
deletes can still happen, resulting in the same behaviour as in
the given example.