The query parameter debug=true can be used to
debug queries with stale=false that are not
returning all expected data or return unexpected data. This is
particularly useful when clients issue a
stale=false query right after being unblocked
by a memcached OBSERVE command. An example issue where this
happened is
MB-7161.
Here follows an example of how to debug this sort of issues on a
simple scenario where there's only 16 vbuckets (instead of 1024)
and 2 nodes. The tools couchdb_dump and
couchdb_info (from the couchstore git project)
are used to help analyze this type of issues (available under
install/bin directory).
Querying a view with debug=true will add an
extra field, named debug_info in the view
response. This field has one entry per node in the cluster (if no
errors happened, like down/timed out nodes for example). Example:
shell> curl -s 'http://localhost:9500/default/_design/test/_view/view1?stale=false&limit=5&debug=true' | json_xs { "debug_info" : { "local" : { "main_group" : { "passive_partitions" : [], "wanted_partitions" : [ 0, 1, 2, 3, 4, 5, 6, 7 ], "wanted_seqs" : { "0002" : 62500, "0001" : 62500, "0006" : 62500, "0005" : 62500, "0004" : 62500, "0000" : 62500, "0007" : 62500, "0003" : 62500 }, "indexable_seqs" : { "0002" : 62500, "0001" : 62500, "0006" : 62500, "0005" : 62500, "0004" : 62500, "0000" : 62500, "0007" : 62500, "0003" : 62500 }, "cleanup_partitions" : [], "stats" : { "update_history" : [ { "deleted_ids" : 0, "inserted_kvs" :00, "inserted_ids" :00, "deleted_kvs" : 0, "cleanup_kv_count" : 0, "blocked_time" : 0.000258, "indexing_time" : 103.222201 } ], "updater_cleanups" : 0, "compaction_history" : [], "full_updates" : 1, "accesses" : 1, "cleanups" : 0, "compactions" : 0, "partial_updates" : 0, "stopped_updates" : 0, "cleanup_history" : [], "update_errors" : 0, "cleanup_stops" : 0 }, "active_partitions" : [ 0, 1, 2, 3, 4, 5, 6, 7 ], "pending_transition" : null, "unindexeable_seqs" : {}, "replica_partitions" : [ 8, 9, 10, 11, 12, 13, 14, 15 ], "original_active_partitions" : [ 0, 1, 2, 3, 4, 5, 6, 7 ], "original_passive_partitions" : [], "replicas_on_transfer" : [] } }, "http://10.17.30.98:9501/_view_merge/" : { "main_group" : { "passive_partitions" : [], "wanted_partitions" : [ 8, 9, 10, 11, 12, 13, 14, 15 ], "wanted_seqs" : { "0008" : 62500, "0009" : 62500, "0011" : 62500, "0012" : 62500, "0015" : 62500, "0013" : 62500, "0014" : 62500, "0010" : 62500 }, "indexable_seqs" : { "0008" : 62500, "0009" : 62500, "0011" : 62500, "0012" : 62500, "0015" : 62500, "0013" : 62500, "0014" : 62500, "0010" : 62500 }, "cleanup_partitions" : [], "stats" : { "update_history" : [ { "deleted_ids" : 0, "inserted_kvs" :00, "inserted_ids" :00, "deleted_kvs" : 0, "cleanup_kv_count" : 0, "blocked_time" : 0.000356, "indexing_time" : 103.651148 } ], "updater_cleanups" : 0, "compaction_history" : [], "full_updates" : 1, "accesses" : 1, "cleanups" : 0, "compactions" : 0, "partial_updates" : 0, "stopped_updates" : 0, "cleanup_history" : [], "update_errors" : 0, "cleanup_stops" : 0 }, "active_partitions" : [ 8, 9, 10, 11, 12, 13, 14, 15 ], "pending_transition" : null, "unindexeable_seqs" : {}, "replica_partitions" : [ 0, 1, 2, 3, 4, 5, 6, 7 ], "original_active_partitions" : [ 8, 9, 10, 11, 12, 13, 14, 15 ], "original_passive_partitions" : [], "replicas_on_transfer" : [] } } }, "total_rows" : 1000000, "rows" : [ { "value" : { "ratio" : 1.8, "type" : "warrior", "category" : "orc" }, "id" : "0000014", "node" : "http://10.17.30.98:9501/_view_merge/", "partition" : 14, "key" : 1 }, { "value" : { "ratio" : 1.8, "type" : "warrior", "category" : "orc" }, "id" : "0000017", "node" : "local", "partition" : 1, "key" : 1 }, { "value" : { "ratio" : 1.8, "type" : "priest", "category" : "human" }, "id" : "0000053", "node" : "local", "partition" : 5, "key" : 1 }, { "value" : { "ratio" : 1.8, "type" : "priest", "category" : "orc" }, "id" : "0000095", "node" : "http://10.17.30.98:9501/_view_merge/", "partition" : 15, "key" : 1 }, { "value" : { "ratio" : 1.8, "type" : "warrior", "category" : "elf" }, "id" : "0000151", "node" : "local", "partition" : 7, "key" : 1 } ] }
For each node, there are 2 particular fields of interest when
debugging stale=false queries that apparently
miss some data:
wanted_seqs - This field has an object
(dictionary) value where keys are vbucket IDs and values are
vbucket database sequence numbers (see
Section D.6, “Data missing in query response or it's wrong (potentially due to server
issues)” for an
explanation of sequence numbers). This field tells us the
sequence number of each vbucket database file (at the
corresponding node) at the moment the query arrived at the
server (all these vbuckets are active
vbuckets).
indexable_seqs - This field has an object
(dictionary) value where keys are vbucket IDs and values are
vbucket database sequence numbers. This field tells us, for
each active vbucket database, up to which sequence the index
has processed/indexed documents (remember, each vbucket
database sequence number is associated with 1, and only 1,
document).
For queries with stale=false, all the sequences
in indexable_seqs must be greater or equal then
the sequences in wanted_seqs - otherwise the
stale=false option can be considered broken.
What happens behind the scenes is, at each node, when the query
request arrives, the value for wanted_seqs is
computed (by asking each active vbucket database for its current
sequence number), and if any sequence is greater than the
corresponding entry in indexable_seqs (stored
in the index), the client is blocked, the indexer is started to
update the index, the client is unblocked when the indexer
finishes updating the index, and finally the server starts
streaming rows to the client - note that at this point, all
sequences in indexable_seqs are necessarily
greater or equal then the corresponding sequences in
wanted_sequences, otherwise the
stale=false implementation is broken.