Project Left Ranger

Version 5 by ingenthr
on Apr 07, 2014 12:13.

compared with
Current by ingenthr
on Apr 18, 2014 12:46.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (6)

View Page History
h2. {anchor:h.qyavz4xvmu66}Date of this Document:

11 March 2014
18 April 2014

h2. {anchor:h.emp6x568jpk4}Proposal Status:
This project would add a subset of the operations described on the RangeOps page: get, getq, delete, deleteq. It also proposes a similar approach for Couchbase extended operations touch and touchq. It also proposes a new statistic, whose interface should be considered volatile, to be used in conjunction with these range operations to get statistics for a given range based on what is on disk. As described there, these RangeOps would use [{color:#1155cc}{+}EXTRAS space in the memcached binary protocol{+}{color}|].
Unlike the recommendations on the RangeOps document, this project SHOULD NOT implement the ASCII protocol equivalents. Couchbase's direct protocol interface is binary only.
One key difference between range operations and other query features such as views or upcoming features like N1QL queries is that these range operations are concurrent and are not ordered or collated and operate only on the key. This allows for the addressing of a different set of use cases.

h2. {anchor:h.pv8muelo6wul}Risks and Assumptions:
* stats: create a new stat which takes the string argument "range_disk_usage" and a range in the EXTRAS field.

To be able to handle the movement of vbuckets, the range operations would either apply to all vbuckets on a given node or to individually targeted vbuckets. It would be up to the client application to determine how to target vbuckets and this proposal leaves undefined the behavior when a vbucket leaves a given node.

For get operations, ep-engine would also use existing mechanisms to include incoming operations in the result of the query and the query would not be considered complete until both the disk query and any changes in memory have been considered. Then, as suggested on the [{color:#1155cc}{+}RangeOps draft{+}{color}|], an empty response would be sent with an opaque that is associated with the request. One important extension to the RangeOps draft is that it must be possible to specify a single vbucket or a set of vbuckets. Also important, there will be a flag to indicate the range query would continue to run until terminated, or automatically terminate. Note that owing to [{color:#1155cc}{+}MB-10291{+}{color}|] "cbmcd connections cannot be efficiently used since operations are never interleaved" these features will probably best be used from dedicated connections
Longer term, it may be a appropriate to reconsider the primary index of items.  

Also, one simple idea may be to leverage the existing hashtable walker in situations where we know we have all metadata in memory.  This wouldn't be O(log n), but would be O\(n) traversal of an in memory data structure. That in memory data structure could be ordered to further reduce runtime complexity, possibly at a cost of space.

h2. {anchor:h.2ln1yyxbjdfh}Issues/Issues to be Opened:
{color:#333333}Added question on 2014-03-26 by Michael.{color}
Updated a performance consideration section on 2014-04-07.  Also added this to the open questions.
Updated on 2014-04-18. Added a note about concurrency/ordering to the summary. Added a section about vbucket handling. Added a note about optimizing the walk of memory.