View Source

h1. {anchor:h.5tau7ssmpeod}{*}Project Left Ranger*


h1. {anchor:h.oe6mwkdri1gb}{*}Introduction*


h2. {anchor:h.46vgkkxq0rgt}Project/Component Working Name:

Left Ranger Matches Again
\\
For pop culture reference, see: [{color:#1155cc}{+}{color}|https://www.youtube.com/watch?v=jqg2bY2VoCo]{color:#1155cc}[https://www.youtube.com/watch?v=jqg2bY2VoCo\+|https://www.youtube.com/watch?v=jqg2bY2VoCo+]{color}\\
Since this proposal "matches again" any incoming changes, and is expected to be used with left-handled ranges, the riff on 'Lone Ranger Rides Again' seemed appropriate, if a bit dated.

h2. {anchor:h.vhl0wuibeszs}Name of Document Author/Supplier:

Matt Ingenthron <matt@couchbase.com>

h2. {anchor:h.qyavz4xvmu66}Date of this Document:

18 April 2014

h2. {anchor:h.emp6x568jpk4}Proposal Status:

DRAFT

h2. {anchor:h.6jqp7tiphpkq}Name of Major Document Customer(s)/Consumer(s):

Developers using Couchbase, SDK Development Team.

h2. {anchor:h.3yj7s7ggxd2b}Communications:

[{color:#1155cc}{+}couchbase@googlegroups.com{+}{color}|mailto:couchbase@googlegroups.com]

h2. {anchor:h.ab97zlc7frwy}Project Summary:

This project will extend the system to allow for range operations. It will specifically add support for range queries which will also consider incoming changes that match that same range while the range query is being executed.

h2. {anchor:h.1na6lbyvgez6}Project Description:

Prior to the creation of Couchbase, we have long had developer requests for range queries over keys. Many extensions of memcached, both those directly extended like memcachedb and those which are logical extensions, like redis, support various different ways of doing range queries. This project will bring range operations to Couchbase.
Support for range operations would be added in a recommendation similar to the [{color:#1155cc}{+}RangeOps document on the memcached wiki{+}{color}|https://code.google.com/p/memcached/wiki/RangeOps]. This was originally authored with the same binary protocol in mind that has been successfully used in Couchbase Server and is part of our existing server and SDKs.
This project would add a subset of the operations described on the RangeOps page: get, getq, delete, deleteq. It also proposes a similar approach for Couchbase extended operations touch and touchq. It also proposes a new statistic, whose interface should be considered volatile, to be used in conjunction with these range operations to get statistics for a given range based on what is on disk. As described there, these RangeOps would use [{color:#1155cc}{+}EXTRAS space in the memcached binary protocol{+}{color}|https://code.google.com/p/memcached/wiki/MemcacheBinaryProtocol].
Unlike the recommendations on the RangeOps document, this project SHOULD NOT implement the ASCII protocol equivalents. Couchbase's direct protocol interface is binary only.
One key difference between range operations and other query features such as views or upcoming features like N1QL queries is that these range operations are concurrent and are not ordered or collated and operate only on the key. This allows for the addressing of a different set of use cases.

h2. {anchor:h.pv8muelo6wul}Risks and Assumptions:

This project assumes that the extensions would only apply to ep-engine's implementation of these protocol operations. The memcached "default" engine would not be extended to support these new command.
The project also assumes that initial support would be needed for the Couchbase C SDK (a.k.a. libcouchbase). Support for other SDKs would be added as time allows in those SDKs development lifecycle.
As mentioned above, it's also assumed that there will be no changes made to moxi and the commands will not be available over the ASCII protocol.

h1. {anchor:h.mrxs7cuhlaen}{*}Business Summary*


h2. {anchor:h.qdm2l8aodb92}Problem Area:

Couchbase has grown into more significant deployments with much larger sets of data. Further, time in the field has shown that frequently to hit high performance needs on the part of our end users, we need to consider modeling the application on a simpler key-value approach rather than using more advanced features such as Couchbase Views.
One area is a need to be able to work with a set of related data. Users may currently model these applications by having a specific key format which has some sort of prefix or delimiter. Couchbase has encouraged this approach with the development of many applications. However, these are of limited use if there is no mechanism by which all given documents/items prefixed by a given key can be retrieved, updated or deleted.
Current approaches call for the creation of a Couchbase View to have an index. While this is close to meeting the functional requirements, it has a few problems.
First, building views is very expensive. Even for a simple view which would be ordered by the key/id alone, the view requires the execution of a javascript function for every mutation persisted and a set of separate files for the views. It must also do this for replicas in order to properly be able to support this after a rebalance.
Second, views may not have the expected sort order. Generally, developers expect this to be sorted in value order. Existing views are unicode collated and may require padding of portions of the first argument to the emit to get the desired behavior.
Third, views are very late with respect to when changes are made and have no provision for handling changes that may be in memory but are not yet persisted.
By providing for range operations which can work with concurrent modifications and not yet persisted data, we can address this area of common demand for features.

h2. {anchor:h.n53vl3ft8kdi}Market/Requester:

There is a quite diverse market for this.

h2. {anchor:h.dp8uasufdogi}How will you know when you are done?

The functional requirements will be implemented and tested by both Couchbase Server and libcouchbase. Regular feature testing from libcouchbase will demonstrate feature completeness.

h1. {anchor:h.c5eavaxi8r7d}{*}Technical Description*


h2. {anchor:h.oarvn6vvllmg}Details:

This project touches a few different components and internal interfaces between those components. The general outline for how it would be done was outlined in a set of meetings on February 18 and 19, 2014 lead by Matt Ingenthron. Attendees included Chiyoung Seo, Aaron Miller, Cihan Biyikoglu, Mark Azad, Alex Ma and Kirk Kirkconnnell.
The components in play are couchstore, ep-engine and the client libraries. Most of the work would be between components and on private interfaces. Looking at this from a bottom up perspective, the project calls for...
Updating the couchstore project to allow for a way to query the ID b-tree and a way to query disk level statistics reduced on the interior nodes of that b-tree. This would allow other components, most notably ep-engine, to query the files and get aggregate statistics. Currently, couchstore has three fields, "deletedfalsecount", "deletedtruecount", "disksize" on the ID b-tree. These would all be queryable by range. Note that these statistics exist primarily for determining file fragmentation and disk usage for stored, compressed documents which makes them suitable for making approximations about a set of items in a range but does not give exact information on the range.
Updating ep-engine to extend existing opcodes:
* get, getq: extend to allow for embedding range requests in the EXTRAS field and extended to return additional metadata. This may be done with a different opcode since the metadata being returned is different from a traditional get.
* delete, deleteq: extend to allow for embedding range requests in the EXTRAS field.
* touch, touchq: extend to allow for embedding range requests in the EXTRAS field.
* stats: create a new stat which takes the string argument "range_disk_usage" and a range in the EXTRAS field.

To be able to handle the movement of vbuckets, the range operations would either apply to all vbuckets on a given node or to individually targeted vbuckets. It would be up to the client application to determine how to target vbuckets and this proposal leaves undefined the behavior when a vbucket leaves a given node.

For get operations, ep-engine would also use existing mechanisms to include incoming operations in the result of the query and the query would not be considered complete until both the disk query and any changes in memory have been considered. Then, as suggested on the [{color:#1155cc}{+}RangeOps draft{+}{color}|https://code.google.com/p/memcached/wiki/RangeOps], an empty response would be sent with an opaque that is associated with the request. One important extension to the RangeOps draft is that it must be possible to specify a single vbucket or a set of vbuckets. Also important, there will be a flag to indicate the range query would continue to run until terminated, or automatically terminate. Note that owing to [{color:#1155cc}{+}MB-10291{+}{color}|https://www.couchbase.com/issues/browse/MB-10291] "cbmcd connections cannot be efficiently used since operations are never interleaved" these features will probably best be used from dedicated connections
.
It would be a goal in this project to add additional monitoring statistics for any range operations being executed, since they are potentially long running operations with memory management and disk IO considerations. The exact mechanism for exposing these statistics will not be defined here, but at a minimum it should include the _approximate number of items matching_ the range and t{_}he number of items visited over the execution of the query_.
The libcouchbase client library would also be updated to support the range queries specified. Also, libcouchbase would need to expose a new broadcast operation on top of stats which would be sent to all servers with ranges embedded in the EXTRAS for the stats request.

h3. Performance Profile Discussion

As currently written, Left Ranger would use the by-ID B-Tree in the underlying individual 'couchstore' files backing each vbucket. &nbsp;Of course, that means that this is not the most efficient approach, as each cluster node will need to do a B-Tree walk for each vbucket active on that node.

Longer term, it may be a appropriate to reconsider the primary index of items. &nbsp;

Also, one simple idea may be to leverage the existing hashtable walker in situations where we know we have all metadata in memory. &nbsp;This wouldn't be O(log n), but would be O\(n) traversal of an in memory data structure. That in memory data structure could be ordered to further reduce runtime complexity, possibly at a cost of space.

h2. {anchor:h.2ln1yyxbjdfh}Issues/Issues to be Opened:

TBD.

h2. {anchor:h.lg1ceg4fl02y}Other Solutions Previously Considered:

Use of views, as discussed above.

h2. {anchor:h.9q05e7lv17el}In Scope:

Changes to ep-engine, couchstore and libcouchbase. Documentation of new extensions and all feature testing.

h2. {anchor:h.iwav601zqzlg}Out of Scope:

Changes to moxi to expose ASCII versions of these new operations.
Any changes to existing tools for backup, transfer, etc.

h2. {anchor:h.h45l1wbn2wst}Interfaces:

A new set of operations at the Couchbase memcached protocol level:
* get, getq
* delete, deleteq
* touch, touchq
* a new stat for "range_disk_usage"

h2. {anchor:h.o8kh7m8u313}Doc Impact:

Additional documentation for libcouchbase about how to use the new API.
New documentation on the new statistic.
Additional documentation on the Couchbase specific extensions to the memcached protocol. As of this writing, it's not clear where those are documented.
Documentation enhancements to the dev guide to describe how these new operations may be used in various application development patterns.

h2. {anchor:h.7n2onrzdaouw}Admin/Config Impact:

No Administration impact.
For configuration, there may be some sizing impact. Note that the new range operations may use a fair amount of disk IO as the ID b-tree is walked and then the individual items are fetched or background fetched. That said, this is not expected to be any worse than existing workload generated by querying views with the logical all_docs configuration.

h2. {anchor:h.ey9ruegzy0bm}Packaging and Delivery Impact:

New dot-minor (or greater) versions of all components mentioned would be required. Given that no existing interfaces are expected to be removed, there is no additional impact over a regular dot-minor upgrade.

h2. {anchor:h.59dk9gesyv4l}Security Impact:

None.

h2. {anchor:h.m8dpcfckgkks}Dependencies:

None outside the mentioned components. This spans many components on both the client and the server.

h2. {anchor:h.yaixkscms3qz}Reference Documents:

The [{color:#1155cc}{+}memcached RangeOps documentation{+}{color}|https://code.google.com/p/memcached/wiki/RangeOps] on the memcached wiki.
This should link to private commands in memcached and public commands in memcached, but it's not clear where those are documented outside the code. Need assistance from ep-engine/memcached teams.

h1. {anchor:h.812ur27q5v09}{*}Resources and Schedule*


h2. {anchor:h.h3tyv05kvh0r}Projected Availability:

This project would be delivered in Couchbase Server 3.0+.

h2. {anchor:h.hnl10gwid0q}Cost of Effort:

(unknown)

h1. {anchor:h.dkn6zf7ju86l}{*}Prototype Availability*


h2. {anchor:h.xno4q3plum87}Prototype Availability:

It'd be expected that builds in leading to it's release or separate builds would deliver the first prototype, likely with reference implementation and documentation being lead by the libcouchbase project.

h2. {anchor:h.radmqv9tz7iq}Prototype Cost:

Since this is a committed change, cost has not been estimated.
\\

h1. {anchor:h.vl7wo5t7cx6d}{*}Open Questions*

# Do we need additional statistics on a per range-query basis to be able to monitor?
# What statistics should be added? The draft currently includes stats about approximate number of items and number of items visited.
# Should it be a new CMD_GET opcode?
# Open question: what is the input to the range? A pair of bytestrings and the inclusion flags seem reasonable.
# Are we comfortable with the "and keep it open" behavior of the range query response?
# (Michael): How are we dealing with that on the client side? Because technically its a single get, but this would span multiple vbuckets and nodes. Who does the gathering and aggregation of the results? I guess its nearly impossible to determine all nodes in a range for a given key? Or ask all of them and merge them on the client?
# Would the disk-oriented performance be acceptable?

h1. {anchor:h.2kpu6sl12tai}{*}Changelog*

Initial draft posted on 2014-02-24.
Updated on 2014-02-25. Indicate metadata must be returned with range gets. Added list of Open Questions. Moved question on the range to the open questions. Added options on statistics.
Minor updates on 2014-03-11. Posted on community wiki.
Updated to indicate release is post 3.0 on 2014-03-11. This was always intended before publication, but it wasn't changed.
{color:#333333}Added question on 2014-03-26 by Michael.{color}
Updated a performance consideration section on 2014-04-07. &nbsp;Also added this to the open questions.
Updated on 2014-04-18. Added a note about concurrency/ordering to the summary. Added a section about vbucket handling. Added a note about optimizing the walk of memory.
{anchor:_GoBack}