Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History
The recommendations here are under development and may change before implementation.

Overview

The intent of the OBSERVE operation, as implemented by Couchbase client libraries, is to give developers the ability to observe the status of a key with respect to a specific change.  For example, a developer may want to somehow mutate a document behind the key identified by the string "foo".  Given that a Couchbase cluster is distributed and concurrent and acknowledges the request before the given document, there are situations where the application developer may want to observe whether the specific mutation they made is either:

  1. safely replicated to one or more nodes within the cluster
  2. persisted to the master, or persisted to one or more nodes within the cluster
  3. has been taken into consideration in any indexes

Request

  Byte/     0       |       1       |       2       |       3       |
     /              |               |               |               |
    |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
    +---------------+---------------+---------------+---------------+
   0| 0x80          | 0xb1          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
   4| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
   8| 0x00          | 0x00          | 0x00          | 0x14          |
    +---------------+---------------+---------------+---------------+
  12| 0xde          | 0xad          | 0xbe          | 0xef          |
    +---------------+---------------+---------------+---------------+
  16| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
  20| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
  24| 0x00          | 0x02          | 0x00          | 0x04          |
    +---------------+---------------+---------------+---------------+
  28| 0x00          | 0x05          | 0x68 ('h')    | 0x65 ('e')    |
    +---------------+---------------+---------------+---------------+
  32| 0x6c ('l')    | 0x6c ('l')    | 0x6f ('o')    | 0x00          |
    +---------------+---------------+---------------+---------------+
  36| 0x05          | 0x00          | 0x05          | 0x77 ('w')    |
    +---------------+---------------+---------------+---------------+
  40| 0x6f ('o')    | 0x72 ('r')    | 0x6c ('l')    | 0x64 ('d')    |
    +---------------+---------------+---------------+---------------+
observe command
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0xb1
Key length   (2,3)  : 0x0000
Extra length (4)    : 0x00
Data type    (5)    : 0x00
Vbucket      (6,7)  : 0x0000
Total body   (8-11) : 0x00000014
Opaque       (12-15): 0xdeadbeef
CAS          (16-23): 0x0000000000000000
# Keys       (24-25): 0x0002
  Key #0
    vbucket  (26-27): 0x0004
    keylen   (28-29): 0x0005
             (30-34): "hello"
  Key #1
    vbucket  (35-36): 0x0005
    keylen   (37-38): 0x0005
             (39-43): "world"

Response

  Byte/     0       |       1       |       2       |       3       |
     /              |               |               |               |
    |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
    +---------------+---------------+---------------+---------------+
   0| 0x81          | 0xb1          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
   4| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
   8| 0x00          | 0x00          | 0x00          | 0x26          |
    +---------------+---------------+---------------+---------------+
  12| 0xde          | 0xad          | 0xbe          | 0xef          |
    +---------------+---------------+---------------+---------------+
  16| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
  20| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
  24| 0x00          | 0x02          | 0x00          | 0x04          |
    +---------------+---------------+---------------+---------------+
  28| 0x00          | 0x05          | 0x68 ('h')    | 0x65 ('e')    |
    +---------------+---------------+---------------+---------------+
  32| 0x6c ('l')    | 0x6c ('l')    | 0x6f ('o')    | 0x01          |
    +---------------+---------------+---------------+---------------+
  36| 0x00          | 0x00          | 0x00          | 0x00          |
    +---------------+---------------+---------------+---------------+
  40| 0x00          | 0x00          | 0x00          | 0x0a          |
    +---------------+---------------+---------------+---------------+
  44| 0x00          | 0x05          | 0x00          | 0x05          |
    +---------------+---------------+---------------+---------------+
  48| 0x77 ('w')    | 0x6f ('o')    | 0x72 ('r')    | 0x6c ('l')    |
    +---------------+---------------+---------------+---------------+
  52| 0x64 ('d')    | 0x00          | 0xde          | 0xad          |
    +---------------+---------------+---------------+---------------+
  56| 0xbe          | 0xef          | 0xde          | 0xad          |
    +---------------+---------------+---------------+---------------+
  60| 0xca          | 0xfe          |
    +---------------+---------------+
observe response
Field        (offset) (value)
Magic        (0)    : 0x81
Opcode       (1)    : 0xb1
Key length   (2,3)  : 0x0000
Extra length (4)    : 0x00
Data type    (5)    : 0x00
Status       (6,7)  : 0x0000
Total body   (8-11) : 0x00000026
Opaque       (12-15): 0xdeadbeef
CAS          (16-23): 0x0000000000000000
# Keys       (24-25): 0x0002
  Key #0
    vbucket  (26-27): 0x0004
    keylen   (28-29): 0x0005
             (30-34): "hello"
             (35-35): 0x01 (persisted)
    cas      (36-43): 000000000000000a
  Key #1
    vbucket  (44-45): 0x0005
    keylen   (46-47): 0x0005
             (48-52): "world"
             (53-53): 0x00 (not persisted)
    cas      (54-61): deadbeefdeadcafe

Notes

The client build a request packet once and put there all keys it interested in. After that it sends to all servers owning vbuckets (master and replicas). The servers should send back response with only keys he knows about.

Client API

Java

The Java client library, and it's dependencies, will require changes to return the CAS value through the OperationStatus object.

The basic idea is that given the OperationStatus object (or optionally, multiple OperationStatus objects), an Observer can report on whether or not the operation has been replicated, persisted, or processed for indexing.

OperationFuture<Boolean> setResult = cbc.set("foo", 0, "bar");

if (setResult.get()) {
  Observer watchit = new Observer(setResult.getStatus());
else {
  // do something because the set failed
}

// now that the observer knows what to observe...
ObserveBoolean observeSuccess = watchit.blockForReplication(1); // this will block
if (!observeSuccess) {
  // handle whatever we want to do for this change
  switch (observeSuccess.getStatus()) {
    case OBS_TIMEDOUT:
      // do something for the timeout case
      break;
    case OBS_MODIFIED:
      // key was either modified by someone else or failover occurred
      break;
  }
}

.NET

PHP

Ruby

The observe method will return false value in case of timeout and positive in case of success.

# observe single key and wait for 5 seconds 
# until it will be replicated to at least one replica
observe("foo", :cas => 6635827497922002944, :ttl => 5, :replicas => 1)
# returns true or false

# observe single key and wait for 5 seconds 
# until it will be replicated to at least one replica
observe({"foo" => 6635827497922002944, "bar" => 16213820143098331136},
        :ttl => 5, :replicas => 1)
# returns {"foo" => true, "bar" => true}

As alternative API all mutators (i.e. set/add/replace/append/prepend) should have two new options:

  • :num_replicas — how many replicas desired to consider mutator successful. as far as the client knows how many replicas configured, it could also raise ArgumentError exception if this option negative or excessive.
  • :timeout — the timeout for replication. In this case it would be better to raise exception instead of returning falsy value
set("foo", "bar", :num_replicas => 2, :timeout => 5) 

Or even better to combine these options into single option :observe. It will make more clear the fact of addition operation applied here.

set("foo", "bar", :observe => {:replicas => 2, :timeout => 5}) 

Because the code above written in synchronous fasion, it is ok to wait before method will return the value. But in asynchronous mode we can return immediately.

conn.run do |c|
  c.set("foo", "bar", :observe => {:replicas => 1, :persisted => 2})
  c.append("baz", "bar", :observe => {:persisted => 4})
  c.observe_and_wait
end

C

For C client (libcouchbase) it will be better to implement only low-level operation because of its asynchronous nature. This means the there should be libcouchbase_observe() function accepting the key and returning its status on all replicas into the corresponding callback.

Recommended Implementation

These recommendations are preliminary, and have not been reviewed.

Given that the item being mutated may be changed at any time by another actor in a deployment, the key idea here is for the client to use the CAS value returned from the mutation operation.  Since the client library knows at all times (though, asynchronously) which node is responsible for the vbucket for a given key, and knows which nodes are slaves for that key, the client library can use stats key as a method of determining what has happened with a key on this given node.

Implementation Approach

If, for example, application code wanted to check for "foo" with CAS value 12345, it would use STATS KEY against whichever nodes it needs to in order to report the status. This would be done in a loop, with some reasonable backoff (possibly guided by recommended polling times from server stats) until a reasonable or user specified timeout.

Looping steps would consist of...

First the client would check to ensure that "foo" 12345 is still the current value on the master for this vbucket via STATS KEY. If it is not the current value, it simply returns saying the item is modified.

If it is still the current value, the next step for the client depends whether the application code is simply checking for persistence or index processing. This can be evaluated from the STATS KEY response. If the application code is checking for replication or whether or not the modification has been persisted on multiple nodes, it can then proceed to check any slaves, as identified by the cluster configuration, for status of persistence or replication of "foo" with CAS 12345. If successful, it returns that response to the application code. If it is not successful, then it waits an interval and loops again, blocking the application code until status is determined or a timeout value has been reached.

Thus the possible set of return values would be either OBS_SUCCESS, OBS_MODIFIED, or OBS_TIMEDOUT.

Note that the status OBS_MODIFIED does not indicate monotonic forward mutation. For example, in one scenario a failover may have occurred and the item key "foo" being observed may have been reverted to a previous state. This state may even be some value prior to the initial fetch before the application code mutated the value of the document.

Implementation Constraints

OBSERVE is a binary protocol only operation. It could be implemented in ASCII, but that would currently be complicated by the fact that mutation operations do not return the new CAS value in ASCII protocol. Since Couchbase Server uses binary protocol exclusively, we do not implement that currently.

Implementation Questions

q: Is it better to return the values OBS_SUCCESS, OBS_MODIFIED, and OBS_TIMEDOUT instead of true/false and treating the timeout as an exceptional condition? Since OBS_MODIFIED and OBS_TIMEDOUT may effectively require the same error handling, it may be easier to switch on these or return some extended boolean with status.

a: ?

q: Is a C implementation of OBSERVE needed?

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.