Skip to end of metadata
Go to start of metadata
The recommendations here are under development and may change before implementation.

Overview

The intent of the REPLICA READ (also known as CMD_GET_REPLICA in the server) operation is to allow a client to perform retrieval operations only against one or more replicas. This would be an inconsistent read. While it could be used for nearly any purpose, the only common use case is expected to be in the event of failures when a known inconsistent read is okay.

Request


  Byte/     0       |       1       |       2       |       3       |
     /              |               |               |               |
    |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
    +---------------+---------------+---------------+---------------+
   0|  0x80         |  0x83         |  0x00         |  0x05         |
    +-----------    +-----------    +-----------    +-----------    +
   4|  0x04         |  0x00         |  0x00         |  0x00         |
    +-----------    +-----------    +-----------    +-----------    +
   8|  0x00         |  0x00         |  0x00         |  0x09         |
    +-----------    +-----------    +-----------    +-----------    +
  12|  0x00         |  0x00         |  0x00         |  0x00         |
    +-----------    +-----------    +-----------    +-----------    +
  16|  0x00         |  0x00         |  0x00         |  0x00         |
    +-----------    +-----------    +-----------    +-----------    +
  20|  0x00         |  0x00         |  0x00         |  0x00         |
    +---------------+---------------+---------------+---------------+
  24|  0x66 ('f')   |  0x6f ('o')   |  0x6f ('o')   |
    +---------------+---------------+---------------+

Field        (offset) (value)
Magic        (0)    : 0x80 (PROTOCOL_BINARY_REQ)
Opcode       (1)    : 0x83
Key length   (2,3)  : 0x0003 (3)
Extra length (0)    : 0x00
Data type    (5)    : 0x00
vbucket      (6,7)  : 0x0000 (0)
Total body   (8-11) : 0x00000003 (3)
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Key          (24-26): The textual string "foo"

Response


   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0| 0x81          | 0x83          | 0x00          | 0x00          |
     +---------------+---------------+---------------+---------------+
    4| 0x04          | 0x00          | 0x00          | 0x00          |
     +---------------+---------------+---------------+---------------+
    8| 0x00          | 0x00          | 0x00          | 0x09          |
     +---------------+---------------+---------------+---------------+
   12| 0x00          | 0x00          | 0x00          | 0x00          |
     +---------------+---------------+---------------+---------------+
   16| 0x00          | 0x00          | 0x00          | 0x00          |
     +---------------+---------------+---------------+---------------+
   20| 0x00          | 0x00          | 0x00          | 0x01          |
     +---------------+---------------+---------------+---------------+
   24| 0xde          | 0xad          | 0xbe          | 0xef          |
     +---------------+---------------+---------------+---------------+
   28| 0x57 ('W')    | 0x6f ('o')    | 0x72 ('r')    | 0x6c ('l')    |
     +---------------+---------------+---------------+---------------+
   32| 0x64 ('d')    |
     +---------------+


Field         (offset) (value)
 Magic        (0)    : 0x81 (PROTOCOL_BINARY_RES)
 Opcode       (1)    : 0x83
 Key length   (2,3)  : 0x0000
 Extra length (4)    : 0x04
 Data type    (5)    : 0x00
 Status       (6,7)  : 0x0000
 Total body   (8-11) : 0x00000009
 Opaque       (12-15): 0x00000000
 CAS          (16-23): 0x0000000000000001
 Extras              :
   Flags      (24-27): 0xdeadbeef
 Key                 : None
 Value        (28-32): The textual string "World"

Client API

Java

Note: not sure if we like this yet...


GetFuture resf;
boolean isReplicaRead;

try { 
  resf = cbc.asyncGet("foo");
  isReplicaRead = false;
} catch (TimeoutException ex) {
  // uhoh, something went wrong, server isn't there!
  resf = cbc.asyncReplicaGet("foo");
  isReplicaRead = true;
} finally {

  // do something useful
}

.NET

PHP

Ruby

res = nil
is_replica_read = false

begin
  res = cbc.get("foo")
rescue Couchbase::Error::Timeout => ex
  res = cbc.get("foo", :replica => true)
  is_replica_read = true
ensure
  # do something useful
end

Recommended Implementation

These recommendations are preliminary, and have not been reviewed.

The replica read command is intentionally simple. It would attempt to iterate through all of the replicas as designated by the configuration supplied by the cluster in order, trying to get the specified key.

Implementation Constraints

REPLICA READ is a binary protocol only operation. It could be implemented in ASCII, but that would require changes in moxi and additional clients.

Implementation Questions

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. May 14, 2012

    Sergey Avseyev says:

    Could you also post packet format here? There no such command definition in http...

    Could you also post packet format here? There no such command definition in https://github.com/membase/memcached/blob/engine/include/memcached/protocol_binary.h

    When it will be accessible?

    1. May 15, 2012

      Matt Ingenthron says:

      It's in the command_ids.h for ep-engine in the master branch (for 2.0), but it's...

      It's in the command_ids.h for ep-engine in the master branch (for 2.0), but it's a valid question as to whether or not it should be in the protocol_binary.h. There are a few things we have in this engine that are extensions.

      Please check with Trond on how they handle this sort of thing-- I don't know without looking a bit deeper.

      1. May 15, 2012

        Sergey Avseyev says:

        Could you review command dissection I've just added?

        Could you review command dissection I've just added?

      2. May 15, 2012

        Sergey Avseyev says:

        Asked Trond, and I think it is ok, because comman_ids.h is public now

        Asked Trond, and I think it is ok, because comman_ids.h is public now

  2. May 15, 2012

    Sergey Avseyev says:

    Is there quiet variant for this command to implement pipelined get?

    Is there quiet variant for this command to implement pipelined get?

    1. May 15, 2012

      Matt Ingenthron says:

      There is not a quiet variant, no. That's a good point though.

      There is not a quiet variant, no. That's a good point though.

  3. May 15, 2012

    Sergey Avseyev says:

    Case 1. The client sends requests to all replicas simultaneously 1. pick the vb...

    Case 1. The client sends requests to all replicas simultaneously

    1. pick the vbucket array from the config
    2. iterate over the array and schedule CMD_GET_REPLICA for each vbucket with given key
    3. flush buffers / start network interaction
    4. collect all requests, skipping errors and find the CAS winner (the request with the most popular CAS version)

    Pros:

    • atomic reading vbucket array from config
    • no need to track vbuckets to which the requests were sent (see Case 2)
    • more consistent value because of quering all replicas
    • less network roundtrips in case of pipelined implementation

    Cons:

    • requires memory to keep all replica responses (currently 3 is max)

    Case 2. The client is iterating over the replicas and stops after first successful response

    1. Pick first replica vbucket from config
    2. Increment variable storing the next position
    3. Schedule CMD_GET_REPLICA request with given key
    4. flush buffers / start network interaction
    5. in the response handler check the status code and return to the user, or continue otherwise
    6. Stop once reached max replica count and return NOTFOUND to user
    7. Pick next replica vbucket from config
    8. Go to step 2

    Pros:

    • Less network roundtrips if pipelining wont be accessible
    • Less memory consumption, because only one response stored

    Cons:

    • The config could be reloaded between tries
    • Isn't multiget friendly
    • less consistent value because it takes first successful response
    1. May 15, 2012

      Sergey Avseyev says:

      The question is what approach is better? or maybe I missed something or there ar...

      The question is what approach is better? or maybe I missed something or there are other options

    2. May 15, 2012

      Matt Ingenthron says:

      First off, there's a problem with case 1. You cannot rely on CAS as a monotonic...

      First off, there's a problem with case 1. You cannot rely on CAS as a monotonic clock.

      Even if it were, I think I prefer case 2.

      Given that a vbucket could move (it is intended to be used with failures, and auto-failover would have the vbucket move within a few seconds), we should walk the array of nodes for that vbucket, but also keep track of the config revision such that if it's updated, we start walking the array from the start again. Otherwise, we're likely to just get not-my-vbucket replies.

      1. May 15, 2012

        Sergey Avseyev says:

        Ok, in this case the client will check no more num_replicas times just iterating...

        Ok, in this case the client will check no more num_replicas times just iterating indexes from 1 to num_replicas-1.