| The recommendations here are under development and may change before implementation. |
Overview
The intent of the REPLICA READ (also known as CMD_GET_REPLICA in the server) operation is to allow a client to perform retrieval operations only against one or more replicas. This would be an inconsistent read. While it could be used for nearly any purpose, the only common use case is expected to be in the event of failures when a known inconsistent read is okay.
Request
Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0| 0x80 | 0x83 | 0x00 | 0x05 |
+----------- +----------- +----------- +----------- +
4| 0x04 | 0x00 | 0x00 | 0x00 |
+----------- +----------- +----------- +----------- +
8| 0x00 | 0x00 | 0x00 | 0x09 |
+----------- +----------- +----------- +----------- +
12| 0x00 | 0x00 | 0x00 | 0x00 |
+----------- +----------- +----------- +----------- +
16| 0x00 | 0x00 | 0x00 | 0x00 |
+----------- +----------- +----------- +----------- +
20| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
24| 0x66 ('f') | 0x6f ('o') | 0x6f ('o') |
+---------------+---------------+---------------+
Field (offset) (value)
Magic (0) : 0x80 (PROTOCOL_BINARY_REQ)
Opcode (1) : 0x83
Key length (2,3) : 0x0003 (3)
Extra length (0) : 0x00
Data type (5) : 0x00
vbucket (6,7) : 0x0000 (0)
Total body (8-11) : 0x00000003 (3)
Opaque (12-15): 0x00000000
CAS (16-23): 0x0000000000000000
Key (24-26): The textual string "foo"
Response
Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0| 0x81 | 0x83 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
4| 0x04 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
8| 0x00 | 0x00 | 0x00 | 0x09 |
+---------------+---------------+---------------+---------------+
12| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
16| 0x00 | 0x00 | 0x00 | 0x00 |
+---------------+---------------+---------------+---------------+
20| 0x00 | 0x00 | 0x00 | 0x01 |
+---------------+---------------+---------------+---------------+
24| 0xde | 0xad | 0xbe | 0xef |
+---------------+---------------+---------------+---------------+
28| 0x57 ('W') | 0x6f ('o') | 0x72 ('r') | 0x6c ('l') |
+---------------+---------------+---------------+---------------+
32| 0x64 ('d') |
+---------------+
Field (offset) (value)
Magic (0) : 0x81 (PROTOCOL_BINARY_RES)
Opcode (1) : 0x83
Key length (2,3) : 0x0000
Extra length (4) : 0x04
Data type (5) : 0x00
Status (6,7) : 0x0000
Total body (8-11) : 0x00000009
Opaque (12-15): 0x00000000
CAS (16-23): 0x0000000000000001
Extras :
Flags (24-27): 0xdeadbeef
Key : None
Value (28-32): The textual string "World"
Client API
Java
Note: not sure if we like this yet...
GetFuture resf; boolean isReplicaRead; try { resf = cbc.asyncGet("foo"); isReplicaRead = false; } catch (TimeoutException ex) { // uhoh, something went wrong, server isn't there! resf = cbc.asyncReplicaGet("foo"); isReplicaRead = true; } finally { // do something useful }
.NET
PHP
Ruby
res = nil
is_replica_read = false
begin
res = cbc.get("foo")
rescue Couchbase::Error::Timeout => ex
res = cbc.get("foo", :replica => true)
is_replica_read = true
ensure
# do something useful
end
Recommended Implementation
| These recommendations are preliminary, and have not been reviewed. |
The replica read command is intentionally simple. It would attempt to iterate through all of the replicas as designated by the configuration supplied by the cluster in order, trying to get the specified key.
Implementation Constraints
REPLICA READ is a binary protocol only operation. It could be implemented in ASCII, but that would require changes in moxi and additional clients.
Comments (10)
May 14, 2012
Sergey Avseyev says:
Could you also post packet format here? There no such command definition in http...Could you also post packet format here? There no such command definition in https://github.com/membase/memcached/blob/engine/include/memcached/protocol_binary.h
When it will be accessible?
May 15, 2012
Matt Ingenthron says:
It's in the command_ids.h for ep-engine in the master branch (for 2.0), but it's...It's in the command_ids.h for ep-engine in the master branch (for 2.0), but it's a valid question as to whether or not it should be in the protocol_binary.h. There are a few things we have in this engine that are extensions.
Please check with Trond on how they handle this sort of thing-- I don't know without looking a bit deeper.
May 15, 2012
Sergey Avseyev says:
Could you review command dissection I've just added?Could you review command dissection I've just added?
May 15, 2012
Sergey Avseyev says:
Asked Trond, and I think it is ok, because comman_ids.h is public nowAsked Trond, and I think it is ok, because comman_ids.h is public now
May 15, 2012
Sergey Avseyev says:
Is there quiet variant for this command to implement pipelined get?Is there quiet variant for this command to implement pipelined get?
May 15, 2012
Matt Ingenthron says:
There is not a quiet variant, no. That's a good point though.There is not a quiet variant, no. That's a good point though.
May 15, 2012
Sergey Avseyev says:
Case 1. The client sends requests to all replicas simultaneously 1. pick the vb...Case 1. The client sends requests to all replicas simultaneously
1. pick the vbucket array from the config
2. iterate over the array and schedule CMD_GET_REPLICA for each vbucket with given key
3. flush buffers / start network interaction
4. collect all requests, skipping errors and find the CAS winner (the request with the most popular CAS version)
Pros:
Cons:
Case 2. The client is iterating over the replicas and stops after first successful response
1. Pick first replica vbucket from config
2. Increment variable storing the next position
3. Schedule CMD_GET_REPLICA request with given key
4. flush buffers / start network interaction
5. in the response handler check the status code and return to the user, or continue otherwise
6. Stop once reached max replica count and return NOTFOUND to user
7. Pick next replica vbucket from config
8. Go to step 2
Pros:
Cons:
May 15, 2012
Sergey Avseyev says:
The question is what approach is better? or maybe I missed something or there ar...The question is what approach is better? or maybe I missed something or there are other options
May 15, 2012
Matt Ingenthron says:
First off, there's a problem with case 1. You cannot rely on CAS as a monotonic...First off, there's a problem with case 1. You cannot rely on CAS as a monotonic clock.
Even if it were, I think I prefer case 2.
Given that a vbucket could move (it is intended to be used with failures, and auto-failover would have the vbucket move within a few seconds), we should walk the array of nodes for that vbucket, but also keep track of the config revision such that if it's updated, we start walking the array from the start again. Otherwise, we're likely to just get not-my-vbucket replies.
May 15, 2012
Sergey Avseyev says:
Ok, in this case the client will check no more num_replicas times just iterating...Ok, in this case the client will check no more num_replicas times just iterating indexes from 1 to num_replicas-1.