Search:

Search all manuals
Search this manual
Manual
Membase Manual 1.7
Additional Resources
Community Wiki
Community Forums
Couchbase SDKs
Parent Section
6.2 Best practices
Chapter Sections
Chapters

6.2.2. Objects that refer to other objects

6.2.2. Nested Items
6.2.2. Simple Lists
6.2.2. Handling List Item Deletion
6.2.2. Handling Highly Contended List Item Deletion
6.2.2. Compressing Lists
6.2.2. Large Lists
6.2.2. Multi-GET

Although Membase is a key-value store and you can store any byte-array value that you wish, there are some common patterns for handling items that refer to other items. Some example use cases. For example: User 1234 is interested in topics A, B, X, W and belongs to groups 1, 3, 5

Nested Items

You can store serialized, nested structures in Membase, such as by using encodings like JSON or XML (or Google Protocol Buffers or Thrift). A user profile item stored in Membase can then track information such as user interests. For example, in JSON...

{ "key": "user-1234", "handle": "bobama", "avatarURL": ...,
"interests": [ "A", "B", "X", "W" ], "groups": [ 1, 3, 5 ], ...
        }

If the above is stored in Membase under key "user-1234", you can then know the interests for that user by doing a simple GET for user-1234 and decoding the JSON response.

Simple Lists

To handle reverse lookups (who are the users interested in topic X?), a common solution is to store simple lists. For example, under key "topic-X", you might have store the following list:

user-1234,user-222,user-987,

Such lists can be easily constructed by using Membase's APPEND or PREPEND operations, where you append/prepend values that look like "user-XXXXX,".

Note that the list is delimited by commas, but that can be any character you choose.

Handling List Item Deletion

The above works when a user registers her interest in a topic, but how can you handle when a user wants to unregister their interest (eg, unsubscribe or unfollow)?

One approach is to use the CAS identifiers to do atomic replacement. A client application first does a GET-with-caS (a "gets" request in the ascii protocol) of the current list for a topic. Then the client removes the given user from the list response, and finally does a SET-with-CAS-identifier operation (a "cas" request in the ascii protocol) while supplying the same CAS identifier that was returned with the earlier "gets" retrieval.

If the SET-with-CAS request succeeds, the client has successfully replaced the list item with a new, shorter list with the relevant list entry deleted.

The SET-with-CAS-identifier operation might fail, however, if another client mutated the list while the first client was attempting a deletion. In this case the first client can try to repeat the list item delete operation.

Under a highly contended or fast mutating list however (such as users trying to follow a popular user or topic), the deleting client will have a difficult time making progress. Some approaches to handle this situation are described next...

Handling Highly Contended List Item Deletion

Instead of performing a SET-with-CAS to perform list item deletion, one pattern is to explicitly track deleted items. This could be done using APPEND for list additions and PREPENDS for list deletions, with an additional "tombstone" deletion character. For example, anything before the "|" character is considered deleted:

user-222,|user-1234,user-222,user-987,

So, after the client library retrieves that list and does some post-processing, the effective, actual list of interested subscribers is user-1234 and user-987.

Care must be taken to count correctly, in case user-222 decides to add themselves again to the list (and her clicks are faster than whatever logic your application has to prevent duplicate clicks):

user-222,|user-1234,user-222,user-987,user-222

A similar encoding scheme would use '+' or '-' delimiter characters to the same effect, where the client sends an APPEND of "+ID" to add an entry to a list, and an APPEND of "-ID" to remove an entry from a list. The client application would still perform post-processing on the list response, tracking appropriate list entry counts. In this and other encodings, we must take care not to use the delimiter characters that were chosen:

+1234+222+987-222

Yet another variation on this would be store deleted items to a separate paired list. So your application might have two lists for a topic, such as a "follow-X" and "unfollow-X".

Compressing Lists

Eventually, your application may need to garbage collect or compress the lists. To do so, you might have your client application do so by randomly piggy-backing on other requests to retrieve the list.

Again, with heavily contended, fast mutating list, attempts to compress a list may be fruitless as SET-with-CAS attempts can fail. Some solutions, as with many in software engineering, involve adding a level of indirection. For example, you could keep two lists for each topic, and use marker items to signal to clients which list is considered active:

topic-X.a => +1234+222+987-222 topic-X.b => (empty)
topic-X.active => topic-X.a

A client could multi-GET on topic-X.a and topic-X.b, and the combined result would contain the full list. To mutate the list, the client would look at the "pointer" item of topic-X.active, and know to APPEND values to topic-X.a.

A randomly self-chosen client may choose to garbage-collect the active list when it sees the list length is large enough, by writing a compressed version of topic-X.a into topic-X.b (note: XXX) and by flipping the topic-X.active item to point to "b". New clients will start APPEND'ing values to topic-X.b. Old, concurrent clients might still be APPEND'ing values to the old active item of topic-X.a, so other randomly self-selected clients can choose to help continue to compress topic-X.a into topic-X.b so that topic-X.a will be empty and ready for the next flip.

An alternative to a separate "topic-X.active" pointer item would be instead to PREPEND a tombstone marker value onto the front of the inactivated list item. For example, if '^' was the tombstone marker character, all concurrent clients would be able to see in that a certain list should not be appended to:

topic-X.a => +1234+222+987-222 topic-X.b => ^+1234

There are concurrency holes in this "active flipping" scheme, such as if there's a client process failure at the step noted above at "XXX", so for periods of time there might be duplicates or reappearing list items.

In general, the idea is that independent clients try to make progress towards an eventually stabilized state. Please consider your application use cases as to whether temporary inconsistencies are survivable.

Large Lists

If your lists get large (e.g., some user has 200,000 followers), you may soon hit the default 1 megabyte value byte size limits of Membase. Again, a level of indirection is useful here, by have another item that lists the lists...

topic-X => +0+1 topic-X.0 => ... many actual items ...
topic-X.1 => ... more actual items ...

The "topic-X" item just lists pointers to items that have the actual lists.

In this approach, you could have randomly self-selected clients decide to add new topic sub-lists (topic-X.N) and APPEND'ing updated info to the "index" item (topic-X).

Other randomly self-chosen clients could attempt to compress topic sub-lists that are old.

Multi-GET

Once your client application has a list of keys, the highest performance approach to retrieve the actual items is to use a multi-GET request. Doing so allows for concurrent retrieval of items across your Membase cluster. This will perform better than a serial loop that tries to GET for each item individually and sequentially.