To handle reverse lookups (who are the users interested in topic X?), a common solution is to store simple lists. For example, under document ID "topic-X", you might have store the following list:
Such lists can be easily constructed by using Couchbase's APPEND or PREPEND operations, where you append/prepend values that look like "user-XXXXX,".
Note that the list is delimited by commas, but that can be any character you choose.
The above works when a user registers her interest in a topic, but how can you handle when a user wants to unregister their interest (eg, unsubscribe or unfollow)?
One approach is to use the CAS identifiers to do atomic replacement. A client application first does a GET-with-caS (a "gets" request in the ascii protocol) of the current list for a topic. Then the client removes the given user from the list response, and finally does a SET-with-CAS-identifier operation (a "cas" request in the ascii protocol) while supplying the same CAS identifier that was returned with the earlier "gets" retrieval.
If the SET-with-CAS request succeeds, the client has successfully replaced the list item with a new, shorter list with the relevant list entry deleted.
The SET-with-CAS-identifier operation might fail, however, if another client mutated the list while the first client was attempting a deletion. In this case the first client can try to repeat the list item delete operation.
Under a highly contended or fast mutating list however (such as users trying to follow a popular user or topic), the deleting client will have a difficult time making progress.
Instead of performing a SET-with-CAS to perform list item deletion, one pattern is to explicitly track deleted items. This could be done using APPEND for list additions and PREPENDS for list deletions, with an additional "tombstone" deletion character. For example, anything before the "|" character is considered deleted:
So, after the client library retrieves that list and does some post-processing, the effective, actual list of interested subscribers is user-1234 and user-987.
Care must be taken to count correctly, in case user-222 decides to add themselves again to the list (and her clicks are faster than whatever logic your application has to prevent duplicate clicks):
A similar encoding scheme would use '+' or '-' delimiter characters to the same effect, where the client sends an APPEND of "+ID" to add an entry to a list, and an APPEND of "-ID" to remove an entry from a list. The client application would still perform post-processing on the list response, tracking appropriate list entry counts. In this and other encodings, we must take care not to use the delimiter characters that were chosen:
Yet another variation on this would be store deleted items to a separate paired list. So your application might have two lists for a topic, such as a "follow-X" and "unfollow-X".
Eventually, your application may need to garbage collect or compress the lists. To do so, you might have your client application do so by randomly piggy-backing on other requests to retrieve the list.
Again, with heavily contended, fast mutating list, attempts to compress a list may be fruitless as SET-with-CAS attempts can fail. Some solutions, as with many in software engineering, involve adding a level of indirection. For example, you could keep two lists for each topic, and use marker items to signal to clients which list is considered active:
topic-X.a => +1234+222+987-222 topic-X.b => (empty) topic-X.active => topic-X.a
A client could multi-GET on topic-X.a and topic-X.b, and the combined result would contain the full list. To mutate the list, the client would look at the "pointer" item of topic-X.active, and know to APPEND values to topic-X.a.
A randomly self-chosen client may choose to garbage-collect the active list when it sees the list length is large enough, by writing a compressed version of topic-X.a into topic-X.b (note: XXX) and by flipping the topic-X.active item to point to "b". New clients will start APPEND'ing values to topic-X.b. Old, concurrent clients might still be APPEND'ing values to the old active item of topic-X.a, so other randomly self-selected clients can choose to help continue to compress topic-X.a into topic-X.b so that topic-X.a will be empty and ready for the next flip.
An alternative to a separate "topic-X.active" pointer item would be instead to PREPEND a tombstone marker value onto the front of the inactivated list item. For example, if '^' was the tombstone marker character, all concurrent clients would be able to see in that a certain list should not be appended to:
topic-X.a => +1234+222+987-222 topic-X.b => ^+1234
There are concurrency holes in this "active flipping" scheme, such as if there's a client process failure at the step noted above at "XXX", so for periods of time there might be duplicates or reappearing list items.
In general, the idea is that independent clients try to make progress towards an eventually stabilized state. Please consider your application use cases as to whether temporary inconsistencies are survivable.
If your lists get large (e.g., some user has 200,000 followers), you may soon hit the default 1 megabyte value byte size limits of Couchbase. Again, a level of indirection is useful here, by have another item that lists the lists...
topic-X => +0+1 topic-X.0 => ... many actual items ... topic-X.1 => ... more actual items ...
The "topic-X" item just lists pointers to items that have the actual lists.
In this approach, you could have randomly self-selected clients decide to add new topic sub-lists (topic-X.N) and APPEND'ing updated info to the "index" item (topic-X).
Other randomly self-chosen clients could attempt to compress topic sub-lists that are old.