Differences between Membase and CouchDB
Hello,
we are developing a social game and we decided to use a NoSQL solution.
I've read that the first "integrated" release of Couchbase (Membase + CouchDB) will be released soon but it's not yet ready.
So, what do you suggest to do in the meanwhile?
Should we develop the game with Membase? With CouchDB? Or both, if it's possible?
Both Membase and CouchDB are key-value database management systems, it isn't?
I ask this because in the Couchbase forum I've read a thread about using Membase *and* CouchDB, is this possible?
What's the better solution?
Thanks.
Hi Perry,
thanks for your answer.
What do you mean with "use Membase to cache your CouchDB data"?
Membase already store data to disk, it isn't? How can I mix these data with CouchDB?
At the moment I've installed Membase and the php client, and I call the php functions with set/get
from Flash (our application is in Flash) with ZendAmf.
I have some questions about this:
- with ZendAmf we set a persistent connection with Membase, so each time I access Membase from Flash I
just call the php functions with set/get without reconnecting to Membase each time.
In the constructor of the php class for ZendAmf I do this:
$memcache = new Memcache;
$memcache->connect("localhost",11211);
$GLOBALS["memcache"] = $memcache; // global variable
And inside the php functions (called from Flash) I do this:
$GLOBALS["memcache"]->set("key", "value");
$GLOBALS["memcache"]->get("key");
Is this right?
Is "new Memcache" right? I'm a bit confused about the difference between "Memcache" and "Memcached".
With "set" the key/value is stored to disk, and the caching is managed transparently, is this right?
- With Membase I can set and get a key/value but I can't do query MySQL-like, it isn't?
I've read that I have to use the TAP interface to query the data, but how can I do this from Flash?
What do you suggest me to do?
Thanks again.
Andrea
Andrea,
-To "use Membase to cache your CouchDB data" I was referring to using Membase in the same way you would use memcached:
1) check to see if data is in Membase
2) If it is, you get it back fast!
3) If not, retrieve it from your database and put it into Membase
This will allow you to do any indexing/querying within CouchDB and use Membase to serve and scale your application. You're right that Membase stores the data to disk (which provides you a very reliable cache), but you've also correctly identified that the querying capabilities of Membase are somewhat lacking today. This is exactly what we're working very hard to greatly improve in 2.0.
To answer your other questions:
-Your code looks correct.
-The "new Memcache" is simply instantiating a new object for you to use later in the code.
-Don't get too confused with memcache versus memcached...they're basically the same. Within PHP, there are two different client libraries (one with the 'd' and one without). We recommend the "pecl-memcached" library since it is generally stabler and has more development going on.
-With Membase you can get/set a key-value...unfortunately there isn't a SQL-like interface to the data.
-There's no functionality out of the box to use TAP from Flash. There is a lot of development going on right now in this area, it may make sense for you to use some of the existing TAP clients as an interim: http://techzone.couchbase.com/wiki/display/membase/TAP+Protocol
Of course you could also write your own for PHP/Flash!
Thanks so much Andrea, hope I was able to answer your questions, let me know if there's anything else you need.
Perry
Hi Perry,
thanks for your help! We decided to use only Memcache, because the query in our game are very simple (except for the score ranking that will be simpler until the release of the "integrated" Membase) and we can do all the work using only set/get operations.
Thanks again!
Andrea
Sounds good Andrea. Are you using the Membase Server version of memcached? It provides you a wealth of extra monitoring and management capabilities.
Perry
Hi Perry,
how can I find which version of Memcached I'm using?
I have one more question: is it possible to get values for multiple keys?
Here (http://www.php.net/manual/en/memcache.get.php) I've read that I can do it in this way:
$p_Values = $GLOBALS["memcache"]->get($p_Keys);
Where $p_Keys is an array of keys. I've tried this, but it doesn't works.
Where is the problem?
Thanks!
Andrea
You can find the version of Membase by going to the "About" box in the UI, but I was simply asking whether you are using memcached buckets inside of Membase Server instead of using just the regular memcached...sounds like you are.
As far as multi-gets go, they should definitely be working.
I assume a single-get from that same code works correctly? Are you getting any errors back from the multi-get?
Perry
Hi Perry,
during the installation of the Membase Server I've set "Bucket type" to "Membase", then I've manually installed
the following in Linux:
$ yum -y install libmemcached
$ yum -y install memcached
$ pecl install memcache
And I use "Memcache" inside PHP. Is this ok?
About the multi-get problem, it works now, I've found the problem is when I try to pass an associative array between Flash and PHP, and I think the problem is ZendAMF. If I pass a normal array it works fine, but if I pass an associative array (from Flash to PHP and vice versa) it doesn't works.
Do you know why? Is this a limit of ZendAMF?
I also have two questions about Membase:
- in MySQL we often need to add fields to MYSQL tables. How can we do this in Membase?
Is there a way to "add a field" to our bucket?
In our game we are using a bucket to store user info. The key is the user id, the value is an associative array
with all the info. How can we add an association inside all associative array in the bucket?
For example:
Bucket:
0 -> { exp:10 , level_sp:0 }
1 -> { exp:20 , level_sp:1 }
2 -> { exp:30 , level_sp:2 }
And I need to add a new "field" called level_mp, set to a default value (0) for all users (keys)
0 -> { exp:10 , level_sp:1, level_mp:0 }
1 -> { exp:20 , level_sp:2, level_mp:0 }
2 -> { exp:30 , level_sp:3, level_mp:0 }
Is there a simple/fast way to do this (like I do in MySQL with PhpMyAdmin)?
- If I need to store data for a second entity, for example "objects", what's the better/clean way to do this?
Should I mix "users" and "objects" in the same bucket?
For example:
user_0 -> { ... }
user_1 -> { ... }
user_2 -> { ... }
obj_0 -> { ... }
obj_1 -> { ... }
obj_2 -> { ... }
Or should I use two buckets (one for the users and one for the objects)? In this case, how can I create a second bucket and how can I select/access the right one?
Thanks,
Andrea
Hi, Andrea,
I'm not sure about the ZendAMF framework -- perhaps you could find out about that via some PHP-specific forum.
The Memcache module is OK to use in PHP. While it doesn't have as many options and features as the Memcached module, it works and you should be fine using it.
I think you will do well to separate the RDBMS idea of a table from your mental picture when using a key-value store like Membase. There are no fields in the sense you're thinking of. The data you're storing is serialized from a PHP object into a string representation of that object, and then that string is stuffed into Membase as the value for a key. Membase itself doesn't know anything about the contents of that value, and nothing about the structure of your PHP object that it represents.
There are various ways to cope with this. Probably the simplest is to just define default values for various fields that your application uses. This way, if you retrieve an object from Membase which doesn't contain the new "level_mp" field, you simply assign the default value after you get the object. Instead of trying to update all the entries in Membase, you control this via your application code.
If you'll often be interested in only one part of an object, then you might split the object into multiple key-value pairs when you store them. So instead of storing something like "foo" => "{ exp:10 , level_sp:1, level_mp:0 }", you might store three keys, like "foo.exp" => 10, "foo.level_sp" => 1, "foo.level_mp" => 0. This makes it more complicated when you need to delete a user -- you must do 3 deletes in this case, instead of 1. But it lets you retrieve just the info you need, which may be useful.
You can find more tips on how to model your data at:
http://techzone.couchbase.com/wiki/display/membase/What+should+I+store+i...
http://techzone.couchbase.com/wiki/display/membase/Objects+that+refer+to...
Probably you will choose to mix "users", "objects" and whatever else for your single application into one bucket. In this sense, a bucket is really more like MySQL's "database" concept than "table". Usually you'll create separate buckets when you want different levels of access (e.g., different passwords) for some reason.
Regards,
Tim
Hi Tim,
thanks for your answers.
I've tried to access your links but I can't open them, I receive this message: "Page level restrictions have been applied that limit access to this page."
I have one more question: in our game we need a system to reset users experience points every week and every month (for our rank).
With MySql we used a simple cron task with, for example: "update db.usertable set exp_week=0".
How can we do this with Membase? Is there a simple way to set a "field" of all keys to a specific value?
For Example in this situation
0 -> { exp:10 , exp_week:10 }
1 -> { exp:10 , exp_week:10 }
2 -> { exp:10 , exp_week:10 }
At the end of each week the exp_week "field" must be set to 0:
0 -> { exp:10 , exp_week:0 }
1 -> { exp:10 , exp_week:0 }
2 -> { exp:10 , exp_week:0 }
Thanks again,
Andrea
Andrea,
I'm sorry about the permissions problem on those wiki pages. I will check with the admin regarding that.
The text of the first page is:
Membase is most suited towards fast-changing data items of relatively small size. By relatively small, Membase inherits Memcached's default configured 1 megabyte limit for each item value. For example, think shopping carts, user profile, user sessions, timelines, game states, pages, conversations and product catalog, instead of large audio or video media blobs.
How should I store an object?
Membase, similar to Memcached, can store any binary bytes, and the encoding is up to you or your client library. Some memcached client libraries, for example, offer convenience functions to serialize/deserialize objects from your favorite web application programming language (Java, Ruby, PHP, Python, etc) to a blob for storage. Please consult your client library API documentation for details.
An additional consideration on object encoding/seralization is whether your objects will need to be handled by multiple programming languages. For example, it might be inconvenient for a Java client application to decode a serialized PHP object. In these cases, consider cross-language encodings such as JSON, XML, Google Protocol Buffers or Thrift.
The later two (Protocol Buffers and Thrift) have some advantages in providing more efficient object encodings than text-based encodings like JSON and XML. One key to Membase performance is to watch your working set size, so the more working set items you can fit into memory, the better.
On that note, some client libraries offer the additional feature of optionally compressing/decompressing objects stored into Membase. The CPU-time versus space tradeoff here should be considered, in addition to how you might want to version objects under changing encoding schemes. For example, you might consider using the 'flags' field in each item to denote the encoding kind and/or optional compression. When beginning application development, however, a useful mantra to follow is to just keep things simple.
The text of the second page is:
Although Membase is a key-value store and you can store any byte-array value that you wish, there are some common patterns for handling items that refer to other items. Some example use cases...
User 1234 is interested in topics A, B, X, W and belongs to groups 1, 3, 5
Shopping Cart 222 points to product-1432 and product-211
A Page has Comments, and each of those Comments has an Author. Each Author, in turn, has a "handle", an avatar image and a karma ranking.
Nested Items
You can store serialized, nested structures in Membase, such as by using encodings like JSON or XML (or Google Protocol Buffers or Thrift). A user profile item stored in Membase can then track information such as user interests. For example, in JSON...
{ "key": "user-1234",
"handle": "bobama",
"avatarURL": ...,
"interests": [ "A", "B", "X", "W" ],
"groups": [ 1, 3, 5 ],
...
}
If the above is stored in Membase under key "user-1234", you can then know the interests for that user by doing a simple GET for user-1234 and decoding the JSON response.
Simple Lists
To handle reverse lookups (who are the users interested in topic X?), a common solution is to store simple lists. For example, under key "topic-X", you might have store the following list...
user-1234,user-222,user-987,
Such lists can be easily constructed by using Membase's APPEND or PREPEND operations, where you append/prepend values that look like "user-XXXXX,".
Note that the list is delimited by commas, but that can be any character you choose.
Handling List Item Deletion
The above works when a user registers her interest in a topic, but how can you handle when a user wants to unregister their interest (eg, unsubscribe or unfollow)?
One approach is to use the CAS identifiers to do atomic replacement. A client application first does a GET-with-caS (a "gets" request in the ascii protocol) of the current list for a topic. Then the client removes the given user from the list response, and finally does a SET-with-CAS-identifier operation (a "cas" request in the ascii protocol) while supplying the same CAS identifier that was returned with the earlier "gets" retrieval.
If the SET-with-CAS request succeeds, the client has successfully replaced the list item with a new, shorter list with the relevant list entry deleted.
The SET-with-CAS-identifier operation might fail, however, if another client mutated the list while the first client was attempting a deletion. In this case the first client can try to repeat the list item delete operation.
Under a highly contended or fast mutating list however (such as users trying to follow a popular user or topic), the deleting client will have a difficult time making progress. Some approaches to handle this situation are described next...
Handling Highly Contended List Item Deletion
Instead of performing a SET-with-CAS to perform list item deletion, one pattern is to explicitly track deleted items. This could be done using APPEND for list additions and PREPENDS for list deletions, with an additional "tombstone" deletion character. For example, anything before the "|" character is considered deleted...
user-222,|user-1234,user-222,user-987,
So, after the client library retrieves that list and does some post-processing, the effective, actual list of interested subscribers is user-1234 and user-987.
Care must be taken to count correctly, in case user-222 decides to add themselves again to the list (and her clicks are faster than whatever logic your application has to prevent duplicate clicks)...
user-222,|user-1234,user-222,user-987,user-222
A similar encoding scheme would use '+' or '-' delimiter characters to the same effect, where the client sends an APPEND of "+ID" to add an entry to a list, and an APPEND of "-ID" to remove an entry from a list. The client application would still perform post-processing on the list response, tracking appropriate list entry counts. In this and other encodings, we must take care not to use the delimiter characters that were chosen...
+1234+222+987-222
Yet another variation on this would be store deleted items to a separate paired list. So your application might have two lists for a topic, such as a "follow-X" and "unfollow-X".
Compressing Lists
Eventually, your application may need to garbage collect or compress the lists. To do so, you might have your client application do so by randomly piggy-backing on other requests to retrieve the list.
Again, with heavily contended, fast mutating list, attempts to compress a list may be fruitless as SET-with-CAS attempts can fail. Some solutions, as with many in software engineering, involve adding a level of indirection. For example, you could keep two lists for each topic, and use marker items to signal to clients which list is considered active:
topic-X.a => +1234+222+987-222
topic-X.b => (empty)
topic-X.active => topic-X.a
A client could multi-GET on topic-X.a and topic-X.b, and the combined result would contain the full list. To mutate the list, the client would look at the "pointer" item of topic-X.active, and know to APPEND values to topic-X.a.
A randomly self-chosen client may choose to garbage-collect the active list when it sees the list length is large enough, by writing a compressed version of topic-X.a into topic-X.b (note: XXX) and by flipping the topic-X.active item to point to "b". New clients will start APPEND'ing values to topic-X.b. Old, concurrent clients might still be APPEND'ing values to the old active item of topic-X.a, so other randomly self-selected clients can choose to help continue to compress topic-X.a into topic-X.b so that topic-X.a will be empty and ready for the next flip.
An alternative to a separate "topic-X.active" pointer item would be instead to PREPEND a tombstone marker value onto the front of the inactivated list item. For example, if '^' was the tombstone marker character, all concurrent clients would be able to see in that a certain list should not be APPEND'ed to...
topic-X.a => +1234+222+987-222
topic-X.b => ^+1234
There are concurrency holes in this "active flipping" scheme, such as if there's a client process failure at the step noted above at "XXX", so for periods of time there might be duplicates or reappearing list items.
In general, the idea is that independent clients try to make progress towards an eventually stabilized state. Please consider your application use cases as to whether temporary inconsistencies are survivable.
Large Lists
If your lists get large (e.g., some user has 200,000 followers), you may soon hit the default 1 megabyte value byte size limits of Membase. Again, a level of indirection is useful here, by have another item that lists the lists...
topic-X => +0+1
topic-X.0 => ... many actual items ...
topic-X.1 => ... more actual items ...
The "topic-X" item just lists pointers to items that have the actual lists.
In this approach, you could have randomly self-selected clients decide to add new topic sub-lists (topic-X.N) and APPEND'ing updated info to the "index" item (topic-X).
Other randomly self-chosen clients could attempt to compress topic sub-lists that are old.
Multi-GET
Once your client application has a list of keys, the highest performance approach to retrieve the actual items is to use a multi-GET request. Doing so allows for concurrent retrieval of items across your Membase cluster. This will perform better than a serial loop that tries to GET for each item individually and sequentially.
Regards,
Tim
Andrea,
To answer your last question, there's no built-in way to do this. Membase is a key-value store and doesn't provide query capabilities, although the forthcoming Couchbase 2.0 product incorporate the rich query capabilities from the CouchDB product into Couchbase Server.
There are a number of ways to make this work well in your application, however. One simple approach would be to store the experience points with a key name that includes the week value.
"user_0;expweek;201118" (for week 18 of year 2011)
Increment that value when needed. When the week changes to week 19, the application will look for a new key, and can start from 0 (or whatever default) again.
There are variations on this, such as storing the week value in another key:
"user_0;week" => 3
"user_0;week;exp;3" => 10
Your app first does a GET for "user_0;week", to look up the current week for that user, then does another GET for the actual experience points. Then you can increment the week counter (to 4), and when the app looks up the experience points, it will find no such value and start over from the default (0).
When using this kind of system, you'll want to ensure that such keys are set to expire, so that they get removed from the cache when they're not needed anymore.
This is just one approach. The trick is to change the way you think about your data, and adapt it to a key-value system. This is a challenge at first, but it is also what enables a non-relational data store to scale so effortlessly across many servers.
Regards,
Tim
Hi Tim,
thanks a lot for your help.
What do you think is the best method?
The first one ("user_0;expweek;201118") or the second one (storing the week value in another key)?
I have to do this for exp week (reset exp_week each week) and for exp month too (reset exp_month each month).
I don't understand this part:
"When using this kind of system, you'll want to ensure that such keys are set to expire,
so that they get removed from the cache when they're not needed anymore."
What do you mean with "ensure that such keys are set to expire"?
I'm using Membase and I thought that the data was saved in the disk with no expire, it isn't?
With "removed from the cache" you mean from the disk or from the ram (memcache)?
How can I set to expire such keys?
Andrea
Hi Tim,
What do you think about this structure:
"user_0" => {exp:"900", expweek:"201118_200", expmonth:"201104_400", ...}
In this way I store info about week (year 2011, week 18) and month (year 2011, month 04) directly in the value of user, with no need of more keys.
For each exp update, I check che current week/month and I compare it with the expweek and expmonth saved for the user, if they are different, I update the value for the user with the new expweek/expmonth value set to _0 and/or .
Is this ok?
I have one more question.
We are using Membase for a social game on Facebook, can you tell me approximately how big should be the server to handle the game?
For example which server should we need to support a social game with 100.000 users or 500.000 users?
Should we start with one server and then scale in the future? Or it's better to start with more than one server?
Thanks for your help.
Andrea
Andrea,
This should work fine as well. There's no set rule about this, and using aggregates like this may be simpler for your application design, and that's fine.
You asked about item expiration. Membase has a Time To Live feature, when you issue a SET, you can specify a time to live for that key/value pair. After the TTL expires, the item will no longer be available from the cache. A background process will clean this item up from the disk eventually. So this is one way to manage what data is kept around. Simply store a value with a TTL, and then don't worry about removing it at a later date. It'll be removed as appropriate, and you don't even need to store a reference to it anywhere.
As for sizing, you can use the sizing info here to help determine how much RAM you'll need:
http://techzone.couchbase.com/wiki/display/membase/Sizing+Guidelines
It makes sense to start with at least 2 servers, which can provide higher availability. You can grow your cluster later on if needed, as your game becomes more popular.
Tim
Hi Tim,
thank you very much!
Andrea
Hi,
I have one more question about Membase.
In our game a user can build a lot of objects and for each object we have a timer that represents the (real) time required for the construction, for example 4 hours.
At the end of each timer we need to update the user value in the bucket (user object built), so we have asynchronous writing to the same key.
How can we handle this situation of parallel writing to the same key in the bucket?
Is there a semaphore (or similar) in Membase to lock a key, so to prevent race condition?
Thanks.
Yes, you should be using the "cas" operation. Basically, when you perform a "get" of a keys (using the "gets" operation) you will receive a cas ID. When you perform a "cas", you supply both the value you are trying to set and the cas ID. If the ID has changed (because some other process has already modified the same key) this operation will fail and it is up to the application to either move on or re-read the key and try again.
Does that make sense?
Perry
Ok thanks!
Hi there, thanks for your interest!
You're right, the first integrated release will be available soon...
That release will keep the same memcached/Membase interface for data access and augment it with the CouchDB "view engine" so you're probably best off developing with the memcached protocol for the time being. You can also get comfortable with how to do indecises and map-reduce with CouchDB, and that won't change going forward.
In terms of a very short-term solution, you could certainly use Membase to "cache" your CouchDB data similar to how you would use memcached to cache any data.
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!