Every now and then you get the opportunity to prove yourself on a big stage. Four weeks ago we got the chance. Couchbase user, OMGPOP, had launched Draw Something – a Pictionary-like game that was going viral. Capitalizing on what would end up being an unprecedented growth opportunity, they absolutely had to scale their database. There were two non-negotiables: performance had to be sustained and the game could never be taken off line, even as the number of users and games rapidly accelerated.
Scaling to support the fastest growing mobile game of all time isn’t trivial. From a standing start, Draw Something has ballooned to more than 50 million downloads in a few short weeks (and is still adding roughly two million downloads a day). Daily Active Users (DAU) number 15 million – surpassing by far that of the next closest game, Words with Friends. Tens of millions of games have been started. Over 3,000 drawings are generated every second adding to the two billion that have already been created and stored.
And the growth hasn’t stopped. OMGPOP now has their sights set on 100 million downloads.
This kind of growth is every application developer’s dream, but if scalability is not planned for up front, it can become a developer’s worst nightmare. The fastest way to kill a growing game is to make users wait, or worse, to tell them to return when you can handle the load. Just ask EA.
Launched at roughly the same time as Draw Something, EA’s Simpson’s Tapped Out was also getting tremendous adoption. Growing virally, it reached #2 on the Apple App Store in early March before running into scaling problems. Within a few days, the game was pulled from the App Store with only existing players continuing to play the game. As of a few weeks later, it still hasn’t returned.
Fortunately, OMGPOP had planned up front for scalability, selecting a NoSQL database as their primary data storage technology. In their case, they were using Couchbase Server, and things were humming right along – we weren’t even aware Couchbase was behind the game.
We became aware of our role in their success when they called us to ensure they were employing best practices in planning for growth of their database cluster. As the number of users, games, and drawings grew at an unprecedented rate, they were able to continuously add capacity to the cluster (growing to over 100 servers), while maintaining application performance and with zero application downtime. There was never a performance drop or a single moment when new players couldn’t join the party – even in the face of dying hardware! At one point, a motherboard issue with their selected hardware was taking cluster members down at a frightening pace. Couchbase took even those failures in stride without interrupting game operation or performance.
Developing innovative applications is, of course, the foundation of OMGPOP and other companies’ success. While the scaling requirements of a hit social game may be more extreme than those of most applications, the importance of being able to easily scale your application without downtime while maintaining performance is critical to the success of most applications. It’s important that developers pick a database that not only supports the operation of the app itself but also is able to easily scale. Congratulations to OMGPOP for developing an innovative game that is a huge hit around the world. We’re happy Couchbase could play a part in the success.
Cool. That is something to brag about.
So how many documents are write and read per second with 100 nodes? Is it cloud environment or dedicated HW?
Hi Artur –
They are running Couchbase on dedicated hardware (managed by a hosting provider) and averaging about 120,000 doc reads and writes, per second, in support of the game. The cluster is also multi-replicating data at similar rates across the cluster (which has allowed us to absorb many hardware failures in the last couple weeks).thanks-
A few (fun) questions:
How many servers has OMGPOP killed/taken down with Draw Something?
What\’s their rate of server killing (drive/server failure).
Is it more or less than the expected hardware failure for the systems they\’re running on?
What\’s it cost OMGPOP/Zynga for this type of hosting service level?
Wondering if fully hosted solutions such as CloudAnt could be this responsive.
Or do you need to hire an admin to scale this readily.
It\’s not immediately clear to me whether fully hosted solutions (such as CloudAnt) are selling instances of a server, or an infrastructure of servers that can be scaled on demand.
You should talk with Cloudant to get insight into their offerings and capabilities. They are best prepared to answer those questions. Couchbase does support hosted deployment of our software on EC2 via rightscale. Very easy to use. Just google couchbase and rightscale.
This isn\’t the greatest place to talk about Cloudant or other non-Couchbase vendors, but I\’ll address the specific questions you raise.
– Cloudant has similar customer stories of responding to load requirements.
– You would not need to hire an admin because Cloudant manages everything for you, whether it\’s in your data center or Cloudant\’s. You just interact with the API.
– You can scale the number of nodes in your Cloudant cluster.
Feel free to reach out to me if you want to talk more about Cloudant.
I wonder if they are having the same issue with the rebalance feature. Are they using the couchbase 1.8.x? We are using membase 1.7.1 with more or less 200million items on 9 node aws cluster, however we are having critical issue on rebalance keep on failing.
Hi screwdisk –
Sorry you are having trouble. Are you getting the help you need? I\’m happy to get you connected with our support resources if you want to ping me on email. I\’m james at couchcbase.
We are in the process of looking for someone to help us build out our arch for an upcoming social game and well none of us here at the office have any big exp with running sites or handling data in the millions per hour or more and being able to keep scaling with out any downtime.. which of course as in EA\’s case bit them in the ass.
We communicate with our game client via json endpoints, and write lots of data and of course read that data as well upon client start up.. We need to be able to handle social game traffic as our app grows on iOS, and Facebook.. is Right Scale with + CouchBase a good solution and also a solution where you dont need a full time admin , also would it be recommended to run in the cloud aws or really metal hardware.
Do we know if they are using version 1.8 or 2.0?
I\’m coming from mongo, and I\’ve been seeing some reports that have led me to search for other options for certain projects such as this awesome example. After further research, this sounds like an awesome solution. I work with Node, and right now it seems that node-memcached is the only way to go. Having never used memcached, although it\’s drop-in, are there important features I would miss out on?
While it\’s true that node-memcached is the production ready option today, we do have a project where we\’re moving along with a libcouchbase based node client. Please email me (matt) at Couchbase if you\’d like a preview.
Using node-memcached, you wouldn\’t be missing anything really. You would have to interact with the view REST interface directly, but it\’s well documented and simple HTTP, which node makes really easy of course.
You can see some rough ideas for it here:
… and more discussion has been happening among contributors and the core developers both over email and IRC.