Does Web Game Scalability Demand NoSQL?

Farmville Screen ShotThe database technology that holds all game variables in a server has are two paths to go down: First, the traditional Relational Database Management Systems (RDBMS) like MySQL, SQL Server and PostgreSQL, or second, the much trendier NoSQL databases such as membase, MongoDB and others.

Traditional databases (aka SQL databases) hold data in tables with each column defining an attribute (such as userid, password, email address, game number, etc.), and each row holding one value. Data is retrieved or updated using a special purpose query language called SQL. Columns have a specified type like “int” or “string,” which correspond to types in programming languages.

For example here is the users table. The first line shows the name of columns and the four rows are the actual data:

userid,password,email address,game number
120000,”xyzzy”,david@dice.com,1
120001,”uthry”,fred@dice.com,1
120002,”tyuhgt”,mike@dice.com,1
120003,”otomh”,dawn@dice.com,2

Note: passwords should always be stored encrypted, not in plain text like I’ve shown in the example!

Now, here is some SQL to retrieve some of that data:

select userid,emailaddress,gamenumber from users
where userid > 120000

That would return this:

120001,fred@dice.com,1
120002,mike@dice.com,1
120003,dawn@dice.com,2

The SQL returns three rows, each holding the email address, user ID and game number for all rows where userid is higher than 120,000. Another table called “cities” might hold information on all cities, say if it was a city conquest game. That might have columns called “gamenumber,” “cityid,” “cityname,” “ownerid.” Here is some more SQL:

select cityid,emailaddress from cities, users
where cities.gamenumber=users.gamenumber
and ownerid=userid
and userid=120000

Both cities and users have a column called “gamenumber,” so we need to prefix them with the table name.

That SQL returns a list of city ids all belonging to userid 120000, along with the same email address, david@dice.com. This is called a “join,” where data is extracted from two or more tables. There is another syntax for joins that I haven’t shown.

Joins are very powerful but also the reason why NoSQL databases have become popular. Doing joins is an expensive process and if you have millions of users, it could be very expensive and slow.

NoSQL Databases

There’s no standard way of holding data in NoSQL; that’s up to the individual database server. But what NoSQL does provide is read/write access to large quantities of data very quickly. Zynga’s Farmville uses membase, a NoSQL database that was developed by NorthScale, and project co-sponsors Zynga and NHN. This has now been replaced by Couchbase.

Crudely put, NoSQL databases store data in large key indexed arrays and are optimized to retrieve data very rapidly, at least ten times faster than relational databases. JavaScript Object notation (JSON) is often used to store data in a structured format.

So for the cities example, the city data retrieved for game:1;user:120000 (the key) might be this block of JSON:

[sourcecode language="javascript"]
{
"gameNum" : 1,
"ownerPlayerNumber": 120000,
"cityName": "DiceVille",
"level": 45,
"population": 25,
"making": {
"fighters": 10,
"bowmen": 25
}
}
[/sourcecode]

For social games in particular, NoSQL is seen as being much faster because players do lots of dragging or clicking on game objects, and each of these is a change to the game state and has to be stored away.

Bear in mind though that NoSQL is not a perfect replacement for RDBMS. The older relational databases were developed at a time when the data had to be safely stored.

The term for this is ACID — an acronym of Atomicity, Consistency, Isolation, Durability — which referred to how they handle transactions when making changes to multiple tables at one time. If one update failed, everything had to be rolled back to guarantee the data was in a consistent state. Think of it this way: You can’t take money from one bank account and then fail to update the target bank account!

NoSQL doesn’t give you quite the same cast iron guarantees that RDBMS does, but then they may not be needed so much. Though caching data in RAM is used extensively to massively increase speed, it may be minutes before the data gets written to disk. If power fails, the update is lost. This caching is frequently done using open source software such as memcached.

Couchbase’s website provides a nice comparison between NoSQL and RDBMS.

Conclusion

Up to a point, server games can handle many users using RDBMS, but once you get into very large numbers, it looks like NoSQL is the way to go. Unless you’re Facebook with MySQL — but I bet they don’t do many joins!

Related Links

Comments

  1. BY michael morse says:

    For any industry, include mobile/social gaming, i would think long and hard before committing
    to a nosql data store for your primary storage. You are simply painting yourself into a corner
    with the limitations and relative immaturity of nosql products, among them, security, lack of
    data integrity, monitoring tools, data architecture and community user experience. Nosql is great for getting a head start in implementing your application without a lot of data management experience, and for that very reason, can get companies into serious trouble down the road. It is certainly possible to find use cases for certain applications within a company where nosql will fit, but it is definitely not all purpose.

    Within gaming, you have several distinctly different types of data, statistics, game data, user data, etc.. you can not allocate the same priority of requirements to all of them.

    Sharded mysql keeps all security, monitoring and query options on the table. If you really need nosql type behavior, use a nosql protocol with mysql such as handlersocket to get the benefits of bypassing query planning and locking, which will result in better performance than nosql, yet keeping the ACID properties of innodb. There is a reason why Facebook (and Twitter and many other large web properties) are known for using sharded mysql, it’s a good all-purpose solution that reasonably covers the points above and indeed they have certain applications that require more complex queries. Now, they also use a number of other solutions to fit specific needs for various applications, among them, nosql.

    Regarding memcached, your should NEVER write to memcached with the idea that it has any durability, it should be used primary for reads on either lookup data that does not change often or data that is not need to be durable. Databases (mysql) use their own RAM to cache data and immediately commit changes to a log, so indeed it is durable, fulfilling the ‘D’ in ACID.

    Using mysql means a little more planning and tuning, and may require some research or help from a professional, but will be well worth the headaches that are avoided.

  2. BY Ben Ballard says:

    While it’s certainly true that Online Games need the kind of scalability that traditional RDBMSs struggle to provide, NoSQL is not the only alternative. There are many ways to tackle the issue of scalability.

    Many of the solutions for scalability involve some pain. Scaling up with more expensive hardware can be very expensive. Adding a caching technology can alleviate read-intensive workloads, but adds complexity to the application platform. NoSQL may address scalability at the expense of consistency, and also the loss of the relational data model which has many benefits. Sharding can also provide scalability, but adds complexity to the application. Switching from a familiar database technology to something with a different storage model and a unique data access API could mean a lot of development costs.

    There is no one solution that fits every use case, and we’re living in a world where the scale of data in terms of size, variety, and velocity has reached heights that no one database can adequately support. There is no “one size fits all” database anymore, if there ever was. So for many applications multiple databases should be used to handle data throughout the lifecycle from real-time individual transactions to historical aggregations and analytics.

    I work for VoltDB, which a fast NewSQL database. VoltDB may or may not be the best fit for your particular application, but it is an excellent fit for many online gaming applications as we’ve seen through our customers and community users. We have taken the approach of focusing on the high velocity OLTP use cases and building an architecture that is built around in-memory operations, eliminating the concurrency management bottlenecks of traditional RDBMSs, and automatically sharding or partitioning across nodes in a shared-nothing architecture. This solves the scalability issues for many high velocity use cases, without losing the benefits of SQL, ACID-compliance, and a relational model.

  3. BY jp says:

    Sounds a lot like InterSystems Caché. http://www.intersystems.com/cache/

  4. Pingback: A NoSQL Alternative — Disk Folders - Dice News

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>