SQL vs. NoSQL: Which Is Better?

Not too long ago, this was the only data-storage device most companies needed. Those days are over.

For the past 40-some years, relational databases have ruled the data world. Relational models first appeared in the early 1970s thanks to the research of computer science pioneers such as E.F. Codd. Early versions of SQL-like languages were also developed in the early 70s, with modern SQL appearing in the late 1970s, and becoming popular by the mid-1980s.

For the past couple of years, the Internets have been filled with heated arguments regarding SQL vs. NoSQL. But is the fight even legitimate? NoSQL databases have grown up a bit (and some, such as Google’s BigTable, are now mature) and prove themselves worthy. And yet the fight continues.

Before I continue, however, I want to make a clarification. In this article, I’m using the term “SQL” rather loosely. Technically speaking, SQL is the language used for accessing a relational database. You can use other languages or techniques for accessing the data. When people argue “SQL vs NoSQL,” they’re actually talking about relational versus non-relational databases. That’s what I’m talking about here: Relational databases (such as Oracle, MySQL, and SQL Server) versus newer non-relational databases (such as MongoDB, CouchDB, BigTable, and others).

Now back to the fight. I spend a great deal of time reading about it, and I’ve noticed that, while the majority of the people simply discuss the merits and differences of the relational versus non-relational databases, a small percentage of the people remain extremely vocal—and even angry—with their adamant support for one over the other. Often, those most vocal proponents are DBAs.

But what about the programmers, who write the client code that access the databases? Where do the disagreements leave them? From a programming perspective, is SQL really that horrible and outdated? Or is the new NoSQL really that awful to work with? Perhaps they both have strengths and good points.

So I’d like to take the disagreement into a different direction: Let’s study the problem at the coding level. Let’s actually look at a couple languages and how they work with the different databases and see if we can draw some conclusions. Our space is limited here, so at the end I’ll offer further thoughts on where you can continue the exploration.

The Old Arguments

Before diving into the code, I want to list some of the more common arguments I’ve seen.

First, here are some arguments against relational databases, the anti-SQL arguments. Now please understand I’m not saying these are necessarily accurate—only that these are arguments others have made—and that I’m sharing them here so we can look into them:

  1. Joins in relational databases can slow the system down to a crawl, especially when millions of users are doing lookups against tables with millions of rows of data. Google and Amazon found this to be the case, and thus developed their own non-relational systems.
  2. Relational data doesn’t map well to typical programming structures that often consist of complex data types or hierarchical data. Data such as XML is especially difficult because of its hierarchical nature. Complex objects that contain objects and lists inside of them do not always map directly to a single row in a single table.
  3. Relational data doesn’t map well. Combine that with the need to handle the syntax of SQL, and writing client code for accessing SQL databases becomes difficult.

Are these valid? If you ask proponents of SQL, you’ll get some good answers about why they’re not. For example, Oracle is extremely powerful and can optimize joins and cache them. Huge banks that serve thousands of branches and millions of customers, including hundreds of thousands of online members, survive just fine using SQL. So here are some arguments I’ve seen for SQL in response to the three big arguments against SQL:

  1. You’re not as big a Google and you’re not searching as much data as Google. (This is one of the most common arguments, and it probably has merit.) Further, today’s relational systems such as Oracle are extremely powerful and mature and can handle joins at the performance level that most systems require. This ain’t your daddy’s Oracle.
  2. Most data can be mapped to a relational database, even XML and complex objects. Complex objects can be mapped to multiple rows divided up through normalization into multiple tables and subsequently retrieved using a View that implements the proper joins.
  3. Code libraries aid in the development of client code, and aren’t that difficult to use.

Also, here’s one more argument for SQL that I’ve seen, but I’m not really going to touch here other than make a very quick point:

  1. Most people who argue against SQL simply don’t understand it.

There may be instances where that’s true, but the majority of the people I’ve worked with who use NoSQL databases have made the transition from SQL, spent years working with SQL, and really do understand it. So I won’t touch that one any further.

There are arguments against NoSQL that aren’t simply defenses of SQL. However, the early pioneers of NoSQL (Google and Amazon especially) developed databases after determining that relational databases didn’t serve their needs—setting the ground for the direct assault on SQL.

But if we look at actual case studies, there are plenty of situations showing that both really do scale. Oracle especially scales well, and a lot of SlashDot readers like PostgreSQL. In light of that, the “won’t scale” argument really doesn’t hold up.

That’s why I want to focus less on the scalability arguments and more on the actual coding. When you learn to code both, you can decide which is a better fit programmatically—for example, a better fit because your objects and classes are too complex to fit nicely into SQL, or perhaps you have an extreme system that churns out millions of records that fit perfectly into SQL. Maybe SQL will work for your project; maybe NoSQL would be better. Meanwhile, a certain subset of people will endlessly proclaim that one is great while the other is horrible and “must die.”

So with that, let’s look at some actual coding examples.

Note: In the following code examples, my goal isn’t to teach you how to code with these libraries. Instead, I’m showing you the code to compare the complexity and features.

NoSQL vs. SQL in Node.JS

One of the somewhat newer up-and-coming platforms is Node.js, which provides for server-side JavaScript. Node.js is useful for writing web applications, and includes drivers for different databases, including both SQL and NoSQL. In general, the drivers have been written by third-party developers. Some do really well, while others leave a bit to be desired.

SQL works well in Node.js with the help of some of these drivers, but you run into a problem where JavaScript language’s objects are dynamic without fixed structures. You can take virtually any object in JavaScript and add members to it dynamically at runtime. This can be both a blessing and a curse; the dynamic nature can be used to create code that’s either powerful or sloppy and bug-ridden.

To help in the effort to create better code, third-party developers have created various model-oriented packages that work well in Node.js, both for SQL and NoSQL databases. Mongoose is a good example for NoSQL, which I’ll demonstrate shortly. For SQL, there are several, including an intriguing one called Patio.

Let’s look at two different SQL approaches in Node.js. Both will use MySQL, with one relying on a strongly-typed model system and the other on a weakly-typed one. For this I’ll use a package called node-db, specifically for the MySQL database; you can find info here.

Weak typing with SQL and Node.JS

Here’s some Node.JS code that uses weak typing:

var mysql = require(‘db-mysql’);

new mysql.Database({ hostname: ‘localhost’,  user: ‘root’,

password: ‘…’, database: ‘mydatabase’

}).connect(function(error) {

if (error) {

return console.log(‘CONNECTION error: ‘ + error);

}

this.query().

select(‘id, email’).

from(‘users’).

where(‘active = ?’, [ true ]).

order({‘timestamp’: false}).

execute(function(error, rows, cols) {

if (error) {

res.send(‘ERROR: ‘ + error);

return;

}

console.log(rows.length + ‘ ROWS found’);

console.log(rows);

res.send(rows);

});

});

This code is straightforward and actually quite elegant. It uses an interesting aspect of Node.js whereby Javascript’s closures can be put to use to handle the issue of asynchronicity, which is useful when dealing with databases. For example, suppose you write sequential code where the first step is saving to the database, and the second is writing out the identity field for the saved row. But you might run into a problem because the call to save to the database is asynchronous, meaning the next statement—writing out the identify field—will occur immediately after the call, whether the save to the database is finished or not. This can be overcome through a type of callback called a promise, which looks like this:

data.save().then(function() {

// do something after save

});

The chaining in the node-db package uses this technique.

Strong typing with SQL and Node.js

Now what about schemas? SQL tables have a fixed schema. Node.js uses the JavaScript language, which allows dynamic classes (meaning you can add any members you like, in contrast to strongly-typed languages such as C++). Before I show the strongly-typed one, I want to point out something interesting about the node-db driver. Notice that I didn’t actually use SQL itself, but I am accessing a relational database. This gets back to the confusion you can see going on in some forums regarding the name NoSQL: it’s a bit of a misnomer.

As mentioned earlier, NoSQL refers to non-relational databases. In this case, we’re not really even using SQL in our code—but we are using MySQL, a relational database. Either way, here’s the sample that attempts to strongly type the data in the weakly-typed JavaScript language. This code uses the Patio package.

For this code, I’m going to use one of their own examples from their Website:

connectAndCreateSchema()

.chain(defineModel, disconnectError)

.then(function(){

var User = patio.getModel(“user”);

var myUser = new User({

firstName : “Bob”,

lastName : “Yukon”,

password : “password”,

dateOfBirth : new Date(1980, 8, 29)

});

//save the user

myUser.save().then(function(user){

console.log(format(“%s %s’s id is %d”, user.firstName, user.lastName, user.id));

disconnect();

}, disconnectError);

}, disconnectError);

This relies on chaining as well, and you can see how the model is a constructor function. I should probably mention the Patio library is quite large, and I’m barely doing it a service by showing such a small amount of code. But you can see how you can create models and keep the data tightly wrapped within those models, more-or-less forcing you and other programmers to not stray from the model. It’s a bit more complex than some others, but it’s still pretty easy to use.

NoSQL and Node.js

Now here’s an example of connecting to MongoDB. For this I’m using a package called mongoose, which you can learn about here.

var PhoneSchema = new Schema({ type: String, number: String });

var ContactSchema = new Schema({ name: String, title: String, company: String, phones: [ PhoneSchema ] });

// Register the model

var ContactModel = mongoose.model(‘Names’, ContactSchema);

// Example of retrieving the data

Function test() {

return ContactModel.find(function(err, list) {

if (!err) {

handlers.success(list);

} else {

handlers.error(err);

}

});

}

This code creates a well-defined model called Contact, which includes a list of phones. Retrieving data involves calling a find function. In this case I’m just retrieving all the records, but you can also include a query, which the mongoose docs demonstrate.

What if you don’t want a strong schema, but want to use a NoSQL like MongoDB? Then it gets even easier. This is where some database purists might cringe, but there are legitimate reasons you might not want or need a fixed schema. As a simple example, you might have a Contacts database, and each person in the database could have varying fields. Some contacts might have several phone numbers of different types; some contacts might have several email addresses. Yes, you could normalize this into a relational database. But in this imagined application, rarely would you just get a list of phone numbers. Instead, most times you would want a name with all phone numbers associated with that name. Is it really necessary to force normalization into the picture and always store the phone numbers in a separate table, only to be joined during virtually every single lookup?

You can certainly do it either way: Store the phones in a separate phone table. Or you could use document storage and just group it all together, and then pull the data in as a single object. This is where a database such as MongoDB really shines. You can store ad-hoc records, and pull them into an object in one shot. You can index the collection on any fields you want (most likely name) and then pull the document into a single JavaScript object with one lookup.

Node.js Roundup

Let’s see where we stand: We’re able to do model-based data processing with both SQL and NoSQL in Node.js. Are there any strengths or weaknesses? Both sets of drivers work well and, although I didn’t show you any speed tests, both are fast. Any weaknesses or strengths are, for the most part, inherent in the databases themselves. MongoDB, for example, allows for a flexible document structure, but that can also open a can of worms and make for some horrible bugs. So far, I’m not convinced one is “better” than the other. But I am developing some personal feelings about when to use one or the other. So let’s move on to another language.

NoSQL vs SQL in Microsoft C#

This might seem like an odd choice of languages to demonstrate next, but there are two reasons I’ve chosen it: The MongoDB team has created a rather sophisticated driver for the strongly-typed C# language. It includes full object serialization. And second, Microsoft—love them or hate them (and I certainly have no love for them)—has actually created a powerful object-relational database library called Entity Framework. So for this example, I’m going to show two ways to access an SQL database in C# (one that uses the Entity Framework, and one that doesn’t), and a way to access a NoSQL database.

First, here’s some sample code for performing a query using the Entity Framework. To create this example, I used the driver created by the MySQL team that integrates with Microsoft’s own Entity Framework:

ProductEntities data = new ProductEntities();

List<Item> advisories = (from p in data.Items

where p.Category == 1

select p).ToList();

The Entity Framework includes several modeling tools that I won’t cover here; essentially, they work by attaching to the database, reading the schema information, and generating the classes for you. These classes include properties for accessing the fields in the row. Thus, you can query the SQL database, request a table or a view, include Where and Order clauses to filter and order your data, and the row or rows come back as a single object or a list of objects, respectively. If the table includes a primary key, you can then make changes to those objects, and write them back to the database. In this short code, I’m reading all rows whose Category column is set to 1, and I’m saving it into a list. (Normally, the library doesn’t read the entire result-set in at once, but it does here when I call the ToList() function.)

But if you prefer a more direct method, you can access the data directly using tables. The older ADO.NET framework includes classes for reading a table or view, for setting a filter to limit the rows in the result set, and for iterating through the rows in the table.

// connect is an object filled in earlier that contains the connection string

SqlCommand command = new SqlCommand(“select * from Person”, connect);

SqlDataAdapter da = new SqlDataAdapter(command);

DataTable dt = new DataTable();

da.Fill(dt);

foreach (DataRow row in dt.Rows) {

// pPeopleColumnNames was filled in with earlier code. It’s just a list of strings

foreach (string colname in pPeopleColumnNames) {

Console.WriteLine(row[colname].ToString());

}

}

connect.Close();

As you can see in this code, I just attached to the table and got back the rows I wanted, and then accessed the fields through an array notation. This is actually very similar to a way you can do it in PHP using the older-style PHP database routines. (There are newer ones also that are more OR/M based.)

Now here’s how you can access data in a MongoDB database. Again, I’m only showing a simple example, but it gives you an idea of what you can do with it.

MongoSiteSettings pCurrentSettings;

MongoCollection<MongoSiteSettings> settings =

mongolib.GetCollection<MongoSiteSettings>(“sitesettings”);

pCurrentSettings = settings.FindOne();

The GetCollection function is a helper function I had written for my own use, but it consists of only five lines that connect to the database, and return a collection object.

Notice this code has a slight similarity to the Entity Framework code: Using something similar to C++ templates (called generics in the .NET world), you can read the data from the MongoDB collection right into an object, in this case an object of type MongoSiteSettings. And you can make changes to the object, before saving it back to the collection, with a simple save function:

settings.name = “New Settings”;

settings.Save(pCurrentSettings);

C# Roundup

From a programming perspective, how do these compare? In terms of ease, they’re all close. The Entity Framework approach is especially comparable to the MongoDB code in that you get back a strongly-typed object. What I didn’t show you are the tools for generating the Entity Framework code; the code generation behinds the scenes uses a graphical interface that includes some nice visual modeling tools. The older style of reading the tables directly really isn’t all that awful either, although a bit dated in its approach.

Drawing Conclusions

Let’s be blunt: None of the code that I presented here was difficult. I’ve been doing SQL programming for about 20 years (I was on the team that developed the original ODBC drivers in the early 1990s). In the past, database client programming was kind of clunky and more than a bit painful. (Those ODBC drivers we developed included client libraries that were, frankly, pretty ugly.) But life has improved since then. There are frameworks that support OR/M and entities, and they’re really not that difficult to use. And they abstract away the actual SQL language, so if you don’t like that, you don’t have to deal with it.

And NoSQL is also pretty easy. I’ve worked with several NoSQL databases, and—as you probably realize at this point—when it comes to NoSQL, I personally prefer MongoDB. But the others aren’t very hard to use either. They have their quirks, but aren’t bad to program. Some allow for REST interfaces, and most include native libraries that make the coding easy.

So what can we conclude? Well, with the drivers here I focused primarily on ease-of-use. There are other factors that need to be considered, as well. Do they support connection pooling, for example? Do they cache? What about pulling in large amounts of data? (Hint: Most of the better drivers for most of the popular languages support cursors, so you don’t have to pull all the data in at once.) Those are factors you’ll need to investigate as you choose a driver for the language and database you’re using. But in general, virtually all the popular languages today, including Java, PHP, Python, PERL, and even C++, have nice libraries that make database programming far easier than it used to be.

As programmers we can embrace what we’re given. Most likely we’ll have our own opinions. I sure do. But if you’re doing something that lends itself nicely to SQL tables, then we can get our job done. If you’re doing something that requires complex data structures and lends itself nicely to a NoSQL database, then we can similarly get our job done.

But to wrap this up, I want to present a personal opinion that I’ve come to. You may agree or disagree; feel free to comment. With the right design, you can almost certainly map even the most complex data into a traditional, relational database such as Oracle. And with the right programming tools, you can easily pull that data into your objects and classes. But why? It’s even easier in a database such as MongoDB.

So I use both. For some projects, I use Oracle, MySQL, or SQL Server (yes, I’ve used all three on different projects) for clients who have large amounts of data that fit well into a tabular structure, and who will be doing queries that pull back thousands of records per query. (Think financial applications.) For software that’s more oriented towards pulling individual objects (such as an online organizer tool, or an online content management system that I’m presently working on), I opt for something like MongoDB.

MongoDB does great with large complex structures that are typically read in individually, while the large relational databases do well when I’m processing huge amounts of data. And no, my clients’ data needs are nowhere near as big as Google, so we don’t encounter any performance and scalability problems.

 

Image: Kuzmin Andrey/Shutterstock.com

Comments

  1. BY Mark M says:

    Does the need to perform reporting off from a database sway your choice of SQL vs. NoSQL?

    I find in the business world it is rare that we build an application that doesn’t require some sort of reporting on the data it contains. The application programming world seems to have caught up with libraries for entity based program access for NoSQL databases; but it doesn’t seem like the reporting technologies are quite up to the same level. Or maybe it’s just the pool of people that know how to use them for NoSQL databases is really small?

    I also wonder if you find there is a ‘danger’ in using NoSQL databases? For those of us that remember VB 6, the whole ‘no defined structure’ kind of smells like the old ‘variant’ data type. Sure it looks really great on the surface to have all that flexibility, but it can also get you into a whole ton or trouble with bugs that can be really hard to track down. We’d always have to sit down and have ‘the talk’ with new programmers about when and how to use them. Do you think that those kinds of risks exist with NoSQL? Or have the tools matured to a point to keep less experienced programmers out of trouble?

  2. BY Torrey Brown says:

    Great article! I love the idea of using NoSQL databases for programming environments with complex data structures vs. SQL databases for those applications leaning more towards ‘tabular’ structures. Thanks for the insights.

  3. BY WT SIT says:

    I like the article. However, I dont know anyone would be interested to try out some olddies. I worked on a project in 1988 which was shortly after the clash of the stock market in the US. The company layed off 3/4 of its staff (Mainframe COBOL/VSAM programmmers). In turn, it changed its new developement using 4GL & DBMS. There were 2 contentants Adabas and Oracle. Eventually they chose Adabas. I asked my manager while Adabas because Oracle was a big name. He told me that they did a transaction benchmark. The result of Mainframe Adabas was a lot lot lot … lot better than Oralce. In additon, Adabas’ Natural 4GL was an excellent programming tool compared to PL/SQL. Adabas is flexible inverted list structured database. I guess it still has some shops in the US using it. Try it out if you have the luxury time.

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>