Drizzle developer Brian Aker weighs in on NoSQL:
MapReduce works as a solution when your queries are operating over a lot of data; Google sizes of data. Few companies have Google-sized datasets though. The average sites you see, they’re 10-20 gigs of data. Moving to a MapReduce solution for 20 gigs of data, or even for a terabyte or two of data, makes no sense. Using MapReduce with NoSQL solutions for small sites? This happens because people don’t understand how to pick the right tools.
Even more than that, I’d argue that people move to NoSQL in many cases because they don’t really understand SQL. Nearly every back end programmer working on Web sites uses a relational database on a daily basis. Those who aren’t using them now probably did have to write SQL queries at one time. And yet very few developers I interview have what I’d describe as strong SQL skills. If you’re not going to use the relational database effectively, you may as well choose a simpler tool, but don’t pretend it’s for technical reasons. It’s a people issue.
April 9, 2010 at 9:09 am
Hey. That makes sense, I’m not disagreeing. But if a simpler tool does the job OK, and it makes implimentation easier or faster, isn’t that a valid choice? Or does NoSQL not necessarily make the task easier?
April 9, 2010 at 9:34 am
I think it’s a valid choice for sure. But what I wonder about is whether what looks like a labor savor up front ends up costing you time down the road when you need more robust reporting, for example. Duplicating all of the things SQL gives you when it comes to selection of data and aggregation using a key/value store is a ton of work.
April 9, 2010 at 9:51 am
I thought I knew relational databases, and then I took an Intro to Databases course in university (CSC343 at UofT). Through posing both hard relational algebra problems and hard SQL problems, it really expanded my ability to write and refactor effective queries.
I highly recommend taking a relational databases course if one has the opportunity and no formal background.
April 9, 2010 at 10:23 am
A tangent: when you and I were discussing this topic the other day, Rafe, I immediately jumped from my SQL skills to my knowledge of data modeling–which, I claimed is better than my actual SQL skills.
I’ve been thinking about why I made this leap, and I concluded that to me, SQL doesn’t do you much good unless you can conceptualize the data that you want, how it should be stored, etc. If I can conceptualize the query, I can always learn the SQL.
April 9, 2010 at 10:25 am
But if a simpler tool does the job OK, and it makes implimentation easier or faster, isn’t that a valid choice?
The question to answer, as Rafe said in his post, is what’s your primary use case? If it’s fast queries of gigantic mostly static data sets, then a noSQL store is probably best. If it’s data integrity for data that changes a lot, then an RDBMS is probably appropriate.
April 9, 2010 at 12:09 pm
It may be the case that some people use NoSQL as a crutch because they don’t know SQL well enough, but I disagree that you need to be at Google-size to benefit from it. You can use NoSQL without MapReduce, and vice versa.
I’ve used SQL in various contexts for years, and I use MongoDB in my projects when the data is well suited to it, which, on the web, is often. Working with it feels a lot like switching from a statically typed language to a dynamically typed one, and the productivity boost is comparable.
April 9, 2010 at 1:42 pm
Back in the day, my first big coding project (>500k lines of C) was a group health insurance quoting system which ran on a 640k 4.77MHz PC, and had locking and such so it could run on multi-user systems.
We spent a lot of time making sure that inserts happened in the right order so that the database was always correct and we could minimize locking, and we developed test tools so that we could replay user database interactions in case we got something wrong.
So I’ve got a rough idea of how challenging it is to build a good database system that doesn’t use SQL.
Later, I worked on a distributed database replacement for Oracle that didn’t use SQL for an application that’s still serving a couple of million authenticated queries a day. I said than, and I still say, that rather than re-inventing the wheel we should have put the 5 or so person years of development effort that went into that system into making PostgreSQL better.
So, yeah, total agreement that the main reason people use NoSQL is that they don’t understand what it really takes to get a database right, and that they’re going to end up just re-inventing the wheel eventually.
There are a few places where “tie %h, ‘DB_File’, ‘myfile.db'” is the right solution, but I think that set is far far smaller than the current bandwagon would suggest.