NoSQL – rc3.org

Drizzle developer Brian Aker weighs in on NoSQL:

MapReduce works as a solution when your queries are operating over a lot of data; Google sizes of data. Few companies have Google-sized datasets though. The average sites you see, they’re 10-20 gigs of data. Moving to a MapReduce solution for 20 gigs of data, or even for a terabyte or two of data, makes no sense. Using MapReduce with NoSQL solutions for small sites? This happens because people don’t understand how to pick the right tools.

Even more than that, I’d argue that people move to NoSQL in many cases because they don’t really understand SQL. Nearly every back end programmer working on Web sites uses a relational database on a daily basis. Those who aren’t using them now probably did have to write SQL queries at one time. And yet very few developers I interview have what I’d describe as strong SQL skills. If you’re not going to use the relational database effectively, you may as well choose a simpler tool, but don’t pretend it’s for technical reasons. It’s a people issue.

Simon Willison turned up an interesting NoSQL use case in a crowdsourcing application he built for the Guardian. In the first application he built he used MySQL’s ORDER BY RAND(), which is rather inefficient. (For more on its inefficiency, see the comments on this blog post.) The next time around, he outsourced picking out random results to in-memory database Redis:

The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the SDIFFSTORE command) to create a set of unreviewed pages for the current assignment and then SRANDMEMBER to pick one of those pages.

Clever.

rc3.org

Strong opinions, weakly held

Tag: NoSQL

The NoSQL use case

One NoSQL use case

Recent Posts

Details