rc3.org

Strong opinions, weakly held

The danger of SQL illiteracy

Despite growing interest in other storage media, relational databases are still the dominant data storage medium for Web applications. Indeed, as SQLite and other databases that can be embedded in applications grow in popularity, we’re seeing the use of relational databases expand even further.

It seems to me, though, that actual knowledge of SQL seems to be falling. I blame this on the growing popularity of persistence frameworks that abstract the database away, allowing developers to interact without databases without writing much (or any) SQL. Ruby on Rails has ActiveRecord, Django has its own persistence components, and there are many ORM frameworks for Java that work very well. There are even a number of persistence frameworks for PHP. As developers come to rely on these frameworks, their SQL skills grow rusty. Many developers don’t even learn SQL in depth, period.

This is a big liability. Even if you don’t have to produce SQL to include in your application, you still need to know it to write ad hoc queries. Knowing SQL lets you write a single query to answer questions like, “What’s the average total for orders received from customers in Texas over each of the last six months?” I’ve seen developers try to take care of those kinds of questions by pulling back individual records and aggregating the results in code. They ought to be using SQL efficiently instead.

In the short term, the answer for me is that I’m going to be asking more SQL questions in interviews. If you’re a developer and you’re not incredibly comfortable with SQL, you should probably hit the books. The other thing I’d add is that languages and frameworks come and go, but SQL, CSS, HTML, and JavaScript are going to be with us for a long, long time. Any investment you make in those areas will certainly pay off.

17 Comments

  1. A few days ago I had a strange problem with my WordPress blog.

    After some code digging, which brought me nowhere, I fired up phpmyadmin and just by looking at the list of tables gave me the clue to what might be the problem.

    And I am certainly not a programmer.

    So, yes, I absolutely agree.

  2. I agree, as well. I’ve always been a little weak at SQL, and, for almost as long, I’ve wanted to get better for the reasons you describe.

    Do you have any suggestions for good SQL books?

  3. Evan says: “Do you have any suggestions for good SQL books?”

    I bet he does < grin >.

  4. Wow. I never knew just how stupid I could feel after a well-targeted Amazon search. Thanks, John.

    Of course, if it turns out that you were referring to another book, I’m probably going to have to spend the weekend off of the internet, out of shame.

  5. I actually wouldn’t recommend my SQL book, a lot of it is out of date, and it’s way too large a book. I do think the core chapters on SQL itself are good.

    I think “The Practical SQL Handbook” by Bowman, Emerson, and Darnovsky is quite good.

  6. Thanks for the recommendation! I’ve ordered it from Amazon, and look forward to becoming more SQLiterate in the near future.

  7. Evan, I was referring to Rafe’s book but I wasn’t trying to make you feel stupid, just having a little fun poking at Rafe since I knew he did author an SQL book.

  8. I think it’s pretty funny that the persistence frameworks basically encourage you to treat highly-sophisticated relational databases as a flat record-based database and try to do all the relations in code.

    It’s not that I think relational databases are all that. Actually, the biggest problem I have with them is that they never bothered to solve the data-binding problem in a standard way, leaving the door open for all these half-assed ORM frameworks.

    Of course, I use Hibernate – I’m lazy, and I can’t be bothered to do all that object-relational mapping junk myself. But I gotta tell you, I still write a lot of SQL – some of it in the OO Hibernate version of SQL, but a fair amount in real, raw SQL, because you just can’t get a truly optimized query with a translation layer in the way.

    Makes me want to write my own ORM that’s the barest thin layer over the database. Of course, I’m sure that’s what everyone else thought they were doing too.

  9. The problem is that when SQL and relational databases were invented, most people were interfacing with them using COBOL.

  10. John, I just re-read my reply to your comment, and I think it comes off as way more sarcastic than I meant it. My thanks to you were sincere, for pointing me in the direction of Rafe’s book, and I definitely didn’t (and don’t!) think that you were trying to make me feel stupid.

    Sorry for the confusion!

  11. Hey, we still ARE interfacing with them using COBOL here!

    Not me personally, of course, but yeah, there’s a lot of COBOL out there still running against them. And SQL has a fair amount in common with COBOL – both were at least partially intended to be used by non-programmers, hence the verbose syntax, and the SQL datatypes have more in common with COBOL than with modern ideas.

    Relational databases are really clever, but you have to understand set theory somewhat – either explicitly or intuitively – to make good use of them. And for a lot of people that’s clearly not a natural way to think (including me, though I can do it).

    Normalization is another area where ordinary intuition about organizing information does you no good at all. (And I tend to think it’s overrated anyway (though obviously important still!) Data duplication is a fact of life to be dealt with, and you can’t make it go away by forcing an ever-less-intuitive structure on your data.

  12. Rather, can’t make it go away COMPLETELY through normalization, since it occurs in so many other places – caching, in-memory copies of data in running processes, replication, backups, work tables, foreign keys, import/exports, printouts, and that guy who always cut-and-pastes stuff into emails rather than pointing you to the appropriate record online. That doesn’t mean normalization isn’t important, but it does mean it’s not a magic bullet and it may not always be worth pushing it as far as it could possibly go. A little duplication in the database won’t kill you.

  13. SQL is one of my favorite things because I always look at writing one query to get all the data I need as a fun puzzle to be solved. I guess that means I grok set theory.

  14. I would think so.

    What was funny (in the end) to me was that I took a discrete math course 1st or 2nd semester 1st year CS undergrad. I already knew a fair amount of it, learnt some more, and just could not see what use it would be to what I knew about computer programming (which I thought was a lot, because I didn’t know enough to know what I didn’t know yet). Set theory? What does this have to do with the price of tea in China?

    So, 2nd year, along came a course in relational databases. And I was having some minor difficulties in getting the concepts until suddenly something went CLICK in this really rather visceral way and what I’d learnt in set theory and what I was trying to learn about relational databases suddenly all slotted in together into a perfect understanding of the subject. Love those moments of enlightenment.

    Then I got my first real job, which happened to involve building an application backed by a relational database, and that rapidly showed me that I didn’t know a damn thing about designing database applications.

  15. you’re not thinking that Big Tablish or LINQish designs will become the norm, eh? πŸ˜‰

  16. I believe in key-value databases (of which BigTable is, one). In fact, I’m kind of betting the farm on them.

    Relational databases don’t reflect the way people really think about data. Key-value databases do.

  17. Relational databases don’t reflect the way people really think about data. Key-value databases do.— Jacob Davies

    Most people don’t think about data, period. There is a skill in breaking down a set of data into components and then arranging them in a meaningful structure.

    What if you had a set of Football Scores? Would a key-value database work there? Oh, but wait, you want to also show the standings of each football team in its respective division, and tie those records back to the original team stats, and tie the team roster into individual game leaders as well as season leaders.

    Then I want to graph, slice, and dice the whole mess across seasons to show most improvement by player position, most consistent player, and best ROI on the coach (how much $$$ did a good coach cost for a winning record vs. the highest paid coach)?

    These are real world problems, btw. πŸ™‚

Leave a Reply

Your email address will not be published.

*

© 2024 rc3.org

Theme by Anders NorenUp ↑