rc3.org

Strong opinions, weakly held

Tag: SQL

Is ORM an anti-pattern?

This weekend, Tim Bray posted a link to an article at Seldo.com that argues that ORM is an anti-pattern. If you don’t know what ORM is, you probably won’t find this post very interesting, but I’ll provide a brief definition. ORM (object-relational mapping) is a layer of abstraction that stands between your database and code and allows you to treat entities in the database as native objects in whatever language you’re using. The advantage and disadvantage is that it substitutes native calls for SQL.

The linked article does a good job of going over the downsides of using ORM, but I think it ignores the upsides. His most compelling argument is that you should be using SQL appropriately in your application rather than trying to avoid it using ORM instead:

But in my experience, the best way to represent relational data in object-oriented code is still through a model layer: encapsulation of your data representation into a single area of your code is fundamentally a good idea. However, remember that the job of your model layer is not to represent objects but to answer questions. Provide an API that answers the questions your application has, as simply and efficiently as possible. Sometimes these answers will be painfully specific, in a way that seems “wrong” to even a seasoned OO developer, but with experience you will get better at finding points of commonality that allow you to refactor multiple query methods into one.

I think that he undersells the utility of ORM when it comes to simple queries and inserting and updating data. I work on a Web service that processes transactions. Each transaction involves a user logging in and the retrieval of a number of settings from different tables. When I’m done processing a transaction, I have to insert dozens of rows of data into at least half a dozen tables. All of the data that is retrieved and stored is already represented in the object model of my application. Hibernate, the ORM library we use, is remarkably good at pulling up the data when users log in using simple queries, and more importantly, saving all of that data using inserts and in some cases, updates.

Using ORM in the transaction processing context has saved me a massive amount of time over the years, and helped avoid stupid countless bugs involved with manually updating the SQL statements in the data access layer every time I update the object model. Coding that SQL by hand would offer nothing in the way of performance nor would it make the application any more understandable.

The main downside is that on the few occasions when I do have to write Hibernate code to ask questions of the database, I pretty much have to look up the proper approach every time. It would be much easier in SQL since I already know SQL very well.

The secret is, I think, to use the right tool for the job. The main manner in which we avoid the downsides of ORM is in not using it at all for the reporting part of our application. For reports, we use a data access layer that just uses SQL to access the database in a purely relational fashion. Just as writing SQL for all of the basic create/read/update/delete operations in our application would be painful, trying to write well-optimized reporting queries through the ORM layer would be as well.

Categorically rejecting a class of tools that are used productively on thousands of projects is just as silly as picking the wrong tool for the job. Either way, you give yourself more work than is necessary.

Making it easier to monitor slow queries

Ronald Bradford has a good idea for developers who are debugging slow queries — include comments in your SQL so that the queries are easier to identify in MySQL’s process list or in the slow query log. I’m wondering how easy it would be to include a setting for various database libraries (Hibernate, ActiveRecord, any of several PHP frameworks, and so on) to include the context in which a query is called in a comment when the query is run. For example, it would be great, if inside every Hibernate query, the class and method from which the query was called were included.

I’m going to look into hacking this functionality into Kohana, the PHP framework I’ve spent a lot of time with in 2009.

Beware excessive abstractions

jQuery creator John Resig warns developers away from libraries that hide JavaScript behind other programming languages, and makes a great point about relying overly much on abstractions to make your life “easier”:

When you use a pure-JavaScript library (such as jQuery, Prototype, Dojo, Yahoo UI, etc.) you are still programming using the JavaScript language. In the case of jQuery a large number of users, who have either never programmed JavaScript before or never programmed before, acquire a good grasp of how to use JavaScript – accented by the use of the library. A pure JavaScript library makes annoying tasks simple, the largest of which being cross-browser support. It does nothing to dilute the quality of the JavaScript-authoring experience. In fact I would argue that JavaScript libraries do much to give JavaScript a particular style and feel. Code written with Prototype feels very different from code written with jQuery – and this is fine since it’s giving the user the option to develop in the JavaScript language how they best see fit.

In the case of these language abstractions you are gaining none of the benefit of learning the JavaScript language. When a leak in the abstraction occurs (and it will occur – just as it’s bound to occur in any abstraction) what resources do you have, as a developer, to correct the problem? If you’ve learned nothing about JavaScript then you stand no chance in trying to repair, or work around, the issue.

This is a problem I often run into with developers who rely on persistence/ORM frameworks. They’re great, but if you’re dealing with relational databases, you really need to know SQL. These frameworks make a nice supplement, but they’re no substitute.

Update: Be sure to read the comments for a well thought out counterpoint.

The quickest route from database results to CSV

I’m wondering what the quickest route is from the results of a SELECT statement in the MySQL command line client to a CSV file that can easily be imported into Excel. Using INTO OUTFILE is an option but only if you have the proper privileges and have access to the filesystem on the server. It won’t select into a file on the system where the client is running. (This seems like a huge oversight to me.)

There are also GUIs (like phpMyAdmin) that provide this functionality, but lots of systems administrators avoid it because of security holes.

Usually, I just whip up a Perl script to query the database and dump the results to CSV, but I’m wondering if there isn’t an easier way.

By the way, in researching this blog post, I noticed that it’s really easy to set your MySQL prompt to something much more useful than the default. For more, see the mysql client documentation. For now I’m going with (\u@\h) [\d]>.

Update: mysql -H -e "select * from whatever" > outfile.html dumps the results of a query in HTML format, which can easily be loaded into Excel. (Thanks, Erik.)

Limit 1

Here’s a free tip. When you’re making ad hoc changes to a database directly from the database client, it is a good idea to stick LIMIT 1 on your UPDATE and DELETE statements if your intent is to only update one row. Even if you want to update many rows, I still recommend a dry run with LIMIT 1 first.

Then if you make an error in your boolean logic or leave something out of the WHERE clause, the wreckage will be limited. It’s much easier to deal with the loss or unwanted alteration of one record than a whole database.

Of course, wrapping your ad hoc queries inside a transaction is also a brilliant idea. I recommend that approach as well assuming it’s available.

© 2024 rc3.org

Theme by Anders NorenUp ↑