rc3.org

Strong opinions, weakly held

Testing the boundaries of ActiveRecord

I am a huge fan of object-relational mapping. One of the most tedious problems for developers who use object-oriented languages is transforming query results into objects and generating insert and update statements to save objects. There are a number of ways to deal with this problem, but the easiest way to handle it is to use somebody else’s ORM tool. Ruby on Rails includes an ORM library called ActiveRecord. This morning I ran into a problem with ActiveRecord that turns out to be an application design problem. I thought I’d write it all up in case anyone else runs into a similar problem.

Let’s say I want to retrieve all of the “question” objects stored in a database. With ActiveRecord, I just make the following call:

Question.find(:all)

It generates this query:

SELECT * FROM questions

If I want to retrieve the most recent 10 questions, I can use the following call:

Question.find(:all, :order => “created_at DESC”, :limit => 10)

That’s converted to:

SELECT * FROM questions ORDER BY created_at DESC LIMIT 10

Questions have categories. If I want to pull up the categories for questions along with the questions themselves, I can add the include parameter:

Question.find(:all, :order => “questions.created_at DESC”, :limit => 10, :include => :categories)

Here’s where the ORM power comes through:

SELECT id FROM questions ORDER BY questions.created_at DESC LIMIT 10

SELECT questions.id AS t0_r0, questions.current AS t0_r1, questions.status AS t0_r2, questions.created_at AS t0_r3, questions.updated_at AS t0_r4, questions.categorized AS t0_r5, categories.id AS t1_r0, categories.name AS t1_r1, categories.default_keyword AS t1_r2, categories.community_id AS t1_r3, categories.created_at AS t1_r4, categories.updated_at AS t1_r5, categories.parent_id AS t1_r6 FROM questions LEFT OUTER JOIN categories_questions ON categories_questions.question_id = questions.id LEFT OUTER JOIN categories ON categories.id = categories_questions.category_id WHERE questions.id IN (‘4839’, ‘4838’, ‘4837’, ‘4836’, ‘4835’, ‘4834’, ‘4833’, ‘4832’, ‘4831’, ‘4830’) ORDER BY questions.created_at DESC

In this case, ActiveRecord actually generates two queries. The first grabs the IDs of the ten most recently created questions, then it plugs those into the second query. This is the only approach that will work, because if you plugged the limit into the second query, you’d get back 10 rows that may or may not comprise 10 questions (depending on how many categories each question has assigned to it). I discussed this problem back on April 26, 2005. As you can see, ActiveRecord uses the very approach I proposed back then.

This is all great, except that I find myself running into one more problem. Let’s say I want to retrieve the most recent 10 questions in a particular category. I can add conditions to my query, like this:

Question.find(:all, :order => “questions.created_at DESC”, :include => :categories, :conditions => ‘category_id is null’)

This yields the following query:

SELECT questions.id AS t0_r0, questions.current AS t0_r1, questions.status AS t0_r2, questions.created_at AS t0_r3, questions.updated_at AS t0_r4, questions.categorized AS t0_r5, categories.id AS t1_r0, categories.name AS t1_r1, categories.default_keyword AS t1_r2, categories.community_id AS t1_r3, categories.created_at AS t1_r4, categories.updated_at AS t1_r5, categories.parent_id AS t1_r6 FROM questions LEFT OUTER JOIN categories_questions ON categories_questions.question_id = questions.id LEFT OUTER JOIN categories ON categories.id = categories_questions.category_id WHERE (category_id is null) ORDER BY questions.created_at DESC

Note that I’ve left out the limit parameter here. The reason I did so is that it doesn’t work. When you use limit, include, and conditions together in one call to find, ActiveRecord wants to apply the conditions to the initial query that retrieves the list of IDs, and in this case, the conditions apply not to the questions table but rather to the categories table. It’s not included in the initial query that retrieves the IDs, so an error is thrown. Here’s the broken query generated by such a call:

SELECT id FROM questions WHERE (category_id is null) ORDER BY questions.created_at DESC LIMIT 10

Bottom line: there’s no elegant way to retrieve the last 10 questions that aren’t assigned to any categories. If I wanted, I could retrieve all of the questions with no categories and then pull the first 10 out of the collection, but that’s a dangerous approach if the data set is large. Failing that, the only approach is guessing. For example, I could stick an arbitrary date in the conditions to limit the result set and then pull the first 10 questions from that set. However, if less than 10 questions have been created since that date, I’ll wind up with fewer than 10 questions in my result set. If the count were important to me, I could then just push the date further back incrementally and run the query again until I got 10 records, but that’s messy.

It’s probably better to just change the application to present questions from the past week rather than the past 10 questions. Anyone have any better ideas?

1 Comment

  1. Hmm. find_by_sql will let you write you own query, won’t it?

    My other thought was to use a Stored Proc and call it, but that would be the same thing, just moving the logic to the DB.

Leave a Reply

Your email address will not be published.

*

© 2024 rc3.org

Theme by Anders NorenUp ↑