Last week I posted about a problem I was having along with some potential solutions. A few people emailed me with their ideas, and I wanted to highlight one in particular. I won’t mention who sent the solution because I’m going to explain why it wouldn’t work in this case, and why I generally think it’s a bad idea.

Just to recap, the problem was creating a paginated report that was built using a query that included an outer join. You can read the details at the link above. Anyway, a reader sent along the idea of adding a column to the table on the left side of the outer join for reporting purposes. It would include all of the values from the right side of the outer join so that they could be included in reports without requiring the outer join query.

Denormalization for reporting purposes isn’t an unusual approach, but I think it should be done only as a last resort. It wouldn’t work in my case because what I didn’t mention was that my query actually used three queries, all connected using outer joins. Here’s what it really looks like:

select i.name, c.code, m.message
from items i left join codes c on (i.id = c.item_id)
left join messages m on (i.user_id = m.user_id and c.id = m.code_id)

That makes the denormalization a bit more complicated, since I want both the codes and messages for each item. The other problem is that if either the codes for an item or a user’s messages change, the denormalized reporting field becomes out of date (unless you add code that automatically updates the denormalized field whenever you update any of the data it’s associated with). In some cases, you may want a snapshot of what the messages and codes were when the record was created, and in those cases denormalization is a pretty good idea. If you want to maintain the relationships, though, denormalization becomes a hazard.

The other problem is that denormalization locks your database into a particular use case. Adding reporting fields may make things easier for the time being, but when your reporting requirements change, you’re stuck with legacy fields that are no longer useful, and then you face the same problem — create new reporting fields or attack the problem in a more relational manner.

Denormalization is a tempting solution to many problems, but I recommend only using it if you have a performance problem that cannot be addressed in any other way.