A database design question

February 8, 2009 / Rafe / 5 Comments

I’m looking for people to give me some ammunition to win an argument with myself. I have a database table that stores denormalized data for presentation in reports. One of the columns will hold some messages that are taken from three different tables.

The three tables are parents, children, and child_messages, which contains custom messages associated with the keys stored in children. To include the children in a report that lists a number of parents, you have to do all sorts of stupid tricks. Denormalization is easier, especially because on this report there are a number of similar scenarios that all involve different tables, making things even more complex than I’ve already described. (I’m planning a blog post to describe this denormalization approach for later.)

The question is how to store the denormalized data. On the Web page, it’s presented in an unordered list. The data can also be downloaded in CSV format.

Some options include:

Storing the messages in an unordered list. The nice thing here is that in the default case you don’t have to do anything to transform the data once it’s fetched. Just print it. If you do have to parse it, HTML is not the most difficult thing to parse.
Storing the messages in YAML. It’s lightweight and there are already plenty of libraries to parse it. The downside is that the reports would always have to parse and transform it. Also, we’re not talking about an arbitrary data structure but rather a list of messages. So the flexibility of YAML doesn’t provide any value.
Storing the messages in line-feed delimited format. Lighter and simpler than including HTML in the field, but again, must be transformed in all cases.

I also considered some other HTML options, but ultimately decided against all of them because there’s no point in using HTML if it’s not going to be the HTML that’s actually displayed on the page.

Opinions?

Commentary

database questions software development

5 Comments

Eamon
February 8, 2009 at 2:45 pm

If you’re never ever going to query the denormalized data, I’d keep it in plaintext or CSV. The transformation from either format to an unordered list is simpler than the reverse, so I’d stick with that.

If there are cases where you might need to query the data, consider XML. MySQL 5.1 offers ExtractValue, which could come in handy in an emergency:

http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html
Jason Weathered
February 8, 2009 at 3:58 pm

If the data format is simple enough, I’d go with single character delimited list. Throw in a runtime check to ensure that your items do not contain the delimiter character for safety.

Easy to convert to an array and then a small loop to output as an unordered list (XHTML). It will also require less storage in your rows which if you’re worrying about needing to denormalize, quite possibly matters.
Allan Beatty
February 10, 2009 at 12:21 am

If you are ever going to publish the list in some other form in addition to HTML, then the delimited list is simplest. It just stores the data itself without making any assumptions about presentation.
Ziad WAKIM
February 11, 2009 at 9:09 am

It depends if the content itself needs to be htmlized, ie: do the text of the message requires to be filtered for special characters adn/or html entities ?

If you’re going to have to check for < (& lt); é (& eacute;) or ª (& #170;) – you might as well keep it as a text delimited list and turn it into an html list at runtime.
Dlamini
February 5, 2012 at 11:15 pm

Hi Caroline, I like it. You’ve ptenesred a very good example where 3NF is preferable over higher forms. One thing I always come back to is the original intent of normalization: To reduce the number of data anomalies that can occur during INSERT, UPDATE, and DELETE operations. Normally, if tables are up to 3NF, then INSERT would never be a problem due to the constraints set up by primary and foreign key relationships. But UPDATEs and DELETEs do create potential issues. So, if you are not going to update or delete a row from a table (or there is little likelihood), then it is safe to denormalize an otherwise 4th or 5th model “down” to 3rd.I’d love to hear more about 2-and-a-half normal form too. And I admit, I have never used CRUD analysis for the purposes of spotting opportunities to denormalize. That’s a great tip!!

rc3.org

Strong opinions, weakly held

A database design question

5 Comments

Leave a Reply Cancel reply

Recent Posts

Details

rc3.org

Strong opinions, weakly held

A database design question

Previous post

Next post

5 Comments

Leave a Reply Cancel reply

Recent Posts

Details