Last week at work we were having a discussion about reporting, and I shared one of my principles when it comes to data collection, which is that logging is nearly always preferable to counting.
Let’s say I’m required to produce reports that show how often every entry on a weblog has been viewed. There are two obvious options for doing so. I could create a table called
entry_views that has columns for the entry ID, and a date field to record when the event occurred. If I had some identifying information for the viewers, I could store that in the table as well.
Another option is to just add a column to the
entries table called
times_viewed. Every time someone views the article, you can just increment the field. Some people would argue that if all you care about is the count, this is all you should do. Your logging table is going to get huge, your reports will be slow, and you’re recording lots of extra information if you build that table just for logging views.
There are two things that you lose if you count rather than log. The first is that you lose all of the information that’s stored in the log. Take a look at Amazon.com sometime and you’ll see tons of features built on what someone once probably thought was extra information. You can see a list of the products you’ve viewed recently, products that were purchased by people who bought the product you’re looking at, and even products people purchased instead of the one you’re looking at. Keeping detailed logs is what makes all of those features possible. Collect the data now, and figure out what to do with it later.
The second thing you lose is an audit trail. A few years ago, I worked on a rewrite of a licensing system for a software company. When a customer purchased a number of licenses, a record was added to the database. When the customer came and generated those licenses, a column in that table was incremented. When the license count was equal to the number of licenses purchased, the system refused to generate any more licenses. The problem was that there was no audit trail. If a customer called and asked who had used up their licenses or when they were redeemed, the company couldn’t tell them.
Obviously, the person who failed to record the individual licenses made a really terrible decision since revenue and customer satisfaction were on the line, but the larger point is that hoarding the data is, to me, always a better decision than not recording something there’s a chance you may need later. Data storage is cheap. The only issue is how to handle reporting performance when you’ve logged tons of information. I generally file that in the category of problems that are nice to have.