The first order of people who don’t get data analysis are those who believe it’s impossible to make accurate predictions based on data models. They’ve been much discussed all week in light of the controversy over Nate Silver’s predictions about the Presidential campaign. If you want to catch up on this topic, Jay Rosen has a useful round up of links.
There are, however, a number of other mistaken ideas about how data analysis works as well that are also problematic. For example, professional blowhard Henry Blodget argues in favor of using data-driven approaches, but then saying the following:
If Romney wins, however, Silver’s reputation will go “poof.” And that’s the way it should be.
I agree that if Silver’s model turns out to be a poor predictor of the actual results, his reputation will take a major hit, that’s inevitable. However, Blodget puts himself on the same side as the Italian court that sent six Italian scientists to jail for their inaccurate earthquake forecast.
If Silver’s model fails in 2012, he’ll revisit it and create a new model that better fits the newly available data. That’s what forecasting is. Models can be judged on the performance of a single forecast, but analysts should be judged on how effectively adapt their models to account for new data.
Another post that I felt missed the point was Natalia Cecire arguing that attempting to predict the winner of the election by whatever means is a childish waste of time:
A Nieman Lab defense of Silver by Jonathan Stray celebrates that “FiveThirtyEight has set a new standard for horse race coverage” of elections. That this can be represented as an unqualified good speaks to the power of puerility in the present epistemological culture. But we oughtn’t consider better horse race coverage the ultimate aim of knowledge; somehow we have inadvertently landed ourselves back in the world of sports. An election is not, in the end, a game. Coverage should not be reducible to who will win? Here are some other questions: How will the next administration govern? How will the election affect my reproductive health? When will women see equal representation in Congress? How will the U.S. extricate itself from permanent war, or will it even try? These are questions with real ethical resonance. FiveThirtyEight knows better than to try to answer with statistics. But we should still ask them, and try to answer them too.
I, of course, agree with her that these are the important questions about the election. When people decide who to vote for, it should be based on these criteria, and the press should be focused on getting accurate and detailed answers to these questions from the candidates.
The fact remains that much of the coverage is, however, still focused on the horse race. Furthermore, much of the horse race coverage is focused largely on topics that do not seem to matter when it comes to predicting who will win the election. This is where data-driven analysis can potentially save us.
If it can be shown that silly gaffes don’t affect the ultimate result of the election, there may be some hope that the press will stop fixating on them. One of the greatest benefits of data analysis is that it creates the opportunity to end pointless speculation about things that can in fact be accurately measured, and more importantly, to measure more things. That creates the opportunity to focus on matters of greater importance or of less certainty.
Don’t change sshd’s port
Don’t change sshd’s port
From Arabesque, my favorite blog for Unix geeks. I always change the sshd port, so I’m delighted to read a sound argument against doing so.