The first order of people who don’t get data analysis are those who believe it’s impossible to make accurate predictions based on data models. They’ve been much discussed all week in light of the controversy over Nate Silver’s predictions about the Presidential campaign. If you want to catch up on this topic, Jay Rosen has a useful round up of links.
There are, however, a number of other mistaken ideas about how data analysis works as well that are also problematic. For example, professional blowhard Henry Blodget argues in favor of using data-driven approaches, but then saying the following:
If Romney wins, however, Silver’s reputation will go “poof.” And that’s the way it should be.
I agree that if Silver’s model turns out to be a poor predictor of the actual results, his reputation will take a major hit, that’s inevitable. However, Blodget puts himself on the same side as the Italian court that sent six Italian scientists to jail for their inaccurate earthquake forecast.
If Silver’s model fails in 2012, he’ll revisit it and create a new model that better fits the newly available data. That’s what forecasting is. Models can be judged on the performance of a single forecast, but analysts should be judged on how effectively adapt their models to account for new data.
Another post that I felt missed the point was Natalia Cecire arguing that attempting to predict the winner of the election by whatever means is a childish waste of time:
A Nieman Lab defense of Silver by Jonathan Stray celebrates that “FiveThirtyEight has set a new standard for horse race coverage” of elections. That this can be represented as an unqualified good speaks to the power of puerility in the present epistemological culture. But we oughtn’t consider better horse race coverage the ultimate aim of knowledge; somehow we have inadvertently landed ourselves back in the world of sports. An election is not, in the end, a game. Coverage should not be reducible to who will win? Here are some other questions: How will the next administration govern? How will the election affect my reproductive health? When will women see equal representation in Congress? How will the U.S. extricate itself from permanent war, or will it even try? These are questions with real ethical resonance. FiveThirtyEight knows better than to try to answer with statistics. But we should still ask them, and try to answer them too.
I, of course, agree with her that these are the important questions about the election. When people decide who to vote for, it should be based on these criteria, and the press should be focused on getting accurate and detailed answers to these questions from the candidates.
The fact remains that much of the coverage is, however, still focused on the horse race. Furthermore, much of the horse race coverage is focused largely on topics that do not seem to matter when it comes to predicting who will win the election. This is where data-driven analysis can potentially save us.
If it can be shown that silly gaffes don’t affect the ultimate result of the election, there may be some hope that the press will stop fixating on them. One of the greatest benefits of data analysis is that it creates the opportunity to end pointless speculation about things that can in fact be accurately measured, and more importantly, to measure more things. That creates the opportunity to focus on matters of greater importance or of less certainty.
Where have referrers gone?
This article in Business Insider is the first media mention I’ve seen discussing the disappearance of referrers on inbound traffic to Web sites. For people who work in analytics, especially on sites that make money by selling advertising, this is a really big deal. In many cases, analytics can be invasive from a privacy standpoint, but referrers generally don’t contain any information you’d just as soon not disclose. Hopefully this will spur a wider discussion of this change.
For what it’s worth, the article is wrong about why browsers strip referrers from traffic that originates on HTTPS sites. When you are viewing an encrypted page, browsers want to make sure that none of the encrypted information is sent over a non-encrypted link. So when you click on a link on an encrypted page that points to a non-encrypted page, the browser strips the referrer to avoid sending information that was encrypted over the non-encrypted connection. Referrers are not stripped when you click on a link from one encrypted page to another, even if they’re on different domains. Sites can get potentially get referrers back by switching to HTTPS, but only if people link to the HTTPS URLs. So if I have a site that accepts HTTP and HTTPS, and all of the links indexed by Google are HTTP links, the referrers will be stripped even if the user ultimately lands on a secure page. So in this case, it’s not really a choice on the part of browser vendors to protect user privacy, but rather one to respect the sanctity of encrypting information.
Update: Also, apparently this discussion of traffic has been going on for awhile.