rc3.org

Strong opinions, weakly held

Tag: statistics

John Myles White explains multi-armed bandit testing

If you’re interested in A/B testing on the Web, you should check out John Myles White’s talk at Tumblr on multi-armed bandit testing. You can learn a lot about standard A/B testing from the explanation he gives as a contrast to how multi-armed bandit tests work. I’ve read a lot of blog posts on multi-armed bandit tests, and this lecture is better than any of them in terms of explaining how this sort of testing actually works.

Why journalists should learn to program

Financial journalist Roland Legrand argues that journalists should learn to program. He says:

All of this takes time, and maybe you’ll never find enough of it to get good at all this stuff.

Still, we must try. The good news is that it doesn’t matter if you become proficient at the latest language. What is important, however, is that you’re able to comprehend the underpinnings of programming and interactivity — to be able to look at the world with a coder’s point of view.

I can’t help but wonder, though, if most journalists would benefit more from learning how to crunch numbers, compile statistics, and derive meaning from them than they would by learning HTML and CSS. They should be reading this book, not this book.

Running the numbers on Toyota safety

Robert Wright breaks down the numbers on Toyota’s safety record given all of the recent reports of uncontrolled acceleration:

My back-of-the-envelope calculations (explained in a footnote below) suggest that if you drive one of the Toyotas recalled for acceleration problems and don’t bother to comply with the recall, your chances of being involved in a fatal accident over the next two years because of the unfixed problem are a bit worse than one in a million — 2.8 in a million, to be more exact. Meanwhile, your chances of being killed in a car accident during the next two years just by virtue of being an American are one in 5,244.

This is the article I’ve been looking for since the mass hysteria about acceleration problems began. It strikes me as undoubtable that a Toyota purchased today is significantly safer than most of the cars I’ve driven over the course of my life. I used to own a 1977 Ford pickup truck that caught on fire under the hood more than once.

Update: It’s also worth mentioning that if the accelerator sticks on your cars, there are several ways to stop.

Your 2009-2010 NBA Preview

The Wages of Wins blog has some bad news for most NBA fans:

Here is an interesting factoid about the NBA Finals. Since 1978 (the first year we can calculate Wins Produced) no team has won an NBA title without one regular player (minimum 41 games played, 24.0 minutes per game) posting at least a 0.200 WP48 [Wins Produced per 48 minutes]. Only one team – the 1978-79 Seattle Super Sonics [led by Gus Williams with a 0.208 WP48] – managed to win a title without a regular player crossing the 0.250 threshold. And only four other champions didn’t have at least one player surpass the 0.300 mark. This tells us – and hopefully this is not a surprise – that to be an elite team you must have at least one elite player.

Okay, now let’s connect this factoid to the draft. Since 1995, no player who posted a below average college PAWS40 [Position Adjusted Win Score per 40 minutes] his last year in college managed to post a career WP48 above the 0.200 mark (after five seasons, minimum 5,000 minutes played). So although college numbers are not a crystal ball (and really, college numbers are not perfect predictors of what a player will do in the NBA), it does seem like players who don’t play relatively well in college are not likely to become superstars in the NBA.

In short, if your favorite team doesn’t already have a truly great player, they’re highly unlikely to win a championship. And the odds are that they won’t find the great player they need in the draft.

This article also makes an important point about synthetic stats. PAWS40 is a stat that the Wages of Wins people made up. Its value is solely in its correlation with more tangible measures of success. Many people who are suspicious of quantitative analysis hate stats like these, but the proof is in the pudding. When you have a derived statistic that correlates this closely with something useful to measure (like championships or wins), that statistic carries more value than any of the more organic stats, like rebounds per game, or shooting percentage.

New York magazine interviews Nate Silver

New York magazine has an interview with Nate Silver of fivethirtyeight.com, this year’s go-to polling analysis site.

I’ve been reading Nate’s baseball analysis for years and was thrilled to see that he was applying his analytic approach to political polling this year. The results have not been disappointing. This paragraph describes my general reaction when I found out who was running fivethirtyeight.com:

Silver’s site now gets about 600,000 visits daily. And as more and more people started wondering who he was, in May, Silver decided to unmask himself. To most people, the fact that Poblano turned out to be a guy named Nate Silver meant nothing. But to anyone who follows baseball seriously, this was like finding out that a guy anonymously running a high-fashion Website turned out to be Howard Cosell. At his day job, Silver works for Baseball Prospectus, a loosely organized think tank that, in the last ten years, has revolutionized the interpretation of baseball stats. Furthermore, Silver himself invented a system called PECOTA, an algorithm for predicting future performance by baseball players and teams. (It stands for “player empirical comparison and optimization test algorithm,” but is named, with a wink, after the mediocre Kansas City Royals infielder Bill Pecota.) Baseball Prospectus has a reputation in sports-media circles for being unfailingly rigorous, occasionally arrogant, and almost always correct.

There are two things that I find interesting about this. The first is that I’ve been reading quantitative analysis of sports for years and wondering how the lessons drawn from that analysis can be applied to other fields. Silver’s work is illustrating just how applicable those lessons are, and I wasn’t surprised to read that he is being invited to speak before business audiences on his work.

The second is that it shows yet again how the secret to being a successful blogger is producing excellent content. Anyone who’s thinking about starting a blog should look at the success Silver has had. He started the year with a diary on the Daily Kos and now he runs a political blog that gets millions of views. Do great things, and the audience will be there.

The lesson for long-time bloggers with small (but wonderful) audiences is self-evident, sadly.

Sports statistics analysts take over the world

Somehow, Nate Silver’s political Web site escaped my notice until today. Silver is using the same techniques he and other used in building improved baseball statistics to analyze the performance of pollsters in 2008 elections, and to aggregate multiple polls into an accurate prediction of voter behavior.

The site provides a lot of interesting numbers, including the odds of various scenarios occurring, like “Obama wins all Kerry states” and “McCain loses OH/MI, wins election.” The site also provides return on investment rankings for the states, and the individual chance of the candidates winning each state.

The reason this post has the subject it does, though, is that it’s fun to watch sports analysis go mainstream. Sports analysis is a perfect training ground for statistical analysis because of the discrete raw statistics that can be used, and the fact that predictions can very easily be compared to actual results.

Most sports analysis comes down to a simple question, “Which things help teams win?” So if I’m a football analyst, I may argue that average time of possession better predicts winning than average margin of victory. I can then process the historical data for as many seasons of football as I like and test that argument. It doesn’t matter how beautiful my theory is, the data will quickly show whether I’m right or wrong.

It’s not surprising to me to see people who have cut their teeth in the world of sports analysis start applying their methods to other areas. The numbers may be different, but the discipline is the same. Silver is doing with polling numbers and election results what he did before with batting averages and baseball games.

If nothing else, it makes me feel like all of the time I’ve spent reading about quantitative analysis of sports hasn’t been a total waste.

If you’re into this sort of analysis, there’s also the Princeton Election Consortium, which posted a mild critique of Silver’s methodology. And for a more naive analysis that just looks at the latest poll result for each state, see electoral-vote.com.

Links for April 13

Links for April 3rd

Links for March 24

  • Emily Yoffe: Forget Juno. Out-of-wedlock births are a national catastrophe. Seeming fact-based defense of marriage. I don’t have strong opinions on this either way, but it certainly seems like marriage is to be encouraged for people who would be parents. The number that stands out to me is that only 4% of mothers who are college graduates are unwed.
  • 10 Zen Monkeys: Can America Handle a Little Truth? Great essay on the Jeremiah Wright controversy.
  • FP Passport: McCain’s wars. John McCain’s transformation into a neocon on foreign policy issues.
  • New York Times review of Nicholson Baker’s pacifist argument against World War II, Human Smoke.

Links from March 13th

© 2024 rc3.org

Theme by Anders NorenUp ↑