As you may or may not know, the NCAA Tournament starts this weekend, and people around you are probably doing their best to predict the winners of all of the games in the tournament. Most years I join one (or more) pools, but I don’t watch enough college basketball to make educated guesses about who will win each game. This year, I wrote a program to do it for me.
My ratings produce a number that actually means something–it’s the chance of beating an average D-I team on a neutral floor. For instance, Michigan’s current rating of .8006 means that the Wolverines would win 8 out of 10 games against the average D-I team. Every March, I borrow Bill James’ log5 formula to take these ratings and compute probabilities for each team to win its conference tournament.
I’m not sure how the log5 formula got its name, but it’s fairly intuitive. Think of a coin with one side labeled “win” and the other side labeled “loss.” The chance of the coin landing on “win” is the team’s rating. Log5 is derived from the probability that a team’s coin will land on win and its opponent’s coin will land on loss. (If they land on the same side, you re-flip.)
My script takes the source from Pomeroy’s ratings page and reads in all of the teams and their ratings, and then picks a random winner for each game based on the log5 comparison of Pomeroy’s ratings. Here’s the output for a hypothetical Final Four the script generated:
FINAL FOUR Maryland has a 45% chance of beating Syracuse Winner: Syracuse Kentucky has a 56% chance of beating Baylor Winner: Kentucky NATIONAL CHAMPION Syracuse has a 51% chance of beating Kentucky Winner: Kentucky
The script has no way of knowing which teams will upset higher seeded opponents, other than by giving teams that Pomeroy’s system likes a better chance of winning, but it should pick roughly the right number of upsets based on the odds.
I’ve uploaded the script to a repository named bracketologist on GitHub if you want to play with it. It’s written in Ruby.