As you may or may not know, the NCAA Tournament starts this weekend, and people around you are probably doing their best to predict the winners of all of the games in the tournament. Most years I join one (or more) pools, but I don’t watch enough college basketball to make educated guesses about who will win each game. This year, I wrote a program to do it for me.
I started with Ken Pomeroy’s team ratings. Here’s his explanation for how the system works:
My ratings produce a number that actually means something–it’s the chance of beating an average D-I team on a neutral floor. For instance, Michigan’s current rating of .8006 means that the Wolverines would win 8 out of 10 games against the average D-I team. Every March, I borrow Bill James’ log5 formula to take these ratings and compute probabilities for each team to win its conference tournament.
I’m not sure how the log5 formula got its name, but it’s fairly intuitive. Think of a coin with one side labeled “win” and the other side labeled “loss.” The chance of the coin landing on “win” is the team’s rating. Log5 is derived from the probability that a team’s coin will land on win and its opponent’s coin will land on loss. (If they land on the same side, you re-flip.)
My script takes the source from Pomeroy’s ratings page and reads in all of the teams and their ratings, and then picks a random winner for each game based on the log5 comparison of Pomeroy’s ratings. Here’s the output for a hypothetical Final Four the script generated:
FINAL FOUR
Maryland has a 45% chance of beating Syracuse
Winner: Syracuse
Kentucky has a 56% chance of beating Baylor
Winner: Kentucky
NATIONAL CHAMPION
Syracuse has a 51% chance of beating Kentucky
Winner: Kentucky
The script has no way of knowing which teams will upset higher seeded opponents, other than by giving teams that Pomeroy’s system likes a better chance of winning, but it should pick roughly the right number of upsets based on the odds.
I’ve uploaded the script to a repository named bracketologist on GitHub if you want to play with it. It’s written in Ruby.
Why people are returning to Java
I am a huge fan of Ruby on Rails, but I was not terribly surprised to read that as Twitter’s code base has grown, they’ve found it more amenable to move to JVM-based languages for reasons having mostly to do with encapsulation. InfoQ interviews Twitter engineer Evan Weaver about how the company’s stack is evolving as they get bigger.
On one hand, they’re moving more services to the JVM for performance reasons. When they extract components from their main code base to optimize them, they generally migrate them to the JVM:
They’re also finding the rigidity of the JVM useful for productivity reasons:
Those are the kinds of reasons why I could not imagine rewriting the main application I work on as a Ruby on Rails application. I’m surprised he doesn’t also mention the maturity and power of the tools on the Java side. Text editors like TextMate and Vim are great, but when it comes to navigating through a large, complex code base, you cannot beat the state of the art Java IDEs like Eclipse and IDEA.
Update: Edd Dumbill lists seven reasons you should use Java again.