Eric Ries posts about continuous deployment at O’Reilly Radar. This is the second post on this approach I’ve seen lately, and I find it to be something interesting to think about, if not to be adopted.
Point five in his five step plan for migrating to continuous deployment is root cause analysis, or “the five whys”. He has a whole post just on that topic that’s definitely worth reading. The case for using root cause analysis is self-evident, the rest of the continuous deployment process is more interesting to discuss.
The basic idea is simple: deploy changes to the application as rapidly as possible, where the definition of “as possible” means “without breaking things”. This is a technique that clearly works for big sites. What I’d really like to do is spend a day or a week watching this process unfold, because it leads me to all sorts of questions.
My first question is, how does quality assurance play into this process? The process Eric describes involves lots of automated testing, but most shops have testers as well. Do they test on the production system as changes go live?
The second question is, what kind of version control strategy is being used to back up this process? I’d expect that each developer has their own branch, and that they merge from the production branch to their branch constantly, and only merge back to the production branch when they have a change they are ready to publish. That way developers can commit code frequently without worrying about it being deployed by mistake.
My third question is whether this process is more suited to some kinds of applications than others? Let’s take World of Warcraft for example — they have a big, heavy release process that involves lots of internal testing and lots of customer testing before important releases go out the door. Even with exhaustive testing, after every release there’s a round of tweaking to deal with all of the unintended consequences of their changes. A change to one constant in World of Warcraft (say, the amount of attack power a warrior gets per point of strength) has changes that ripple through the entire game. It also seems like they’d have to group changes to see how those changes interact with one another without inflicting them on players first.
They could try continuous deployment to the Public Test Realm, but everything that happens there is reported to and dissected by the player community. It’s risky.
I’d love to read a whole lot more about this kind of process and how it scales. It’s pretty much how everybody works on small applications that they maintain by themselves, but I’m very curious to hear more about how it works for big teams.
GM mortaged the future
FiveThirtyEight.com has a simple explanation of why GM is dying: