It’s one a.m. and I’ve just deployed the latest version of your application to the production server. I tested it on my local computer, I saw it work on the development server. Now it’s broken and I’m trying to decide whether to try to quickly fix the problem or roll back, assuming I’ve left myself with an easy way to roll back. This has happened to me more times than I’d like to admit. Lately I’ve been trying to figure out ways to make the deployment process more predictably successful.
Here’s a brief catalog of common deployment issues that I’ve been burned by in the past. If you don’t use version control or you habitually edit your code on the production server while it’s deployed, there are whole classes of other problems you may see as well. I don’t do those things. (Or admit to it, anyway.)
The development and production databases are out of sync. This is probably number one on the list of reasons for botched deployments. You deploy your new code to the production server and find that you forgot to add that one little column to the database, which in turn causes everything to blow up. Fortunately this is often one of the easiest problems to fix on the fly. Rails migrations are one of the most elegant solutions to this problem, I’m still looking for equivalents for other platforms. I’ve also been trying to find an application that will compare two schemas and show the differences.
You deployed the wrong code. This is a problem that usually pertains to how you manage your code in your version control system. Depending on your release process, it can be easy to wind up with the wrong code sitting on the production server after deployment. I find this often happens when one step in the process involves merging code from a development branch to a release branch (or something similar). Often the code is not properly tested after the merge, and surprising things happen on deployment. This is where tools that aid in merging branches like svnmerge can be helpful. You can also just run a full diff of the production and development branches as a sanity check to make sure that you’re really deploying the code you intend to deploy.
Missing dependencies on production. This is a reason why I’m generally a fan on packaging required libraries with the application I’m deploying. For example, if I’m working on a Java application, I prefer to keep as many of the JAR files the application depends on in the WAR file for the application. If there’s a required library present on the development box that is missing on the production box, bad things happen. I’ve also seen more subtle problems arise when versions are slightly out of sync. For example, I’ve seen this happen with the MySQL JDBC driver at times. In order to configure your database resources using JNDI, the database driver needs to be available to Tomcat, so it lives in
$TOMCAT_HOME/common/lib rather than inside the application. That means that it’s possible for versions to differ from server to server, and some versions have bugs that others don’t. Beyond packaging libraries with the application being deployed, this problem is a tough one to keep under control.
Configuration files gone awry. It’s almost always the case that applications have some environment-specific configuration, even if it’s just to indicate which database the application needs to connect to. I mentioned in the previous item that I use JNDI for database configuration with Java applications. It enables me to refer to the same JNDI resource in my applications and keep the environment-specific stuff in the Tomcat configuration file. That exposes me to configuration-related risks. If you change your configuration requirements, but you haven’t updated the configuration on production to reflect those changes, you’ll run into problems at deployment. This problem is similar to having your database schemas out of sync. Another common problem is clobbering the production-specific configuration with the configuration from development. There are a number of ways to avoid this problem, but my favorite is to use whatever ignore facility your version control system provides judiciously. For example, for Rails applications, I never keep the
database.yml file in version control, and I put it on the ignore list so that it never gets added by mistake. Instead I create an example that every developer copies when they set up the application. My deployment script copies the production database configuration at deployment time. That prevents the production database configuration from being overwritten when I deploy new versions of my application.
Differences in scale. Let’s say you work on a content management system for a newspaper. On your local machine, you delete and regenerate your database all the time. It never has more than a day’s worth of stories in it. On the development server, the database has a few thousand articles in it. On the production server, it has every article the newspaper has published for the past ten years, plus all of the reader comments associated with those articles. You will almost certainly find that your application behaves very differently in production than it does in development. That missing index in the development database may not matter, but in production it’s going to make certain queries slower by a factor of twelve. When you discover that the new features that worked brilliantly until they got to the production server bring your site to their knees upon deployment, you almost always have to roll back. There are many cases where differences in scale change things radically. Are you running a single server for development and deploying to a cluster? Does heavy usage on the production server reveal lost update problems and race conditions that you didn’t encounter when testing in development? Does your application suddenly crash under load? There’s no silver bullet for this one. Figuring out how scaling up will affect your application before deployment is one of those hard (or expensive) problems to address.
Deployment problems basically take two forms. The first is the obvious problem. In other words, nothing works. These are my favorite kinds of problems, because usually you can roll back and regroup, or you can apply an immediate fix. The second is the subtle problem, which every software developer hates. There are many kinds of subtle problems, but the subtle deployment problem is the worst because usually you can’t duplicate the problem in other environments. If you can, it wasn’t a deployment problem. Identifying such problems as deployment problems and then figuring out what caused them isn’t fun, and once you’ve spotted the problem, you have to decide whether to duplicate the problem in another environment or engage in the dreaded “fix by trial and error” approach. Either way, your life is probably hell until it’s fixed.
The best tonic for deployment problems is better tools. In the Java world, WAR files were a great advancement in terms of providing for smoother deployments. They allow you to package up your application and distribute it cleanly, and generally deploying WAR files is very easy. Ruby on Rails is, I think, the most deployment-friendly framework out there today. (I say that without knowing how Django compares.) Capistrano is a wonderful tool for managing code deployment and it’s simple enough that it’s being applied to non-Rails applications. (Here’s an explanation of how it was applied to a Java application, and here’s a similar article on Capistrano and PHP.) I’ve already mentioned migrations, which are a Rails feature that I’d like to see available for non-Rails applications as well.
The other key tool in the world of deployment is your version control system. If you are still deploying directly from your main development branch, I would suggest that your release process probably needs some work. Your version control system is what enables you to work on new features that will go live in six months and bug fixes that will go live tomorrow simultaneously. It’s the tool that will enable you to roll back to a version of your application that definitely works when you find yourself in the weeds. Be sure to give it the love and attention it deserves.
That’s my primer on deployment problems. Always remember that an application that works on your workstation only works in theory.
You shouldn’t hate releasing code
You shouldn’t hate releasing software. Especially if your software is a Web site, not something that comes in an installer or disk image. And yet it’s really easy to get into a state where releases suck. I think that the thought of making releases suck less is the one of the main attractions of continuous deployment.
Here’s a partial list of reasons why releases can suck:
My theory is that releases don’t have to suck. The releases I’m involved with exhibit few of these problems, but my goal for 2010 is to eliminate the remaining problems to excise the remaining suckiness.
If you liked this, you may also like my post When deployments go wrong from a couple of years ago.