When deployments go wrong

It’s one a.m. and I’ve just deployed the latest version of your application to the production server. I tested it on my local computer, I saw it work on the development server. Now it’s broken and I’m trying to decide whether to try to quickly fix the problem or roll back, assuming I’ve left myself with an easy way to roll back. This has happened to me more times than I’d like to admit. Lately I’ve been trying to figure out ways to make the deployment process more predictably successful.

Here’s a brief catalog of common deployment issues that I’ve been burned by in the past. If you don’t use version control or you habitually edit your code on the production server while it’s deployed, there are whole classes of other problems you may see as well. I don’t do those things. (Or admit to it, anyway.)

The development and production databases are out of sync. This is probably number one on the list of reasons for botched deployments. You deploy your new code to the production server and find that you forgot to add that one little column to the database, which in turn causes everything to blow up. Fortunately this is often one of the easiest problems to fix on the fly. Rails migrations are one of the most elegant solutions to this problem, I’m still looking for equivalents for other platforms. I’ve also been trying to find an application that will compare two schemas and show the differences.

You deployed the wrong code. This is a problem that usually pertains to how you manage your code in your version control system. Depending on your release process, it can be easy to wind up with the wrong code sitting on the production server after deployment. I find this often happens when one step in the process involves merging code from a development branch to a release branch (or something similar). Often the code is not properly tested after the merge, and surprising things happen on deployment. This is where tools that aid in merging branches like svnmerge can be helpful. You can also just run a full diff of the production and development branches as a sanity check to make sure that you’re really deploying the code you intend to deploy.

Missing dependencies on production. This is a reason why I’m generally a fan on packaging required libraries with the application I’m deploying. For example, if I’m working on a Java application, I prefer to keep as many of the JAR files the application depends on in the WAR file for the application. If there’s a required library present on the development box that is missing on the production box, bad things happen. I’ve also seen more subtle problems arise when versions are slightly out of sync. For example, I’ve seen this happen with the MySQL JDBC driver at times. In order to configure your database resources using JNDI, the database driver needs to be available to Tomcat, so it lives in $TOMCAT_HOME/common/lib rather than inside the application. That means that it’s possible for versions to differ from server to server, and some versions have bugs that others don’t. Beyond packaging libraries with the application being deployed, this problem is a tough one to keep under control.

Configuration files gone awry. It’s almost always the case that applications have some environment-specific configuration, even if it’s just to indicate which database the application needs to connect to. I mentioned in the previous item that I use JNDI for database configuration with Java applications. It enables me to refer to the same JNDI resource in my applications and keep the environment-specific stuff in the Tomcat configuration file. That exposes me to configuration-related risks. If you change your configuration requirements, but you haven’t updated the configuration on production to reflect those changes, you’ll run into problems at deployment. This problem is similar to having your database schemas out of sync. Another common problem is clobbering the production-specific configuration with the configuration from development. There are a number of ways to avoid this problem, but my favorite is to use whatever ignore facility your version control system provides judiciously. For example, for Rails applications, I never keep the database.yml file in version control, and I put it on the ignore list so that it never gets added by mistake. Instead I create an example that every developer copies when they set up the application. My deployment script copies the production database configuration at deployment time. That prevents the production database configuration from being overwritten when I deploy new versions of my application.

Differences in scale. Let’s say you work on a content management system for a newspaper. On your local machine, you delete and regenerate your database all the time. It never has more than a day’s worth of stories in it. On the development server, the database has a few thousand articles in it. On the production server, it has every article the newspaper has published for the past ten years, plus all of the reader comments associated with those articles. You will almost certainly find that your application behaves very differently in production than it does in development. That missing index in the development database may not matter, but in production it’s going to make certain queries slower by a factor of twelve. When you discover that the new features that worked brilliantly until they got to the production server bring your site to their knees upon deployment, you almost always have to roll back. There are many cases where differences in scale change things radically. Are you running a single server for development and deploying to a cluster? Does heavy usage on the production server reveal lost update problems and race conditions that you didn’t encounter when testing in development? Does your application suddenly crash under load? There’s no silver bullet for this one. Figuring out how scaling up will affect your application before deployment is one of those hard (or expensive) problems to address.

Deployment problems basically take two forms. The first is the obvious problem. In other words, nothing works. These are my favorite kinds of problems, because usually you can roll back and regroup, or you can apply an immediate fix. The second is the subtle problem, which every software developer hates. There are many kinds of subtle problems, but the subtle deployment problem is the worst because usually you can’t duplicate the problem in other environments. If you can, it wasn’t a deployment problem. Identifying such problems as deployment problems and then figuring out what caused them isn’t fun, and once you’ve spotted the problem, you have to decide whether to duplicate the problem in another environment or engage in the dreaded “fix by trial and error” approach. Either way, your life is probably hell until it’s fixed.

The best tonic for deployment problems is better tools. In the Java world, WAR files were a great advancement in terms of providing for smoother deployments. They allow you to package up your application and distribute it cleanly, and generally deploying WAR files is very easy. Ruby on Rails is, I think, the most deployment-friendly framework out there today. (I say that without knowing how Django compares.) Capistrano is a wonderful tool for managing code deployment and it’s simple enough that it’s being applied to non-Rails applications. (Here’s an explanation of how it was applied to a Java application, and here’s a similar article on Capistrano and PHP.) I’ve already mentioned migrations, which are a Rails feature that I’d like to see available for non-Rails applications as well.

The other key tool in the world of deployment is your version control system. If you are still deploying directly from your main development branch, I would suggest that your release process probably needs some work. Your version control system is what enables you to work on new features that will go live in six months and bug fixes that will go live tomorrow simultaneously. It’s the tool that will enable you to roll back to a version of your application that definitely works when you find yourself in the weeds. Be sure to give it the love and attention it deserves.

That’s my primer on deployment problems. Always remember that an application that works on your workstation only works in theory.

5 Comments

bryan
April 4, 2007 at 6:16 pm
…which is why one would use a test environment (dev is not test) prior to deployment.

The test environment’s server is as close to production as humanly possible. For the data store, we usually do a snapshot of production.
The deploy script goes:
1. Developer gets code working on her box, then checks in to subversion and tags a pre-release.
2. The developer then deploys to test and makes sure there aren’t any obvious errors. If all goes well, she notifies QA.
3. QA runs their testing.
4. If the code gets a green light, then QA runs the roll-to-production script which copies the pre-release tag to a release tag and then deploys it to the server.
Jeff
April 4, 2007 at 10:10 pm

For database changes…Red Gate’s SQL Compare and SQL Data Compare as life savers! They aren’t cheap, but they aren’t very expensive either. If there is a better product out there, I’d like to know about it. I suppose since it doesn’t have any integration with a source code control system, it isn’t perfect.

http://redgate.com/
Asd
May 15, 2007 at 9:19 am

Here is my primer on deployment problems: get a QA department you nitwit.
one more
May 15, 2007 at 1:28 pm

I doubt this will happen to many others.. but a few weeks ago I had a production deployment fail which I’ve performed at least 50 times, so it was quite surprising. Turns out, I had called my hosting companies tech support a few weeks prior to the deployment, and their solution to my problem (unbeknown to me) was to delete the production database user account. My application didn’t notice until it was stopped and started. Needless to say I kicked that hosting provider to the curb the very next week.
Marc
May 24, 2007 at 4:41 am

From http://www.netfocusconsulting.com/buildsystem.jsp I stumpled on to http://schemacrawler.sourceforge.net/ and remembered this blog entry where you asked for a tool to compare database schemas.

Here you go 🙂

Cheers Marc

rc3.org

Strong opinions, weakly held

When deployments go wrong

5 Comments

Leave a Reply Cancel reply

Recent Posts

Details

rc3.org

Strong opinions, weakly held

When deployments go wrong

Previous post

Next post

5 Comments

Leave a Reply Cancel reply

Recent Posts

Details