It’s one a.m. and I’ve just deployed the latest version of your application to the production server. I tested it on my local computer, I saw it work on the development server. Now it’s broken and I’m trying to decide whether to try to quickly fix the problem or roll back, assuming I’ve left myself with an easy way to roll back. This has happened to me more times than I’d like to admit. Lately I’ve been trying to figure out ways to make the deployment process more predictably successful.
Here’s a brief catalog of common deployment issues that I’ve been burned by in the past. If you don’t use version control or you habitually edit your code on the production server while it’s deployed, there are whole classes of other problems you may see as well. I don’t do those things. (Or admit to it, anyway.)
The development and production databases are out of sync. This is probably number one on the list of reasons for botched deployments. You deploy your new code to the production server and find that you forgot to add that one little column to the database, which in turn causes everything to blow up. Fortunately this is often one of the easiest problems to fix on the fly. Rails migrations are one of the most elegant solutions to this problem, I’m still looking for equivalents for other platforms. I’ve also been trying to find an application that will compare two schemas and show the differences.
You deployed the wrong code. This is a problem that usually pertains to how you manage your code in your version control system. Depending on your release process, it can be easy to wind up with the wrong code sitting on the production server after deployment. I find this often happens when one step in the process involves merging code from a development branch to a release branch (or something similar). Often the code is not properly tested after the merge, and surprising things happen on deployment. This is where tools that aid in merging branches like svnmerge can be helpful. You can also just run a full diff of the production and development branches as a sanity check to make sure that you’re really deploying the code you intend to deploy.
Missing dependencies on production. This is a reason why I’m generally a fan on packaging required libraries with the application I’m deploying. For example, if I’m working on a Java application, I prefer to keep as many of the JAR files the application depends on in the WAR file for the application. If there’s a required library present on the development box that is missing on the production box, bad things happen. I’ve also seen more subtle problems arise when versions are slightly out of sync. For example, I’ve seen this happen with the MySQL JDBC driver at times. In order to configure your database resources using JNDI, the database driver needs to be available to Tomcat, so it lives in $TOMCAT_HOME/common/lib
rather than inside the application. That means that it’s possible for versions to differ from server to server, and some versions have bugs that others don’t. Beyond packaging libraries with the application being deployed, this problem is a tough one to keep under control.
Configuration files gone awry. It’s almost always the case that applications have some environment-specific configuration, even if it’s just to indicate which database the application needs to connect to. I mentioned in the previous item that I use JNDI for database configuration with Java applications. It enables me to refer to the same JNDI resource in my applications and keep the environment-specific stuff in the Tomcat configuration file. That exposes me to configuration-related risks. If you change your configuration requirements, but you haven’t updated the configuration on production to reflect those changes, you’ll run into problems at deployment. This problem is similar to having your database schemas out of sync. Another common problem is clobbering the production-specific configuration with the configuration from development. There are a number of ways to avoid this problem, but my favorite is to use whatever ignore facility your version control system provides judiciously. For example, for Rails applications, I never keep the database.yml
file in version control, and I put it on the ignore list so that it never gets added by mistake. Instead I create an example that every developer copies when they set up the application. My deployment script copies the production database configuration at deployment time. That prevents the production database configuration from being overwritten when I deploy new versions of my application.
Differences in scale. Let’s say you work on a content management system for a newspaper. On your local machine, you delete and regenerate your database all the time. It never has more than a day’s worth of stories in it. On the development server, the database has a few thousand articles in it. On the production server, it has every article the newspaper has published for the past ten years, plus all of the reader comments associated with those articles. You will almost certainly find that your application behaves very differently in production than it does in development. That missing index in the development database may not matter, but in production it’s going to make certain queries slower by a factor of twelve. When you discover that the new features that worked brilliantly until they got to the production server bring your site to their knees upon deployment, you almost always have to roll back. There are many cases where differences in scale change things radically. Are you running a single server for development and deploying to a cluster? Does heavy usage on the production server reveal lost update problems and race conditions that you didn’t encounter when testing in development? Does your application suddenly crash under load? There’s no silver bullet for this one. Figuring out how scaling up will affect your application before deployment is one of those hard (or expensive) problems to address.
Deployment problems basically take two forms. The first is the obvious problem. In other words, nothing works. These are my favorite kinds of problems, because usually you can roll back and regroup, or you can apply an immediate fix. The second is the subtle problem, which every software developer hates. There are many kinds of subtle problems, but the subtle deployment problem is the worst because usually you can’t duplicate the problem in other environments. If you can, it wasn’t a deployment problem. Identifying such problems as deployment problems and then figuring out what caused them isn’t fun, and once you’ve spotted the problem, you have to decide whether to duplicate the problem in another environment or engage in the dreaded “fix by trial and error” approach. Either way, your life is probably hell until it’s fixed.
The best tonic for deployment problems is better tools. In the Java world, WAR files were a great advancement in terms of providing for smoother deployments. They allow you to package up your application and distribute it cleanly, and generally deploying WAR files is very easy. Ruby on Rails is, I think, the most deployment-friendly framework out there today. (I say that without knowing how Django compares.) Capistrano is a wonderful tool for managing code deployment and it’s simple enough that it’s being applied to non-Rails applications. (Here’s an explanation of how it was applied to a Java application, and here’s a similar article on Capistrano and PHP.) I’ve already mentioned migrations, which are a Rails feature that I’d like to see available for non-Rails applications as well.
The other key tool in the world of deployment is your version control system. If you are still deploying directly from your main development branch, I would suggest that your release process probably needs some work. Your version control system is what enables you to work on new features that will go live in six months and bug fixes that will go live tomorrow simultaneously. It’s the tool that will enable you to roll back to a version of your application that definitely works when you find yourself in the weeds. Be sure to give it the love and attention it deserves.
That’s my primer on deployment problems. Always remember that an application that works on your workstation only works in theory.
Setting Apache MaxClients for small servers
Every once in awhile, the server this blog runs on chews up all of its RAM and swap space and becomes unresponsive, forcing a hard reboot. The problem is always the same — too many Apache workers running at the same time. It happened this morning and there were well over 50 Apache workers running, each consuming about 15 megs of RAM apiece. The server (a virtual machine provided by Linode) has 512 megs of RAM, so Apache is consuming all of the VM’s memory on its own.
At first I decided to attack the problem through monitoring. I had Monit running on the VM but it wasn’t actually monitoring anything. I figured that I’d just have it monitor Apache and restart it whenever it starts consuming too many resources. I did set that up, but I wondered how Apache was able to get itself into such a state in the first place.
The problem was that Apache was configured very poorly for my VM. Because I’m running PHP apps with the PHP module, I’m running Apache using the prefork module. For more information on Apache’s Multi-Processing Modules, check out the docs. Basically, prefork doesn’t use threads, so you don’t have to make sure your applications and libraries are thread-safe.
Anyway, here are the default settings for Apache in Ubuntu when it comes to resource limits:
In preform mode, Apache can handle one incoming request per process. So in this case, when Apache starts, it starts five worker processes. It also tries to keep five spare servers idle for incoming demand. If it has ten idle servers, it starts shutting down processes until the number of idle servers goes below ten. Finally,
MaxClients
is the hard limit on the number of workers Apache is allowed to start. So on my little VM, Apache feels free to start up to 150 workers, at 15 megs of RAM apiece, using up to 2.25 gigabytes of RAM, which is more than enough to consume all of the machine’s RAM and swap space.This number is far, far, far too high for my machine. I had to do this once before but when I migrated from Slicehost to Linode some time ago, I forgot to manually change the Apache settings. I wound up setting my machine to a relatively conservative
MaxClients
setting of 8. I’m still tweaking the other settings, but for a server that’s dedicated to Web hosting, you may as well set theStartServers
setting to the same as theMaxClients
setting so that it never has to bother spinning up new server processes to meet increasing demand.Currently my configuration looks like this:
The only danger with this low setting is that if there are more than 8 simultaneous incoming requests, the additional requests will wait until a worker becomes available, which could make the site really slow for users. Right now I only have about 60 megs of free RAM, though, so to increase capacity I’d need to either get a larger VM, move my static resources to S3, or set up a reverse proxy like Varnish and serve static resources that way.