The long journey toward production
3

The long journey toward production

Last week one of the data analysts at work asked me to help him out with a script he was writing. The script generates a CSV file and uploads it to an FTP server. He had one file containing a sequence of SQL queries, and another shell script that executes that script and then uploads the results via FTP. I thought it would be fun to write up what it took to convert those bits of code into something that meets the definition of production service in our environment.

The first, most obvious problem, was that the script was running on the analyst’s development VM, not on a production server. Relatedly, it was running from his personal crontab. The only traces that this production service even existed were in his personal space. That seemed wrong. It also queries tables in his personal schema and had the credentials for his database account hard coded in the script.

Fortunately we already have a production cron server that’s hooked into our deployment system along with a version controlled directory for the scripts to schedule cron jobs.

Relatedly, we are mostly a PHP shop, and we write cron jobs as command line PHP scripts. This may not be to your tastes (or mine), but it’s what we do. So the script needed to be ported to PHP. It also needed to extend our standard base class for crons. This provides basic features like logging to our centralized log management system, as well as conveniences like locking to prevent overlapping runs of the script and the ability to accept an email address to which to send alerts.

To get all of this working I had to rewrite the script in PHP, implementing the functionality to generate the CSV file and then send it via FTP. The SQL queries required to collect the data creates a couple of temporary tables and then runs a query against those tables. My first thought was that I would just run the queries natively from PHP through our database library, but the temporary tables only last the duration of a database session and there’s no guarantee that the queries will be run within the context of a single session, so the tables were disappearing before I could retrieve data from them.

Instead I had to put all of the queries into one variable and then run them through the command line client for the database using PHP’s proc_open function, piping the contents of the variable to the external process. I also switched things up to use the appropriate database credentials, which required the analyst to update the permissions for that table. Ideally, we’ll eventually change things up so that the data is stored in a production schema.

At that point, I had a script that would work but it didn’t have any error handling and it wasn’t a subclass of the base cron script we use. Adapting it to use the base cron script was pretty straightforward. Error handling for these types of scripts is a bit more complex. I opted to do one check to see whether the CSV file was created successfully, and then to catch any errors that occurred with FTP and alert. Fortunately, the base cron script makes it easy to send email when failures occur, so I didn’t have to write that part.

Finally, I just had to pick a time for the script to run, add the crontab entry, and then push the script through our deployment system. Or at least that was the idea. For whatever reason, the script works when I run it manually but it does not appear to be running through cron, so I’m running it manually every day for now. I also realized that if the script runs before the big data job that generates the data for it finishes, or that job fails for any reason, then the output of the script will be wrong. That means I need another layer of error handling to detect problems with the big data job and send an alert rather than uploading invalid data.

Why write this up? It’s to point out that for most projects, getting something to work is just a small, small part of building a production service. Exporting a CSV file from a database query and uploading it to an FTP server takes just a few minutes. Converting that into a service that runs within the standard infrastructure, and handles failure conditions smoothly takes hours.

There are a few takeaways here. The first is that anything we can do to make it easier to build production services is almost certainly worth the investment. Having a proper cron base script was really helpful. I’m creating a superclass of that base class that’s designed just for these specific kinds of jobs to make this work easier next time.

The second is an acknowledgement on the part of everyone involved in a project that getting something working is just the beginning, not the end of the project. The work of making a service production-ready isn’t fun or glamorous, but it’s what separates the hacker from the software engineer. Managers need to account for the time it takes to get something ready for production when allocating resources. And everybody needs to be smart about figuring out the level of reliability any service needs. If a service is going to run indefinitely, you need to be certain that it will work when the person who wrote it goes on vacation.

The third is that at any company, people are building services like this all the time outside the production context. You usually find out things went wrong with them at the worst possible time.

Don’t get stuck
17

Don’t get stuck

At Etsy, our engineering is well known for practicing continuous deployment. For all of the talk in the industry about continuous deployment, I don’t think that its impact on personal productivity is fully understood.

If you don’t work in a shop that does continuous deployment, you may assume that the core of it is that releases are not really planned. Code is pushed as soon as it’s finished, not according to some schedule, and that’s true, but there’s a much deeper truth at work. The real secret to the success of continuous deployment is that code is pushed before it’s done.

When people are practicing continuous deployment the Etsy way, they start out by adding a config flag to the code base behind which they can hide all of the code for their feature. As soon as the flag has been added, they add some conditional code to the application that creates a space that they can fill with the code for their new feature. At that point, they should be pushing code as frequently as is practical.

This is the core principle of continuous deployment. It doesn’t matter if the feature doesn’t work at all or there’s nothing really to show, you should be pushing code in small, digestible chunks. Ideally, you’ve written tests that are then part of the continuous integration suite, and you’re having people review that code before it goes out. So even though you don’t have a working feature, you’re confident the code you’re producing is robust because other people have looked at it, and it’s being tested every time anyone deploys or runs the suite of automated tests. You’re also reducing the chances of having to spend hours working through a painful merge scenario.

Many engineers are not prepared to work this way. There’s a strong urge to hold onto your code until you’ve made significant progress. On many teams, working on a feature for a week or two to build something real before you push it is completely normal. At Etsy, we see that as a risky thing to do. At the end of those two weeks you’re pushing a sizable chunk of code that has never been tested and has never run on the production servers out into the world. That chunk of code very well may be too big for another engineer to review carefully in a reasonable amount of time. It should have been broken up.

Pushing code frequently is the main factor that mitigates the risk of abandoning the traditional software release cycle. If you deploy continuously but the developers all treat the project like they’re developing in a more traditional fashion, it won’t work.

That’s the systems-based argument for pushing code at a rate that tends to make people uncomfortable, but what I want to talk about is how taking this approach improves personal productivity. I’m convinced that one thing that separates great developers from good developers is that great developers don’t allow themselves to get stuck. And if they do get stuck, they get stuck on design issues, and not on problem solving or debugging.

Thinking in terms of what code you can write that you can push immediately is one way to help keep from getting stuck. In fact, a mental exercise I use frequently when I’m blocked on solving a problem is to try to come up with the smallest thing I can do that represents progress. Focusing on deploying code frequently helps me stay in that mindset.

John Carmack’s beautiful code
0

John Carmack’s beautiful code

One of my coworkers sent this Kotaku article by game developer Shawn McGrath to our internal code readers mailing list. It’s a review of the Doom III source code, describing the beautiful coding style of Id Software founder John Carmack, one of the programmers who I most respect.

It’s a thoughtful article about thoughtfully written code, and really shows the value of reading code for the practicing programmer.

One of my favorite things about John Carmack is that he has always been more of a craftsman than a theoretician when it comes to developing software. To get an idea of what I mean, take a look at his blog post on functional programming in C++, which he linked to in his comment on the Kotaku article. Carmack’s post is a far better introduction to the benefits of functional programming than the one I linked to the other day.

Getting started with functional programming
0

Getting started with functional programming

I fully intend to write a post talking about stuff I learned in 2012, but in truth, I’ll probably never get around to it. The blog suffered last year because I was so busy stuffing new stuff into my head that I didn’t have the energy to write much of it down.

One of the big things I learned was that while I’ve programmed in a lot of languages, they all came from the same family and I used them all in the same way. That left a huge gaping hole in my experience called “functional programming.” There’s another hole involving functions as first class objects that I’m trying to fill up as well.

If you know nothing about functional programming, Uncle Bob has a short and useful introduction that’s worth reading. If you want to master the concepts, I recommend The Little Schemer.

I still don’t do much functional programming in my data to day life beyond the occasional bit of Scala hacking, but I find that functional concepts make it really easy to break down certain kinds of problems regardless of which language I’m using. For example, it’s really easy to write a binary search implementation using a functional approach.

In the larger scheme of things, I was able to get away with ignoring functional programming for a long time, but I don’t think that’s possible any more. Not only are functional languages picking up steam, but functional techniques are everywhere these days if you know where to look for them.

The era of the REPL
2

The era of the REPL

Coming from a background in higher-level languages like Ruby, Scheme, or Haskell, learning C can be challenging. In addition to having to wrestle with C’s lower-level features like manual memory management and pointers, you have to make do without a REPL. Once you get used to exploratory programming in a REPL, having to deal with the write-compile-run loop is a bit of a bummer.

From Alan O’Donnell’s article, Learning C with GDB. What I take away from this is that I’m not using REPLs nearly enough, especially when I’m learning new things.

Learning a language by porting from it
0

Learning a language by porting from it

I’ve been experimenting with a new approach to learning a programming language — porting code from a language I don’t know to one that I already do. In this case, I’m porting Scala code to Java.

The traditional method of learning a language is to find a project and start writing code in the new language, and I’ve used it many times in the past. In this case, I’m trying to get up to speed on an existing code base written in Scala so that I can help maintain and extend it. Part of the work is learning how to write Scala, and part of it is becoming familiar with the structure and style of this code base.

I’m finding it works really well. You can’t successfully port code to another language unless your really understand how it works, and you have to learn the language to get there. In the worst case, if I port everything and I still don’t know Scala, we can just use the ported code.

There is value in adapting to the world
2

There is value in adapting to the world

When I was younger, I was obsessed with the following maxim from George Bernard Shaw’s Maxims for Revolutionists:

The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

That quote sprung to mind when I read this post by Sam Stephenson where he explains why he doesn’t invest too much effort in his dot files. Here’s the key bit:

What I discovered is that in many cases, my ability to adapt to a foreign environment without frustration is more important than the benefits of configuring a local environment to suit my whims. And that being able to quickly recreate my environment from scratch is an asset.

I fall more into Stephenson’s camp than I do into Shaw’s at this point, at least when it comes to my own tools. As software developers, we are trying to build things that adapt the world to ourselves, but with experience, I find it more and more important not to bind myself too tightly to any particular tool or process.