rc3.org

Strong opinions, weakly held

Month: January 2013 (page 1 of 2)

The state of blog comments in 2013

It seems like the recent trend has been toward publishing blogs without public comment sections. For one thing, people are sick of spam, Internet cranks, and the work required to have a comments section that adds to the site rather than detracting. As an example, here’s Matt Gemmell explaining why he turned off comments. At the same time, many writers are using static publishing engines or sites like Tumblr that don’t even support comments. (I know you can use third party commenting tools like Disqus, but most people don’t.)

Here’s a chart that shows the comments per post broken down by month going back to the first month that I added comment support to the blog. As you can see, the number of comments per post has dropped, even though I’m publishing fewer posts. That said, the variability has been pretty high, and the months with high averages are a result of extreme outliers — blog posts that got tons of traffic and comments. A real statistician would also share the standard deviation for each data point.

Comments per post

So I think that not only has publishing a blog that includes comments gone out of fashion, but commenting has as well. It’s easier to link to a post from Twitter or Google Plus and just discuss it in those venues, or respond to the author of a post on Twitter rather than in the comment section. I understand the impulse, but these venues are especially ephemeral as compared to blogs or even blog comments.

I used to be pretty reluctant to comment on blog posts on other blogs, but these days if I have something to say and I’m not going to write a post about it of my own, I generally post a comment, especially if the publisher moderates the comments effectively. I do this because I see it as a way to signal my respect for the writer. I also think it’s more social.

Those of us with blogs have also always had the option of responding to posts on other blogs with posts on our own blogs. The upside is that you hopefully deliver some readers to the blog you’re linking to. The downside is that unless the writer links back to you, your commentary is probably lost on everyone who finds the blog post that is the subject of your comments later.

Maybe it’s the Twitter effect, but I find that I crave more discussion when I post something. Looking at traffic numbers is far less interesting than having a productive or enlightening discussion. For that reason, I hate that commenting appears to be on the wane.

Why’s SQL injection so prevalent?

Why are SQL injection vulnerabilities so prevalent? Because most of the PHP/MySQL documentation uses examples with SQL injection vulnerabilities and no discussion of the potential risks.

John Carmack’s beautiful code

One of my coworkers sent this Kotaku article by game developer Shawn McGrath to our internal code readers mailing list. It’s a review of the Doom III source code, describing the beautiful coding style of Id Software founder John Carmack, one of the programmers who I most respect.

It’s a thoughtful article about thoughtfully written code, and really shows the value of reading code for the practicing programmer.

One of my favorite things about John Carmack is that he has always been more of a craftsman than a theoretician when it comes to developing software. To get an idea of what I mean, take a look at his blog post on functional programming in C++, which he linked to in his comment on the Kotaku article. Carmack’s post is a far better introduction to the benefits of functional programming than the one I linked to the other day.

Aaron Swartz’s Wikipedia analytics

One less remarked upon contribution Aaron Swartz made as an engineer was his 2006 post Who Writes Wikipedia? Having heard that around 500 editors made over 50% of the edits to Wikipedia from Jimmy Wales, Aaron performed his own analysis and found that while a core group of editors made most of the edits, a much larger group of people did most of the actual writing.

In the days before it was common for every site to perform quantitative analysis user activity and publish the results, Aaron embarked on what was for its day a Big Data project, and published a surprising and interesting result that vastly changed people’s understanding of Wikipedia. This is the sort of work everyone in analytics aspires to do.

The world will miss Aaron Swartz

Add Aaron Swartz to the list of people I wish I’d met before it was too late. Cory Doctorow has written a great remembrance. When I was younger, I was obsessed with this famous George Bernard Shaw quote:

The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

I never really lived it, though. Aaron Swartz did. The world never treats those people especially well.

Big Data and analytics link roundup

Here are a few things that have caught my eye lately from the world of Big Data and analytics.

Back in September, I explained why Web developers should care about analytics. This week I noticed a job opening for a Web developer at Grist that includes knowledge of analytics in the list of requirements. That doesn’t exactly make for a trend, but I expect to see a lot more of this going forward.

Also worth noting are the two data-related job openings at Rent the Runway. They have an opening for a data engineer and one for a data scientist. These two jobs are frequently conflated, and there is some overlap in the skill sets, but they’re not the same thing. For the most part what I do is data engineering, not data science.

If you do want to get started in data science, you could do worse than to read Hilary Mason’s short guide. Seth Brown has posted an excellent guide to basic data exploration in the Unix shell. I do this kind of stuff all the time.

Here are a couple of contrary takes on Big Data. In the New York Times, Steve Lohr has a trend piece on Big Data, Sure, Big Data Is Great. But So Is Intuition. Maybe it’s different on Wall Street, but I don’t see too many people divorcing Big Data from intuition. Usually intuition leads us to ask a question, and then we try to answer that question using quantitative analysis. That’s Big Data to me. For a more technical take on the same subject, see Data-driven science is a failure of imagination from Petr Keil.

On a lighter note, Sean Taylor writes about the Statistics Software Signal.

Vim and ctags

I’ve been living in Vim for the past year or so, and doing so has yielded great rewards in terms of productivity and one decent post on the grammar of Vim. I’ve tried to be miserly with Vim plugins. There are tons of great plugins for Vim, but I feel like it’s more important to master the built in features of the editor before I start adding new ones. For example, I’m still not a proficient user of registers, and I still miss opportunities to perform operations using the inside and around grammar.

Ctags is a really powerful external tool that I don’t feel violates my self-imposed plugin moratorium. It creates an index of your code so that you can navigate within and between files based on identifiers in the code. If you’re accustomed to using an IDE like Eclipse, this is functionality you come to take for granted. In the Vim world, you need to use Ctags to get it. I’ve gotten by all year (and for the past couple of decades of vi usage) without bothering with Ctags, but searching for a method definition with Ack for the billionth time finally led me into getting serious about setting up Ctags for my projects.

Getting started with Ctags is easy. Once you’ve installed the Ctags package, just go to the directory with your source code and run Ctags like this:

ctags *.rb

This produces tags for all of the Ruby files in the current directory, helpfully storing them in a file named tags. It’s an ASCII file, you can view it if you want to. Once you’ve done so, you can open Vim (or another editor that supports Ctags), and then use the tags to navigate around. You can use :tag to jump to a specific tag, or Control-] to jump to whatever tag is under the cursor. Once you’ve jumped to a tag, you can jump back to the previous location with Control-T. There’s a Vim tip that explains the tag-related commands.

The static nature of tag files means that some customization of your environment is necessary to get things working smoothly. As soon as you start changing your files or you pull in updates from version control, the tags are out of date, so you need a way to automatically keep them up to date. You also need to set things up properly so that Vim can properly look up tags for files in other directories.

Let’s talk about the second challenge first, because you need to figure out how you’re going to solve it before you can solve the first problem. You can generate tags recursively using the -R flag with ctags. To generate the tags for all of the files in the current directory and its subdirectories, you can run:

ctags -R .

This may seem like a good idea, but there are some issues that aren’t worth going into in this blog post.

Another option is to put a tags file in each directory. One reason is that having tags in each directory facilitates keeping your tags up to date automatically. I’ll discuss that shortly. The other relates to how Vim locates the tags in other directories. You can configure Vim with the locations to search for tag files using the tags setting. By default, it looks like this:

tags=./tags,./TAGS,tags,TAGS

Vim keeps track of two “current” directories, which may be the same. The first is the current directory of the shell used to open Vim. You can find out what it is by using the :pwd command. The second is the directory of the file in the current buffer. You can show the full path to the current buffer with the command :echo expand('%:p'). This configuration indicates that Vim should search for tags in the tags file in its current directory, and then check the tags file in the directory of the file in the current buffer.

This works fine for simple projects where all of the files are in the same directory. In many cases, though, projects span a nested directory structure and you may want to look up a tag in one directory that’s in a source file in another directory. Here’s my tags setting:

set tags=./tags,tags;

I got rid of the capitalized tag file names because I don’t roll that way. I also added a trailing semicolon to the setting, which enables Vim to recurse all of the subdirectories of its current directory looking for tag files. The tag search path can be broadened as much as you like, but that’s sufficient for me. The only catch is that I have to open Vim from my project’s home directory if I want to be able to search the tags for the whole project.

This scheme works perfectly with the “one tags file per directory” approach. Now the trick is to generate all of the files and keep them up to date. There are a number of approaches you can take, helpfully listed in the Exuberant Ctags FAQ. I’m using strategy #3, because it’s the one the author of Ctags recommends and I don’t know anything he doesn’t.

I have a script called projtags that generates tag files in every directory under the current directory. When I want to use ctags for a project, I switch to the project directory and run this script. You can find it in this Gist.

To update the tags for a file when I save a file in Vim, I use an autocommand in my Vim configuration. The source for that is in another Gist that you can copy. The function updates the tags whenever you save a file and there’s a tags file in the directory of the file being saved. This prevents new tag files from being created in random directories that aren’t part of projects. The functions delete the tags for the current file being saved from the tags file using sed and then uses ctags -a to append the tags for the file being saved to the tags file. This is faster than generating tags for all of the files in the directory. You can just paste the contents of the Gist into your .vimrc file.

I also want to update my tags whenever I pull in other people’s changes from version control. I could just run my projtags script when I pull new files, but for one of our projects, it takes about 40 seconds to run. Too slow. Instead, I have a script called updatetags that finds all of the directories where the tags file is not the newest file in the directory and regenerates the tags file for those directories. It also generates tags in directories that were added since the last run. (It’s in a Gist as well.)

The final step is invoking the script. There are a lot of ways to do so, but I use Git, and I want the script to run automatically after I pull in code from remote repositories. To cover all cases, I run the following commands (from the home directory of the repository):

ln -s $SCRIPT_DIR/updatetags .git/hooks/post-checkout
ln -s $SCRIPT_DIR/updatetags .git/hooks/post-merge

The $SCRIPT_DIR variable is just a placeholder for the actual directory where updatetags lives.

I should add that one special bonus when you have your tags set up properly is that you can open a tag from the command line rather than a file, using the -t flag. So if you want to open a class named UserHistory you can just type:

vim -t UserHistory

I immediately found this to be fantastically efficient.

This system of managing tag files may be grossly inefficient. If you have a better way of managing your tags, I’d love to hear about it in the comments.

For more information:

G. K. Chesterton on software development

John D. Cook blogs a great quotation from G. K. Chesterton that advises caution before removing something. In essence, he challenges people who would remove something unnecessary to go and figure out why it was erected in the first place. Needless to say, this is an issue we deal with a lot in writing software.

In the comments of the post, an interesting debate plays out about the role of code, automated tests, and documentation in helping people figure out why the code that they want to delete was originally written.

Written properly, your code should be self-documenting in terms of its basic workings. Donald Knuth’s literate programming takes this approach to an extreme, but we can move a long way toward it just by naming things well and structuring our programs to emphasize readability. Comments still have their place in areas where efficiency trumps readability, but generally, we should always prefer writing our code so that it’s comprehensible without them.

Unit tests verify that code is functioning properly. They have some use as documentation, but their purpose is to enable you to refactor your code with the assurance that you haven’t broken anything as long as the tests still pass.

Documentation in comments or elsewhere should be the story of why the code works the way it does. Why was a particular algorithm chosen? What compromises did you have to make for performance? What’s likely to fail under strain? These are the sorts of questions that are difficult to answer in the code itself, but are important to anyone who’s expected to maintain the code in the future.

For another take on learning about why things are as they are before making changes, check out my post from last February on getting off to a successful start at a new job. My advice to myself served me pretty well in 2012.

More on functional programming

Yesterday I linked to Uncle Bob’s intro to functional programming. There are some interesting reactions floating around today. Tim Bray agrees that FP is important, but doesn’t like the magical example or the fact that it’s written in Lisp. The reliably cranky Cédric Beust pushes back on the article with vigor, mainly on the point that functional programming is the “next big thing” in software engineering.

For what it’s worth, I think that functional programming is worth learning more about because it will make you a better programmer in whatever language you use, not because one day you’ll be using FP and abandoning the use of mutable variables.

Getting started with functional programming

I fully intend to write a post talking about stuff I learned in 2012, but in truth, I’ll probably never get around to it. The blog suffered last year because I was so busy stuffing new stuff into my head that I didn’t have the energy to write much of it down.

One of the big things I learned was that while I’ve programmed in a lot of languages, they all came from the same family and I used them all in the same way. That left a huge gaping hole in my experience called “functional programming.” There’s another hole involving functions as first class objects that I’m trying to fill up as well.

If you know nothing about functional programming, Uncle Bob has a short and useful introduction that’s worth reading. If you want to master the concepts, I recommend The Little Schemer.

I still don’t do much functional programming in my data to day life beyond the occasional bit of Scala hacking, but I find that functional concepts make it really easy to break down certain kinds of problems regardless of which language I’m using. For example, it’s really easy to write a binary search implementation using a functional approach.

In the larger scheme of things, I was able to get away with ignoring functional programming for a long time, but I don’t think that’s possible any more. Not only are functional languages picking up steam, but functional techniques are everywhere these days if you know where to look for them.

Older posts

© 2024 rc3.org

Theme by Anders NorenUp ↑