rc3.org

Strong opinions, weakly held

Category: Commentary (page 41 of 982)

The argument for strict coding standards

cbloom rants on the value of strict coding standards:

Strict coding standards are actually an intellectual relief because they remove all those decisions and give you a specific way to do the syntax. (The same of course goes for reading other people’s code – your eyes can immediately start looking at the functionality, not try to figure out the current syntax)

Don’t forget that adherence to coding standards also produces higher quality diffs.

Using HMAC to authenticate Web service requests

One weakness of many Web services that require authentication, including the ones I’ve built in the past, is that the username and password of the user making the request are simply included as request parameters. Alternatively, some use basic authentication, which transmits the username and password in an HTTP header encoded using Base64. Basic authentication obscures the password, but doesn’t encrypt it.

This week I learned that there’s a better way — using a Hash-based Message Authentication Code (or HMAC) to sign service requests with a private key. An HMAC is the product of a hash function applied to the body of a message along with a secret key. So rather than sending the username and password with a Web service request, you send some identifier for the private key and an HMAC. When the server receives the request, it looks up the user’s private key and uses it to create an HMAC for the incoming request. If the HMAC submitted with the request matches the one calculated by the server, then the request is authenticated.

There are two big advantages. The first is that the HMAC allows you to verify the password (or private key) without requiring the user to embed it in the request, and the second is that the HMAC also verifies the basic integrity of the request. If an attacker manipulated the request in any way in transit, the signatures would not match and the request would not be authenticated. This is a huge win, especially if the Web service requests are not being made over a secure HTTP connection.

There’s one catch that complicates things.

For the signatures to match, not only must the private keys used at both ends of the transaction match, but the message body must also match exactly. URL encoding is somewhat flexible. For example, you may choose to encode spaces in a query string as %20. I may prefer to use the + character. Furthermore, in most cases browsers and Web applications don’t care about the order of HTTP parameters.

foo=one&bar=two&baz=three

and

baz=three&bar=two&foo=one

are functionally the same, but the crypto signature of the two will not be.

Another open question is where to store the signature in the request. By the time the request is submitted to the server, the signature derived from the contents of the request will be mixed in with the data that is used to generate the signature. Let’s say I decide to include the HMAC as a request parameter. I start with this request body:

foo=one&bar=two&baz=three

I wind up with this one:

foo=one&bar=two&baz=three&hmac=de7c9b8 ...

In order to calculate the HMAC on the server, I have to remove the incoming HMAC parameter from the request body and calculate the HMAC using the remaining parameters. This is where the previous issue comes into play. If the HMAC were not in the request, I could simply calculate the signature based on the raw incoming request. Once I start manipulating the incoming request, the chances of reconstructing it imperfectly rise, possibly introducing cases where the signatures don’t match even though the request is valid.

This is an issue that everyone implementing HMAC-based authentication for a Web service has to deal with, so I started looking into how other projects handled it. OAuth uses HMAC, with the added wrinkle that the signature must be applied to POST parameters in the request body, query string parameters, and the OAuth HTTP headers included with the request. For OAuth, the signature can be included with the request as an HTTP header or as a request parameter.

This is a case where added flexibility in one respect puts an added burden on the implementor in others. To make sure that the signatures match, OAuth has very specific rules for encoding and ordering the request data. It’s up to the implementor to gather all of the parameters from the query string, request body, and headers, get rid of the oauth_signature parameter, and then organize them based on rules in the OAuth spec.

Amazon S3’s REST API also uses HMAC signatures for authentication. Amazon embeds the user’s public key and HMAC signature in an HTTP header, eliminating the need to extract it from the request body. In Amazon’s case, the signed message is assembled from the HTTP verb, metadata about the resource being manipulated, and the “Amz” headers in the request. All of this data must be canonicalized and added to the message data to be signed. Any bug in the translation of those canonicalization rules into your own codes means that none of your requests will be authenticated by Amazon.com.

Amazon uses the Authorization header to store the public key and HMAC. This is also the approach that Microsoft recommends. I think it’s superior to the parameter-based approach taken by OAuth. It should be noted that the Authorization header is part of the HTTP specification and if you’re going to use it, you should do so in a way that complies with the standard.

For my service, which is simpler than Amazon S3 or OAuth, I’ll be using the Authorization header and computing the HMAC based on the raw incoming request.

I realize that HMAC may not be new to many people, but it is to me. Now that I understand it, I can’t imagine using any of the older approaches to build an authenticated Web service.

Regardless of which side of the Web service transaction you’re implementing, calculating the actual HMAC is easy. Normally the SHA-1 or MD5 hashing algorithms are used, and it’s up to the implementor of the service to decide which of those they will support. Here’s how you create HMAC-SHA1 signatures using a few popular languages.

PHP has a built-in HMAC function:

hash_hmac('sha1', "Message", "Secret Key");

In Java, it’s not much more difficult:

Mac mac = Mac.getInstance("HmacSHA1");
SecretKeySpec secret = 
    new SecretKeySpec("Secret Key".getBytes(), "HmacSHA1");

mac.init(secret);
byte[] digest = mac.doFinal("Message".getBytes());

String hmac = Hex.encodeHexString(digest);

In that case, the Hex class is the Base64 encoder provided by Apache’s Commons Codec project.

In Ruby, you can use the HMAC method provided with the OpenSSL library:

DIGEST  = OpenSSL::Digest::Digest.new('sha1')

Base64.encode64(OpenSSL::HMAC.digest(DIGEST, 
  "Secret Key", "Message"))

There are also libraries like crypto-js that provide HMAC support for JavaScript.

I regret that I didn’t use Google Code Search more

Miguel de Icaza writes about the bad news that Google is shutting down Code Search. In it, he lists a number of things Code Search was useful for that never really occurred to me. I hate missing out. I particularly regret not taking advantage of it when I was wrestling with connection and socket timeouts with Commons HttpClient awhile back.

The purposes of punishment

On the Advanced NFL Stats blog, Brian Burke has written an interesting post about the philosophy of punishment. First, he lists the reasons why you punish people:

… punishment has at least five possible purposes: incapacitation, restitution, rehabilitation, deterrent, and prevention of retribution.

He then goes on to elaborate on each of those. I particularly liked his explanation of the last purpose:

Another purpose of punishment, one that I think we’ve lost touch with in modern society, is to prevent a cycle of vigilante retribution. The ancient verse eye for an eye, tooth for a tooth is widely understood as recommending retaliation. In tribal societies absent of a central authority, it was common for cycles of retribution to spiral out of control. If one party knocks out another’s tooth, pretty soon the victim’s cousins would be exacting revenge on the offender’s family. I understand the verse to mean hey dummy, don’t take an eye in exchange for a tooth. Knock out the other guy’s tooth and let that be the end of it. Otherwise we’ve got the Hatsfields and McCoys, Montagues and Capulets, or Bosnians and Serbs.

Given the pervasiveness of vigilanteism as a theme in fiction, I don’t think we’ve lost touch with this aspect of punishment to the degree that he supposes.

Why do good people build bad applications?

HeatstrokeLots of people are commenting on the Gun.io blog post The Government’s $200,000 Useless Android Application. Android developer Rich Jones stumbled across an application provided by the Occupational Safety and Health Administration (an agency of the US government). In the end, he discovers that the government paid $96,000 to have a contractor build this buggy application that would take him around 6 hours to write. You really should read the whole thing — the steps he went through to get the details of the contract are interesting.

The obvious response to this is to point out the pervasiveness of government waste, but everybody already knows that the government wastes a lot of money. In fact, pretty much any institution that has lots of money and lots of bureaucracy wastes a lot of money. For example, here’s a recent Tweet from Horace Dediu:

HP spent $1.2 billion to buy Palm, earned $600 million in losses and $1.5 billion to shut it down.

When I look at the OSHA app, what I wonder about is what kind of process led to something so bad being built and released. Who thought it was a good idea to spend $96,000 to build something so simple? What did the actual project entail?

I’ve seen seemingly simple tasks mushroom into huge projects in large organizations. Last year at work we were talking to a customer about integrating with our Web service. I can write the code to integrate with every feature we provide in a day or two — their internal estimate for the entire project was 1,000 to 2,000 hours, mainly because the customer bundled a lot of needed internal changes into the project. They did take on the project, and I have no idea how long it took them in the end.

I figured that this was a case where honest people made their best effort to produce something good, and wound up with a subpar result. In the end, though, I wasn’t so sure. The original blog post mentioned that the application was buggy, and I wanted to figure out how you could pay $96,000 for a simple, buggy application.

You can download the source to the application on the OSHA Web site. The Android version of the application contains 2134 lines of Java code, plus various layout elements. Unsurprisingly, there are no tests at all in the packaged source. There’s no documentation, either.

Just for fun, I decided to try to track down the bug mentioned in the Gun.io post — the application showed the current temperature in Boston as 140 degrees.

The application retrieves weather data from a NOAA Web service and then parses it using a SAX content handler that they wrote. Unfortunately, the code that constructs the URL for the Web service has been removed, probably because there’s no authentication. Lacking any example data, I looked at the code that processes the data instead.

The first thing that stood out to me was that the variable name of the SAX content handler is myExampleHandler. A quick Google search revealed that they just copied that part of the code from this blog post and didn’t bother to change the variable names or the comments. That’s a pretty clear indicator that the code was not written by a professional who cares about their work.

The content handler itself is really badly written. For example, the developer sets a bunch of boolean variables to keep track of which elements they’re processing, but then never actually uses them. Instead they use a completely other group of integer variables that they use as booleans. Why did they create two sets of variables for roughly the same purpose? I have no idea, but it could be because multiple people worked on it and didn’t even bother to try to figure out what was going on before they started adding code.

In the end, I wasn’t able to find the bug. My original guess was that the service returned the temperature in Fahrenheit and the developer converted it from Celsius to Fahrenheit mistakenly, but that’s not the case. Now I think it’s presenting the Heat Index and labeling it as the temperature, but I don’t understand how Android layouts work so I’m not sure that’s it either.

In any case, the application was probably not built by an experienced Java developer. They didn’t follow any normal conventions with regard to naming methods or variables. When you do not camel-case the names of your accessor methods, you’re clearly not used to reading other people’s Java code. Furthermore, there are plenty of other signs that the application was written without a lot of care. The code isn’t even properly indented, and it looks as though the developer is not very comfortable with data structures.

For example, temperature and humidity values are extracted from the XML returned by the Web service and stored as Vectors of strings in a data transfer object. When the developer goes to use them, they construct new arrays and copy all of the values in the Vectors into plain old arrays, converting them to numeric values at that time. Why not parse the numbers at extraction time? I have no idea. Why not store the corresponding humidity and temperature values in a data structure rather than just keeping them in two separate indexed data structures? I have no idea. But these sorts of rookie mistakes are a good indication that the developer who wrote this was in far over their head.

My guess is that OSHA hired Eastern Research Group to build this application because they were already on an approved contractor list and the business development person from ERG told them that they were capable of doing it. The application was then built in-house at ERG by a developer who had no clue what they were doing, probably under a tight deadline, or it was outsourced to some other firm that was fleecing ERG the same way they were fleecing OSHA. Clearly there was nobody on the OSHA side who was capable of doing even the rudimentary inspection of what was delivered that it took me 30 minutes or so to perform.

I have worked as a consultant before and I see this a lot. People who outsource software development simply lack the expertise to assess the applications that are built for them. They don’t know how much they should cost, what to look for in a vendor, or how to evaluate what’s delivered to them to make sure they got their money’s worth.

I went into this thinking that maybe everybody involved was honest and the bad result was due to flaws in the process, but now I think it’s pretty clear that ERG sold the OSHA a false bill of goods and wound up fleecing them pretty badly. I hope it’s not too late to get their money back.

The image embedded in this blog post is from the source for the application — it represents heat stroke. Maybe some of the money was spent on illustrations.

Steve Silberman profiles Susan Kare

Susan Kare is the artist who created the original icons for the Macintosh. She started at Apple by designing proportional fonts but graduated to icon design. The degree to which her work made the original Macintosh software easier for humans to relate to can’t be overstated. The blog post features work from her original sketchbook, in which she designed icons on graph paper by using the squares as pixels.

Big Data demands better shell skills

At work, I’ve been experimenting with Apache Solr to see whether it’s the best choice for searching a very large data set that we need to access. The first step was to just set it up and put a little bit of data into it in order make sure that it meets our current and anticipated future requirements. Once I’d figured that out, the next step was to start loading lots of data into Solr to see how well it performs, and to test out import performance as well.

Before I could do that, though, I generated about 33 million records to import, which take up about 10 gigabytes of disk space. That’s not even 5% of the space that the full data set will take up, but it’s a start.

What I’m quickly learning is that when it comes to dealing with Big Data, knowledge of the Unix shell is a huge advantage. To give an example, I’m currently using Solr’s CSV import feature to import the test data. If we wind up using it in production, writing our own DataInputHandler will certainly be the way to go, but I’m just trying to get things done right now.

Here’s the command the documentation suggests you use to load a CSV file into Solr:

curl http://localhost:8983/solr/update/csv --data-binary @books.csv 
    -H 'Content-type:text/plain; charset=utf-8'

I quickly found out that when you tell curl to post a 10 gigabyte file to a URL, it runs out of memory, at least on my laptop.

These are the kinds of problems for which Unix provides ready solutions. I used the split command to split my single 10 gigabyte file into 33 files, each a million lines long. split helpfully named them things like xaa, xab, etcetera, all the way through xbh. You can use command line arguments to tell split to use more meaningful names. Anyway, then I used a for loop to iterate over each of the files, using curl to submit them:

for file in x* ; do
    curl http://localhost:8983/solr/update/csv --data-binary @$file 
        -H 'Content-type:text/plain; charset=utf-8'
done

That would have worked brilliantly, except that Solr wants you to list the fields in the file on the first row of your CSV file, so only the first file imported successfully. I wound up opening all of the others in vim* and copying the headers over rather than writing a script, proving that I need to brush up on my shell skills as well, because prepending a line to a file is easy if not elegant.

Once the files were updated, I used the loop above to import the data.

When it comes to working with big data sets, there are many, many tasks like these. Just being able to use pipes to make sure that your very large data files are always compressed can be a life-saver. Understanding shell scripting is the difference between accomplishing a lot in a day through automation or doing lots of manual work that makes you hate your job.

* I should add that MacVim gets extra credit for opening 33 252 megabyte files at once without complaining. I just typed mvim x* and up popped a MacVim window with 33 buffers. Unix is heavy duty.

Facebook is on the Web but not of the Web

It’s becoming increasingly clear that while Facebook is a Web site, they don’t want to join the other Web sites in the pool we know as the Web. Anil Dash has the details and a way to encourage Facebook to change its behavior. First, he makes the case that Facebook is encouraging to drive its users away from the larger Web:

Facebook has moved from merely being a walled garden into openly attacking its users’ ability and willingness to navigate the rest of the web. The evidence that this is true even for sites which embrace Facebook technologies is overwhelming, and the net result is that Facebook is gaslighting users into believing that visiting the web is dangerous or threatening.

This is, to me, the latest front in the battle over for users on the Web. Ultimately, Facebook wants users to view ads on Facebook pages, not on your Web site. Furthermore, they want to be able to observe the behavior of their users wherever they go in order to serve up ads that users are more likely to click on. Publishers want access to Facebook’s user base. Currently, Facebook is forcing them to give up an awful lot in order to get it, but hopefully that can be changed.

It’s for these sorts of reasons that I sort of passively resist Facebook. I am still using Facebook only in Chrome’s Incognito mode so that they can’t track me across the Web, and I still refuse to use services that require you to sign in using a Facebook account. I just don’t want to cede more control to Facebook.

The privacy risks of using Google Analytics

Did you know that there’s a reverse lookup for Google Analytics IDs? I didn’t. Andy Baio has the details.

The only exercise advice you really need

The New York Times ran an article this week about how beginning runners are not well served by the massive amounts of advice being offered on running form and running shoes. What do the doctors say?

When it comes to running form, Dr. Bredeweg said, “we don’t know what is the right thing to do.” For example, he noted, forefoot strikers place less stress on their knees but more on their calves and Achilles tendons.

“We tell people we don’t know a thing about the best technique,” he said. He tells runners to use the form they naturally adopt.

The problem of excessive advice is pervasive in the world of fitness. Everyone is trying to sell an exercise routine that they claim is the best. Whether it’s weight training, Crossfit, yoga, pilates, or running, people are evangelists of what they do, and professionals are even worse.

For people who aren’t exercising regularly, the most important thing is to start doing something. It doesn’t even matter what it is. If you don’t like what you’re doing, try something else, but keep exercising. The idea that there’s one master program is completely false. If some exercise doesn’t feel good, find something else.

Eventually, once you’ve been exercising for awhile, you may set goals that your exercise routine isn’t helping you meet, and you’ll need to find a coach, do more research, or just up your intensity, but it’s not worth worrying about before you reach that point.

The truth is that Nike has always provided the best advice when it comes to working out — just do it. If you can consistently challenge yourself over a long period of time, almost everything else will take care of itself.

Older posts Newer posts

© 2025 rc3.org

Theme by Anders NorenUp ↑