Today the New York Times has another Edward Snowden story, this one by David Sanger and Eric Schmitt. It discusses the means he used to harvest millions of documents from the NSA’s internal network, and runs under the headline Snowden Used Low-Cost Tool to Best NSA. Good security reporting focuses on the conflicting goals come into play when designing secure systems, rather than retreating into counterfactual thinking.
Here’s the sort of counterfactual thinking I’m talking about:
Mr. Snowden’s “insider attack,” by contrast, was hardly sophisticated and should have been easily detected, investigators found.
Agency officials insist that if Mr. Snowden had been working from N.S.A. headquarters at Fort Meade, Md., which was equipped with monitors designed to detect when a huge volume of data was being accessed and downloaded, he almost certainly would have been caught.
Officials say web crawlers are almost never used on the N.S.A.’s internal systems, making it all the more inexplicable that the one used by Mr. Snowden did not set off alarms as it copied intelligence and military documents stored in the N.S.A.’s systems and linked through the agency’s internal equivalent of Wikipedia.
When telling a story about security, or any system, there are three aspects that are involved – the intended functionality, security (and safety in general), and cost. Here’s the only sentence from the story that even hints at these tradeoffs:
But he was also aided by a culture within the N.S.A., officials say, that “compartmented” relatively little information.
The NSA built a system for sharing information internally out of off the shelf Web technology (which almost certainly lowered costs substantially), and provided broad access to it for the same reason that any organization tries to improve communcation through transparency. They wound up with a system that was no doubt difficult to secure from people like Edward Snowden.
While the crawler Snowden used might possibly have been easy to detect, writing a crawler that is difficult to detect is not particularly challenging. A Web crawler is pretty straightforward. It downloads a Web page, extracts all of the links, and then follows the links and repeats the process. It recursively finds every Web page reachable from the page where it starts. On the real Internet, crawlers identify themselves and follow the robot exclusion standard, a voluntary code of conduct for people who write programs to crawl the Web. There’s no reason it has to be that way, though. Browsers (or crawlers) identify themselves with a user agent, and when you request a Web page, you can use any user agent you want.
The point is that there’s nothing any specific request from a crawler that would make it easy to detect. Secondarily, there’s the nature of the traffic. The recursive nature of the requests from the crawler might also be suspicious, but detecting that sort of thing is a lot more difficult, and those patterns could be obfuscated as well. If you have months to work, there are a lot of options for disguising your Web crawling activity.
Finally, there’s the sheer volume of data Snowden downloaded. Snowden literally requested millions of URLs from within the NSA. Again, there are ways to hide this as well, especially if you can run the crawler from multiple systems, but if you’re going to download over a million pages, it’s difficult to disguise the fact that you have done so. Detecting such activity would still require some system to monitor traffic volumes.
Somehow Snowden also managed to take the information his crawler gathered out of the NSA. That seems like another interesting breakdown in the NSA’s security protocols.
The article has plenty of discussion of why Snowden should have been detected, but very little about why he wasn’t, and even less about how the desire to secure the system is at odds with the other goals for it. The thing is, the journalists involved didn’t need to rely on the NSA to give them any of this information. Anyone familiar with these sorts of systems could have walked them through the issues.
Any article about security (or safety) should focus on the conflicts that make building a secure system challenging. The only way to learn from these kinds of incidents is by understanding those conflicts. One thing I do agree with in the story is that Snowden’s approach wasn’t novel or innovative. That’s why the story of the tradeoffs inherent in the system is the only interesting story to tell.