Strong opinions, weakly held

Minute fingerprints

Bruce Schneier has a fascinating essay on how easy it is to discern someone’s identity from anonymous data. Researchers at the University of Texas discovered that by comparing moving ratings in the data set for the Netflix challenge to movie ratings from IMDB users, you can figure out who rated the movies in the anonymous Netflix data. And as it turns out, you don’t need all that many ratings to do it:

With only eight movie ratings (of which two may be completely wrong), and dates that may be up to two weeks in error, they can uniquely identify 99 percent of the records in the dataset. After that, all they need is a little bit of identifiable data: from the IMDb, from your blog, from anywhere. The moral is that it takes only a small named database for someone to pry the anonymity off a much larger anonymous database.

What interests me about this is how little data uniquely identifies a person. He provides a number of other examples in this vein as well. I imagine you could do the same thing with records of a person’s doctor visits or even dental visits, and I expect that you could pretty easily identify me among all Amazon.com customers based only on the purchases I made in 2007. We really do live in the age of data mining.

1 Comment

  1. This is indeed amazing, though perhaps less and less unexpected as time goes on. One thing that I liken this to is the cliched concept of “the singularity” that futurists like to bring up. What we have here, in a social sense, is relatively indistinguishable from the common definition people use to describe singularity; basically, when technological change rates exceed a society’s capability for fully understanding them. We don’t even know the right questions to ask about the social impact of this sort of thing yet, and the rate of increase of the capability of software systems to mine and assemble personal profiles from the various breadcrumbs left across the internet is relatively rapidly increasing.

Leave a Reply

Your email address will not be published.


© 2019 rc3.org

Theme by Anders NorenUp ↑