In a discussion of Maureen Dowd’s plagiarism of Josh Marshall, he mentions that each sentence is a little snowflake:
An eminent technologist once explained to me that any specific ordering of a relatively brief sequence of words — I forget the exact number, but it was certainly no more than nine — is distinct enough that (unless it is some boilerplate phrase that gets repeated over and over in some type of document) it can be used as a unique fingerprint for the entire document. He demonstrated this for me with Google searches. (Try it yourself, using the “exact phrase” setting — search string in quote marks.) It’s a pretty nifty idea with all sorts of implications.
I did in fact try it, and it’s true. I took a few unremarkable pieces of sentences from my blog and searched them in Google, and sure enough, my blog posts were the only things matched. The unlikeliness of a sentence being unintentionally duplicated is vastly underrated, I think.
Some examples: “It’s impossible to escape the conclusion that the Pentagon” or “five promises that Obama made regarding gay rights issues” or “if you’re dealing with relational databases you really need to know SQL“.