Dale Dougherty at O’Reilly asked a number of Internet luminaries whether we’re winning the war on email spam, and if not, how we can win it. I wasn’t asked, probably because I’m not a luminary, but that’s OK, I’m going to give my opinion anyway.
To me, there are two struggles against spam. One is the Internet-wide struggle. I think it is more or less lost. Email as we know it today is permanently and irreversibly polluted and the only way to fix it is to start over. For the foreseeable future, we’re going to have to deal with the fact that most of the mail being exchanged is produced by spammers, and exists only to soak up bandwidth and hopefully be filtered out.
Given that the email ecosystem is so thoroughly polluted, how goes the individual struggle against spam? I think the news is better on that front. For me, the answer has been Gmail. I get a handful of spam at most in my Gmail inbox every day (no more than 10 messages), and right now nearly all of my email is forwarded to it. My personal email address at rc3.org gets several hundred spams a day, and all of the spam filters on my server and the filter in Thunderbird were only able to sift through perhaps half of it. The rest I had to delete by hand, every day. I’ve since forwarded that account to Gmail as well, and it’s doing a much better job.
I’m not sure what Google is doing beyond what I could accomplish with Amavisd and the countermeasures in my Postfix configuration, but whatever they’re doing is working. So in terms of wading through spam on a daily basis, I’m in pretty good shape right now.
Of course, there are two other costs of spam aside from deleting it manually. The first is the problem of legitimate mail being marked as spam and tossed out, and the other is my outgoing mail being thrown away before it’s read by the recipients. Currently, I’m just ignoring those issues. I don’t go through my spam folder and look for mail that’s not spam, and I send mail hoping that it will get to its recipient, but beyond that I don’t worry about it.
Once you’ve managed to get past the daily work of dealing with spam in your inbox, the next step is, I think, to manage your expectations. We used to be able to expect that if we sent someone an email, it would be delivered and read. That’s no longer the case, and we have to compensate in other ways.
March 7, 2007 at 2:59 am
my guess is that Google works so well because it can piggyback instantaneously off all its users’ human spam filtering capacities. You don’t “mark” spam with gmail; you “report” it and once ten, or a hundred people have done so, all the millions of other copies can be instantaneously be deleted. That has to be the quickest possible blacklist mechanism.
March 7, 2007 at 7:57 am
My datapoint (I guess I should put this on my own blog…): I get around 1100-1200 messages a day to the ~7 domains I own. I’ve got “catchall” addresses turned on for all of them. Of that mail stream, about 100-150 messages a day are “ham”; the rest is junk. SpamAssassin (which is the only thing I run, unless my hosting providers are doing something at the MTA layer before passing the mail to me) has a false negative rate of 1 or 2 messages a day, and a false positive rate that’s so low I don’t bother to look at the contents of the spam bucket before emptying it out.
March 7, 2007 at 9:52 am
Thanks for chiming in, Rafe (thanks also for Rafe’s Law 😉
One thing for Andrew: the technique of using an entire ISP’s population to mark and detect spam is not new; Razor has been doing it since 2001, and AOL since about 2002? not sure. It’s not as easy as you think, though due to (a) people who don’t agree on what spam is, (b) spammer evasion through hashbusting, and (c) the “race against time” factor. in our tests in SpamAssassin, Razor alone typically catches only about 30% of spam nowadays.
March 8, 2007 at 6:34 pm
Personally, I think the whole approach of content-oriented spam filtering is a pointless waste of time. All we need to do is focus on the sender’s point of origin, which Gmail is (IIRC) unique in NOT allowing you to do. What heuristics they’re using, I don’t know, but it’s trivial to find spam samples in my Gmail inbox that I’ve had rules for here for years. Not knocking Gmail overall, but the fact that they allow spammers to hide their origins is a big black mark for me.
Most of the worst stuff comes from hosts with generic rDNS. So, if you know how folks name their PTRs, you can block most of the junk you wouldn’t have otherwise been able to block by HELO/EHLO checks and other sanity measures. If you don’t know how this works, you can always use the Spamhaus PBL, which is a subset of the hosts I track.
We regularly block several thousand messages a day here, and I can name all the false positives we’ve had since the new year on one hand (two different addresses). And we’ve let in maybe a couple hundred spam messages in that time, mostly pump and dump crud and the occasional 419 scam, because I’d gotten slack about checking for open relays and proxies.
The thing to remember, though, is that the battle over spam isn’t about the inbox – if it’s there, or even in your spam folder, it’s too late and you lost. The battle, if we’re smart, needs to be taken aggressively to the perimeter defenses; saying you’re healthy because you’re only mildly sick all the time isn’t being healthy, it’s admitting your immune system sucks. Saying you’re healthy because you don’t get sick, now that’s something else.