Poisoning the well

Overall, the volume of spam attempts on my server have been down lately. Oh, I still get a steady stream, I delete over 100 comment spams (caught by my filters) each day. But I’ve seen fewer of the massive, server-squashing spam runs that hammer my web service with too many simultaneous connections, blocking out legitimate users.

On the other hand, I’m seeing a lot more attempts by spammers to poison the well. What I mean by that is that they are submitting bogus comments, full of non-spammy (but more-or-less random) content, and links to legitimate web sites. For example:

Name: Adam Baumann

Hi. Just letting you know that I enjoyed your site. when Soldier Double Game Lose: , to Bet Opponents you should be very Faithful Big Gnome becomes Industrious Plane in final , Superb Opponents becomes Superb Soldier in final Faithful is feature of White Circle

The comment is obviously gibberish, right? And the links are all to perfectly normal — in fact, popular — sites. You might wonder why a spammer would bother posting it. The idea is to poison the well of any sites which use Bayesian techniques to classify content as spam or not. By tricking sites into classifying “good” content as “spam”, they (theoretically) can reduce the effectiveness of the spam filters.

With enough poisoning, your spam filter may start getting false-positives, which are legitimate messages that have incorrectly been tagged as spam. And if you get enough false-positives, you’ll lose faith in your spam filter and disable it. At least, that’s what the spammers are trying to accomplish.

Will their plan work? I guess that depends on your particular spam filters. I’m betting that systems like Akismet, which collect data from a wide variety of sources, will probably be able to defend against Bayes poisoning. How? Well, there’s this thing called an IP address. Even though the spammers submit their garbage via an army of anonymous proxy servers and zombie machines, they still only have access to a finite number of hosts, a limited number of IP addresses. It won’t take long for those IPs to be statistically classified as sources of spam. An IP like will be flagged as a spam indicator far sooner than the words “Industrious” and “Soldier”.

So once again I say, thank you, spammers. We’re learning more about you every day.

