There’s been more talk about comment spammers around the net lately. And there’s especially been some traffic around Jay Allen’s MT-Blacklist plugin for MovableType. We’re going to be doing some similar things for WordPress.
I’ve commented in a few discussions about an idea that I have that would be more of a deterrent (hopefully) than a blocking mechanism. It would work by encoding URLs in submitted comments, randomly replacing characters with their numeric entity equivalents. This change is invisible to browsers, but the idea is that Google would index the URL differently every time it was posted in a blog’s comments. Which would bust the ranking for that link. The downside is that it would also bust Google ranking for non-spammers. I plan to run some experiments later to verify whether Google automaticly decodes the entities before indexing links.
If this technique does bust the ranking, we might still be able to use it without punishing regular users too much. We could use a scoring technique to attempt to decide whether a comment is spam or not. Ones that we can decide are definitely spam get blocked completely. Ones that definitely are not spam go through unchanged. Comments that score in a gray area would be encoded, but the admin would be able to retroactively whitelist them.