There’s been more talk about comment spammers around the net lately. And there’s especially been some traffic around Jay Allen’s MT-Blacklist plugin for MovableType. We’re going to be doing some similar things for WordPress.
I’ve commented in a few discussions about an idea that I have that would be more of a deterrent (hopefully) than a blocking mechanism. It would work by encoding URLs in submitted comments, randomly replacing characters with their numeric entity equivalents. This change is invisible to browsers, but the idea is that Google would index the URL differently every time it was posted in a blog’s comments. Which would bust the ranking for that link. The downside is that it would also bust Google ranking for non-spammers. I plan to run some experiments later to verify whether Google automaticly decodes the entities before indexing links.
If this technique does bust the ranking, we might still be able to use it without punishing regular users too much. We could use a scoring technique to attempt to decide whether a comment is spam or not. Ones that we can decide are definitely spam get blocked completely. Ones that definitely are not spam go through unchanged. Comments that score in a gray area would be encoded, but the admin would be able to retroactively whitelist them.
Thoughts?
Comment spam notesRelated posts:
- Spam: Kill it at the root
" I posted an article over on the WordPress development blog earlier on the subject of blog comment spam. Why do spammers post comments on..." - Fighting Blog Comment Spam
" Recently, several blogs, including the WordPress Development Blog, have been hit by spammers. The spammers post comments containing links to porn sites across a..." - Comment Spam
" Interesting discussion about comment spam over at Harry Fuecks’ Dynamically Typed. I’ve also been watching some discussions about a possible SpamAssassin spinoff specifically for..." - Follow you, follow me
" Two years ago today, we released WordPress version 1.5. This was a pretty major release that introduced several new features that are still major..." - Some blog spam cases you might want to watch for
" I like to think that I’ve got some pretty decent spam prevention measure in place on my server. My mail server uses RBL/DNSBL services..."















3 Comments
I dont know if Mike Renzmann(otaku42) has been in touch with you, but I have suggested a comment spam detection idea to him using neuro-fuzzy networks. The code is already implemented in C++ and needs to be ported. I would be very interested in knowing what you guys think about that.
sizegenetics review
The site looks great ! Thanks for all your help ( past, present and future !)