SpamValve ========= WHAT IS IT? ----------- SpamValve is an automated spammer IP blocking system with configurable incident count threshold and inactivity expiration. HUH? ---- Okay, we all know what spam is, right? And you probably know that it can come in a variety of flavors: email spam, usenet spam, wiki spam, forum spam, blog comment spam, web referer spam, etc. If there's a way to get data into your system from an external source, spammers are going to try to inject their junk into it. If you are reading this, then you're probably interested in stopping that spam[1]. One problem that we run into when trying to stop spam, is that (generally speaking), each different service that you run on your server requires its own anti-spam add-ons. To fight email spam, you configure your MTA to employ various RBL lists and/or you install something like SpamAssassin or procmail filters. To fight blog comment spam, you employ blogware plugins like MT-Blacklist, SpamKarma, or RefererKarma. Each of these solutions has its own set of requirements, configuration data, and methods of detecting spam. SOME BACKGROUND --------------- Generally, there are two ways to decide if an incoming chunk of data is spam: by content or by source. It's often pretty simple to examine the content and see if it contains some sort of spam indicator. The spam indicator might be an obvious word or phrase like "viagra" or "hot XXX movies". Or it might be a link to a web site that contains the spammy material (www.buy-my-crap.com). Detecting by source means that you know (by some means or another) that the IP number connecting to your machine has sent spam in the past, and is likely to do so again. So, knowing this, you may wonder which method is better? Well, neither one is "better", really. Being able to block on content when the spammer is sending you "www.buy-my-crap.com" spam is all fine and dandy until he changes to "www.buy-my-other-crap.com". Then you add that to your filters. Then he starts sending you "www.buy-some-different-crap.com" links. Round and round and round it goes. Okay, so maybe blocking by IP is the answer, then? Well, not really. You know all those viruses, trojans, and spyware programs you keep hearing about? A lot of those come from spammers. And what they do is to infect people's computers, giving spammers a method of controlling them remotely. These armies of "zombie" computers form meshes of hundreds, or even thousands of hosts from which the spammer can indirectly spew his garbage. So when you block one host, the next time it will just come from another one. Round and round. SO, THERE'S NO WAY TO STOP SPAM? -------------------------------- I didn't exactly say that, but it's probably true. It's certainly not an *easy* problem, or it wouldn't have become a problem at all, would it? We may never be able to completely stop spam. But we can do a lot to reduce the flow. One of the best ways is to use both content and source methods to detect the spam. Detection via content can be quite effective, when done right. Complex pattern matching and Bayesian statistical methods can often catch 99% of the incoming junk. The problem is that your computer first has to *read* every piece of data, then it has to *analyze* the data to determine if it's junk. On a busy machine, or when a particularly rabit spammer attacks, all that analysis can crush your server's performance. That's where source matching comes in handy. If you already know that the host which is sending you data is a proven source of spam, you can just ignore anything it sends you. This saves you a lot of analysis work. But often, you are still using a lot of resources just to get to the point where you can determine that the connection is from a spammy host. They've already opened up a network socket to your email or web service. Maybe your web server has to launch a CGI process which does the work of checking the host's IP number against the list of spammers. So, even though you're blocking the spam at an earlier stage, your server might still have to do a lot of work (albeit, less than previously) before it tosses it in the trash. BLOCKING EARLY: FIREWALLS ------------------------- But, what if you could block the spam even earlier in this process? Before the spam host even connects to your server's email or web port? That would save your server a lot of trouble, wouldn't it? That's what firewalls do. Many servers have the ability to also act as firewalls. This is typically via systems such as ipfw, ipchains, ipfilter, netfilter, or iptables. These programs hook into the server's networking functions at a very low level. Blocking packets from a particular host at this level is very efficient. If we've determined that a particular host is a spam source, we could just toss its IP number into our firewall rules, and never see packets from that host again. Beautiful, right? Before you know it, we'll have blocked a huge number of spam hosts, and the spammers will get less and less spam through. It will slow to a trickle! And with no harmful side-effects, right? Welllllll... THE PROS AND CONS OF IP BLOCKING -------------------------------- As we've determined, blocking spammers by IP number is very efficient, and saves our servers from doing a lot of unnecessary computations. The problem is that we're dealing with a moving target. Host 'A' might be a spam source this week, but next week, the owner of that machine might detect the problem and clean it up. But since everybody has blocked his IP number, he now finds that he has problem communicating with lots of other sites. Worse yet, host 'B' might be an infected user who gets her IP number dynamically from her ISP. You block her IP when the trojan spamware on her computer blasts you. But an hour later, she goes offline and host 'C' picks up her now-unused IP number. Host 'C' doesn't have any spamware, but because it's using a blocked IP number, it can't reach everything it should be able to. The ephemeral nature of IP numbers (in many cases) means that we shouldn't block them forever. If we do, we're almost guaranteed to block some innocent bystander eventually. So, we should somehow timestamp our firewall rules, and expire them after some "reasonable" amount of time. But you won't find many firewalls that provide a way to track timestamps. That means we'll have to do it ourselves. ENTER SPAMVALVE --------------- So, we finally arrive at a real explanation of what SpamValve is. It's pretty simple, really: SpamValve dynamically creates and deletes firewall rules based on entries in an external database. The core of the system is the 'svmanage' program, which runs periodically (typically via cron). It reads the database, the existing firewall rules, and updates everything appropriately. The current version is written in Perl, and it requires a few moderately common modules: DBI, Date::Manip, and Config::IniFiles. We've got the piece that reads the database and manipulates the firewall rules, so all we need is some way to get information *into* the database. The SpamValve distribution also comes with a utility called 'svupdate', which is one way to do so. This one is also a Perl script. Any service which can launch an external script and pass it a parameter should be able to use svupdate to add entries to the database. These two programs are really quite simple, and it wouldn't be hard for a programmer to translate them into their favorite language (Python, Ruby, PHP, etc). All that's required is that you be able to deal with the database and run the system's external firewall program (initially, it's hardcoded to work with 'ipfw'). It's not required that you use the svupdate program, though. If your application has native database access, you can directly modify the SpamValve database table. The SpamValve distribution includes a WordPress plugin as a proof-of-concept. THE GOOD, THE BAD, AND THE UGLY ------------------------------- The Good: All I can really tell you is that SpamValve is working pretty well for me. I'm using it in my weblog. When I detect comment or referer spam, I add the host IP to the SpamValve table. After I receive more than a few spam attempts from a given host, it gets blocked for at least 48 hours. The svmanage utility monitors the packet counters in the firewall, so further contact from a spammy host merely prolongs their lifetime in the rules. The Bad: Right now, it's hardcoded to work with the 'ipfw' firewall system, which is mainly available on BSD-based servers. Modifying it to work with other firewall systems shouldn't be terribly difficult, though. Also, the 'svmanage' utility requires root access in order to manipulate the firewall rules. The Ugly: The code is not beautiful. I hacked it until it worked for me, then I did some minimal cleanup. It would probably be better if I made it more object-oriented. It needs to be refactored in such a way that it will be easy to swap out different firewall systems. It currently assumes a MySQL database, but changing to something else should be as simple as modifying the DBI->connect() call. There's currently an IP whitelist which is hardcoded into the svupdate program. This should be moved into the configuration file. IN CLOSING ---------- Again, I reiterate: it works for me. It might not work for you. It might block innocent bystanders. It might cause your server to forget how to talk to the Internets. It might tell you that it will respect you in the morning, but never call back. It might cause unsightly blemishes. Some settling may occur during shipping. Do not use SpamValve if you are allergic to nuts or shellfish. Caveat emptor (which is Latin for "The Emperor lives in a cave"). SpamValve will not completely eliminate spam. As currently implemented, and as part of a larger anti-spam strategy, it *can* slow spammers down. If enough people slow down enough spam, perhaps it will impact the cash flow enough to cause a few of them to give up. Yeah, it's a pipe dream, and I don't even smoke a pipe. But let a guy dream, will ya? FOOTNOTES --------- [1] Unless you *are* a spammer, and you're reading this in order to try to figure out a way to bypass the anti-spam system. In which case, let me tell you here and now that you are a scumbag. No, no, don't give me that "it's just business" crap. I don't want to hear your half-assed justifications for what you do. It's not just business. It's just annoying. It's just crap. In most cases, it's just downright criminal. Get a real job, before the cops find you, loser. -- SpamValve Copyright 2005, Dougal Campbell Released under the GNU Public License