SpamValve
                                    =========


WHAT IS IT?
-----------

SpamValve is an automated spammer IP blocking system with configurable
incident count threshold and inactivity expiration.


HUH?
----

Okay, we all know what spam is, right? And you probably know that it can
come in a variety of flavors: email spam, usenet spam, wiki spam, forum
spam, blog comment spam, web referer spam, etc. If there's a way to get data
into your system from an external source, spammers are going to try to
inject their junk into it.

If you are reading this, then you're probably interested in stopping that
spam[1]. One problem that we run into when trying to stop spam, is that
(generally speaking), each different service that you run on your server
requires its own anti-spam add-ons. To fight email spam, you configure your
MTA to employ various RBL lists and/or you install something like
SpamAssassin or procmail filters. To fight blog comment spam, you employ
blogware plugins like MT-Blacklist, SpamKarma, or RefererKarma. Each of
these solutions has its own set of requirements, configuration data, and
methods of detecting spam.


SOME BACKGROUND
---------------

Generally, there are two ways to decide if an incoming chunk of data is
spam: by content or by source. It's often pretty simple to examine the
content and see if it contains some sort of spam indicator. The spam
indicator might be an obvious word or phrase like "viagra" or "hot XXX
movies". Or it might be a link to a web site that contains the spammy
material (www.buy-my-crap.com). Detecting by source means that you know (by
some means or another) that the IP number connecting to your machine has
sent spam in the past, and is likely to do so again. 

So, knowing this, you may wonder which method is better? Well, neither one
is "better", really. Being able to block on content when the spammer is
sending you "www.buy-my-crap.com" spam is all fine and dandy until he
changes to "www.buy-my-other-crap.com". Then you add that to your filters.
Then he starts sending you "www.buy-some-different-crap.com" links. Round
and round and round it goes. Okay, so maybe blocking by IP is the answer,
then? Well, not really. You know all those viruses, trojans, and spyware
programs you keep hearing about? A lot of those come from spammers. And what
they do is to infect people's computers, giving spammers a method of
controlling them remotely. These armies of "zombie" computers form meshes of
hundreds, or even thousands of hosts from which the spammer can indirectly
spew his garbage. So when you block one host, the next time it will just
come from another one. Round and round.


SO, THERE'S NO WAY TO STOP SPAM?
--------------------------------

I didn't exactly say that, but it's probably true. It's certainly not an
*easy* problem, or it wouldn't have become a problem at all, would it? We
may never be able to completely stop spam. But we can do a lot to reduce the
flow. One of the best ways is to use both content and source methods to
detect the spam. Detection via content can be quite effective, when done
right. 

Complex pattern matching and Bayesian statistical methods can often catch
99% of the incoming junk. The problem is that your computer first has to
*read* every piece of data, then it has to *analyze* the data to determine
if it's junk. On a busy machine, or when a particularly rabit spammer
attacks, all that analysis can crush your server's performance.

That's where source matching comes in handy. If you already know that the
host which is sending you data is a proven source of spam, you can just
ignore anything it sends you. This saves you a lot of analysis work. But
often, you are still using a lot of resources just to get to the point where
you can determine that the connection is from a spammy host. They've already
opened up a network socket to your email or web service. Maybe your web
server has to launch a CGI process which does the work of checking the
host's IP number against the list of spammers.

So, even though you're blocking the spam at an earlier stage, your server
might still have to do a lot of work (albeit, less than previously) before
it tosses it in the trash. 


BLOCKING EARLY: FIREWALLS
-------------------------

But, what if you could block the spam even earlier in this process? Before
the spam host even connects to your server's email or web port? That would
save your server a lot of trouble, wouldn't it? That's what firewalls do. 

Many servers have the ability to also act as firewalls. This is typically
via systems such as ipfw, ipchains, ipfilter, netfilter, or iptables. These
programs hook into the server's networking functions at a very low level.
Blocking packets from a particular host at this level is very efficient.

If we've determined that a particular host is a spam source, we could just
toss its IP number into our firewall rules, and never see packets from that
host again. Beautiful, right? Before you know it, we'll have blocked a huge
number of spam hosts, and the spammers will get less and less spam through.
It will slow to a trickle! And with no harmful side-effects, right?

Welllllll...


THE PROS AND CONS OF IP BLOCKING
--------------------------------

As we've determined, blocking spammers by IP number is very efficient, and
saves our servers from doing a lot of unnecessary computations. The problem
is that we're dealing with a moving target. Host 'A' might be a spam source
this week, but next week, the owner of that machine might detect the problem
and clean it up. But since everybody has blocked his IP number, he now finds
that he has problem communicating with lots of other sites. Worse yet, host
'B' might be an infected user who gets her IP number dynamically from her
ISP. You block her IP when the trojan spamware on her computer blasts you.
But an hour later, she goes offline and host 'C' picks up her now-unused IP
number. Host 'C' doesn't have any spamware, but because it's using a blocked
IP number, it can't reach everything it should be able to.

The ephemeral nature of IP numbers (in many cases) means that we shouldn't
block them forever. If we do, we're almost guaranteed to block some innocent
bystander eventually. So, we should somehow timestamp our firewall rules,
and expire them after some "reasonable" amount of time. But you won't find
many firewalls that provide a way to track timestamps. That means we'll have
to do it ourselves.


ENTER SPAMVALVE
---------------

So, we finally arrive at a real explanation of what SpamValve is. It's
pretty simple, really: SpamValve dynamically creates and deletes firewall
rules based on entries in an external database. The core of the system is
the 'svmanage' program, which runs periodically (typically via cron). It
reads the database, the existing firewall rules, and updates everything
appropriately. The current version is written in Perl, and it requires a few
moderately common modules: DBI, Date::Manip, and Config::IniFiles.

We've got the piece that reads the database and manipulates the firewall
rules, so all we need is some way to get information *into* the database.
The SpamValve distribution also comes with a utility called 'svupdate',
which is one way to do so. This one is also a Perl script. Any service
which can launch an external script and pass it a parameter should be able
to use svupdate to add entries to the database.

These two programs are really quite simple, and it wouldn't be hard for a
programmer to translate them into their favorite language (Python, Ruby,
PHP, etc). All that's required is that you be able to deal with the
database and run the system's external firewall program (initially, it's
hardcoded to work with 'ipfw').

It's not required that you use the svupdate program, though. If your
application has native database access, you can directly modify the
SpamValve database table. The SpamValve distribution includes a WordPress
plugin as a proof-of-concept.


THE GOOD, THE BAD, AND THE UGLY
-------------------------------

The Good: All I can really tell you is that SpamValve is working pretty well for me.
I'm using it in my weblog. When I detect comment or referer spam, I add the
host IP to the SpamValve table. After I receive more than a few spam
attempts from a given host, it gets blocked for at least 48 hours. The
svmanage utility monitors the packet counters in the firewall, so further
contact from a spammy host merely prolongs their lifetime in the rules.

The Bad: Right now, it's hardcoded to work with the 'ipfw' firewall system,
which is mainly available on BSD-based servers. Modifying it to work with
other firewall systems shouldn't be terribly difficult, though. Also, the
'svmanage' utility requires root access in order to manipulate the firewall
rules.

The Ugly: The code is not beautiful. I hacked it until it worked for me,
then I did some minimal cleanup. It would probably be better if I made it
more object-oriented. It needs to be refactored in such a way that it will
be easy to swap out different firewall systems. It currently assumes a MySQL
database, but changing to something else should be as simple as modifying
the DBI->connect() call. There's currently an IP whitelist which is
hardcoded into the svupdate program. This should be moved into the
configuration file.


IN CLOSING
----------

Again, I reiterate: it works for me. It might not work for you. It might
block innocent bystanders. It might cause your server to forget how to talk
to the Internets. It might tell you that it will respect you in the morning,
but never call back. It might cause unsightly blemishes. Some settling may
occur during shipping. Do not use SpamValve if you are allergic to nuts or
shellfish. Caveat emptor (which is Latin for "The Emperor lives in a cave").

SpamValve will not completely eliminate spam. As currently implemented, and
as part of a larger anti-spam strategy, it *can* slow spammers down. If
enough people slow down enough spam, perhaps it will impact the cash flow
enough to cause a few of them to give up. Yeah, it's a pipe dream, and I
don't even smoke a pipe. But let a guy dream, will ya?


FOOTNOTES
---------

[1] Unless you *are* a spammer, and you're reading this in order to try to
figure out a way to bypass the anti-spam system. In which case, let me tell
you here and now that you are a scumbag. No, no, don't give me that "it's
just business" crap. I don't want to hear your half-assed justifications for
what you do. It's not just business. It's just annoying. It's just crap. In
most cases, it's just downright criminal. Get a real job, before the cops
find you, loser.

-- 
SpamValve Copyright 2005, Dougal Campbell <dougal@gunters.org>
Released under the GNU Public License