I installed Akismet back in November 2005 as the comment spam I was getting was becoming unmanageable (flattering at first when it was only one or two a month, but it quickly became annoying).
Since then, it has stopped 587 spam comments from ever appearing in the comments. Not a single one incorrectly identified. 587 might not seem like a lot for a 6 month period (the popular blogs probably get that in a day or, I’d wager, an hour), but what has been remarkable, or perhaps more frightening, and the reason for this post, is that just over half of those 587 spams were accumulated between yesterday and today.
Yesterday I got 130. What tipped me off was that one made it into the moderation queue, which results in me being sent an email notifying me so. Today, I’ve received 176, and another made it into the queue, both of which I’ve since marked as spam. They majority of them seem to be about appetite suppressants.
I’m going away for the long weekend, so we’ll see what the next three days hold in store for my little blog.
In related news, I recently switched spam-filtering solutions on my personal mailserver. During the initial years of running my mailserver, I never ran any server-side filtering. I left it all up to my client’s filter (in this case, Apple Mail), which worked for a while, but it seemed you’d have to keep refreshing what it had learnt every 6 months or a year. Then I stumbled on to SpamAssassin, and it worked fairly well, although it required a lot of upkeep, diligence, configuring, and tweaking. Based on some advice of a past coworker, I tried switching over to DSPAM. Using bayesian filtering techniques, it seems to be a holy grail of operability (near-zero configuration, near-100% accuracy). In my case, it would wildly fluctuate between 85% and 95% accuracy over the year or so that I used it. I’m not blaming DSPAM at all for that low score, as I know it was totally my fault. I was never able to manage properly installing and configuring DSPAM to be able to unlearn a message it thought was spam.
That’s my problem, you see. I’m not a mailserver guru. I know enough to get a basic system up and running. And no, I don’t know enough to be dangerous, because what I do know, I know enough about to know that I don’t know enough about what I know, you know?
Four weeks ago I switched a third time to Bogofilter, and so far it has been working like a dream. It’s another bayesian solution. I’m still training it, but for about the last week, I think I’ve only had a couple spam messages (which weren’t technically spam so much as they were “wrong numbers”). DSPAM was set up to place a unique key in each email received, when a message was incorrectly identified as “ham” (a false negative), you would forward it to yourself with a specially crafted email address that would tell DSPAM the second time it looked at it to correct its bayesian network (and vice versa with legit email it identified as spam, aka a false positive, which are far worse to get). That’s the part I couldn’t get working correctly.
In Bogofilter, I instead take a much simpler route. I create three new mail folders: one to store emails Bogofilter has identified as spam, one to correct false negatives, and one to correct false positives (the last two are scanned by a couple of cron jobs), so all the management is done by moving messages around the folders (after those initial cron jobs were set up, no more command line tinkering was necessary). During the four weeks of use, it has caught 702 spam (some of which had to be hand trained at the start) with only one being a false positive (which was caught very early on and hasn’t happened since). I’d like to report a nice fancy percentage number showing it’s accuracy, but I don’t comprehend the statistical reporting features, yet. I’ll take a guess and say it’s way higher than DSPAM ever was for me. And if it isn’t, it will be shortly.
No Comments
Comments RSS feed.
Leave a comment