SpamAssassin
From Qmailtoaster
Note: this page is obsolete. It was formerly SpamAssassin, is now spamassasin. If you're viewing this page, please change the link which brought you here accordingly -ES
SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email system.
This page is designed to give you an overview of how QmailToaster goes about configuring SpamAssassin.
Configuration and Rules
The SpamAssassin-Toaster uses the following configuration files:
- /etc/mail/spamassassin/local.cf
- /etc/mail/spamassassin/v310.pre
- /etc/mail/spamassassin/v312.pre
- /usr/share/spamassassin/*.cf
The local.cf file contains basic settings, like the score you must reach before a message is considered spam, what the subject line should be changed to if the score is reached (ie add ***SPAM*** to the subject) and whether Bayes Scoring should be used. The settings in here will apply to all users on your system.
The two .pre files tell SpamAssassin what plugins to load for applying different tests. these are in the format
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
You can find a list of available plugins on CPAN. Installing a plugin using CPAN goes like this:
# cpan # install Mail::SpamAssassin::Plugin::URIDNSBL # quit
Here's how to find out what perl modules you have. If you are using the latest version of SpamAssassin-toaster then everything you need should already be installed.
The /usr/share/spamassassin/*.cf files are custom rule sets designed for catching spam using your installed modules. How each of them will add (or subtract) points from the mail's spam score is set by 50_scores.cf. If you are, for instance, a pharmaceutical retailer you probably want to lower the scores for the various drugs cf files.
Some of the files will only be used if the appropriate module is loaded, for instance 25_uribl.cf will only run if you have added
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
to one of your .pre files.
You can find lots and lots of alternative rule sets at Rules Emporiumand you might want to join a SpamAssassin mailing list to keep your self up to date on the fight against spam while you are at it.
If you add rules (by creating a new .cf file in /usr/share/spamassassin) or add a module to a .pre file so new rules will be applied or basically make any changes to the SpamAssassin configuration files you must check that all the syntax is OK:
# spamassassin -D --lint
If you see any errors, correct them before you restart the spamd service! The most likely thing you will see are missing perl modules. Add them using CPAN as you see above.
After you make any changes you need to restart the SpamAssassin service. You can do this using Jake's spamd script or by doing:
# qmailctl stop # qmailctl start
Bayesian Statistical Scoring
SpamAssassin can score messages based on the words in a message because certain words are more probable to turn up in spam and others are more probable to show up in ham.
In order for this to be effective you need to train Spam Assassin. You will need a collection of spam messages and a collection of ham messages. You can do this by setting up a couple of email accounts on your server called spam@yourqmailtoaster.com and notspam@yourqmailtoaster.com. Forward all your spam mail to one and non-spam mail to the other, alright you might not want to forward all of your real mail to it but the more ham Spam Assassin has for comparison, the better. You should encourage your users to forward spam to the spam address and any false positives to the not-spam address. You might want to implement Squirrelmail Spam Buttons to make this easier.
Now create a script that looks like this:
#!/bin/bash # Spam Assassin Bayes Training # Learn spam! cd /home/vpopmail/domains/yourqmailtoaster.com/spam/Maildir/cur /usr/bin/sa-learn --spam ./* rm -rf /home/vpopmail/domains/yourqmailtoaster.com/spam/Maildir/cur/* cd /home/vpopmail/domains/yourqmailtoaster.com/spam/Maildir/new /usr/bin/sa-learn --spam ./* rm -rf /home/vpopmail/domains/yourqmailtoaster.com/spam/Maildir/new/* # Learn ham! cd /home/vpopmail/domains/yourqmailtoaster.com/notspam/Maildir/cur /usr/bin/sa-learn --ham ./* rm -rf /home/vpopmail/domains/yourqmailtoaster.com/notspam/Maildir/cur/* cd /home/vpopmail/domains/yourqmailtoaster.com/notspam/Maildir/new /usr/bin/sa-learn --ham ./* rm -rf /home/vpopmail/domains/yourqmailtoaster.com/notspam/Maildir/new/* # Update the Bayes DB /usr/bin/sa-learn --sync
Test it and save as /usr/local/bin/learnSpam.sh, for example, and use cron to run the script daily as this:
0 2 * * * sudo -u vpopmail -H /usr/local/bin/learnSpam.sh >/dev/null
It's important to run it with sudo to user vpopmail and -H options for affect bayes database (/home/vpopmail/.spamassassin/*) of user vpopmail user, rather root's (/root/.spamassassin)
Further Info
- SimScan is used by QMailToaster to run incoming mail through ClamAV and SpamAssassin. It is configured by the settings in /var/qmail/control/simcontrol. See Simscan for more details.
- The SpamAssassin daemon is started by the /var/qmail/supervise/spamd/run script. man spamd for other options you can set in here.
- SpamAssassin can be set up to check the body of messages against Spam URI Realtime Blocklists. See SURBL for more details.
- You can also check incoming mail against Realtime Black Lists before the mail even reaches SpamAssassin. See RBLs for more details.