JDunphy-SA-RuleWriting

Writing Rules to Improve Spamassassin Effectiveness

   KB 23864        Last updated on 2020-02-11  




0.00
(0 votes)


Introduction

Zimbra provides a great start but with a few additional concepts, it is possible to improve the security and spam detection effectiveness for email delivery to your users. The ideal solution will be one that is tailored to your individual environment. Writing your own SpamAssassin (SA) rules is easy once you know how and that is the goal of this article by providing a few key conceptual ideas for someone not well versed in SA. It is important to note that spam classification is based on score. If the score is equal to your spam score (5 by default), then the email will be placed in the user's junk folder. If the score is over a threshold (15 by default), the email message is not delivered. Zimbra user filters operate on email that isn't classified as junk unless you run the filter manually and chose the junk folder. The net result is that you create custom rules that deliver emails to junk, inbox, or dropped for your users.

In Zimbra, amavisd-new (which is a perl program/daemon) calls the SpamAssassin modules in addition to clamav for virus/malware signature detection. You can debug your rules directly by calling SpamAssassin in debug mode with any email. One of the great features that comes with Zimbra is convenience for your users to train email as spam or misidentified email as not spam. Those emails are available for study provided you pull them before Zimbra trains them each day and removes them. For sake of this article, we will focus on that but you could just as easily 'view original' on any email and save that email via copy/paste into a file for further investigation.

A few higher references to begin with

Locations of key SA files and directories

  • /opt/zimbra/data/spamassassin/state/3.004001/updates_spamassassin_org - rules and score
  • /opt/zimbra/common/lib/perl5/Mail/SpamAssassin - SA code written in perl
  • /opt/zimbra/data/spamassassin/localrules/sauser.cf - local changes to override or extend SA rules
  • /opt/zimbra/conf/amavisd.conf - configurations

Key Concepts

  • Each email contains the SA rules that fired in the X-Spam-Status header
  • Any local SA changes should be placed in your /opt/zimbra/data/spamassassin/localrules/sauser.cf
  • If you use an external MX, you need to tell SA where the last trusted IP ("a server that will not have forged its received line") or your rules will be off. Depending on your architecture, this could be trusted_networks or internal_networks that you add to your sauser.cf file. Generally, if you define internal_networks, SA will attempt to infer trusted_networks and vice a versa if you define trusted_networks. In Zimbra, trusted_networks is built initially from mynetworks if you don't define it any further. SPF however is checked on internal network boundary. Another way of thinking of it is that trusted_networks is your internal_networks + any IP's that you trust to not forge their header. They might relay spam but not initiate spam.

X-Spam-Status Header

Every email that goes through Zimbra and SA will include this header. It allows you to determine what rules fired and the score which contributed to classification.

X-Spam-Flag: YES
X-Spam-Score: 10.903
X-Spam-Level: **********
X-Spam-Status: Yes, score=10.903 required=4.8 tests=[BAYES_99=4,
	BAYES_999=0.2, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=0.001, HTTP_IN_BODY=0.1, J_BELOW_FOLD=1.5,
	J_BIG_PADDING_1=0.5, J_IMG_NO_EXTENS=0.1, J_OBFUSCATED_URL=2.5,
	J_URI_DOMAIN_BAD=0.1, MIME_QP_LONG_LINE=0.001, RCVD_IN_IVMSIP24=2,
	SPF_HELO_NONE=0.001] autolearn=no autolearn_force=no

As a convention, my local rules have prefix J_ which helps identify them from core SA rules. In the above line, we see that it was classified as junk with a minimum score of 4.8 for that classification threshold. This email scored 10.903 points and one rule called J_BELOW_FOLD contributed 1.5 points to that overall score. Scores can also be negative to reduce the overall score to effectively whitelist email or individuals. You can also do this at other layers including amavisd and postfix.

sauser.cf

You can override any default score like this:

 score SPF_HELO_NONE 1

A basic rule in SA looks like this:

rawbody J_HIDING_TEXT           /text-indent:\s*-\d{2,}/
score J_HIDING_TEXT             1.5
describe  J_HIDING_TEXT   hiding text by shoving off the screen

The rawbody allows us to look at the raw html and in this case, they are using negative indentation to hide some text from the viewer. If this rule fires, then we will add 1.5 points to the overall score. The describe tag is helpful to document what the rule is attempting to accomplish. A good place to learn about rules is to study the core rules and scores that are updated every night from SA.

header __J_FROM_BODY From =~ /example\.com| example\.com/i
header __J_FROM_SMTP Return-Path =~ /\@example.com|\@example.net/i

Here is another rule looking at a single header but without assigning a score. The 2 underscores in front of the rule name allow you to not have to specify a score. Think of this like a variable to be used later in a more elaborate rule. The two rules above track the from address in both the SMTP envelope and the message body. Both can be spoofed but what they can not do is digitally sign with our key. If we detect that anomaly then we have a potential spam signature.

If the sender is claiming to be from one of our domains than any email coming would be signed by us.

meta J_SPOOFED_FROM_1 (!__J_FROM_SMTP && __J_FROM_BODY && !DKIM_VALID_AU)
score  J_SPOOFED_FROM_1 7
describe J_POOFED_FROM_1 Spoofed from and not signed

The net effect is a score of 7 would force any email with a spoofed from address to be delivered to the user junk folder. A simpler version of this can also be done as follows:

meta J_SPOOFED_FROM (__J_FROM_BODY && !DKIM_VALID_AU)
score  J_SPOOFED_FROM 7
describe J_SPOOFED_FROM Not DKIM signed

The problem with this rule is that DKIM_VALID_AU says it was digitally signed by the author. So it is possible for the return-path to be the spammers From address but they still forged the From address in the body. That is why there are two variants of this rule.

Here is an example of a whitelist rule:

#whitelist if its from us
meta J_WHITELISTUS (!J_SPOOFED_FROM && __J_FROM_SMTP && DKIM_VALID_AU)
score J_WHITELISTUS -10
describe J_WHITELISTUS Digitally signed by our senders

Blacklists are useful for IP reputation checks but they lead to false positives. A better technique is to score them with a laddering approach. For example, Sorbs lists public freemail sites like gmail which can lead to false positives. Note: in these examples, an external MX is adding a header with the value of 'Blacklisted'. How about something like this:

header __X_SORBS_MILTER         X-SORBS-MILTER =~ /Blacklisted/
meta J_SORBS_BL    (__X_SORBS_MILTER && !DKIM_SIGNED)
score J_SORBS_BL 0.1
describe J_SORBS_BL listed on sorbs and not digitially signed

meta J_TRACKING_SORBS (J_SORBS_BL && J_IMG_NO_EXTENS)
score J_TRACKING_SORBS 0.5
describe J_TRACKING_SORBS Sorbs blacklist and tracking

meta J_FOREIGN_SORBS_1 ((__X_SORBS_MILTER && BLACKLIST_COUNTRY) && !(RCVD_IN_MSPIKE_H2 || RCVD_IN_DNSWL_NONE))
score J_FOREIGN_SORBS_1 2.0
describe J_FOREIGN_SORBS_1 Blacklist and foreign contry spam

The first rule will file when a sender is on Sorbs and not digitally signed. That prevents freemail sites like gmail from tripping this rule but still allow us to use Sorbs. Next we check to see if they are doing some tracking by hiding an img src statement without any extension in the message body. That adds another 0.5 to the score. Finally, we maintain a list of countries that are normal for our mail server. Should the sender come outside of that IP space and not be in a few acceptable white lists, we add 2 more points. The effect is we were able to use Sorbs with a possible 2.6 points even though it is too dangerous normally to score just a Sorbs blacklist.

Another technique is to look for a count of the number of lists the sender's IP address is on. The more IP's the higher probability this could be a compromised host issuing spam.

header __X_SORBS_MILTER         X-SORBS-MILTER =~ /Blacklisted/
header __X_PSBL_SURRIEL_MILTER  X-PSBL-SURRIEL-MILTER =~ /Blacklisted/
header __X_DNSRBL_MILTER        X-DNSRBL-MILTER =~ /Blacklisted/
meta J_DNSBL_MILTER_META ((__X_SORBS_MILTER + __X_PSBL_SURRIEL_MILTER + __X_DNSRBL_MILTER) > 1)
score J_DNSBL_MILTER_META 0.3
describe J_DNSBL_MILTER_META      IP listed in at least 2 Blacklists

Typical Workflow

The workflow we do is the following:

  • pull the user trained email each day or right click and view original email
  • run the email through spamassassin in debug mode to see which rules fired or didn't fire
  • adjust our rules in the sauser.cf file
  • re-test the same email
  • run spamassassin --lint to test for any mistakes
  • let the normal SA rule update cycle put the rules into production

Useful Scripts

Over time, you will want to develop a set of convenience aliases and small scripts for navigation. Here is a few to give you some ideas from my github page.

  • check_sa.sh - given an email, will run spamassassin in debug mode so you can investigate the rules that fired.
  • zmcopyTrain - pulls the user trained ham and spam every day
  • dump_trained_spam.sh - pulls all the daily user training created by zmcopyTrain and displays the Return-Path, X-Spam-Status, From, To, and Subject
  • vi-email.sh - edit an individual email given some pattern to search
  • inspect_mail.pl - print Return-Path, X-Spam-Status, From, To, and Subject headers
  • zmcopyTrain - pull any email trained by users (spam|ham)

and some aliases (csh):

  • #spamassasin
  • alias vi-spam 'sudo vi /opt/zimbra/data/spamassassin/localrules/sauser.cf'
  • alias view-spam 'view /opt/zimbra/data/spamassassin/localrules/sauser.cf'
  • alias g-spam 'pushd /opt/zimbra/data/spamassassin/state/3.004001/updates_spamassassin_org'
  • alias g-spamassasin 'pushd /opt/zimbra/common/lib/perl5/Mail/SpamAssassin'

Sample Scenario

The file email.txt contains the file we wish to inspect

% inspect_mail.pl < email.txt
Return-Path: rosenjason789@gmail.com
X-Spam-Status: No, score=2.968 required=4.8 tests=[BAYES_80=2,
	DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
	HTML_MESSAGE=0.001, J_DNSBL_MILTER_META=0.3, J_DOCTYPE_MISSING=0.5,
	J_RCVD_IN_HOSTKARMA_YEL=0.003, RCVD_IN_DNSWL_NONE=-0.0001,
	RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001,
	T_GB_FREEMAIL_DISPTO=0.01] autolearn=no autolearn_force=no
From: "Jason Rosen" <rosenjason789@gmail.com>
To: <user@example.com>
Subject: Health and Safety Professionals Email List

We could investigate the rules that fired or gather additional information by running that email through SA in debug mode.

# su - zimbra
# spamassassin -D -L < email.txt > /dev/null 2> email.txt.out

or from any user account with this command

% check_sa.sh email.txt

At the end of the file email.txt.out is the subtests line ... here is a snippet

...
...
Jul  3 08:07:18.375 [11901] dbg:check:subtests=__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BUGGED_IMG,__CLICK_HERE,
__COMMENT_EXISTS,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__CTYPE_MULTIPART_ANY,__DKIM_EXISTS,__DOS_HAS_ANY_URI,__DOS_HAS_LIST_UNSUB,
__DOS_RCVD_TUE,__DOS_SINGLE_EXT_RELAY,__FB_COST,__FORGED_SENDER,__FSL_HAS_LIST_UNSUB,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_DKIM_SIGHD,
__HAS_FROM__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HIGHBITS,__HTML_LINK_IMAGE,__HTML_TITLE_120,
__J_BLACKLISTS,__J_CLASS_IDS_CNT,__J_DOCTYPE_EXISTS,__J_HTML_EXISTS,__J_STYLE_IN_HEAD,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,
__LIST_PARTIAL,__LOCAL_PP_NONPPURL,__LONGLINE,__L_BODY_8BITS,__MIME_HTML,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HEX,
__MSOE_MID_WRONG_CASE,__NAKED_TO,__NONEMPTY_BODY,__NOT_A_PERSON,__RCD_RDNS_MAIL,__RCD_RDNS_MAIL_MESSY,__RDNS_LONG,__REPLYTO_EXISTS,
__RP_MATCHES_RCVD,__SANE_MSGID,__SUBJECT_ENCODED_B64,__SUBJECT_UTF8_B_ENCODED,__SUBJ_NOT_SHORT,__SUBSCRIPTION_INFO,__TAG_EXISTS_BODY,
__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TAG_EXISTS_META,__TAG_EXISTS_STYLE,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO___LOWER,__TVD_MIME_ATT_TP,
__UNPARSEABLE_RELAY_COUNT,__UNSUB_LINK,__URI_MAILTO,__URI_MAILTO,__USING_VERP1,__X_SORBS_MILTER
Jul  3 08:07:18.376 [11901] dbg: timing: total 11936 ms - init: 1461 (12.2%), parse: 7 (0.1%), extract_message_metadata: 180 (1.5%), get_uri_detail_list: 10 (0.1%), tests_pri_-1000: 11 (0.1%), compile_gen: 119 (1.0%), compile_eval: 28 (0.2%), tests_pri_-950: 7 (0.1%), tests_pri_-900: 8 (0.1%), tests_pri_-90: 7 (0.1%), tests_pri_0: 10144 (85.0%), tests_pri_10: 7 (0.1%), tests_pri_20: 7 (0.1%), tests_pri_30: 7 (0.1%), tests_pri_500: 76 (0.6%)
Jul  3 08:07:18.378 [11901] dbg: plugin: Mail::SpamAssassin::Plugin::MIMEHeader=HASH(0x32e65d0) implements 'finish_tests', priority 0
Jul  3 08:07:18.379 [11901] dbg: plugin: Mail::SpamAssassin::Plugin::Check=HASH(0x342f240) implements 'finish_tests', priority 0
Jul  3 08:07:18.386 [11901] dbg: netset: cache trusted_networks hits/attempts: 4/7, 57.1 %

Look at some of those "double underscore" rules in /opt/zimbra/data/spamassassin/state/3.004001/updates_spamassassin_org to understand what they attempt to do and see if we might incorporate that rule for a custom rule.

Lets say we wanted to make sure this type of email goes to the junk folder in the future. Here is a rule we could put in sauser.cf

meta J_FORCE_TO_JUNK    ((_X_SORBS_MILTER + __J_BLACKLISTS)  > 1) &&  (__NOT_A_PERSON))
score J_FORCE_TO_JUNK 10
description J_FORCE_TO_JUNK On a blacklist and wasn't a person that sent this

and then verify it via the lint option

/opt/zimbra/common/bin/spamassassin --lint

If that new rule was clean, then re-run that email through spamassassin in debug mode and observe the score and your new rules firing.

Final thoughts: If we didn't want this email to be delivered at all, change the score past 15. This is only an example but with malware or dangerous attachments that you were 100% positive you didn't want delivered even to the junk folder, you might do this. We find that adding a little scoring is better than a lot of scoring in our experience. Our rules tend to be scored quite low but the laddering approach gradually adds more and more points which leads to a low number of false positives. Finally, unless this is a critical rule we don't refresh the spam system in real-time but prefer to allow the normal /opt/zimbra/libexec/zmsaupdate cycle that runs every night handle that. If you need it immediately, run zmantispamctl restart and your new rules will be live.

More articles written by me, https://wiki.zimbra.com/wiki/JDunphy-Notes

Jump to: navigation, search