JDunphy-Amavisd-Debugging

Revision as of 21:13, 26 October 2020 by JDunphy (talk | contribs) (zmantispamctl)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Debugging Amavisd Problems

   KB 24179        Last updated on 2020-10-26  




0.00
(0 votes)


Background

Amavisd runs as a daemon process and listens on TCP port (10024 - amavisd label in zimbra.log) for incoming SMTP connections. Postfix will send all in coming email to that port. Amavisd then runs all configured virus and spam tests then decides what to do with the email. If it considers the email clean it will pass it back to Postfix - but on a different TCP port (10025 - postfix/amavisd label in zimbra.log) where Postfix will accept any email for delivery. See $notify_method and $forward_method for this re-injection configuration in amavisd.conf. Postfix (smtp-amavis - see master.cf) uses a unix domain socket with a timeout of 20 mins to inject email to amavisd. It has max_use=20 before the instance terminates and a new instance is spawned (master). It utilizes XFORWARD to pass additional information such as name,address, protocol and HELO name of original client. Amavisd can use this information for switching policy banks, p0f fingerprinting, etc.


     .......................................
      :                Postfix              :
   ----->smtpd \                            :
      :         -pre-cleanup-\       /local---->
   ---->pickup /              -queue-       :
      :             -cleanup-/   |   \smtp----->
      :     bounces/    ^        v          :
      : and locally     |        v          :
      :   forwarded   smtpd  smtp-amavis     :
      :    messages   10025      |          :
      ...........................|...........
                        ^        |
                        |        v
            ............|...............................
            :           |   $inet_socket_port=10024    :
            :           |                              :
            : $forward_method='smtp:[127.0.0.1]:10025' :
            : $notify_method ='smtp:[127.0.0.1]:10025' :
            :                                          :
            :    amavisd-new                           :
            ............................................
                         Message flow 
  • Messages enter the Postfix system at the regular smtpd or pickup service.
  • The pre-cleanup cleanup service performs transformations and checks on these messages.
  • The qmgr service schedules the messages to be sent to the amavisd-new content filter.
  • amavisd-new performs various tests on the messages.
  • Messages are re-injected into the Postfix mail system, sending them to a dedicated, local smtpd service.

Flow

$ telnet localhost 10024
220 [127.0.0.1] ESMTP amavisd-new service ready
HELO localhost
250 [127.0.0.1]
MAIL FROM: <>
250 2.1.0 Sender  OK
RCPT TO: <postmaster>
250 2.1.5 Recipient postmaster OK
DATA
354 End data with <CR><LF>.<CR><LF>
From: virus-tester
To: undisclosed-recipients:;
Subject: amavisd test - simple - no spam test pattern

This is a simple test message from the amavisd-new test-messages.
.
250 2.6.0 Ok, id=30897-02, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as 079474CE44
QUIT
221 2.0.0 [127.0.0.1] amavisd-new closing transmission channel

The maillog shows the delivery path. Here's an excerpt from a successful delivery process:

Nov 1 11:28:10 mail postfix/smtpd[30986]: connect from localhost[127.0.0.1] <-- 1
Nov 1 11:28:10 mail postfix/smtpd[30986]: 079474CE44: client=localhost[127.0.0.1]
Nov 1 11:28:10 mail postfix/cleanup[30980]: 079474CE44: message-id=<20061101102810.079474CE44@mail.example.com>
Nov 1 11:28:10 mail postfix/qmgr[20432]: 079474CE44: from=<>, size=822, nrcpt=1 (queue active)
Nov 1 11:28:10 mail amavis[30897]: (30897-02) Passed BAD-HEADER, <> -> <postmaster>, quarantine: badh-le5gjszxowBk, mail_id: le5gjszxowBk, Hits: -1.76, queued_as: 079474CE44, 39505 ms <-- 2
Nov 1 11:28:10 mail postfix/smtpd[30986]: disconnect from localhost[127.0.0.1] Nov 1 11:28:10 mail postfix/local[30987]: 079474CE44: to=<postmaster@example.com>, relay=local, delay=0.27, delays=0.14/0.05/0/0.08, dsn=2.0.0, status=sent (delivered to mailbox: postmaster) <-- 3
Nov 1 11:28:10 mail postfix/qmgr[20432]: 079474CE44: removed

1 - amavisd connects with Postfix dedicated smtpd-daemon and hands over the e-mail that had been sent during the telnet session. smtpd gives a queue-id of 079474CE44 that can be tracked throughout the maillog.

2 - amavisd notices it has checked and sent an e-mail to <postmaster>.

3 - Postfix' local-service logs it successfully delivered an e-mail with queue-id 079474CE44 to the mailbox of postmaster.

Introduction

Zimbra uses Amavisd as a content filter. In general unfiltered email will arrive to amavisd (perl program) at port 10024 and amavisd will call clamav and perform SA classification before returning control back to postifx for delivery. For every amavisd configured, you can expect to see one folder representing a running amavisd in /opt/zimbra/data/amavisd/tmp. The contents under these directories represent the last processed email message. Zimbra cron will clean these directories out a few times per day as will zmantispamctl if you issue a start.

# su - zimbra
% zmprov gs `zmhostname` zimbraAmavisMaxServers 
# name mail.example.com
zimbraAmavisMaxServers: 6

amavisd Logging is sent to /var/log/zimbra.log and additional logging can be enabled with this command with levels 1-5:

% su - zimbra
% zmprov ms `zmhostname` zimbraAmavisLogLevel 3
% zmantispamctl restart

Debug mode is available and can be set by modifying /opt/zimbra/common/sbin/amavsid and locating the DEBUG statement and changing the value to 1 from 0 then restarting with zmantispamctl.

%  grep -n 'DEBUG = ' /opt/zimbra/common/sbin/amavsid
720:  $DEBUG = 0;

This will present very detailed information of why something is failing which is especially difficult when a SA module or rules is causing the problem.

zmantispamctl

This script will start antispam-mysql.server (start mariadb instance with tables supporting spam) and call zmamavisdctl (start amavisd-new). zmamavisdctl could also be called directly if you didn't want to restart the mariadb component and were just updating rules that you needed live immediately.

% ls /opt/zimbra/data/amavisd/
amavisd.sock  db  mysql  quarantine  tmp  var

It then does the following in this order:

sudo /opt/zimbra/common/sbin/amavis-mc
create directories if they don't exist: quarantine, tmp, var
clean out any old directories and files in tmp
/opt/zimbra/common/sbin/amavisd -X no_conf_file_writable_check -c /opt/zimbra/conf/amavisd.conf

Given this, it is possible to run amavisd from the command line if you were debugging the program directly. Finally, if you are modifying amavisd.conf and don't want zmconfigd to overwrite your file and rebuild it then use a second argument. Most of the zimbra start scripts follow this paradigm. Here is the amavisd example:

% zmamavisdctl start nozmconfigd

It doesn't matter what you put for the second argument. Note if you do run it by hand from the command line there are these options that might be useful [reload|debug|debug-sa|foreground]. Usage would be like this:

# su - zimbra
% /opt/zimbra/common/sbin/amavisd -X no_conf_file_writable_check -c /opt/zimbra/conf/amavisd.conf debug-sa

Reload simply sends a SIGHUP to amavisd to reload the SA rules.

Directories

/opt/zimbra/data/amavisd - where it works from
/opt/zimbra/common/sbin - daemons
/opt/zimbra/common/bin   - amvis-logwatch

Note: amavis-logwatch can parse /var/log/zimbra.log and make some sense of the logs.

% amavis-logwatch /var/log/zimbra.log

Sample Problems

Postfix will pass email for filtering to amavisd on port 10024 and the resulting filtered email with headers added for SA score, etc will be sent back to postfix (amavsid/smtpd in logs). A common error is when postfix notices that the connection isn't finishing correctly. It is expecting a socket close or an exit. You will see something like this even though amavisd is still working on the message.

% grep END-OF-MESSAGE /var/log/maillog 
Oct 17 13:02:03 relay3 postfix/amavisd/smtpd[5726]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
% grep 13:02:03 /var/log/maillog
Oct 17 13:02:03 relay3 postfix/amavisd/smtpd[5726]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4

If you see an amavisd that is 100% cpu bound, there is a good chance that it is a runaway. In the worse examples, an email has triggered a rule in SA that doesn't finish. Sometimes upgrading SA to a newer version will clear the problem unless you can find the rule causing the issue. In situations like this, you will lose the amavisd and it can take approx 903 seconds (15mins - see $active_ttl in amavis-services) before it dies and is available for new email processing. The email it was working on will be moved to the deferred queue by postfix. The corresponding log message can be found in /var/log/zimbra.log with a message like this:

Oct 17 12:17:21 relay3 postfix/smtp[1720]: B5D38380E5: to=<user@example.com>, orig_to=<user@mail.example.net>, relay=127.0.0.1[127.0.0.1]:10024, delay=133927, delays=133023/0/0.01/903, dsn=4.4.2, status=deferred (lost connection with 127.0.0.1[127.0.0.1] while sending end of data -- message may be sent more than once)

If you have an email that will not complete and cycles between deferred and the active queue, place it in the hold queue. That has the advantage that it is removed from the defer and deferred directories and is self-contained in a single file. If you have a debugging machine, you can copy this single file and test on that machine. You can also create another queue to make a copy of your input so you can try and re-try. For example:

# cd /opt/zimbra/data/postfix/spool
# mkdir jhold
# cd hold
# mv * ../jhold

Next setup debugging with amavisd, cp the message from jhold to hold and requeue the message. If it fails, rule out any local customization first with sauser.cf and repeat. If that fails, update the SA version as there are a few CVE against the SA shipped by Zimbra. It is rare to have this happen but mail bombs exist and if enough of them hit they can disrupt the queuing as amavisd can not process new email until they timeout (approx 15mins). One can view messages in these hold queues with postcat. Specify the -q option to view in any of the postfix queues and use it directly (no option) to view in your own queue (ie. jhold). Finally, if you absolutely need to get that message delivered, use postcat and grab the mail message and then insert it via zmlmtpinject.

You can issue a kill -s term (SIGTERM) to the stuck amavisd process if $active_ttl (15 mins) is too long to wait for amavsid-services to do the same thing. If you are hit with an email bomb, it doesn't take too many stuck amavisd's before all incoming email will be in the deferred queue while you are waiting for your amavisd's to time-out.

Scenario 2 (Quick workaround to buy time): If the problem persists and the steps above don't help and you have identified this to an individual email type and want to receive that email but doing so would cause amavisd to timeout (after 15 mins) with the message bounces between retention and active queues. Consider whitelisting with amavisd where the SA checks would short circuit SA testing/classification... otherwise one could whitelist with postfix. See this link for how to do that. https://wiki.zimbra.com/wiki/King0770-Notes-Whitelist-Spamassassin-MTA. Note: unfortunately, the problem could return should an internal user reply to that message as that also would cause the same problem described above. Consider disabling outgoing spam (see: ORIGINATING_POST policy bank to disable) processing for outgoing email until you have a solution why amavisd is locking up or getting stuck by an incoming email message. Similarly, blacklist if you don't want the message and have no solution to buy addition time while you resolve the stuck amavisd processes.

Unfortunately, if these" email bomb" messages come in bunches they will effectively break postfix queueing and everything will end up in the retention queue with very little real email making it through the active queues. It is the long $active_ttl that does the damage here as amavisd becomes non responsive as postfix it has to requeue these bad messages moved from the deferred queue back to the active queue by postfix's qmgr. Things will slowly grind to a halt with very real email getting through. In the worst scenarios, if you strace the pid of the stuck amavisd you will see it locked in epoll and 100% cpu by each amavisd.

Through all of this, no email will be lost unless an admin issues a delete of the email. The first level of attack is to manually move deferred email to the hold queue to allow the system to get on top of the active queue but that can be exhausted fairly quickly as bunches of incoming and deferred email bombs take out your amavisd's. You can also increase the number of amavsid's with the goal of having more of them to process real email than email bombs arriving every 15 mins. Consider creating your own hold queue folder at the same level as postfix's hold queue and moving (mv command) all the emails to it until you can get on top of this. Then gradually moving email's from your own hold queue via a simple 'mv' command line back to the postfix hold queue. From there you can requeue the email from the zimbra admin interface under queue monitoring and management later to successful delivered delayed email.

Utility Tools

Check various queues and system problems.

#!/bin/bash 

# Author: Jim Dunphy <jad aesir.com>
# License (ISC): It's yours. Enjoy
# Date: 10/14/2020

export PATH="/opt/zimbra/common/sbin:/opt/zimbra/common/bin:/opt/zimbra/libexec:/opt/zimbra/bin:$PATH"

#
# usage: check_queue.sh 
# 

# Reference:
#   http://www.postfix.org/postcat.1.html 
#       postcat -h file (-q searches through queues)
#   http://logreporters.sourceforge.net/faq.html#percentiles
#       postfix-logwatch --limit delivered=1.10 /var/log/zimbra.log  # top 10 list
#
# Message delivery time stamps
# delays=a/b/c/d, where
#   a = time before queue manager, including message transmission
#   b = time in queue manager
#   c = connection setup including DNS, HELO and TLS;
#   d = message transmission time.
#
# Bugs: Need to fix postfix_logwatch (change cut to =cut)
#


. /opt/zimbra/.bashrc

usage() {
  echo "

    *** Example usage: ***

     $ check_queue.sh --list
     $ check_queue.sh --qshape   #active queue
     $ check_queue.sh --qshaped  #deferred queue
     $ check_queue.sh --ldl
     $ check_queue.sh -v QueueID
     $ check_queue.sh --all
     $ check_queue.sh --lh listHoldQueue
     $ check_queue.sh --vi QueueID (from hold queue only)

    *** PostFix Options - need to run as root: ***

       View Email Queued being Processed
       -v|--view QueueId
       --vi QueueId (from hold queue only)

       List All Queues and Counts
       --all

       List Active Queue
       -l|--la|--list   

       List Hold Queue
       --lh

    *** Amavis Options: ***

       Amavis Stats
       -a|--amavis

    *** Log Options: ***
       --ldl      #list deferred logs (reason)
       --prob     #reject|warning|error|fatal|panic
       --delay    # delay=a/b/c/d
       --plog     # postfix_log

    -h|--help

  "
}

#
args=$(getopt -l "view:,vi:,delay,help,ldl,lh,la,list,plog,prob,amavis,all,qshape,qshaped" -o "v:hla" -- "$@")

eval set -- "$args"

while [ $# -ge 1 ]; do
        case "$1" in
                --)
                    # No more options left.
                    usage
                    shift
                    break
                   ;;
                -l|--la|--list)
                        ls /opt/zimbra/data/postfix/spool/active
                        exit 0
                        ;;
                --lh)
                        ls /opt/zimbra/data/postfix/spool/hold
                        exit 0
                        ;;
                --ldl)
			grep status=deferred /var/log/zimbra.log
                        exit 0
                        ;;
                -a|--amavis)
			amavisd-status
                        exit 0
                        ;;
                --all)
                        /opt/zimbra/libexec/zmqstat
                        exit 0
                        ;;
                --plog)
                        postfix-logwatch --detail 6 /var/log/zimbra.log
                        exit 0
                        ;;
                --prob)
                        egrep '(reject|warning|error|fatal|panic):' /var/log/zimbra.log
                        exit 0
                        ;;
                --delay)
                        grep -i smtp /var/log/zimbra.log | grep -ioE 'to=.*status=.*[ \t] ' | tail -10
                        #grep -i smtp /var/log/zimbra.log | grep -ioE 'to=.*status=.*[ \t] ' 
                        exit 0
                        ;;
                --qshape)
                        su - zimbra -c "qshape"
                        exit 0
                        ;;
                --qshaped)
                        su - zimbra -c "qshape deferred"
                        exit 0
                        ;;
		-v|--view)
                        view="$2"
                        postcat -q $view
                        shift
                        exit
                        ;;
		--vi)
                        view="$2"
                        vi /opt/zimbra/data/postfix/spool/hold/$view
                        shift
                        exit
                        ;;

                -h|--help)
                        usage
                        exit 0
                        ;;
        esac

        shift
done

exit 0

Reference

https://www.ijs.si/software/amavisd/#faq-trouble
https://www.ijs.si/software/amavisd/amavisd-new-magdeburg-20050519.pdf
https://www.ijs.si/software/amavisd/amavisd-new-docs.html#max_requests
https://wiki.zimbra.com/index.php?title=Troubleshooting_Course_Content_Rough_Drafts-Zimbra_Logs#Postfix
https://wiki.zimbra.com/wiki/IP_Address_whitelisting

More articles written by me, https://wiki.zimbra.com/wiki/JDunphy-Notes

Jump to: navigation, search