Difference between revisions of "JDunphy-Amavisd-Debugging"

m (Background)
m (Introduction)
Line 9: Line 9:
  
 
= Introduction =
 
= Introduction =
Zimbra uses Amavisd as a content filter.  In general unfiltered email will arrive to amavisd (perl program) at port 10024 and amavisd will call clamav and perform SA classification before returning control back to postifx (amavisd/smtpd) for delivery. For every amavisd configured, you can expect to see one folder representing a running amavisd in /opt/zimbra/data/amavisd/tmp. The contents under these directories represent the last processed email message. Zimbra cron will clean these directories out a few times per day as will zmantispamctl if you issue a start.
+
Zimbra uses Amavisd as a content filter.  In general unfiltered email will arrive to amavisd (perl program) at port 10024 and amavisd will call clamav and perform SA classification before returning control back to postifx for delivery. For every amavisd configured, you can expect to see one folder representing a running amavisd in /opt/zimbra/data/amavisd/tmp. The contents under these directories represent the last processed email message. Zimbra cron will clean these directories out a few times per day as will zmantispamctl if you issue a start.
 
<pre>
 
<pre>
 
# su - zimbra
 
# su - zimbra

Revision as of 19:08, 23 October 2020

Debugging Amavisd Problems

   KB 24179        Last updated on 2020-10-23  




0.00
(0 votes)


Background

Amavisd runs as a daemon process and listens on TCP port (10024 - amavisd label in zimbra.log) for incoming SMTP connections. Postfix will send all in coming email to that port. Amavisd then runs all configured virus and spam tests then decides what to do with the email. If it considers the email clean it will pass it back to Postfix - but on a different TCP port (10025 - postfix/amavisd label in zimbra.log) where Postfix will accept any email for delivery.

Introduction

Zimbra uses Amavisd as a content filter. In general unfiltered email will arrive to amavisd (perl program) at port 10024 and amavisd will call clamav and perform SA classification before returning control back to postifx for delivery. For every amavisd configured, you can expect to see one folder representing a running amavisd in /opt/zimbra/data/amavisd/tmp. The contents under these directories represent the last processed email message. Zimbra cron will clean these directories out a few times per day as will zmantispamctl if you issue a start.

# su - zimbra
% zmprov gs `zmhostname` zimbraAmavisMaxServers 
# name mail.example.com
zimbraAmavisMaxServers: 6

amavisd Logging is sent to /var/log/zimbra.log and additional logging can be enabled with this command with levels 1-5:

% su - zimbra
% zmprov ms `zmhostname` zimbraAmavisLogLevel 3
% zmantispamctl restart

Debug mode is available and can be set by modifying /opt/zimbra/common/sbin/amavsid and locating the DEBUG statement and changing the value to 1 from 0 then restarting with zmantispamctl.

%  grep -n 'DEBUG = ' /opt/zimbra/common/sbin/amavsid
720:  $DEBUG = 0;

This will present very detailed information of why something is failing which is especially difficult when a SA module or rules is causing the problem.

zmantispamctl

This script will start antispam-mysql.server (start mariadb instance with tables supporting spam) and call zmamavisdctl (start amavisd-new). zmamavisdctl could also be called directly if you didn't want to restart the mariadb component and were just updating rules that you needed live immediately.

% ls /opt/zimbra/data/amavisd/
amavisd.sock  db  mysql  quarantine  tmp  var

It then does the following in this order:

sudo /opt/zimbra/common/sbin/amavis-mc
create directories if they don't exist: quarantine, tmp, var
clean out any old directories and files in tmp
/opt/zimbra/common/sbin/amavisd -X no_conf_file_writable_check -c /opt/zimbra/conf/amavisd.conf

Given this, it is possible to run amavisd from the command line if you were debugging the program directly. Finally, if you are modifying amavisd.conf and don't want zmconfigd to overwrite your file and rebuild it then use a second argument. Most of the zimbra start scripts follow this paradigm. Here is the amavisd example:

% zmamavisdctl start nozmconfigd

It doesn't matter what you put for the second argument.

Directories

/opt/zimbra/data/amavisd - where it works from
/opt/zimbra/common/sbin - daemons
/opt/zimbra/common/bin   - amvis-logwatch

Note: amavis-logwatch can parse /var/log/zimbra.log and make some sense of the logs.

% amavis-logwatch /var/log/zimbra.log

Sample Problems

Postfix will pass email for filtering to amavisd on port 10024 and the resulting filtered email with headers added for SA score, etc will be sent back to postfix (amavsid/smtpd in logs). A common error is when postfix notices that the connection isn't finishing correctly. It is expecting a socket close or an exit. You will see something like this even though amavisd is still working on the message.

% grep END-OF-MESSAGE /var/log/maillog 
Oct 17 13:02:03 relay3 postfix/amavisd/smtpd[5726]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
% grep 13:02:03 /var/log/maillog
Oct 17 13:02:03 relay3 postfix/amavisd/smtpd[5726]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4

If you see an amavisd that is 100% cpu bound, there is a good chance that it is a runaway. In the worse examples, an email has triggered a rule in SA that doesn't finish. Sometimes upgrading SA to a newer version will clear the problem unless you can find the rule causing the issue. In situations like this, you will lose the amavisd and it can take approx 903 seconds (15mins - see $active_ttl in amavis-services) before it dies and is available for new email processing. The email it was working on will be moved to the deferred queue by postfix. The corresponding log message can be found in /var/log/zimbra.log with a message like this:

Oct 17 12:17:21 relay3 postfix/smtp[1720]: B5D38380E5: to=<user@example.com>, orig_to=<user@mail.example.net>, relay=127.0.0.1[127.0.0.1]:10024, delay=133927, delays=133023/0/0.01/903, dsn=4.4.2, status=deferred (lost connection with 127.0.0.1[127.0.0.1] while sending end of data -- message may be sent more than once)

If you have an email that will not complete and cycles between deferred and the active queue, place it in the hold queue. That has the advantage that it is removed from the defer and deferred directories and is self-contained in a single file. If you have a debugging machine, you can copy this single file and test on that machine. You can also create another queue to make a copy of your input so you can try and re-try. For example:

# cd /opt/zimbra/data/postfix/spool
# mkdir jhold
# cd hold
# mv * ../jhold

Next setup debugging with amavisd, cp the message from jhold to hold and requeue the message. If it fails, rule out any local customization first with sauser.cf and repeat. If that fails, update the SA version as there are a few CVE against the SA shipped by Zimbra. It is rare to have this happen but mail bombs exist and if enough of them hit they can disrupt the queuing as amavisd can not process new email until they timeout (approx 15mins). One can view messages in these hold queues with postcat. Specify the -q option to view in any of the postfix queues and use it directly (no option) to view in your own queue (ie. jhold). Finally, if you absolutely need to get that message delivered, use postcat and grab the mail message and then insert it via zmlmtpinject.

You can issue a kill -s term (SIGTERM) to the stuck amavisd process if $active_ttl (15 mins) is too long to wait for amavsid-services to do the same thing. If you are hit with an email bomb, it doesn't take too many stuck amavisd's before all incoming email will be in the deferred queue while you are waiting for your amavisd's to time-out.

Scenario 2 (Quick workaround to buy time): If the problem persists and the steps above don't help and you have identified this to an individual email type and want to receive that email but doing so would cause amavisd to timeout (after 15 mins) with the message bounces between retention and active queues. Consider whitelisting with amavisd where the SA checks would short circuit SA testing/classification... otherwise one could whitelist with postfix. See this link for how to do that. https://wiki.zimbra.com/wiki/King0770-Notes-Whitelist-Spamassassin-MTA. Note: unfortunately, the problem could return should an internal user reply to that message as that also would cause the same problem described above. Consider disabling outgoing spam (see: ORIGINATING_POST policy bank to disable) processing for outgoing email until you have a solution why amavisd is locking up or getting stuck by an incoming email message. Similarly, blacklist if you don't want the message and have no solution to buy addition time while you resolve the stuck amavisd processes.

Unfortunately, if these" email bomb" messages come in bunches they will effectively break postfix queueing and everything will end up in the retention queue with very little real email making it through the active queues. It is the long $active_ttl that does the damage here as amavisd becomes non responsive as postfix it has to requeue these bad messages moved from the deferred queue back to the active queue by postfix's qmgr. Things will slowly grind to a halt with very real email getting through. In the worst scenarios, if you strace the pid of the stuck amavisd you will see it locked in epoll and 100% cpu by each amavisd.

Through all of this, no email will be lost unless an admin issues a delete of the email. The first level of attack is to manually move deferred email to the hold queue to allow the system to get on top of the active queue but that can be exhausted fairly quickly as bunches of incoming and deferred email bombs take out your amavisd's. You can also increase the number of amavsid's with the goal of having more of them to process real email than email bombs arriving every 15 mins. Consider creating your own hold queue folder at the same level as postfix's hold queue and moving (mv command) all the emails to it until you can get on top of this. Then gradually moving email's from your own hold queue via a simple 'mv' command line back to the postfix hold queue. From there you can requeue the email from the zimbra admin interface under queue monitoring and management later to successful delivered delayed email.

Utility Tools

Check various queues and system problems.

#!/bin/bash 

# Author: Jim Dunphy <jad aesir.com>
# License (ISC): It's yours. Enjoy
# Date: 10/14/2020

export PATH="/opt/zimbra/common/sbin:/opt/zimbra/common/bin:/opt/zimbra/libexec:/opt/zimbra/bin:$PATH"

#
# usage: check_queue.sh 
# 

# Reference:
#   http://www.postfix.org/postcat.1.html 
#       postcat -h file (-q searches through queues)
#   http://logreporters.sourceforge.net/faq.html#percentiles
#       postfix-logwatch --limit delivered=1.10 /var/log/zimbra.log  # top 10 list
#
# Message delivery time stamps
# delays=a/b/c/d, where
#   a = time before queue manager, including message transmission
#   b = time in queue manager
#   c = connection setup including DNS, HELO and TLS;
#   d = message transmission time.
#
# Bugs: Need to fix postfix_logwatch (change cut to =cut)
#


. /opt/zimbra/.bashrc

usage() {
  echo "

    *** Example usage: ***

     $ check_queue.sh --list
     $ check_queue.sh --qshape   #active queue
     $ check_queue.sh --qshaped  #deferred queue
     $ check_queue.sh --ldl
     $ check_queue.sh -v QueueID
     $ check_queue.sh --all
     $ check_queue.sh --lh listHoldQueue
     $ check_queue.sh --vi QueueID (from hold queue only)

    *** PostFix Options - need to run as root: ***

       View Email Queued being Processed
       -v|--view QueueId
       --vi QueueId (from hold queue only)

       List All Queues and Counts
       --all

       List Active Queue
       -l|--la|--list   

       List Hold Queue
       --lh

    *** Amavis Options: ***

       Amavis Stats
       -a|--amavis

    *** Log Options: ***
       --ldl      #list deferred logs (reason)
       --prob     #reject|warning|error|fatal|panic
       --delay    # delay=a/b/c/d
       --plog     # postfix_log

    -h|--help

  "
}

#
args=$(getopt -l "view:,vi:,delay,help,ldl,lh,la,list,plog,prob,amavis,all,qshape,qshaped" -o "v:hla" -- "$@")

eval set -- "$args"

while [ $# -ge 1 ]; do
        case "$1" in
                --)
                    # No more options left.
                    usage
                    shift
                    break
                   ;;
                -l|--la|--list)
                        ls /opt/zimbra/data/postfix/spool/active
                        exit 0
                        ;;
                --lh)
                        ls /opt/zimbra/data/postfix/spool/hold
                        exit 0
                        ;;
                --ldl)
			grep status=deferred /var/log/zimbra.log
                        exit 0
                        ;;
                -a|--amavis)
			amavisd-status
                        exit 0
                        ;;
                --all)
                        /opt/zimbra/libexec/zmqstat
                        exit 0
                        ;;
                --plog)
                        postfix-logwatch --detail 6 /var/log/zimbra.log
                        exit 0
                        ;;
                --prob)
                        egrep '(reject|warning|error|fatal|panic):' /var/log/zimbra.log
                        exit 0
                        ;;
                --delay)
                        grep -i smtp /var/log/zimbra.log | grep -ioE 'to=.*status=.*[ \t] ' | tail -10
                        #grep -i smtp /var/log/zimbra.log | grep -ioE 'to=.*status=.*[ \t] ' 
                        exit 0
                        ;;
                --qshape)
                        su - zimbra -c "qshape"
                        exit 0
                        ;;
                --qshaped)
                        su - zimbra -c "qshape deferred"
                        exit 0
                        ;;
		-v|--view)
                        view="$2"
                        postcat -q $view
                        shift
                        exit
                        ;;
		--vi)
                        view="$2"
                        vi /opt/zimbra/data/postfix/spool/hold/$view
                        shift
                        exit
                        ;;

                -h|--help)
                        usage
                        exit 0
                        ;;
        esac

        shift
done

exit 0

Reference

https://www.ijs.si/software/amavisd/#faq-trouble
https://www.ijs.si/software/amavisd/amavisd-new-magdeburg-20050519.pdf
https://www.ijs.si/software/amavisd/amavisd-new-docs.html#max_requests
https://wiki.zimbra.com/index.php?title=Troubleshooting_Course_Content_Rough_Drafts-Zimbra_Logs#Postfix

More articles written by me, https://wiki.zimbra.com/wiki/JDunphy-Notes

Jump to: navigation, search