From Zimbra :: Wiki

Jump to: navigation, search
Attention.png - This article is NOT official Zimbra documentation. It is a user contribution and may include unsupported customizations, references, suggestions, or information.


Testing & Debugging Tools

Actual Testing & Debugging Homepage

Please see Ajcody-Testing-Debugging

Other Debug References

When Was A ZCS Service Enabled Or Disabled=

See Ajcody-Notes-Archive-Discovery#When_Was_A_ZCS_Service_Enabled_Or_Disabled

Ajcody Logging Wiki

See Ajcody-Logging

Other Zimbra Wiki Pages

RFE's Related To Better Logging And Historical Data Of Systems

  • Bug 93954 : RFE: admin console services status page and service enable/disable page improvements
  • Bug 93950 : RFE: History files written when changes to individual ZCS services have taken place
  • Bug 93784 : Proxy check script
  • Bug 93656 : zmdiaglog to collect history or diffs of server variables changes via backups
  • Bug 90598 : RFE: Add option to let the MTA log a mail's subject
  • Bug 90400 : Provide User's Mailbox Summary/Reporting for the end user in ZWC/email report
  • Bug 90032 : zmdiaglog: don't attempt to process mailboxd collection activities on non-mailbox nodes
  • Bug 89860 : gather additional system information in zmdiaglog
  • Bug 89700 : proxy logs goto logger host
  • Bug 89450 : Include postfix-logwatch and amavis-logwatch
  • Bug 89008 : Source Port Logging in Jetty For EU Legal Requirements
    • This bug was initially created as a clone of Bug #85073, include client (source) ports for all mail related connections, in addition to client IP. Need this done for Jetty, nginx and postfix already complete.
  • Bug 88414 : expand 3rd party performance and diagnostic software support
  • Bug 88412 : virtualization env. specific checks and diagnostic scripts
  • Bug 88410 : zmdiaglog gets moved to it's own package, zimbra-diagnostics and we include public package repro to update
  • Bug 88408 : expand zmrcd commands and/or zmdiaglog to do more specific level of log collection
  • Bug 88406 : proxy diagnostic script
  • Bug 88404 : post-installer check script
  • Bug 85523 : Individual logs for zmlocalconfig edits and all services start, stop, restarts
  • Bug 85520 : All backups should include the localconfig.xml
  • Bug 85073 : Source Port Logging For EU Legal Requirements
  • Bug 84313 : zmcontrol should give more details when a service isn't in a running state
  • Bug 84273 : Expand zmdiaglog to better fit as a support case creation tool
    • I have a lot of other bugs/rfe's references in this one also.
    • Private Bug
  • Bug 84272 : Host our own ZCS instance for customers support accounts
    • Private Bug
  • Bug 84271 : Integration of account logins for ftp for support cases
    • Private Bug
  • Bug 84142 : Add script for detecting compromised accounts
  • Bug 83779 : zmdiaglog: include all ZCS configs which are different from the default value
  • Bug 83227 : *.trace.log - what's it for, how to control growth, better retention policy
  • Bug 81201 : zmcontrol start --debug offer verbose logging for all services and notes logs
  • Bug 79883 : Expand Monitor > MobileSync Statistics reporting
  • Bug 78251 : RFE: Display Date And Time for each command
  • Bug 77076 : All Ext. Auth options should have documentation about limitations and expectations
  • Bug 77075 : Flowchart of ext. auth and where log events occur
  • Bug 76540 : RFE: Priovide ability to view major logs, edit configs and terminal login within admin console
  • Bug 75502 : Logging should state if any stop/start/restart of service is manual or automated
  • Bug 62315 : RFE: integrate log analysis project - AWstats or Analog, Webalizer, etc.
  • Bug 62095 : Incorrect graphs in Administration Monitoring Server Statistics message count, message volume
  • Bug 57871 : RFE: replace the current syslog + flash based log system to a Hyperic HQ + HTML5 based log solution
  • Bug 54034 : RFE: zimbra-attrs.xml & zmprov desc state if immediate, zmprov fc , or service restart needed
  • Bug 51100 : RFE: zmprov variable edits should output if a service restart is need or flush cache
  • Bug 49557 : RFE: Login Format Option other than user@domain ,ex. user+domain
  • Bug 45383 : RFE: Enhance documentation as to where connection logging should be expected - esp Active Sync
  • Bug 28934 : More logging options via admin console
  • Bug 12196 : support for real time counters (snmp, jmx, etc)

Various Errors And Log Items One Might See


See Ajcody-Logging for more complete logging information.

mailbox.log and zimbra.log

Zimbra Support will likely need to see the /opt/zimbra/log/mailbox.log file and possibly /var/log/zimbra.log from the system. These files contain the basic operation data from the system, and can tell us if the server has something seriously wrong with it. They can also be correlated against other collected data to give a complete picture of the workings of the system.

mysql_error.log and myslow.log

These files are both located in /opt/zimbra/log/ on the mail store server. They contain information about the health of the mysql database. If there is data corruption or another problem causing direct mysql errors, events will be logged in mysql_error.log. If certain search requests are taking longer to complete than others, they will be logged in myslow.log.


This file contains startup information for mailboxd and thread dumps created whenever mailboxd is shut down. If a server goes completely nonresponsive and is restarted, the thread dump captured here will tell us if there are certain threads blocking other threads' access to critical data elements. Frequently, slow behavior can be caused by these thread locks.
In some cases, it may be necessary to monitor garbage collection and other operations at the Java VM level. To enable this logging, add the following to mailboxd_java_options.
To get current options:
zmlocalconfig mailboxd_java_options
Then appended the following to your current set of options - "-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime". PLEASE NOTE FOR BELOW, "YOUR EXISTING OPTIONS" should be replaced with the output above.
zmlocalconfig -e mailboxd_java_options="YOUR EXISTING OPTIONS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime"

Connections And Or Threads Mentioned In mailbox.log

Tuning Variables Related To Threads And Connections

Resources to "tune" the below variables:

  • ZCS 5.0 has a single thread pool for both HTTP and HTTPS
    • To see current setting
      • zmprov gs `zmhostname` zimbraHttpNumThreads
    • To modify setting
      • zmprov ms `zmhostname` zimbraHttpNumThreads 500
  • This setting of 300 is able to support a few 10s of thousands of users checking mailing every 8 minutes.
    • To see current setting
      • zmprov gs `zmhostname` zimbraPop3NumThreads
    • To modify setting
      • zmprov ms `zmhostname` zimbraPop3NumThreads 300
  • IMAP thread settings.
    • To see current setting
      • zmprov gs `zmhostname` zimbraImapNumThreads
    • To modify setting
      • zmprov ms `zmhostname` zimbraImapNumThreads 500
  • LMTP thread settings.
    • To see current setting
      • zmprov gs `zmhostname` zimbraLmtpNumThreads
    • To modify setting
      • zmprov ms `zmhostname` zimbraLmtpNumThreads 40
  • JVM (java Virtual Machine options) ZCS 5.0 and later
    • To see current setting
      • zmlocalconfig mailboxd_java_options
    • To modify setting
      • zmlocalconfig mailboxd_java_options="-client -XX:NewRatio=2 -Djava.awt.headless=true -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1"
        • The above is using the "default" options as an example to modify.
    • Other various configurations for mailboxd_java_options. See sections related to heap and thread dumps below for more details.
      • mailboxd_java_options="-client -XX:NewRatio=2 -XX:MaxPermSize=128m -Djava.awt.headless=true -XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseParallelGC"
      • mailboxd_java_options="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/some/directory/that/exists/and/is/zimbra/writable"
  • JVM Cache setting
    • To see current setting
      • zmlocalconfig mailboxd_java_heap_memory_percent
    • To modify setting
      • zmlocalconfig -e mailboxd_java_heap_memory_percent=40
  • Mailbox message cache. The mailbox server maintains a cache of message bodies (blobs on disk from /opt/zimbra/store). This cache speeds up retrieval of message content for mailapps such as mail.app
    • To see current setting
      • zmprov gs `zmhostname` zimbraMessageCacheSize
    • To modify setting
      • zmprov ms `zmhostname` zimbraMessageCacheSize 104857600
  • zimbra_mysql_connector_maxActive
    • To see current setting
      • zmlocalconfig zimbra_mysql_connector_maxActive
    • To modify setting
      • zmlocalconfig -e zimbra_mysql_connector_maxActive=100

Example Of mailbox.log event:

dbconn - Connection pool is 75% utilized (88 connections out of a maximum of 100 in use).  
   Turn on debug logging for zimbra.dbconn to see stack traces of connections not returned to the pool.

There is a potential problem with XMPP [Instant Messaging]. XMPP is a beta feature and is not supported. If you're not using XMPP anyways, disabling it is strongly recommended, and XMPP is a beta feature and not supported in a production environment today.

  • Confirm if variable is set to False or True
    • zmprov gacf zimbraXMPPEnabled
  • Set to FALSE to turn off
    • zmprov mcf zimbraXMPPEnabled FALSE
  • Then at some point restart the mailstores.

Bug reference:

Zimbra Tools And Commands

zmstat and zmstat-chart


Down And Dirty Example zmstat

Zmstats runs constantly on all current Zimbra systems. It monitors various Zimbra components as well as the whole system to give a good picture of how the system is performing over time. Stats charts are extremely useful for troubleshooting performance issues and can often point to a bottleneck on the system itself or to specific problems in the mailboxd Java VM.

To generate a stat chart, run the following:

zmstat-chart -d <output directory> -s /opt/zimbra/zmstat/<date> --title "<server> <date>"

The date is in the format 'YYYY-MM-DD'. Generally when troubleshooting a performance problem, Zimbra Support will need to see several days' worth of statistics data, as well as the log files and possibly thread dumps from the same time period.


See also : Zmdiaglog

Running a full zmdiaglog would take many stats such as zmstats, thread dump, lsof, core dump, and heap dump. The heap dump portion could take longer than the other steps and is invasive on the system. If possible, it should only be ran if it is late in the evening or if the server is not responding anyway. Prior to 5.0.16, this tool may run a heap dump, which would be interruptive on a running server. From 5.0.16, a heap dump requires "zmdiaglog -a". Also, you can CTRL-C to exit out of the zmdiaglog during the heap dump process without affecting the rest of the dump. You will just need to compress the files manually.

As root:


This will create a zip file with allot of relevant data for troubleshooting issues.

zimdialog To Get Full Heap Dump

In order to get a full heap dump, one has to make sure that HeapDumps have been enabled in the JVM options. In order to do this, please first check your mailboxd_java_options on that server:

# su - zimbra
$ zmlocalconfig mailboxd_java_options

Then, append these additional arguments:

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/some/directory/that/exists/and/is/zimbra/writable

It is very important that the HeapDumpPath chosen is large enough to dump a multi-GB core file and heap dump. It is also very important to make sure that these options are appended to your existing options.

Example [You might need to adjust based upon your existing mailboxd_java_options]:

$ zmlocalconfig mailboxd_java_options
mailboxd_java_options = -client -XX:NewRatio=2 -Djava.awt.headless=true -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1
$ zmlocalconfig -e mailboxd_java_options="-client -XX:NewRatio=2 -Djava.awt.headless=true -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"

Then restart ZCS on that mailstore.

Caution: If /var/tmp does not enough space, it could fill the filesystem/server and cause very bad problems!

Then, if any problem with performance occurs again, run the zmdiaglog as root:

[Prior to 5.0.16]


[For 5.0.16 and forward]

/opt/zimbra/libexec/zmdiaglog -a

Note: there are optional options to zmdiaglog that allows it to write to locations other than /tmp, again very important not to fill up the filesystem. Run zmdiaglog -h to get the list of options:

# /opt/zimbra/libexec/zmdiaglog -h
Usage: zmdiaglog [-dth]
   -d    - Log destination (Default /tmp/zmdiaglog.PID)
   -t    - Timeout in seconds for hanging commands (Default 120)
   -h    - This help message

This may take a few minutes to complete, as it is writing a multiple GB file. Once complete, then restart mailboxd:

# su - zimbra
$ zmmailboxdctl restart

You will then provide to Zimbra, via FTP, the zmdiaglog information that is written to /tmp. Our developers will most likely need to debug the information provided rather than support staff. Please be patient, support will provide feedback as they hear the conclusions that the developers provide.

There are lots of steps requested here, but this information is critical if we are to diagnose this reported problem. The data provided by zmdiaglog is very important.

Using JDK 1.6 For zmdiaglog

See Ajcody-Testing-Debugging#jmap_And_JDK_1.6

Note About Memory And Linux

In general, memory used by the JVM can occasionally spike above the maximum heap size under normal usage, but we would expect that it would go back down again eventually. Also, please note that all Linux systems will utilize all memory available on the system - this is because the kernel will cache any data it can in order for possible reuse; this is perfectly normal, and in the case that memory is needed, older cache will be deleted from memory and memory freed for other use. No Linux system will show much memory as "free" after being up for some period of time, for example when using top or vmstat. All memory will get utilized by the system, and this is a good thing for performance - you paid for the memory, might as well use it. Where it could be a problem is if one process grows uncontrolled, or if swapping is occurring.

Thread Dumps And Specific Commands To Use

A thread dump is a printout of the status of all the running threads in the mailboxd process at a specific point in time. A thread dump allows Zimbra engineers to see how the system is operating, what each thread is doing, and what data elements are being accessed by individual threads. If a performance bottleneck is not identifiable from the stat charts alone, it may be necessary to generate a periodic thread dump.

zmthrdump - ZCS 5.0.10+

The thread dump from zmthrdump will show which threads are currently in process within the jvm. zmthrdump is useful when you want to grab several thread dumps in short intervals in order to track and look for locked threads. Typically, the thread dumps are useful for tracking what is waiting on what in the thread stack. If the server is also experiencing GC [garbage collection] issue, it is generally better to just have a full zmdiaglog (with valid java memory heap dump) along with the rest of the data it collects.


  • zmthrdump help
    • zmthrdump [-h] [-i] [-t <timeout seconds>] [-p <pid file>] [-f <file>] [-o <out-file>]

zmmailboxdmgr And Threaddump - Prior to 5.0.10

Run this command as root

for i in 1 2 3; do /opt/zimbra/libexec/zmmailboxdmgr threaddump; sleep 30; done

/opt/zimbra/log/zmmailboxd.out will reflect the thread dumps.

Thread Dump Script Example 1

Usage of the below script:

$0 FILENAME -dl [-frames #_ stack_frames] [-id REGEXP] [-all]
$0 FILENAME -dt [-frames #_ stack_frames] [-sort id|state] [-stack REGEXP] [-id REGEXP] [-state REGEXP] [-waiting NUMBER]
frames: controls the # lines of stack trace included in the output
id: only include where the id matches REGEXP
all: include ALL locks (not just ones that other threads are blocked-on)
sort: controls the order threads are printed out (locks always printed in lock ID order)
stack: only include threads where the thread's stack output matches REGEXP
state: only include threads where the thread state (e.g. RUNNABLE) matches REGEXP
waiting: only inclure threads are blocking other threads


$0 threads.txt -dt -stack MailboxIndex.java -state RUNNABLE -f 20
-- dumps all RUNNABLE threads with MailboxIndex.java on the stack (1st 20 lines of the stack)
$0 threads.txt -dl -f 0
-- dumps the list of locks in the system that are blocking at least one thread
$0 threads.txt -dt -f 0 -sort state
-- dumps a list of all the threads in the system and tells you their run state
$0 threads.txt -dt -f 20 -w 1
-- dumps a list of all the threads in the system that are blocking other threads (quick way to hunt for contention)

Working Example:

threadDumpAnalyzer.pl zmmailboxd.out -dl -f 100 ~/Desktop
threadDumpAnalyzer.pl Script
#!/usr/bin/perl -w
# Zimbra Collaboration Suite Server
# Copyright (C) 2004, 2005, 2006, 2007 Zimbra, Inc.
# The contents of this file are subject to the Yahoo! Public License
# Version 1.0 ("License"); you may not use this file except in
# compliance with the License.  You may obtain a copy of the License at
# http://www.zimbra.com/license.
# Software distributed under the License is distributed on an "AS IS"
# basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.
# ***** END LICENSE BLOCK *****
use strict;
use warnings;
use Getopt::Long;

my %locks;                      # by LockID - hash w/ 'owner','type'
my %threads;           # by threadId - hash w/ 'state' 'waitingOnLock'

my $filename = $ARGV[0];
my ($dumpLocks, $dumpThreads, $searchThreadStack, $searchThreadId, $stackFrames, $sort, $filterByState, $allLocks);
my ($waiting);

$stackFrames = 10;
$sort = "state";

           "dl" => \$dumpLocks,
           "dt" => \$dumpThreads,
           "frames=s" => \$stackFrames,
           "sort=s" => \$sort,
           "id=s" => \$searchThreadId,
           "state=s" => \$filterByState,
           "stack=s" => \$searchThreadStack,
           "waiting=s" => \$waiting,
           "all" => \$allLocks,

sub dumpLocks();
sub handleThread();
sub dumpThreads();
sub usage();
sub readFile($);
sub getBlockedThreads($);

if (!defined $filename) {


if (defined $dumpLocks) {
} elsif (defined $dumpThreads) {
} else {


sub usage() {
  my $usage = <<END_OF_USAGE;
    $0 FILENAME -dl [-frames #_ stack_frames] [-id REGEXP] [-all]
    $0 FILENAME -dt [-frames #_ stack_frames] [-sort id|state] [-stack REGEXP] [-id REGEXP] [-state REGEXP] [-waiting NUMBER]

    frames:  controls the # lines of stack trace included in the output
    id:      only include where the id matches REGEXP

    all:     include ALL locks (not just ones that other threads are blocked-on)

    sort:    controls the order threads are printed out (locks always printed in lock ID order)
    stack:   only include threads where the thread's stack output matches REGEXP
    state:   only include threads where the thread state (e.g. RUNNABLE) matches REGEXP
    waiting: only inclure threads are blocking other threads


    $0 threads.txt -dt -stack MailboxIndex.java -state RUNNABLE -f 20
        -- dumps all RUNNABLE threads with MailboxIndex.java on the stack (1st 20 lines of the stack)

    $0 threads.txt -dl -f 0
        -- dumps the list of locks in the system that are blocking at least one thread

    $0 threads.txt -dt -f 0 -sort state
        -- dumps a list of all the threads in the system and tells you their run state

    $0 threads.txt -dt -f 20 -w 1
        -- dumps a list of all the threads in the system that are blocking other threads (quick way to hunt for contention)

  die $usage;

sub mySort($$) {
  my ($a, $b) = @_;
  if ($sort eq "state") {
    my $state1 = $threads{$a}{state};
    my $state2 = $threads{$b}{state};
    if (!defined $state1) {
      return 1;
    } elsif (!defined $state2) {
      return -1;
    return $state1 cmp $state2;
  } else {
    return $a cmp $b;

sub padToWidth($$) {
  my ($str, $width) = @_;
  if (!defined($str)) {
    $str = "";
  return sprintf "%-*s", $width, $str;

sub formatStackTrace($$) {
  my ($stack, $indent) = @_;
  my $ret;
  if ($stackFrames == 0) { return ""; }
  my $num = $stackFrames;
  foreach my $line (split /\n/, $stack) {
    if ($num <= 0) {
      return $ret;
    $ret .= $indent.$line."\n";
  return $ret;

sub formatLock($) {
  my $lockId = shift;
  my $output = "$lockId - ";
  my $numData = 0;
  while (my ($k,$v) = each %{ $locks{$lockId}}) {
    $output .= " $k=$v";
  if ($numData == 0) { $output .= " UNKNOWN"; }
  return $output;

sub formatThread($) {
  my $threadId = shift;
  if (!defined $threadId) { return ""; }
  my $foo = padToWidth($threadId, 50);
  if (!defined($foo) || $foo eq "") {
    $foo = "ASDF";
  my $bar = $threads{$threadId}{state};
  if (!defined($bar) || $bar eq "") {
    $bar = "HJKL";
  my $ret = $foo.$bar."\n";
  if (defined $threads{$threadId}{waitingOnLock}) {
    $ret .= "\tWaiting for: ".formatLock($threads{$threadId}{waitingOnLock})."\n";

  my @blockedThreads = getBlockedThreads($threadId);
  for my $blockedThread (sort @blockedThreads) {
    $ret .= "\t$blockedThread is waiting on this thread\n";
  $ret .= formatStackTrace($threads{$threadId}{stack}, "\t  ");
  if ($stackFrames > 0) {
    $ret .= "\n";
  return $ret;

# given a lockId, return a list of the threads that are blocked on it
sub getLockWaiters($) {
  my $lockId = shift;
  my @ret;

  while (my ($threadId,$thread) = each %threads ) {
    if (defined $thread->{waitingOnLock} &&
        $thread->{waitingOnLock} eq $lockId) {
      push @ret, $threadId;
  return @ret;

sub getLockOwner($) {
  my $lockId = shift;
  if (defined $locks{$lockId} &&
      defined $locks{$lockId}{owner}) {
    return $locks{$lockId}{owner};
  } else {
    return "";

# given a threadId, get a list of all other threads that are blocked
# on locks it is holding
sub getBlockedThreads($) {
  my $threadId = shift;
  my @ret;
  foreach my $lockId ( sort keys %locks ) {
    if (getLockOwner($lockId) eq $threadId) { # a lock we own
      my @blockedThreads = getLockWaiters($lockId); 
      foreach my $blockedThread (@blockedThreads) {
        push @ret, $blockedThread;
        push @ret, getBlockedThreads($blockedThread); #recurse!
  return @ret;

sub dumpLocks() {
  foreach my $lockId ( sort keys %locks ) {
    my $ret = "";
    my $numWaiters = 0;
    $ret .=  "LOCK: $lockId   ";
    while (my ($k,$v) = each %{ $locks{$lockId}}) {
      $ret .=  "$k=$v, ";
    $ret .= "\n";
    while (my ($threadId,$thread) = each %threads ) {
      if (defined $thread->{waitingOnLock} &&
          $thread->{waitingOnLock} eq $lockId) {
        $ret .= "\tThread $threadId is waiting for this lock\n";
    $ret .= formatStackTrace($threads{getLockOwner($lockId)}{stack}, "\t");

    if (!defined $searchThreadId || $lockId =~ /$searchThreadId/) {
      if ($numWaiters > 0 || defined $allLocks) {
        if ($stackFrames > 0) {
          $ret .= "\n";
        print $ret;

sub dumpThreads() {
  foreach my $threadId ( sort { mySort($a, $b) } keys %threads ) {
    if (!defined $threadId) {
      # continue
    } elsif (defined $searchThreadStack && $threads{$threadId}{stack} !~ /$searchThreadStack/) {
      # continue
    } elsif (defined $searchThreadId && $threadId !~ /$searchThreadId/) {
      # continue
    } elsif (defined $filterByState && $threads{$threadId}{state} !~ /$filterByState/) {
      # continue
    } elsif (defined $waiting && getBlockedThreads($threadId) < $waiting) {
      # continue
    } else {
      print formatThread($threadId);

sub readFile($) {
  my $filename = shift;
  open IN, '<', $filename or die "couldn't open $filename";
  my @curThread;

  while (<>) {
    if ($_ eq "") {
      if (@curThread) {
        my $threadId;
        my @locksHeld;
        my $waitingOnLock;
        my $threadState;
        my $output;
        my $firstLineState;
        # 1stline
        my $line = shift @curThread;
        $output .= "$line\n";
        if ($line =~ /"(.*)"/) {
          $threadId = $1;
          if ($line =~/nid=0x[0-9a-f]+\s([a-zA-Z\s\.()]+)/) {
            $threads{$threadId}{state} = $1;

        } else {
          $threadId = $line;
        if ($threadId eq "") { $threadId = "none"; }
        # 2nd line
        $line = shift @curThread;
        if (defined $line) {
          $output .= "$line\n";
          if ($line =~ /State: ([A-Z_]+)/) {
            $threadState = $1;
            $threads{$threadId}{state} = $1;
          foreach $line (@curThread) {
            $output .= "$line\n";
            if ($line =~ /locked <(0x[0-9a-f]+)>\s?(.*)?/) {
              push @locksHeld, $1;
              $locks{$1}{owner} = $threadId;
              $locks{$1}{type} = $2;
            } elsif ($line =~ /- waiting to lock <(0x[0-9a-f]+)>/) {
              $waitingOnLock = $1;
              $threads{$threadId}{waitingOnLock} = $1;
            } elsif ($line =~ /- waiting to lock <(0x[0-9a-f]+)>/) {
              $waitingOnLock = $1;
              $threads{$threadId}{waitingOnLock} = $1;              
        } else {
          $threads{$threadId}{state} = "unknown";
        $threads{$threadId}{stack} = $output;
        @curThread = ();
    } else {
      push @curThread, $_;
  close IN;
Thread Dump Script Example 2

The following script generates five thread dumps within one minute:

# Dump 5 threads and proc stats for mailboxd Java PID in one minute.
# Daily output written to /tmp/zmperiodic-YYYYMMDD
# Execute the script with cron
# crontab: * * * * * /tmp/zmperiodic.sh

DUMPDIR="/tmp/zmperiodic-$(date '+%Y%m%d')"

if [ ! -d $DUMPDIR ]
mkdir $DUMPDIR

for ((i=0; i<5; i++))
echo "" > /opt/zimbra/log/zmmailboxd.out
STAMP=$(date '+%Y%m%d.%H%M%S')
JPID=$(cat /opt/zimbra/log/zmmailboxd_java.pid)
kill -3 $JPID
sleep 1

cp /opt/zimbra/log/zmmailboxd.out $DUMPDIR/zmmailboxd.out-$STAMP
cat /proc/$JPID/task/*/stat > $DUMPDIR/proc-stats-$STAMP

if [ $i -ne 4 ]
sleep 11

It should be placed in cron so that it runs every minute. The output data will be placed in a directory named /tmp/zmperiodic-YYYYMMDD. This data will consist of thread dumps along with thread data from /proc.

Heap Dump Specific Commands

jmap And JDK 1.6

Go here:



  • Java SE Development Kit (JDK)
  • JDK 6 Update 13

Choose Example Is Using x64 Binary - Adjust If Using x32:

  • Linux x64
  • jdk-6u13-linux-x64.bin *DO NOT choose RPM download*

Place jdk-6u13-linux-x64.bin in /tmp on the ZCS server.

Install JDK 1.6 as root. Executing the bin file prompt you to agree to a license agreement. The installer will create a directory named jdk1.6.0_13 and extract contents to that directory. This process does not interfere with ZCS.

cd /opt/zimbra
chown -R zimbra:zimbra jdk1.6.0_13

Next, stop ZCS services with zmcontrol, destroy the java symlink and symlink JDK 1.6 to java.

su - zimbra
zmcontrol stop
rm -rf java
ln -s jdk1.6.0_13 java
zmcontrol start

You can now run your zmdiaglog with greater efficiency. Alternatively, you can generate heap dumps manually with jmap (as zimbra this time).

su - zimbra -c '/opt/zimbra/java/bin/jmap -dump:live,file=/tmp/heapdump.live $(cat /opt/zimbra/log/zmmailboxd_java.pid)'

Rollback at any time by shutting down services with zmcontrol and pointing the java symlink back to jdk1.5.



Trouble Shooting Memory Issues

Basic Commands To Gather Data


Try adjusting the ps output:

  • ps -eo pid,ppid,rss,vsize,pcpu,pmem,cmd -ww --sort=pid
    • RSS (Resident Set Size) - The portion of a process that exists in physical memory (RAM). The rest of the program exists in swap. If the computer has not used swap, this number will be equal to VSIZE. RSS per unit time will show how much RAM the process is using over time.
    • VSIZE (Virtual memory SIZE) - The amount of memory the process is currently using. This includes the amount in RAM and the amount in swap. VSIZE per unit time will show how large the process is over time.


Using Free:

  • free -o
    • [ -o ] Disables the display of a "buffer adjusted" line. If the -o option is not specified, free subtracts buffer memory from the used memory and adds it to the free memory reported.
    • "Memory Used" per unit time will show the "Memory Used" asymptotically approach the total amount of memory in the system under heavy use. This is normal, as RAM unused is RAM wasted.
    • "Memory Used - Memory Buffered - Memory Cached" per unit time will give a good sense of the memory use of your applications minus the effects of your operating system. As you start new applications, this value should go up. As you quit applications, this value should go down. If an application has a severe memory leak, this line will have a positive slope.
    • "Swap Used" per unit time will display the swap usage. When the system is low on RAM, a program called kswapd will swap parts of process if they haven't been used for some time. If the amount of swap continues to climb at a steady rate, you may have a memory leak or you might need more RAM.

You could also use watch with free to poll it:

  • watch free
    • or
  • watch -n 1 -d free


Top is nice, put you want to save output. Do the [-b batch] and [-n number of iterations] variables:

top -b -n1 > /tmp/top-output.txt

One could add into cron if you want to get top output every 5 minutes like this for example [Would would need to confirm the directory structure and perms to write in it our correct]:

crontab -e
 ** add the following line
*/5 * * * * top -b -n1 > /opt/zimbra/support/data/tops/top-`date +\%F-\%H-\%M`.txt

This is a nice addition to include in the Ajcody-Logging#Like_To_Have_zmstat-chart_Data_Integrated_With_Zimbra setup. Where you would place the data underneath the root http directory.


Gives a good overview as well:

  • The process-related fields are:
    • r — The number of runnable processes waiting for access to the CPU
    • b — The number of processes in an uninterruptible sleep state
    • w — The number of processes swapped out, but runnable
  • The memory-related fields are:
    • swpd — The amount of virtual memory used
    • free — The amount of free memory
    • buff — The amount of memory used for buffers
    • cache — The amount of memory used as page cache
  • The swap-related fields are:
    • si — The amount of memory swapped in from disk
    • so — The amount of memory swapped out to disk
  • The I/O-related fields are:
    • bi — Blocks sent to a block device
    • bo— Blocks received from a block device
  • The system-related fields are:
    • in — The number of interrupts per second
    • cs — The number of context switches per second
  • The CPU-related fields are:
    • us — The percentage of the time the CPU ran user-level code
    • sy — The percentage of the time the CPU ran system-level code
    • id — The percentage of the time the CPU was idle

Polling data with vmstat:

  • vmstat 1 10
    • Displays one new line per second, but only for the next ten seconds


See below for description of the output for meminfo:

Featured Article: /proc/meminfo Explained

Do the following:

cat /proc/meminfo

Non-Zimbra Tools

Caution : None of this is supported by the Zimbra Support team. However it might prove useful, and depending on how much I can document these tools - who knows, maybe we can start using them more directly in the support process.

Debugging Tools

  • Opensourcetesting.org - General Site with News and Tools
  • MAT - Memory Analyzer Tool
    • MAT w/ Eclipse
      • http://www.eclipse.org/mat/
        • Adding this to the startup settings of zimbra
          • -XX:+HeapDumpOnOutOfMemoryError
          • -XX:HeapDumpPath=/opt/zimbra/log/dumps/
        • dumps out of mem heap. That file you can then load into MAT.
  • Linux /proc/sys/kernel/core_pattern
    • By setting /proc/sys/kernel/core_pattern, one can run a user-level program before a process dies to gather additional debugging information.
    • A pipe is opened and attached to your script. The kernel writes the coredump to the pipe. Your script can read it and write it to storage.
    • Your script can attach GDB, get a backtrace, run pstack, dump open file descriptors, etc.
    • For detailed explanation + script: http://timetobleed.com/an-obscure-kernel-feature-to-get-more-info-about-dying-processes/

Automate Testing / Load Testing

Personal tools