Ajcody-Testing-Debugging: Difference between revisions

Line 124: Line 124:


See [[Ajcody-Testing-Debugging#jmap_And_JDK_1.6]]
See [[Ajcody-Testing-Debugging#jmap_And_JDK_1.6]]
====Note About Memory And Linux====
In general, memory used by the JVM can occasionally spike above the maximum heap size under normal usage, but we would expect that it would go back down again eventually. Also, please note that all Linux systems will utilize all memory available on the system - this is because the kernel will cache any data it can in order for possible reuse; this is perfectly normal, and in the case that memory is needed, older cache will be deleted from memory and memory freed for other use. No Linux system will show much memory as "free" after being up for some period of time, for example when using top or vmstat. All memory will get utilized by the system, and this is a good thing for performance - you paid for the memory, might as well use it. Where it could be a problem is if one process grows uncontrolled, or if swapping is occurring.


===Thread Dump Specific Commands===
===Thread Dump Specific Commands===

Revision as of 16:31, 20 May 2009

Attention.png - This article is NOT official Zimbra documentation. It is a user contribution and may include unsupported customizations, references, suggestions, or information.

Testing & Debugging Tools

Actual Testing & Debugging Homepage

Please see Ajcody-Testing-Debugging

Various Errors And Log Items One Might See

Logging

See Ajcody-Logging for more complete logging information.

Connections And Or Threads Mentioned In mailbox.log

Tuning Variables Related To Threads And Connections

Resources to "tune" the below variables:

zimbraHttpNumThreads
  • ZCS 5.0 has a single thread pool for both HTTP and HTTPS
    • zmprov gs `zmhostname` zimbraHttpNumThreads
    • zmprov ms `zmhostname` zimbraHttpNumThreads 500
zimbraPop3NumThreads
  • This setting of 300 is able to support a few 10s of thousands of users checking mailing every 8 minutes.
    • zmprov gs `zmhostname` zimbraPop3NumThreads
    • zmprov ms `zmhostname` zimbraPop3NumThreads 300
zimbraImapNumThreads
  • IMAP thread settings.
    • zmprov gs `zmhostname` zimbraImapNumThreads
    • zmprov ms `zmhostname` zimbraImapNumThreads 500
zimbraLmtpNumThreads
  • LMTP thread settings.
    • zmprov gs `zmhostname` zimbraLmtpNumThreads
    • zmprov ms `zmhostname` zimbraLmtpNumThreads 40
mailboxd_java_options
  • JVM (java Virtual Machine options) ZCS 5.0 and later
    • zmlocalconfig mailboxd_java_options
    • zmlocalconfig mailboxd_java_options="-client -XX:NewRatio=2 -Djava.awt.headless=true -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1"
        • The above is using the "default" options as an example to modify.
    • Other various configurations for mailboxd_java_options
      • mailboxd_java_options="-client -XX:NewRatio=2 -XX:MaxPermSize=128m -Djava.awt.headless=true -XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseParallelGC"
      • mailboxd_java_options="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/some/directory/that/exists/and/is/zimbra/writable"
mailboxd_java_heap_memory_percent
  • JVM Cache setting
    • zmlocalconfig mailboxd_java_heap_memory_percent
    • zmlocalconfig -e mailboxd_java_heap_memory_percent=40
zimbraMessageCacheSize
  • Mailbox message cache. The mailbox server maintains a cache of message bodies (blobs on disk from /opt/zimbra/store). This cache speeds up retrieval of message content for mailapps such as mail.app
    • zmprov gs `zmhostname` zimbraMessageCacheSize
    • zmprov ms `zmhostname` zimbraMessageCacheSize 104857600
zimbra_mysql_connector_maxActive
  • zimbra_mysql_connector_maxActive
    • zmlocalconfig zimbra_mysql_connector_maxActive
    • zmlocalconfig -e zimbra_mysql_connector_maxActive=100

Example Of mailbox.log event:

dbconn - Connection pool is 75% utilized (88 connections out of a maximum of 100 in use).  
   Turn on debug logging for zimbra.dbconn to see stack traces of connections not returned to the pool.
zimbraXMPPEnabled

There is a potential problem with XMPP [Instant Messaging]. XMPP is a beta feature and is not supported. If you're not using XMPP anyways, disabling it is strongly recommended, and XMPP is a beta feature and not supported in a production environment today.

  • Confirm if variable is set to False or True
    • zmprov gacf zimbraXMPPEnabled
  • Set to FALSE to turn off
    • zmprov mcf zimbraXMPPEnabled FALSE
  • Then at some point restart the mailstores.

Bug reference:

Zimbra Tools And Commands

zmstat and zmstat-chart

Resources:

Down And Dirty Example:

zmstat-chart -d /path/to/dest -s /opt/zimbra/zmstat --title "<server> <date>"


zmdiaglog

zmdiaglog which grabs threaddumps, lsof, zmstats, and possibly a heap dump. Prior to 5.0.16, this tool may run a heap dump, which can be interruptive on a running server. From 5.0.16, a heap dump requires "zmdiaglog -a". Also, you can CTRL-C to exit out of the zmdiaglog during the heap dump process without affecting the rest of the dump. You will just need to compress the files manually.

As root:

/opt/zimbra/libexec/zmdiaglog

This will create a zip file with allot of relevant data for troubleshooting issues.

Using JDK 1.6 For zmdiaglog

See Ajcody-Testing-Debugging#jmap_And_JDK_1.6

Note About Memory And Linux

In general, memory used by the JVM can occasionally spike above the maximum heap size under normal usage, but we would expect that it would go back down again eventually. Also, please note that all Linux systems will utilize all memory available on the system - this is because the kernel will cache any data it can in order for possible reuse; this is perfectly normal, and in the case that memory is needed, older cache will be deleted from memory and memory freed for other use. No Linux system will show much memory as "free" after being up for some period of time, for example when using top or vmstat. All memory will get utilized by the system, and this is a good thing for performance - you paid for the memory, might as well use it. Where it could be a problem is if one process grows uncontrolled, or if swapping is occurring.

Thread Dump Specific Commands

zmthrdump - ZCS 5.0.10+

The thread dump from zmthrdump will show which threads are currently in process within the jvm. zmthrdump is useful when you want to grab several thread dumps in short intervals in order to track and look for locked threads. Typically, the thread dumps are useful for tracking what is waiting on what in the thread stack. If the server is also experiencing GC [garbage collection] issue, it is generally better to just have a full zmdiaglog (with valid java memory heap dump) along with the rest of the data it collects.

Resources:

  • zmthrdump help
    • zmthrdump [-h] [-i] [-t <timeout seconds>] [-p <pid file>] [-f <file>] [-o <out-file>]

zmmailboxdmgr And Threaddump - Prior to 5.0.10

Run this command as root

for i in 1 2 3; do /opt/zimbra/libexec/zmmailboxdmgr threaddump; sleep 30; done

/opt/zimbra/log/zmmailboxd.out will reflect the thread dumps.

A Better Way To Thread Dump?

Usage of the below script:

$0 FILENAME -dl [-frames #_ stack_frames] [-id REGEXP] [-all]
$0 FILENAME -dt [-frames #_ stack_frames] [-sort id|state] [-stack REGEXP] [-id REGEXP] [-state REGEXP] [-waiting NUMBER]
frames: controls the # lines of stack trace included in the output
id: only include where the id matches REGEXP
all: include ALL locks (not just ones that other threads are blocked-on)
sort: controls the order threads are printed out (locks always printed in lock ID order)
stack: only include threads where the thread's stack output matches REGEXP
state: only include threads where the thread state (e.g. RUNNABLE) matches REGEXP
waiting: only inclure threads are blocking other threads

Examples:

$0 threads.txt -dt -stack MailboxIndex.java -state RUNNABLE -f 20
-- dumps all RUNNABLE threads with MailboxIndex.java on the stack (1st 20 lines of the stack)
$0 threads.txt -dl -f 0
-- dumps the list of locks in the system that are blocking at least one thread
$0 threads.txt -dt -f 0 -sort state
-- dumps a list of all the threads in the system and tells you their run state
$0 threads.txt -dt -f 20 -w 1
-- dumps a list of all the threads in the system that are blocking other threads (quick way to hunt for contention)

Working Example:

threadDumpAnalyzer.pl zmmailboxd.out -dl -f 100 ~/Desktop
threadDumpAnalyzer.pl Script
#!/usr/bin/perl -w
# 
# ***** BEGIN LICENSE BLOCK *****
# 
# Zimbra Collaboration Suite Server
# Copyright (C) 2004, 2005, 2006, 2007 Zimbra, Inc.
# 
# The contents of this file are subject to the Yahoo! Public License
# Version 1.0 ("License"); you may not use this file except in
# compliance with the License.  You may obtain a copy of the License at
# http://www.zimbra.com/license.
# 
# Software distributed under the License is distributed on an "AS IS"
# basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.
# 
# ***** END LICENSE BLOCK *****
# 
use strict;
use warnings;
use Getopt::Long;

my %locks;                      # by LockID - hash w/ 'owner','type'
my %threads;           # by threadId - hash w/ 'state' 'waitingOnLock'

my $filename = $ARGV[0];
my ($dumpLocks, $dumpThreads, $searchThreadStack, $searchThreadId, $stackFrames, $sort, $filterByState, $allLocks);
my ($waiting);

$stackFrames = 10;
$sort = "state";

GetOptions(
           "dl" => \$dumpLocks,
           "dt" => \$dumpThreads,
           "frames=s" => \$stackFrames,
           "sort=s" => \$sort,
           "id=s" => \$searchThreadId,
           "state=s" => \$filterByState,
           "stack=s" => \$searchThreadStack,
           "waiting=s" => \$waiting,
           "all" => \$allLocks,
          );

sub dumpLocks();
sub handleThread();
sub dumpThreads();
sub usage();
sub readFile($);
sub getBlockedThreads($);

if (!defined $filename) {
  usage();
}

readFile($filename);

if (defined $dumpLocks) {
  dumpLocks();
} elsif (defined $dumpThreads) {
  dumpThreads();
} else {
  usage();
}

exit(0);

sub usage() {
  my $usage = <<END_OF_USAGE;
    
USAGE:
    $0 FILENAME -dl [-frames #_ stack_frames] [-id REGEXP] [-all]
    $0 FILENAME -dt [-frames #_ stack_frames] [-sort id|state] [-stack REGEXP] [-id REGEXP] [-state REGEXP] [-waiting NUMBER]

    frames:  controls the # lines of stack trace included in the output
    id:      only include where the id matches REGEXP

    all:     include ALL locks (not just ones that other threads are blocked-on)

    sort:    controls the order threads are printed out (locks always printed in lock ID order)
    stack:   only include threads where the thread's stack output matches REGEXP
    state:   only include threads where the thread state (e.g. RUNNABLE) matches REGEXP
    waiting: only inclure threads are blocking other threads

 Examples:

    $0 threads.txt -dt -stack MailboxIndex.java -state RUNNABLE -f 20
        -- dumps all RUNNABLE threads with MailboxIndex.java on the stack (1st 20 lines of the stack)

    $0 threads.txt -dl -f 0
        -- dumps the list of locks in the system that are blocking at least one thread

    $0 threads.txt -dt -f 0 -sort state
        -- dumps a list of all the threads in the system and tells you their run state

    $0 threads.txt -dt -f 20 -w 1
        -- dumps a list of all the threads in the system that are blocking other threads (quick way to hunt for contention)
    

END_OF_USAGE
  die $usage;
}

sub mySort($$) {
  my ($a, $b) = @_;
  if ($sort eq "state") {
    my $state1 = $threads{$a}{state};
    my $state2 = $threads{$b}{state};
    if (!defined $state1) {
      return 1;
    } elsif (!defined $state2) {
      return -1;
    }
    return $state1 cmp $state2;
  } else {
    return $a cmp $b;
  }
}

sub padToWidth($$) {
  (my $str, my $width) = @_;
  if (!defined($str)) {
    $str = "";
  }
  return sprintf "%-".$width.".".$width."s", $str;
}

sub formatStackTrace($$) {
  my ($stack, $indent) = @_;
  my $ret;
  if ($stackFrames == 0) { return ""; }
  my $num = $stackFrames;
  
  foreach my $line (split /\n/, $stack) {
    if ($num <= 0) {
      return $ret;
    }
    $ret .= $indent.$line."\n";
    $num--;
  }
  return $ret;
}

sub formatLock($) {
  my $lockId = shift;
  my $output = "$lockId - ";
  my $numData = 0;
  for my $data (keys %{ $locks{$lockId}}) {
    $output .= " $data=";
    $output .= $locks{$lockId}{$data};
    $numData++;
  }
  if ($numData == 0) { $output .= " UNKNOWN"; }
  return $output;
}

sub formatThread($) {
  my $threadId = shift;
  if (!defined $threadId) { return ""; }
  my $foo = padToWidth($threadId, 50);
  if (!defined($foo) || $foo eq "") {
    $foo = "ASDF";
  }
  my $bar = $threads{$threadId}{state};
  if (!defined($bar) || $bar eq "") {
    $bar = "HJKL";
  }
  my $ret = $foo.$bar."\n";
  if (defined $threads{$threadId}{waitingOnLock}) {
    $ret .= "\tWaiting for: ".formatLock($threads{$threadId}{waitingOnLock})."\n";
  }

  my @blockedThreads = getBlockedThreads($threadId);
  for my $blockedThread (sort @blockedThreads) {
    $ret .= "\t$blockedThread is waiting on this thread\n";
  }
  
  $ret .= formatStackTrace($threads{$threadId}{stack}, "\t  ");
  if ($stackFrames > 0) {
    $ret .= "\n";
  }
  return $ret;
}

# given a lockId, return a list of the threads that are blocked on it
sub getLockWaiters($) {
  my $lockId = shift;
  my @ret;

  foreach my $threadId ( keys %threads ) {
    if (defined $threads{$threadId}{waitingOnLock}) {
      if ($threads{$threadId}{waitingOnLock} eq $lockId) {
        push @ret, $threadId;
      }
    }
  }
  return @ret;
}

sub getLockOwner($) {
  my $lockId = shift();
  if (defined $locks{$lockId}) {
    if (defined $locks{$lockId}{owner}) {
      return $locks{$lockId}{owner};
    } else {
      return "";
    }
  } else {
    return "";
  }
}

# given a threadId, get a list of all other threads that are blocked
# on locks it is holding
sub getBlockedThreads($) {
  my $threadId = shift;
  my @ret;
  foreach my $lockId ( sort keys %locks ) {
    if (getLockOwner($lockId) eq $threadId) { # a lock we own
      my @blockedThreads = getLockWaiters($lockId); 
      foreach my $blockedThread (@blockedThreads) {
        push @ret, $blockedThread;
        push @ret, getBlockedThreads($blockedThread); #recurse!
      }
    }
  }
  return @ret;
}

sub dumpLocks() {
  foreach my $lockId ( sort keys %locks ) {
    my $ret = "";
    my $numWaiters = 0;
    $ret .=  "LOCK: $lockId   ";
    for my $data (keys %{ $locks{$lockId}}) {
      $ret .=  "$data=";
      $ret .=  $locks{$lockId}{$data};
      $ret .=  ", ";
    }
    $ret .= "\n";
    foreach my $threadId ( keys %threads ) {
      if (defined $threads{$threadId}{waitingOnLock}) {
        if ($threads{$threadId}{waitingOnLock} eq $lockId) {
          $ret .= "\tThread $threadId is waiting for this lock\n";
          $numWaiters++;
        }
      }
    }
    $ret .= formatStackTrace($threads{getLockOwner($lockId)}{stack}, "\t");

    if ((!defined $searchThreadId) || ($lockId =~ /$searchThreadId/)) {
      if ($numWaiters > 0 || defined $allLocks) {
        if ($stackFrames > 0) {
          $ret .= "\n";
        }
        print $ret;
      }
    }
  }
}

sub dumpThreads() {
  foreach my $threadId ( sort { mySort($a, $b) } keys %threads ) {
    if (!defined $threadId) {
      # continue
    } elsif (defined $searchThreadStack && !($threads{$threadId}{stack} =~ /$searchThreadStack/)) {
      # continue
    } elsif (defined $searchThreadId && !($threadId =~ /$searchThreadId/)) {
      # continue
    } elsif (defined $filterByState && !($threads{$threadId}{state} =~ /$filterByState/)) {
      # continue
    } elsif (defined $waiting && (getBlockedThreads($threadId) < $waiting)) {
      # continue
    } else {
      print formatThread($threadId);
    }
  }
}

sub readFile($) {
  my $filename = shift;
  
  open IN, "<$filename" or die "couldn't open $filename";
  
  my @curThread;

  while (<>) {
    chomp;
    s/\r//g;
    if ($_ eq "") {
      if (@curThread) {
        my $threadId;
        my @locksHeld;
        my $waitingOnLock;
        my $threadState;
        my $output;
        
        my $firstLineState;
        
        # 1stline
        my $line = shift @curThread;
        $output .= $line."\n";
        if ($line =~ /"(.*)"/) {
          $threadId = $1;
          if ($line =~/nid=0x[0-9a-f]+\s([a-zA-Z\s\.()]+)/) {
            $threads{$threadId}{state} = $1;
          }

        } else {
          $threadId = $line;
        }
        if ($threadId eq "") { $threadId = "none"; }
        
        # 2nd line
        $line = shift @curThread;
        if (defined $line) {
          $output .= $line."\n";
          if ($line =~ /State: ([A-Z_]+)/) {
            $threadState = $1;
            $threads{$threadId}{state} = $1;
          }
          
          foreach $line (@curThread) {
            $output .= $line."\n";
            if ($line =~ /locked <(0x[0-9a-f]+)>\s?(.*)?/) {
              push @locksHeld, $1;
              
              $locks{$1}{owner} = $threadId;
              $locks{$1}{type} = $2;
              
            } elsif ($line =~ /- waiting to lock <(0x[0-9a-f]+)>/) {
              $waitingOnLock = $1;
              $threads{$threadId}{waitingOnLock} = $1;
            } elsif ($line =~ /- waiting to lock <(0x[0-9a-f]+)>/) {
              $waitingOnLock = $1;
              $threads{$threadId}{waitingOnLock} = $1;              
            } 
          }
        } else {
          $threads{$threadId}{state} = "unknown";
        }
        
        $threads{$threadId}{stack} = $output;
        undef @curThread;
      }
    } else {
      push @curThread, $_;
    }
  }
  
  close IN;
}

Heap Dump Specific Commands

jmap And JDK 1.6

Go here:

http://java.sun.com/javase/downloads/index.jsp

Select:

  • Java SE Development Kit (JDK)
  • JDK 6 Update 13

Choose Example Is Using x64 Binary - Adjust If Using x32:

  • Linux x64
  • jdk-6u13-linux-x64.bin *DO NOT choose RPM download*

Place jdk-6u13-linux-x64.bin in /tmp on the ZCS server.

Install JDK 1.6 as root. Executing the bin file prompt you to agree to a license agreement. The installer will create a directory named jdk1.6.0_13 and extract contents to that directory. This process does not interfere with ZCS.

cd /opt/zimbra
/tmp/jdk-6u13-linux-x64.bin
chown -R zimbra:zimbra jdk1.6.0_13

Next, stop ZCS services with zmcontrol, destroy the java symlink and symlink JDK 1.6 to java.

su - zimbra
zmcontrol stop
exit
rm -rf java
ln -s jdk1.6.0_13 java
zmcontrol start

You can now run your zmdiaglog with greater efficiency. Alternatively, you can generate heap dumps manually with jmap (as zimbra this time).

su - zimbra -c '/opt/zimbra/java/bin/jmap -dump:live,file=/tmp/heapdump.live $(cat /opt/zimbra/log/zmmailboxd_java.pid)'

Rollback at any time by shutting down services with zmcontrol and pointing the java symlink back to jdk1.5.

zmdbintegrityreport

Resources:

Non-Zimbra Tools

Caution : None of this is supported or used directly when engaged with the support team. It might prove useful still the same though and depending on how much I can document around these tools - who know, maybe we can start using them more directly in the support process.

Debugging Tools

  • Opensourcetesting.org - General Site with News and Tools
  • MAT - Memory Analyzer Tool
    • MAT w/ Eclipse
      • http://www.eclipse.org/mat/
        • Adding this to the startup settings of zimbra
          • -XX:+HeapDumpOnOutOfMemoryError
          • -XX:HeapDumpPath=/opt/zimbra/log/dumps/
        • dumps out of mem heap. That file you can then load into MAT.

Automate Testing / Load Testing

Jump to: navigation, search