Performance Tuning Guidelines for Large Deployments

Revision as of 07:07, 31 January 2008 by Anand (talk | contribs) (Disk)

ZCS 4.5 Article ZCS 4.5


To achieve top ZCS performance, you may need to modify the operating system (OS) settings on your server and fine-tune ZCS after it is installed. Note: The Best Practices described in this guide are designed for large ZCS deployments, usually sites with more than 2000 users.

Hardware and Operating System

RAM and CPU

ZCS, like all messaging and collaboration systems, is an IO bound application. Three core components of ZCS (a) the Java based message store (aka mailbox server, tomcat in 4.5.x and jetty 5.0 onwards), (b) MySQL instances used for metadata, and (c) LDAP servers (master and replica) - all rely heavily on caching data in RAM to provide better performance and reduce disk IO. For all large installations we recommend at least 8GB of RAM. Our testing shows that a system with the same CPUs and disk is able to support more users when upgraded from 8GB to 16GB of RAM.

We recommend an x86_64 dual-dual core CPU, of a speed that is not too low or too high on the price/performance ratio. Disable hyper-threading if that feature is present in your CPU (performance monitoring data is unreliable). At this time, we have not tested on dual-quad cores (coming soon).

Almost all recent 'x86' server CPUs support installing a 32-bit/i686 version of Linux or a 64-bit/x86_64 version. We strongly recommend installing the 64-bit version if you have more than 4GB of RAM. If you have 4GB of RAM or less, it is unclear that the 64-bit version will boost performance, but you shouldn't really be running a large install on a system with < 8GB of RAM. If you anticipate adding more RAM in the near future during a maintenance window, do install the 64-bit version now - upgrade from 32-bit to 64-bit is possible, but a lot of work.

A word of caution is in order around 32-bit kernel and 8GB of RAM. In 32-bit mode, the CPU can address only 4GB of RAM even if you paid for 8GB worth of memory sticks, and, by default, a 32-bit Linux kernel only allows each process to address 2GB of space. Through PAE (Process Address Extension), a feature available in some CPUs, and a special 32-bit kernel that supports large address space for processes, it is possible to get a 32-bit mode kernel that really uses > 4GB of RAM, and get a per process 3-4GB address range. Please avoid this hassle. Given there is plenty of RAM, the CPU performs better in 64-bit mode, more CPU registers are available, there is no segment addressing overhead introduced by PAE, and you get a tested platform.

Disk

Zimbra mailbox servers are read/write intensive, and even with enough RAM/cache, the message store will generate a lot of disk activity. LDAP is read heavy and light on writes, is able to use caches a lot more effectively, and does not generate the type of disk activity that mailbox servers do.

In a mailbox server, the greatest source IO activity is generated by these three sources, in decreasing order of load generated:

  • Lucene search index managed by the Java mailbox process
  • MySQL instance that runs on each message store server, and stores metadata (folders, tags, flags, etc)
  • Blob store managed by the Java mailbox process

MySQL, Lucene and blob stores generate random IO and therefore have to be serviced by a fast disk subsystem. Below are some guidelines around selecting a disk system. Contact pre-sales support for more detailed guidance.

  • NO RAID5. RAID5 (and similar parity based RAID schemes) give you capacity, but take away IO performance. Do not believe any streaming file IO peak throughput numbers of RAID5 systems, and expect performance when storing a database.
  • NO NFS. It is our experience that the world is full of poor NFS implementations (server and client), and sometimes the disks backing that NFS mount are not performant to boot. Also note that many upstream OSS components of Zimbra (BDB, OpenLDAP, MySQL, Lucene) have or do discourage the use of NFS to store binary/mmaped data.
  • NO SATA. SATA drives are great for large capacities and you can even get models with MTBFs that match SCSI and FC drives, but they do not perform as well the 15KRPM or 10KRPM SCSI/FC options. ZCS Network Edition supports Hierarchical Storage Management (HSM) which can be used to store older message blobs on a slightly slower subsystem. You can consider SATA for the HSM destination volume, but make sure that the HSM destination is not so slow that that the HSM processes doesn't complete or takes too long.
  • Don't just size for capacity. Eg, 2 x 147 GB drives will perform better than 1 x 300 GB drive.
  • Use SANs. Best disk performance today still comes from large SANs with ton of cache (eg, 32 GB).
  • Use NVRAM. SANs use some non-volatile RAM to speed up disk writes perform better. In internal disk implementations, some SCSI controllers support a Battery Backup Unit (BBU) to provide this functionality.
  • NO Drive Caches. Make sure your disk system/controller disables write caching in the drives. Use of write caches at the drive drives could cause permanent and unrecoverable corruption if contents of these caches are lost in a power failure.

Services to Disable

Linux distributions tend to enable services by default that are not really required in a production ZCS server. We recommend identifying and disabling such services to reduce risk of exposure to vulnerabilities in services that are enabled but not really needed/used, and to avoid any unintended performance interference.

  • Use chkconfig --list to get information about services on your system at various boot/run levels.
  • Examine the output of ps -ef and make sure there are no processes running that shouldn't be.

The table below lists a few examples of services that may be installed by your Linux distribution that you might consider disabling. Use it as a guide - it is not an exhaustive or prescriptive list. Some services maybe required for the proper functioning of your system, so exercise caution when disabling services.

  • autofs, netfs: Services that make remote filesystem available.
  • cups: Print services.
  • xinetd, vsftpd: UNIX services (eg, telnet) that may not be required.
  • nfs, smb, nfslock: Services that export local filesystems to remote hosts.
  • portmap, rpcsvcgssd, rpcgssd, rpcidmapd: UNIX RPC services usually used in conjunction with network file systems.
  • dovecot, cyrus-imapd, sendmail, exim, postfix, ldap: Duplicates installed by the distro of functionality or packages provided by ZCS.

Services to Enable

Certain services are either required or useful when running ZCS:

  • sshd: Secure SHell remote login service is required by ZCS tools. Also used by administrator (ie, people) login to the server. Consider disabling root login and password authentication.</tt>
  • syslog: Handles logging of system events. On a multi-node install, designate a single/dedicated server for running syslog server. Logs are auto-rotated and will not fill your hard drive.
  • systat: System performance monitoring tools for Linux. Includes iostat, which is required by the ZCS zmstats service.
  • ntpd: Network Time Protocol server that adjusts for drifts in your system clock.
  • xfs: Font server for X Windows. Turning off xfs prevents the virtual X process from starting on the server. Do not run GUI desktops on your server. ZCS needs a virtual X processes for attachment conversions in the Network Edition, but that should be just one process and it is not started through the init/service mechanism.

Diagnostic Tools

Please install and be familiar with the use of atleast the following operating system monitoring tools.

  • lsof: Show files and network connections in use.
  • tcpdump: Sniff network traffic.
  • iostat: Monitor IO statistics. -x option is particularly useful.
  • vmstat: Monitor CPU/memory use.
  • pstack: Get stack trace from a running process (for a Java process a JVM generated thread dump is usually more interesting.)
  • strace: Trace systems calls.

Some of these tools are part of the procps and sysstat packages.

The zmstat service shipped since ZCS 4.5.9 requires atleast the IO/CPU/memory monitoring tools to record performance data periodically. It is a good idea to make sure all your servers have the 'zmstats' service enabled.

Open File Descriptors Limit

The mailbox server (specially the Lucene search index) might need to operate on a large number of files at the same time. The Zimbra installer modifies /etc/security/limits.conf to set the maximum number of file descriptors that the 'zimbra' UNIX user is allowed to concurrently open. Until ZCS 5.0.2, the installer used to set this limit to 10,000, and this wiki page used to advice that large installs modify this to 100,000.

As of ZCS 5.0.2, the installer sets the max file descriptor limit to 524,288 (2^19). See bug 23211 for details. Installations upgrading from earlier releases of ZCS should verify that their /etc/security/limits.conf contains the following lines after the upgrade to ZCS 5.0.2:

zimbra soft nofile 524288
zimbra hard nofile 524288

File System

We recommend the ext3 file system for Linux deployments (tried and true, performance for random IO is a wash, gains only in blob store for other file systems).

We suggest the following options as a guideline for when creating an ext3 filesystem with the mke2fs command. Do consult ext3 documentation. 'Caution: Running mke2Fs will wipe all data from the partition. Make sure that you create the file system in the correct partition. </font>

-j Create the file system with an ext3 journal.
-L SOME_LABEL Create a new volume label. Refer to the labels in /etc/fstab
-O dir_index Use hashed b-trees to speed up lookups in large directories.
-m 2 Only 2% needs to be reserved for root on large filesystems.
-i 10240 For message store, option -i should be the expected average message size. Estimate this conservatively, as no. of inodes can not be changed after creation.
-J size=400 Create a large journal.
-b 4096 Block size in bytes.
-R stride=16 Stride is used to tell the file system about the size of the RAID configuration. Stride * block size should be equal to RAID stripe size. For example 4k blocks, 128k RAID stripes would set stride=32.

Network Ports

Perform a portscan of your servers from a remote host and localhost (eg, use nmap). Only ports that you need to have open should be open.

The following ports are used by ZCS. If you have any other services running on these ports, turn them off.

Port ZCS Service
25 Postfix
80 HTTP
110 POP3
143 IMAP
389 LDAP
443 HTTPS
465 SMTP SSL (since ZCS 4.5.5)
993 IMAP SSL
995 POP3 SSL
7025 LMTP
7047 Conversion Server
7110 Backend POP3 (if proxy configured)
7143 Backend IMAP (if proxy configured)
7306 MySQL
7307 Logger MySQL
7993 Backend IMAP SSL (if proxy configured) (not used with nginx since 5.0?)
7995 Backend POP3 SSL (if proxy configured) (not used with nginx since 5.0?)
10024 amavisd-new
10025 Postfix answering amavisd-new

Network Stack

The Linux kernel makes TCP/IP network tunables available in the /proc/sys/net/ipv4directory. These files can be modified directly or with the sysctl command to make kernel configuration changes on the fly. But changes made this way do not persist across reboots. We recommend editing the file /etc/sysctl.conf and adding the settings below so they will be permanent. If you need your edits to sysctl.conf to take effect right away, use the sysctl -p option.

net.ipv4.tcp_fin_timeout=15
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1

The above settings allow for ZCS servers to handle a lot of short lived connections. When TCP/IP was designed, networks were lossy and had high latency. With today's modern networks, it is common practice to configure the above options so port numbers are not stuck in TIME_WAIT<tt> state. Various RFCs specify how long to wait, and even kernel docs caution you against changing the defaults, but it can pay to be aggressive. Documentation for these settings is in the kernel-doc package, in the file <tt>networking/ip-sysctl.txt.

Zimbra MTA

Postfix

Change the postfix LMTP concurrency setting in the ZCS postfix directory, /opt/zimbra/postfix, from 20 (default) using this formula: X=Greater of (20 or (20*MTA/MTB)). MTA is the number of MTA servers, MTB is the number of Mail Box servers. For example, if one had 3 MTA boxes and 2 MTB boxes, the result would be 20*3/2 = 30</font>

postconf -e lmtp_destination_concurrency_limit=30
postconf -e default_destination_concurrency_limit=30

Note: If you change this Postfix setting, you may need to set the zimbraLmtpNumThreads setting higher.

amavisd-new

IMPORTANT: Changes to Amavisd should only be made on dedicated MTA servers.

Change the following amavisd settings in /opt/zimbra/postfix/conf/master.cf.

smtp-amavis unix - - n - 20 smtp

Change the following amavisd settings in /opt/zimbra/conf/amavisd.conf.in.

$max_servers = 20

Zimbra Mailbox Server

Connection Handling

Each ZCS mailbox server is a HTTP, IMAP and POP3 server, rolled into one process. The server is highly multi-threaded and uses pool of threads to service incoming connections for these services. An important part of connection handling configuration is sizing these thread pools. Moderns JVMs and kernels are able to support a lot of threads (we have tested as high as 3000), however too many threads can cause memory pressure on the server.

These thread pool sizes can be configured on a per server basis. However, if you have or will have a multi-node install and all your servers will have a similar configuration, you can set the size in global config, so any new servers you add will get the right defaults for your environment. You can always override the global config setting on any server, by setting the value in the server object. We'll identify any attributes that can be set in both global config or server with a comment below, but show the example as modifying server. Use zmprov modifyServer to modify server attributes, and zmprov modifyConfig to modify global config.

  • HTTP. In ZCS 5.0, the HTTP stack used by the mailbox server is provided by Jetty (a Java application container). Earlier releases used Apache Tomcat. In both cases, a thread is dedicated to the servicing a HTTP request. Jetty also offers support for idle but long lived HTTP connections without a dedicated thread (see Zimbra blog article, link?). Since HTTP connections are not usually long lived, you must size the HTTP thread pool to accomodate concurrent connections at any instant during at the busiest time of the day for the server. In most cases, we have found 250 threads each for HTTP and HTTPS to be sufficient.
# ZCS 5.0 has a single thread pool for both HTTP and HTTPS
# global config or server OK
$ zmprov ms zimbraHttpNumThreads 500
# ZCS 4.5.x and earlier had distinct thread pools for HTTP and HTTPS
# global config or server  OK
$ zmprov ms zimbraHttpNumThreads 250
$ zmprov ms zimbraHttpSSLNumThreads 250

If HTTP, IMAP or POP3 clients are getting connection refused errors from the server, and if the server appears to be running OK, initiate a thread dump on the Java mailbox server process. You can do this by using either ~zimbra/libexec/zmtomcatmgr threaddump command pre-5.0, or ~zimbra/libexec/zmmailboxdmgr threaddump script in 5.0 or later. The thread dump should show what all the thread pool threads are doing. If they are just idling (usually blocked on a monitor in the thread, waiting), this is not a thread pool problem. If all threads are busy doing something else, then either (a) you have hit a bug where the process has wedged itself (hey we do find bugs), or (b) the threads are all busy doing disk IO. Report (a) to us, and for (b) consider better disks or adding RAM for your load or another server.

  • POP3. If POP3 service is refused or times out, increase the total number of assigned POP3 threads upward. The default is 20. It should be set to X=1.5*(Number of POP clients at peak connection time). For example, if there was a peak value of 200 POP connections at the busiest time of day, the setting would be 300.
zmprov ms <localservername> zimbraPop3NumThreads 300


  • IMAP.If IMAP service is refused or times out, increase the total number of assigned IMAP threads upward. The default is 200. There is one thread required per connection, and different IMAP clients create different number of connections, so it is difficult to estimate an exact number of connections. As a rule of thumb start with 4*(Number of IMAP users) and adjust upward if needed. Do not set it above 2000 threads -- even with a large amount of memory you run the risk of OOME if all threads are used.
zmprov ms <localservername> zimbraImapNumThreads 500
  • LMTP.If mail is backing up in the queue, adjust the LMTP threads upwards to handle a larger number of simultaneous connections from your MTA servers. The default is 10. For large volume sites, you may need to set this at 100. Sites with large deployments should make this change before going live. In general, the value for zimbraLmtpNumThreads should be 5 more than the value computed for postfix LMTP Concurrency.
zmprov ms <localservername> zimbraLmtpNumThreads '35
  • Add note about server restart. Add note about having to apply this on each server. Both these should be higher level comments, probably along with the global config vs server blurb.

Java Settings for Tomcat

Change the Java Heap Memory Percentage for Tomcat to 40

zmlocalconfig -e tomcat_java_heap_memory_percent=40

There are a number of Java options for Tomcat to be enabled. Enabling more than one must all be done in the same command, as each invocation overwrites the previous settings. The settings are -XX:+PrintGCDetails -XX:+PrintGCTimeStamps to enable verbose Garbage Collecting. These two options do not need to be set in Zimbra 5.0 or later. -XX:SoftRefLRUPolicyMSPerMB=1 to aggressively purge soft objects from the LRU cache. The default value is 10,000. -XX:+UseParallelGC to allow the Garbage Collector to run processes in parallel. This option should ONLY be enabled on multi-processor systems.

ZCS 4.5 or earlier: zmlocalconfig -e tomcat_java_options=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:SoftRefLRUPolicyMSPerMB=1
ZCS 5.0 or later: zmlocalconfig -e mailboxd_java_options=-XX:SoftRefLRUPolicyMSPerMB=1

MySQL

To improve performance the following may need to be modified.

Increase the maximum number of database connections. This should be set to 105 in /opt/zimbra/conf/my.cnf

max_connections = 105

Evaluate the MySQL memory usage. To do this run /opt/zimbra/bin/mysql -e "show innodb status" and search for the Buffer pool hit rate. If the hit rate is less than 995/1000 the MySQL buffer pool size needs to be increased in /opt/zimbra/conf/my.cnf. The following would increase the buffer pool size to 450 MB. It should never be set to a value higher than 60% of total memory.

innodb_buffer_pool_size = 471859200

Decrease the dirty pages percent for MySQL. This should be set to 10 in /opt/zimbra/conf/my.cnf

innodb_max_dirty_pages_pct = 10

Change the flush method for MySQL. This should be set to O_DIRECT in /opt/zimbra/conf/my.cnf

innodb_flush_method = O_DIRECT

The maximum active connectors should be increased to max_connections-5

zmlocalconfig -e zimbra_mysql_connector_maxActive=100

Changing Index settings

Some ZCS deployments may need to have the default index LRU and flush settings modified. With a small LRU (Least Recently Used) setting, you reduce Java heap consumption at the expense of increased IO writes to Lucene index directory/volume. If you have fast disks, you should tend toward smaller LRU. But if your disks are slow, increased index flushing can overwhelm the disks and the server becomes IO bound.

In short, configure a smaller LRU if memory is the bottleneck. Use a larger LRU if disk (for Lucene files) is the bottleneck.

Because many factors, such as amount of RAM, disk count/speed, load characteristics, etc, are involved in setting the index LRU size for best performance, we cannot recommend specific settings. Optimum value can only be determined through trial and error.

You would change the settings with the following commands

zmlocalconfig -e zimbra_index_lru_size=<number>
zmlocalconfig -e zimbra_index_idle_flush_time=<number>


Backup and Recovery

The Network Edition of ZCS includes full backup and restore functionality. When ZCS is installed, a backup schedule is automatically added to the cron table. You can change the schedule, but you should not disable it. Backing up the server on a regular basis can help you quickly restore your mail service if an unexpected crash occurs.

The default full backup is scheduled for 1:00 a.m. every Sunday and the default incremental backups are scheduled for 1:00 a.m. Monday through Saturday. Backups are stored in /opt/zimbra/backup. You will need to make sure that this backup is on a different disk and partition than your data and set up the process to automatically copy the zmbackups offsite or to a different machine or tape backup to minimize the possibility of unrecoverable data loss in the event that the backup disk fails.

Backup and restore is documented in the Administrator’s Guide and more information can be found on the Zimbra wiki.

Zimbra OpenLDAP Server

As a best practice, we recommend that you set up one LDAP replica for each MTA. The following settings will need to be on both the master LDAP server and the replica servers.

For peak performance, the following settings in the /opt/zimbra/conf/'slapd.conf.in' file may need to be modified.

Add a command to set the thread count to 8. The default is 16. Type the directives on the line above the pidfile... line

threads 8


Change the cachesize. The number set should be the number of configured active accounts and the number of configured active domains. The default is 10000. To find this command in slapd.conf.in, look for the following line and change the cachesize.

# number of entries to keep in memory
cachesize 50000

Set the idlcachesize. The number set should be the same as the cachesize setting. To find this command in slapd.conf.in, look for the cachesize parameter and add the idlcachesize setting.

idlcachesize 50000


Important: You must restart the LDAP server after you make these changes.

If you have more than 100 domains, we suggest adjusting the following localconfig LDAP cache settings:

ldap_cache_domain_maxsize. This sets the cache of the number of domains in the server. The default is 100. If more than 100 domains are configured, you should adjust this to the lower of the number of domains you have configured and 30,000. For example, with 45,000 domains, set as ldap_cache_domain_maxsize=30000.

zmlocalconfig -e ldap_cache_domain_maxsize=30000

Configuring the BDB subsystem to increase LDAP server performance

BDB is the underlying high-performance transactional database used to store the LDAP data. Proper configuration of this database is essential to maintaining a performant LDAP service. There are several parameters involved in tuning the underlying BDB database. This always involves editing the DB_CONFIG file. Modifications to the DB_CONFIG file require a restart of the LDAP server before they are picked up, and should be made to both master and replica servers.

You can increase LDAP server performance by adjusting the BDB backend cache size to be at or near the size of your data set. This is subject to the limit of 4 GB for 32 bit and 10 TB for 64 bit, and the amount of RAM you have. The size of the data set is the sum of the Berkeley DataBase (BDB) files in /opt/zimbra/openldap-data. To increase the cache size, add (or replace) the following line to the DB_CONFIG file in /opt/zimbra/openldap-data/. The following would set the database in-memory cachesize to 500MB.

set_cachesize 0 524288000 1

Note: The format for the set_cachesize command is <gigabytes> <bytes> <segments>

Note: On 32 bit systems, when setting cachesize greater than 2 GB, the cachesize must be split across multiple segments, such that no one segment is larger than 2 GB. For example, for 4 GB, to split across multiple segments, you would type

set_cachesize 4 0 2

It is possible to check that the cache setting has taken effect by using the /opt/zimbra/sleepycat/bin/db_stat -m -h /opt/zimbra/openldap-dataǀhead -n 11 command can be used to see the current cache setting, as well as to find other important information.

Example Output:
500MB Total cache size
1 Number of caches
500MB Pool individual cache size
0 Requested pages mapped into the process' address space.
3437M Requested pages found in the cache (100%).
526125 Requested pages not found in the cache.
47822 Pages created in the cache.
526125 Pages read into the cache.
27M Pages written from the cache to the backing file.
0 Clean pages forced from the cache.
0 Dirty pages forced from the cache.

The above output shows that there is a 500MB total cache in a single segment all of which is allocated to the cache pool. The other important data to evaluate are the Requested pages found in the cache, the Clean pages forced from the cache and the Dirty pages forced from the cache. For optimal performance, the Requested pages found in the cache should be above 95%, and the pages forced from the cache should be 0.

As part of the transaction interface, BDB uses a number of locks, lockers, and lock objects. The default value for each of these parameters is 1000. How many of each are being used depends on the number of entries and indices in the BDB database. The /opt/zimbra/sleepycat/bin/db_stat -c -h /opt/zimbra/openldap-dataǀhead -n 12 command can be used to determine current usage.

Example Output:
5634 Last allocated locker ID.
2147M Current maximum unused locker ID.
9 Number of lock modes.
3000 Maximum number of locks possible.
1500 Maximum number of lockers possible.
1500 Maximum number of lock objects possible.
93 Number of current locks.
1921 Maximum number of locks at any one time.
483 Number of current lockers.
485 Maximum number of lockers at any one time.
93 Number of current lock objects.
1011 Maximum number of lock objects at any one time.

The above output shows that there are a maximum of 3000 locks, 1500 lockers, and 1500 lock objects available to the BDB database located in /opt/zimbra/openldap-data. Of those settings, there are currently 93 locks in use, 483 lockers in use, and 93 lock objects in use. Over the course of the lifetime of the database, the highest recorded values have been 1921 locks used, 485 lockers used, and 1011 lock objects used. As long as usage is within 85% of the configured value(s), it should not be necessary to modify the settings.

The following entries in DB_CONFIG would increase the number of locks to 3000, the lock objects to 1500, and the lockers to 1500.

set_lk_max_locks 3000
set_lk_max_objects 1500
set_lk_max_lockers 1500

Monitoring

You can monitor the mail queues for delivery problems from the administration console, Monitoring Mail Queues page. To view the queues from the command line, as zimbra type sudo -/libexec/zmqstat.

You should install port monitoring software to monitor IMAP and POP3 performance.

Jump to: navigation, search