Difference between revisions of "Performance Tuning Guidelines for Large Deployments"

(change max connections, fix java connections formula and value)
(<font size="3" color="#00007f" face="Arial">'''Java Settings for Tomcat'''</font>)
Line 377: Line 377:
 
{| width="480" cellpadding="8"
 
{| width="480" cellpadding="8"
 
| nowrap="nowrap" width="480" bgcolor="#F7F7FF" valign="top" |
 
| nowrap="nowrap" width="480" bgcolor="#F7F7FF" valign="top" |
<font size="3" color="#000000" face="&quot;Times New Roman&quot;">'''zmlocalconfig -e tomcat_java_heap_memory=''40'''''</font>
+
<font size="3" color="#000000" face="&quot;Times New Roman&quot;">'''zmlocalconfig -e tomcat_java_heap_memory_percent=''40'''''</font>
 
|}
 
|}
  

Revision as of 19:12, 29 June 2007

To achieve top ZCS performance, you may need to modify the operating system (OS) settings on your server and fine-tune ZCS after it is installed. Note: The Best Practices described in this guide are designed for large ZCS deployments, usually sites with more than 2000 users.

Operating System Configuration and Hardening

General Note about using 32-bit or 64-bit Operating System

Under a 32-bit Linux kernel, a process can have a maximum address space of 4GB. With PAE (Physical Address Extension), a 32-bit kernel can address more than 4GB, but this is not useful for ZCS because only 4GB is allowed per process.

By default for ZCS, only 2GB of user space is allowed in a 32-bit kernel. This limits performance in large deployments.

Therefore, Zimbra recommends 64-bit operating systems for sites with more than 2000 users so that both MySQL and JVM can be given larger memory to work with if needed. Zimbra recommends 64-bit operating systems be used on the LDAP servers for the increased RAM addressing ability.

OS packages

When you install the operating system, you should only install packages that are required. Zimbra highly recommends that you review the packages installed on the server and turn off those that you do not need. If you do not recognize a service, consult your Linux documentation.

To see a list of all services on your server, run chkconfig --list. Review the list and turn off or remove services you do not need, including NFS, RPC, and mail services.

The following services must be OFF

autofs

AutoFS-Automounter, this automatically mounts file systems on demand.

cups

Common Unix Printer System.

netfs

Used in support of exporting NFS shares.

rpcgssd, rpcidmapd

These are NFS daemons used for NFS and Samba.

sendmail

All mail services, such as exim, sendmail, system postfix, IMAP and Qmail should be off or removed.

The following services must be On

sshd

SSH daemon.

syslog

Handles logging of system events. Logs autorotate and will not fill your hard drive.

systat

System performance tool for Linux.

xfs

Font server for X Windows. Turning off xfs prevents the virtual X process from starting on the server.xfs Note. Zimbra does not recommend running GUI processes on server boxes. Zimbra does need a virtual X processes for attachment conversions in the Network version, but that should be just one process and it is not started through the init/service mechanism.

NFS Consideration

Network File Systems (NFS) should not be used with ZCS services, as it will greatly impact performance. For security, Zimbra recommends that NFS be turned off.

Increase maximum open file descriptors

For high load conditions, the default limit of 1024 file descriptors is insufficient. We recommend that the following two lines be added to /etc/security/limits.conf on all servers configured with ZCS software to increase the limit.

  • zimbra soft nofile 10000
  • zimbra hard nofile 10000

Avoid port conflicts

The following ports are used by ZCS. If you have any other services running on these ports, turn them off.

Port

Postfix

25

HTTP

80

POP3

110

IMAP

143

LDAP

389

HTTPS

443

Secure SMTP (Beginning with ZCS 4.5.5)

465

Tomcat IMAP SSL

993

Tomcat POP SSL

995

Tomcat LMTP

7025

convertd

7047

mysqld

7306

logger mysqld

7307

amavisd

10024

postfix answering amavis

10025

Install diagnostic tools

The following are recommended diagnostic tools.

lsof

Lists information about files opened by processes.

tcpdump

Prints out headers of packets on a network interface.

iostat

Used to monitor system input/output activities.

vmstat

Used to report virtual memory statistics.

pstack

Used to trace the thread status of a process and for deadlocks detection.

strace

Traces system calls and signals.

File System Tuning

We recommend the ext3 file system for Linux deployments. The following file system tuning is required for your mail servers and recommended for the other servers. Options and performance characteristics change all the time. Please read the latest ext3 documentation.

To create a file system, use mke2fs and the following arguments.

Note: Running mke2Fs will wipe out any files on a partition. Make sure that you create the file system in the correct partition.

-j

Create the file system with an ext3 journal.

-L SOME_LABEL

Create a new volume label. Refer to the labels in /etc/fstab.

-O dir_index

dir_index - Use hashed b-trees to speed up lookups in large directories.

-m 2

Only 2% needs to be reserved for root.

-i 10240

For the message store, -i should be the average message size. Specify 1024 inode for every 10K of data. The larger the bytes-per-inode ratio, the fewer inodes will be created. Note: It is not possible to expand the number of inodes on a file system after it is created.

-J size=400

Create a large journal.

-b 4096

Specifies the block size in bytes.

-R stride=16

The -R stride flag is used to tell the file system about the size of the RAID stripes. Knowing the size of a stripe allows mke2fs to allocate the block and inode bitmaps so that they do not all end up the same physical drive. Stride * block size should be equal to RAID stripe size. For example 4k blocks, 128k RAID stripes would set stride=32.

Important: Do not configure RAID5. RAID5 is not acceptable for use with ZCS in production environments.

Network Stack Tuning

TCP/IP configuration values are stored in the /proc/sys/net/ipv4 directory and typically accept a value or are turned on or off with "1" (on) or "0" (off).

For ZCS, the TCP default settings should be changed as described below.

Note: *-tw in the setting refers to “TIME_WAIT”. When a TCP connection ends without an explicit close, the OS puts the connection into TIME_WAIT state. Various RFCs specify how long to wait, but in practice, it can pay to be more aggressive.

net.ipv4.tcp_fin_timeout=15

net.ipv4.tcp_tw_reuse=1

net.ipv4.tcp_tw_recycle=1

These changes should be added to the runtime commands file, /etc/'sysctl.conf', so that the changes are enacted at each boot.

ZCS Tuning

For better performance change the following applications defaults. After you make these changes you will need to restart the server (zmcontrol stop; zmcontrol start).

Important: When you upgrade ZCS, you will need to make these changes again. Upgrading overrides your changes.

Zimbra MTA Settings

Postfix

Change the postfix LMTP concurrency setting in the ZCS postfix directory, /opt/zimbra/postfix, from 20 (default) using this formula: X=Greater of (20 or (20*MTA/MTB)). MTA is the number of MTA servers, MTB is the number of Mail Box servers. For example, if one had 3 MTA boxes and 2 MTB boxes, the result would be 20*3/2 = 30

postconf -e lmtp_destination_concurrency_limit=30
postconf -e default_destination_concurrency_limit=30

Note: If you change this Postfix setting, you may need to set the zimbraLmtpNumThreads setting higher.

Amavisd

IMPORTANT: Changes to Amavisd should only be made on dedicated MTA servers.

Change the following amavisd settings in /opt/zimbra/postfix/conf/master.cf.

smtp-amavis unix - - n - 20 smtp

Change the following amavisd settings in /opt/zimbra/conf/amavisd.conf.in.

$max_servers = 20

Zimbra Mailbox Server Settings

Tomcat Settings

To improve performance the following may need to be modified. These settings can be set on both the global and server level. You should change the settings on the server.

POP3 threads. If POP3 service is refused or times out, increase the total number of assigned POP3 threads upward. The default is 20. It should be set to X=1.5*(Number of POP clients at peak connection time). For example, if there was a peak value of 200 POP connections at the busiest time of day, the setting would be 300.

zmprov ms <localservername> zimbraPop3NumThreads 300

IMAP threads. If IMAP service is refused or times out, increase the total number of assigned IMAP threads upward. The default is 200. It should be set to X=10*(Number of IMAP clients at peak connection time). For example, if there was a peak value of 50 IMAP connections at the busiest time of day, the setting would be 500.

zmprov ms <localservername> zimbraImapNumThreads 500

LMTP threads. If mail is backing up in the queue, adjust the LMTP threads upwards to handle a larger number of simultaneous connections from your MTA servers. The default is 10. For large volume sites, you may need to set this at 100. Sites with large deployments should make this change before going live. In general, the value for zimbraLmtpNumThreads should be 5 more than the value computed for postfix LMTP Concurrency.

zmprov ms <localservername> zimbraLmtpNumThreads 35

Note: If the system is running out of memory and crashing, specifically, this means tomcat is running out of memory.

Important: You must stop and restart the Zimbra server after making these changes.

Java Settings for Tomcat

Change the Java Heap Memory Percentage for Tomcat to 40

zmlocalconfig -e tomcat_java_heap_memory_percent=40

There are a number of Java options for Tomcat to be enabled. Enabling more than one must all be done in the same command, as each invocation overwrites the previous settings. The settings are -XX:+PrintGCDetails -XX:+PrintGCTimeStamps to enable verbose Garbage Collecting. These two options do not need to be set in Zimbra 5.0 or later. -XX:SoftRefLRUPolicyMSPerMB=1 to aggressively purge soft objects from the LRU cache. The default value is 10,000. -XX:+UseParallelGC to allow the Garbage Collector to run processes in parallel. This option should ONLY be enabled on multi-processor systems.

ZCS 4.5 or earlier: zmlocalconfig -e tomcat_java_options=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:SoftRefLRUPolicyMSPerMB=1
ZCS 5.0 or later: zmlocalconfig -e mailboxd_java_options=-XX:SoftRefLRUPolicyMSPerMB=1

MySQL settings

To improve performance the following may need to be modified.

Increase the maximum number of database connections. This should be set to 105 in /opt/zimbra/conf/my.cnf

max_connections = 105

Evaluate the MySQL memory usage. To do this run /opt/zimbra/bin/mysql -e "show innodb status" and search for the Buffer pool hit rate. If the hit rate is less than 995/1000 the MySQL buffer pool size needs to be increased in /opt/zimbra/conf/my.cnf. The following would increase the buffer pool size to 450 MB. It should never be set to a value higher than 60% of total memory.

innodb_buffer_pool_size = 471859200

Decrease the dirty pages percent for MySQL. This should be set to 10 in /opt/zimbra/conf/my.cnf

innodb_max_dirty_pages_pct = 10

Change the flush method for MySQL. This should be set to O_DIRECT in /opt/zimbra/conf/my.cnf

innodb_flush_method = O_DIRECT

The maximum active connectors should be increased to max_connections-5

zmlocalconfig -e zimbra_mysql_connector_maxActive=100

Zimbra LDAP Server Settings

As a best practice, we recommend that you set up one LDAP replica for each MTA. The following settings will need to be on both the master LDAP server and the replica servers.

For peak performance, the following settings in the /opt/zimbra/conf/'slapd.conf.in' file may need to be modified.

Add a command to set the thread count to 8. The default is 16. Type the directives on the line above the pidfile... line

threads 8

Add a command to set the idletimeout to 5. Idletimeout is disabled by default (that is it is set to 0).

idletimeout 5

Change the cachesize. The number set should be the number of configured active accounts and the number of configured active domains. The default is 10000. To find this command in slapd.conf.in, look for the following line and change the cachesize.

# number of entries to keep in memory
cachesize 50000

Set the idlcachesize. The number set should be the same as the cachesize setting. To find this command in slapd.conf.in, look for the cachesize parameter and add the idlcachesize setting.

idlcachesize 50000


Important: You must restart the LDAP server after you make these changes.

If you have more than 100 domains, we suggest adjusting the following localconfig LDAP cache settings:

ldap_cache_domain_maxsize. This sets the cache of the number of domains in the server. The default is 100. If more than 100 domains are configured, you should adjust this to the lower of the number of domains you have configured and 30,000. For example, with 45,000 domains, set as ldap_cache_domain_maxsize=30000.

zmlocalconfig -e ldap_cache_domain_maxsize=30000

Configuring the BDB subsystem to increase LDAP server performance

BDB is the underlying high-performance transactional database used to store the LDAP data. Proper configuration of this database is essential to maintaining a performant LDAP service. There are several parameters involved in tuning the underlying BDB database. This always involves editing the DB_CONFIG file. Modifications to the DB_CONFIG file require a restart of the LDAP server before they are picked up, and should be made to both master and replica servers.

You can increase LDAP server performance by adjusting the BDB backend cache size to be at or near the size of your data set. This is subject to the limit of 4 GB for 32 bit and 10 TB for 64 bit, and the amount of RAM you have. The size of the data set is the sum of the Berkeley DataBase (BDB) files in /opt/zimbra/openldap-data. To increase the cache size, add (or replace) the following line to the DB_CONFIG file in /opt/zimbra/openldap-data/. The following would set the database in-memory cachesize to 500MB.

set_cachesize 0 524288000 1

Note: The format for the set_cachesize command is <gigabytes> <bytes> <segments>

Note: On 32 bit systems, when setting cachesize greater than 2 GB, the cachesize must be split across multiple segments, such that no one segment is larger than 2 GB. For example, for 4 GB, to split across multiple segments, you would type

set_cachesize 4 0 2

It is possible to check that the cache setting has taken effect by using the /opt/zimbra/sleepycat/bin/db_stat -m -h /opt/zimbra/openldap-dataǀhead -n 11 command can be used to see the current cache setting, as well as to find other important information.

Example Output:
500MB Total cache size
1 Number of caches
500MB Pool individual cache size
0 Requested pages mapped into the process' address space.
3437M Requested pages found in the cache (100%).
526125 Requested pages not found in the cache.
47822 Pages created in the cache.
526125 Pages read into the cache.
27M Pages written from the cache to the backing file.
0 Clean pages forced from the cache.
0 Dirty pages forced from the cache.

The above output shows that there is a 500MB total cache in a single segment all of which is allocated to the cache pool. The other important data to evaluate are the Requested pages found in the cache, the Clean pages forced from the cache and the Dirty pages forced from the cache. For optimal performance, the Requested pages found in the cache should be above 95%, and the pages forced from the cache should be 0.

As part of the transaction interface, BDB uses a number of locks, lockers, and lock objects. The default value for each of these parameters is 1000. How many of each are being used depends on the number of entries and indices in the BDB database. The /opt/zimbra/sleepycat/bin/db_stat -c -h /opt/zimbra/openldap-dataǀhead -n 12 command can be used to determine current usage.

Example Output:
5634 Last allocated locker ID.
2147M Current maximum unused locker ID.
9 Number of lock modes.
3000 Maximum number of locks possible.
1500 Maximum number of lockers possible.
1500 Maximum number of lock objects possible.
93 Number of current locks.
1921 Maximum number of locks at any one time.
483 Number of current lockers.
485 Maximum number of lockers at any one time.
93 Number of current lock objects.
1011 Maximum number of lock objects at any one time.

The above output shows that there are a maximum of 3000 locks, 1500 lockers, and 1500 lock objects available to the BDB database located in /opt/zimbra/openldap-data. Of those settings, there are currently 93 locks in use, 483 lockers in use, and 93 lock objects in use. Over the course of the lifetime of the database, the highest recorded values have been 1921 locks used, 485 lockers used, and 1011 lock objects used. As long as usage is within 85% of the configured value(s), it should not be necessary to modify the settings.

The following entries in DB_CONFIG would increase the number of locks to 3000, the lock objects to 1500, and the lockers to 1500.

set_lk_max_locks 3000
set_lk_max_objects 1500
set_lk_max_lockers 1500

Changing Index settings

Some ZCS deployments may need to have the default index LRU and flush settings modified. With a small LRU (Least Recently Used) setting, you reduce Java heap consumption at the expense of increased IO writes to Lucene index directory/volume. If you have fast disks, you should tend toward smaller LRU. But if your disks are slow, increased index flushing can overwhelm the disks and the server becomes IO bound.

In short, configure a smaller LRU if memory is the bottleneck. Use a larger LRU if disk (for Lucene files) is the bottleneck.

Because many factors, such as amount of RAM, disk count/speed, load characteristics, etc, are involved in setting the index LRU size for best performance, we cannot recommend specific settings. Optimum value can only be determined through trial and error.

You would change the settings with the following commands

zmlocalconfig -e zimbra_index_lru_size=<number>
zmlocalconfig -e zimbra_index_idle_flush_time=<number>

Monitoring

You can monitor the mail queues for delivery problems from the administration console, Monitoring Mail Queues page. To view the queues from the command line, as zimbra type sudo -/libexec/zmqstat.

You should install port monitoring software to monitor IMAP and POP3 performance.

Backup and Recovery

The Network Edition of ZCS includes full backup and restore functionality. When ZCS is installed, a backup schedule is automatically added to the cron table. You can change the schedule, but you should not disable it. Backing up the server on a regular basis can help you quickly restore your mail service if an unexpected crash occurs.

The default full backup is scheduled for 1:00 a.m. every Sunday and the default incremental backups are scheduled for 1:00 a.m. Monday through Saturday. Backups are stored in /opt/zimbra/backup. You will need to make sure that this backup is on a different disk and partition than your data and set up the process to automatically copy the zmbackups offsite or to a different machine or tape backup to minimize the possibility of unrecoverable data loss in the event that the backup disk fails.

Backup and restore is documented in the Administrator’s Guide and more information can be found on the Zimbra wiki.

--------------------------------------------------------------------------------

.7

Jump to: navigation, search