Talk:Performance Tuning Guidelines for Large Deployments
From Zimbra :: Wiki
Contents |
Areas for Potential Enhancements
Disk selection
Section should be updated to mention SAS and Nearline SAS disks. Very few people use SCSI/FC disks anymore.
General comments on memory allocation
In the "JVM Options" sections we say, "The mailbox server Java process uses caches extensively. By default, we try to reserve 30% of system memory for use by this process, and 40% for use by MySQL. On systems with > 4GB of memory, you can reserve even more for us by the JVM without risking starving the OS."
In the "MySQL" section we say, "The amount of memory you assign to MySQL and JVM together should not exceed 80% of system memory, and should be lower if you are running other services on the system as well."
The message seems unclear, and the information is scattered. Perhaps we should provide some guidance with context on both allocations in the same place/area and/or cross reference the info. When you combine the potential for over allocation with vm.swappiness=0 the combination seems pretty deadly. Perhaps more importantly, unless the JVM tuning becomes better for large memory allocations there should be some caveats about allocating more then X (4?) GB of memory to the JVM. Since the JVM allocation is a percentage of total system memory it would be great to provide clearer guidance on large (>8GB?) memory systems
From L. Mark Stone :: 3 October 2010:
We have found from testing that setting vm.swappiness=10 for us works best with systems having 12GB or more. The backups process uses a lot of RAM, as do some operating system update utilities (like YaST on SLES), so allowing some easy swapping is not a bad idea, especially if the system has been modified to allocate more RAM than standard to Zimbra. You can always run as root swapoff -a; swapon -a if needed to clear out the swap file safely while the system is running.. On voluminous mailbox servers, we allocate more RAM first to the InnoDB buffer pool to ensure the buffer pool size is bigger than the InnoDB database size. Only then do we allocate more RAM to Java. We also have found that more than six cores doesn't help much, and too many cores just results in lots of context switching with no noticeable performance gain.
Connection handling: LMTP
If mail is backing up in the queue for local delivery, you may want to adjust the LMTP threads upwards to handle a larger number of simultaneous connections from your MTA servers. For larger volume sites, you may want to set this at 50, 100 or potentially higher. Sites with large deployments can tune this incrementally higher to meet typical usage demands but not so high to overwhelm the server under extreme loads.
JVM Options
Notes or recommendations on Concurrent Mark GC yet?
WARNING
-XX:ErrorFile does not work on ZCS 5.0.18: Unrecognized VM option 'ErrorFile=/opt/zimbra/log'
Other Topics
IO_DIRECT for MySQL can be detrimental on SANs (plus Linus on O_DIRECT)
Note: section Performance_Tuning_Guidelines_for_Large_Deployments#MySQL mentions O_DIRECT and Zimbra recommendations. If you are doing your own investigation here are some references to get you started:
See MySQL bug 21947, which says:
While on on a EMC SAN connected via Fibre Channel through a QLogic FC adapter (qla2300), for example, using O_DIRECT can slow down simple SELECTs 3 times.
Linus's take on O_DIRECT is amusing too:
The right way to do it is to just not use O_DIRECT. The whole notion of "direct IO" is totally braindamaged. Just say no.
--Csamuel 17:29, 17 February 2008 (PST)
EXT3, not RAID-5, may be the cause of performance issues
I would suggest that many of the problems attributed to RAID-5 might instead be related to ext3's poor performance under high multiple writer load.
On our CentOS Zimbra system we appear to have mitigated the painful impact of ext3 by remounting /opt and /tmp as ext2 instead.
We now see far better performance out of Zimbra than we ever had before and are contemplating using the CentOS XFS kernel modules to move to XFS for for those partitions to regain the benefits of a journalled file system.
We have seen similar issues when we tried to run a RHEL NFS server for our HPC clusters when you end up with huge load averages and I/O waits due to ext3 bottlenecking on kjournald. A paper presented at Linux.Conf.Au a few years ago indicated that the fact that ext3 kjournald is single threaded is responsible. Moving to XFS under Fedora on more limited hardware solved our NFS problems (we now run Debian with XFS).
--Csamuel 17:38, 17 February 2008 (PST)
problems with sysctl.conf settings
We had an issue after changing the sysctl.conf parameters suggested here.
net.ipv4.tcp_fin_timeout=15 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_tw_recycle=1
We had several linux clients that were unable to connect to the server after making this change.
- We had a similar issue. Symptom: clients in an office behind NAT could no longer connect to Zimbra, but clients in other locations could. I assumed the issue was related to lots of NATted connections being created and dropped, set tcp_tw_recycle back to 0 (while keeping suggested values for tcp_fin_timeout and tcp_tw_reuse) and the problem has not reoccurred.
- Vvardja 12:25, 15 June 2010 (UTC)
zimbraMessageCacheSize
As of Zimbra 6.0.x zimbraMessageCacheSize is a Number of Messages rather than a size in MB. The Maximum Number of Messages to store in the Cache is 10000. The default value is 2000. So trying
# can be set on global config or server zmprov ms zimbraMessageCacheSize 104857600
will result into setting the Message Cache to the maximum of 10000 Messages. Depending on your systems ressources this might not be your intention.
Open File Descriptors Limit for root
This tip is related to the bug "Bad file descriptor" SocketException causes system failure every several hours (solved in the post #81 #81) with errors like "java.net.SocketException: Bad file descriptor" and "com.zimbra.common.service.ServiceException: system failure: ZimbraLdapContext". I found it in an installation based in ZCS 6 and Ubuntu 10.04 amd64.
The Performance Tuning Guidelines for Large Deployments in the section Open File Descriptors Limit talks only about open the limits for the zimbra user but java used to crash for the user root too, so in the file /etc/security/limits.conf you should add:
root soft nofile 524288 root hard nofile 524288
And also we should the command "ulimit -n 524288" in root user .bashrc.
--Julian M 01:42, 31 May 2011 (UTC)
