One of the coolest things about working with ZCS is that it exposes you to many technologies such as Java, Postfix, OpenLDAP, and MySQL. An administrator of a ZCS system should have a working knowledge of these technologies, in order to monitor system performance and solve performance problems.
The ZCS server collects many performance-related statistics. The data is stored in CSV files in /opt/zimbra/zmstat and its subdirectories:
- cpu.csv: CPU utilization
- fd.csv: file descriptor count
- mailboxd.csv: ZCS server and JVM statistics
- mtaqueue.csv: Postfix queue
- proc.csv: disk utilization
- soap.csv: SOAP request processing time
- threads.csv: JVM thread counts
- vm.csv: Linux VM statistics (from the vmstat command)
These files are in a standard CSV format that can be loaded into Excel for viewing and charting.
Zimbra provides a command-line utility called zmstat-chart that is used to generate charts from the CSV data. The following command:
$ zmstat-chart -s /opt/zimbra/2008-04-03 -d ~/charts
will read data from CSV files in /opt/zimbra/2008-04-03 and write HTML and PNG files to the ~/charts directory. Default chart parameters are specified in /opt/zimbra/conf/zmstat-chart.xml. An alternate chart conf file can optionally be specified with the -c option.
CPU utilization is tracked both at the server level and the process level. Here's a sample process CPU graph:
This chart shows how server CPU increases in the morning as users come to work and a spike at 9:00AM. To further investigate the problem, you could look at other charts or the server logs to determine what happened at 9:00AM to cause the heightened system load.
Disk utilization is tracked for each disk partition:
This chart shows that disk activity also increases along with the increased utilization shown in the CPU chart. It also shows that the sda partition is experiencing more load than the others. When laying out disk partitions for a ZCS installation, it's a good idea to put different system components (/opt/zimbra/store, /opt/zimbra/db, /opt/zimbra/index) on separate partitions. This makes it much easier to determine which system component is performing more disk access.
JVM Garbage Collection
ZCS tracks the percentage of time that the Java Virtual Machine spends on garbage collection:
If the JVM is spending more than a few percent of its time on garbage collection, consider increasing the amount of memory allocated to the server Java process.
InnoDB Buffer Pool Hit Rate
This chart tracks the buffer pool hit rate for the InnoDB storage engine in MySQL:
Higher numbers indicate that MySQL is able to get data from memory instead of going to disk. If your hit rate is below 990, MySQL is hitting the disk harder than it should. Investigate the following issues:
- Consider increasing the buffer pool size in my.cnf.
- Run EXPLAIN on some of the SQL statements in /opt/zimbra/log/myslow.log to see if they are causing InnoDB to read a large amount of data into memory.