Zmstats: Difference between revisions
(Adding category) |
Koichi kato (talk | contribs) No edit summary |
||
(14 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{BC|Community Sandbox}} | |||
__FORCETOC__ | |||
<div class="col-md-12 ibox-content"> | |||
=Zmstats= | |||
{{KB|{{Unsupported}}|{{ZCS 8.6}}|{{ZCS 5.0}}|}} | |||
{{WIP}} | |||
= Zmstats = | = Zmstats = | ||
Zmstats is how Zimbra exposes its performance metrics and statistics to the world. The information covers a wide array of data: disk usage, cpu utilization, java statistics, zimbra counters and beyond. | Zmstats is how Zimbra exposes its performance metrics and statistics to the world. The information covers a wide array of data: disk usage, cpu utilization, java statistics, zimbra counters and beyond. | ||
Line 7: | Line 13: | ||
* zmstatctl (all zmstat collection scripts are located in /opt/zimbra/libexec) | * zmstatctl (all zmstat collection scripts are located in /opt/zimbra/libexec) | ||
** zmstat-allprocs (since ZCS 6.0) | |||
** zmstat-convertd | |||
** zmstat-cpu | |||
** zmstat-df (since ZCS 6.0) | |||
** zmstat-fd | ** zmstat-fd | ||
** zmstat-io | ** zmstat-io | ||
** zmstat- | ** zmstat-ldap (since ZCS 8.0) | ||
** zmstat-mtaqueue | ** zmstat-mtaqueue | ||
** zmstat-mysql | ** zmstat-mysql | ||
** zmstat-nginx (since ZCS 6.0) | |||
** zmstat-proc | |||
** zmstat-vm | |||
* zmstat-chart | * zmstat-chart | ||
** zmstat-chart-config | ** zmstat-chart-config | ||
== Running zmstats == | |||
* On the server where the stats were produced, make sure that the zmstat-chart.xml is provided. When running [[zmdiaglog]], the zmstats and zmstat-chart-config are automatically produced] | |||
<pre> | |||
# su - zimbra | |||
$ zmstat-chart-config > /tmp/zmstat-chart-`zmhostname`.xml | |||
</pre> | |||
* Make a charts directory: | |||
<pre> | |||
$ mkdir ~/zmstat/2010-06-01/charts | |||
</pre> | |||
* Then produce the stats: | |||
<pre> | |||
$ zmstat-chart -c /tmp/zmstat-chart.xml -s ~/zmstat/2010-06-01 -d ~/zmstat/2010-06-01/charts | |||
</pre> | |||
== zmstatctl == | == zmstatctl == | ||
Line 53: | Line 80: | ||
See /opt/zimbra/conf/zmstat-chart.xml for more examples | See /opt/zimbra/conf/zmstat-chart.xml for more examples | ||
= Individual | = Individual csv files and their related counters = | ||
=== allprocs.csv (sinze ZCS 6.0) === | |||
Written by zmstat-allprocs<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: proc(5) man page. /proc/[pid]/stat, /pro/[pid]/io | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| process || process name | |||
|- | |||
| utime || total user time | |||
|- | |||
| stime || total system time | |||
|- | |||
| cputime || user + system time total | |||
|- | |||
| rchar || bytes read (*1) | |||
|- | |||
| wchar || bytes written (*1) | |||
|- | |||
| read_bytes || bytes read from disk (*1) | |||
|- | |||
| write_bytes || bytes written to disk (*1) | |||
|- | |||
| rss || resident-set-size memory usage (kiloBytes) | |||
|- | |||
| processes || number of processes | |||
|- | |||
| threads || number of threads | |||
|- | |||
|} | |||
(*1) Only if IO Accounting is enabled in the Linux kernel | |||
=== convertd.csv === | |||
Collects CPU statistics for the convertd process (NE only). | |||
Written by zmstat-convertd<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: proc(5) man page. /proc/[pid]/stat, /pro/[pid]/io | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| utime || user time for convertd | |||
|- | |||
| stime || system time for convertd | |||
|- | |||
| cputime || user + system time total | |||
|- | |||
| rchar || bytes read (*1) | |||
|- | |||
| wchar || bytes written (*1) | |||
|- | |||
| read_bytes || bytes read from disk (*1) | |||
|- | |||
| write_bytes || bytes written to disk (*1) | |||
|- | |||
| rss || resident-set-size memory usage (kiloBytes) | |||
|- | |||
| processes || number of processes | |||
|- | |||
| theards || number of threads | |||
|- | |||
|} | |||
(*1) Only if IO Accounting is enabled in the Linux kernel | |||
=== cpu.csv === | |||
Written by zmstat-cpu<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: proc(5) man page. /proc/stat | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| cpu:user || total user time | |||
|- | |||
| cpu:nice || total nice process time | |||
|- | |||
| cpu:sys || total system time | |||
|- | |||
| cpu:idle || total idle time | |||
|- | |||
| cpu:iowait || total time in iowait | |||
|- | |||
| cpu:irq || total time in irq | |||
|- | |||
| cpu:softirq || total time in softirq | |||
|- | |||
| cpu-N:XXX || same as above, but per individual core/cpu | |||
|- | |||
|} | |||
=== df.csv (since ZCS 6.0) === | |||
Captures disk usage | |||
Written by zmstat-df<br> | |||
Interval: LC.zmstat_disk_interval<br> | |||
Reference: df(1) man page | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| path || mount point | |||
|- | |||
| disk || device | |||
|- | |||
| disk_use || space used (kiloBytes) | |||
|- | |||
| disk_space || total space (kiloBytes) | |||
|- | |||
| disk_pct_used || percentage used | |||
|- | |||
|} | |||
=== | === fd.csv === | ||
Captures file descriptor usage on the system | Captures file descriptor usage on the system | ||
Written by zmstat-fd<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: proc(5) man page. /proc/sys/fs/file-nr, /proc/[pid]/fs/ | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| fd_count || current number of open file descriptors | |||
|- | |||
| mailboxd_fd_count || current number of open file descriptors by mailboxd | |||
|- | |||
|} | |||
=== imap.csv === | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| command || executed command | |||
|- | |||
| exec_count || number of executions | |||
|- | |||
| exec_ms_avg || average execution time | |||
|- | |||
|} | |||
=== io.csv and io-x.csv === | |||
Written by zmstat-io<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: iostat(1) man page | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| dev:tps || transactions per second | |||
|- | |||
| dev:kB_read/s || read rate | |||
|- | |||
| dev:kB_wrtn/s || write rate | |||
|- | |||
| dev:kB_read || bytes read | |||
|- | |||
| dev:kB_wrtn || bytes written | |||
|- | |||
| dev:rrqm/s || read requests merged per second queued to device | |||
|- | |||
| dev:wrqm/s || write requests merged per second queued to device | |||
|- | |||
| dev:r/s || reads per second | |||
|- | |||
| dev:w/s || writes per second | |||
|- | |||
| dev:rkB/s || read rate | |||
|- | |||
| dev:wkB/s || write rate | |||
|- | |||
| dev:avgrq-sz || average size (sectors) of requests | |||
|- | |||
| dev:avgqu-sz || average queue length | |||
|- | |||
| dev:await || average wait time for requests to be served | |||
|- | |||
| dev:svctm || average time to service requests | |||
|- | |||
| dev:%util || percentage of CPU time / bandwidth utilization of device | |||
|- | |||
|} | |||
=== ldap.csv (since ZCS 8.0) === | |||
<u>LDAP server</u> | |||
Written by zmstat-ldap<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: OpenLDAP Software 2.4 Administrator's Guide: Monitoring http://www.openldap.org/doc/admin24/monitoringslapd.html | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| abandon_ops || number of completed Abandon operations | |||
|- | |||
| add_ops || number of completed Add operations | |||
|- | |||
| bind_ops || number of completed Bind operations | |||
|- | |||
| bytes_sent || bytes sent by the server | |||
|- | |||
| compare_ops || number of completed Compare operations | |||
|- | |||
| completed_ops || See [http://bugzilla.zimbra.com/show_bug.cgi?id=100731 Bug 100731] | |||
|- | |||
| connections || number of connections | |||
|- | |||
| delete_ops || number of completed Delete operations | |||
|- | |||
| entries_sent || entries sent by the server | |||
|- | |||
| extended_ops || number of completed Extended operations | |||
|- | |||
| initiated_ops || See [http://bugzilla.zimbra.com/show_bug.cgi?id=100731 Bug 100731] | |||
|- | |||
| modify_ops || number of completed Modify operations | |||
|- | |||
| modrdn_ops || number of completed Modrdn operations | |||
|- | |||
| read_waiters || number of current read waiters | |||
|- | |||
| referrals_sent || referrals sent by the server | |||
|- | |||
| search_ops || number of completed Search operations | |||
|- | |||
| unbind_ops || number of completed Unbind operations | |||
|- | |||
| write_waiters || number of current write waiters | |||
|- | |||
|} | |||
<u>Mailbox server</u> | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| command || executed command | |||
|- | |||
| exec_count || number of executions | |||
|- | |||
| exec_ms_avg || average execution time | |||
|- | |||
|} | |||
= | Known bug: [http://bugzilla.zimbra.com/show_bug.cgi?id=99936 Bug 99936] - mailboxd overwrites ldap.csv generated by zmstat-ldap | ||
=== mailboxd.csv === | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| account_cache_hit_rate || LDAP account cache hit rate | |||
|- | |||
| account_cache_size || LDAP account cache size | |||
|- | |||
| acl_cache_hit_rate || LDAP ACL cache hit rate | |||
|- | |||
| bis_read || Number of times that the file descriptor cache read message data from disk | |||
|- | |||
| bis_seek_rate || Percentage of file descriptor cache disk reads that required a seek | |||
|- | |||
| calcache_hit || Hit rate of calendar summary cache, counting cache hit from both memory and file | |||
|- | |||
| calcache_lru_size || Number of calendars (folders) in the calendar summary cache LRU in Java heap | |||
|- | |||
| calcache_mem_hit || Hit rate of calendar summary cache, counting cache hit from memory only | |||
|- | |||
| cos_cache_hit_rate || LDAP COS cache hit rate | |||
|- | |||
| cos_cache_size || LDAP COS cache size | |||
|- | |||
| db_conn_count || Number of times that the server got a database connection from the pool | |||
|- | |||
| db_conn_ms_avg || Average latency (ms) of getting a database connection from the pool | |||
|- | |||
| db_pool_size || Number of database connections in use | |||
|- | |||
| domain_cache_hit_rate || LDAP domain cache hit rate | |||
|- | |||
| domain_cache_size || LDAP domain cache size | |||
|- | |||
| ews_syncstate_cache_hit_rate || EWS Syncstate Cache Hit Rate | |||
|- | |||
| ews_syncstate_cache_size || EWS Syncstate Cache Size | |||
|- | |||
| fd_cache_hit_rate || File descriptor cache hit rate | |||
|- | |||
| fd_cache_size || Number of open file descriptors that reference message content | |||
|- | |||
| gc_concurrentmarksweep_count || Number of times that concurrentmarksweep GC was invoked | |||
|- | |||
| gc_concurrentmarksweep_ms || Time (ms) spent on concurrentmarksweep GC | |||
|- | |||
| gc_major_count || Number of times that major GC was invoked | |||
|- | |||
| gc_major_ms || Time (ms) spent on major GC | |||
|- | |||
| gc_minor_count || Number of times that minor GC was invoked | |||
|- | |||
| gc_minor_ms || Time (ms) spent on minor GC | |||
|- | |||
| gc_parnew_count || Number of times that parnew GC was invoked | |||
|- | |||
| gc_parnew_ms || Time (ms) spent on parnew GC | |||
|- | |||
| group_cache_hit_rate || LDAP group cache hit rate | |||
|- | |||
| group_cache_size || LDAP group cache size | |||
|- | |||
| heap_free || Number of bytes free in the entire JVM heap | |||
|- | |||
| heap_used || Number of bytes used in the entire JVM heap | |||
|- | |||
| http_idle_threads || Number of HTTP idle threads | |||
|- | |||
| http_threads || Number of HTTP threads | |||
|- | |||
| idx_bytes_read || Accumulated bytes read by Lucene | |||
|- | |||
| idx_bytes_read_avg || Average of idx_bytes_read | |||
|- | |||
| idx_bytes_written || Accumulated bytes written by Lucene | |||
|- | |||
| idx_bytes_written_avg || Average of idx_bytes_written | |||
|- | |||
| idx_wrt_avg || Average number of concurrent index writers | |||
|- | |||
| idx_wrt_opened || Accumulated number of index writers opened | |||
|- | |||
| idx_wrt_opened_cache_hit || Accumulated number of cache hits when opening an index writer | |||
|- | |||
| imap_conn || Number of cleartext IMAP connections | |||
|- | |||
| imap_count || Number of IMAP requests received | |||
|- | |||
| imap_ms_avg || Average processing time (ms) of IMAP requests | |||
|- | |||
| imap_ssl_conn || Number of SSL IMAP connections | |||
|- | |||
| imap_ssl_threads || Number of SSL IMAP threads | |||
|- | |||
| imap_threads || Number of IMAP threads | |||
|- | |||
| innodb_bp_hit_rate || InnoDB buffer pool hit rate | |||
|- | |||
| ldap_dc_count || Number of times that the server got an LDAP directory context | |||
|- | |||
| ldap_dc_ms_avg || Average latency (ms) of getting an LDAP directory context | |||
|- | |||
| lmtp_conn || Number of LMTP connections | |||
|- | |||
| lmtp_dlvd_bytes || Number of bytes of data delivered to mailboxes as a result of LMTP delivery | |||
|- | |||
| lmtp_dlvd_msgs || Number of messages delivered to mailboxes as a result of LMTP delivery | |||
|- | |||
| lmtp_rcvd_bytes || Number of bytes received over LMTP | |||
|- | |||
| lmtp_rcvd_msgs || Number of messages received over LMTP | |||
|- | |||
| lmtp_rcvd_rcpt || Number of LMTP recipients | |||
|- | |||
| lmtp_threads || Number of LMTP threads | |||
|- | |||
| mbox_add_msg_count || Number of messages that were added to a mailbox | |||
|- | |||
| mbox_add_msg_ms_avg || Average latency (ms) of adding a message to a mailbox | |||
|- | |||
| mbox_cache || Mailbox cache hit rate | |||
|- | |||
| mbox_cache_size || Number of mailboxes cached in memory | |||
|- | |||
| mbox_get_count || Number of times that the server got a mailbox from the cache | |||
|- | |||
| mbox_get_ms_avg || Average latency (ms) of getting a mailbox from the cache | |||
|- | |||
| mbox_item_cache || Item cache hit rate | |||
|- | |||
| mbox_msg_cache || Message cache hit rate | |||
|- | |||
| mobile_ping_cache_hit_rate || mobile_ping_cache_hit_rate | |||
|- | |||
| mobile_ping_cache_size || mobile_ping_cache_size | |||
|- | |||
| mobile_syncstate_cache_hit_rate || mobile_syncstate_cache_hit_rate | |||
|- | |||
| mobile_syncstate_cache_size || mobile_syncstate_cache_size | |||
|- | |||
| mpool_cms_old_gen_free || mpool_cms_old_gen_free | |||
|- | |||
| mpool_cms_old_gen_used || mpool_cms_old_gen_used | |||
|- | |||
| mpool_code_cache_free || Number of bytes free in the code cache memory pool | |||
|- | |||
| mpool_code_cache_used || Number of bytes used in the code cache memory pool | |||
|- | |||
| mpool_compressed_class_space_free || mpool_compressed_class_space_free | |||
|- | |||
| mpool_compressed_class_space_used || mpool_compressed_class_space_used | |||
|- | |||
| mpool_metaspace_free || mpool_metaspace_free | |||
|- | |||
| mpool_metaspace_used || mpool_metaspace_used | |||
|- | |||
| mpool_par_eden_space_free || Number of bytes free in the eden space memory pool | |||
|- | |||
| mpool_par_eden_space_used || Number of bytes used in the eden space memory pool | |||
|- | |||
| mpool_par_survivor_space_free || Number of bytes free in the survivor space memory pool | |||
|- | |||
| mpool_par_survivor_space_used || Number of bytes used in the survivor space memory pool | |||
|- | |||
| msg_cache_size || Number of message structures cached in memory | |||
|- | |||
| pop_conn || Number of cleartext POP3 connections | |||
|- | |||
| pop_count || Number of POP3 requests received | |||
|- | |||
| pop_ms_avg || Average processing time (ms) of POP3 requests | |||
|- | |||
| pop_ssl_conn || Number of SSL POP3 connections | |||
|- | |||
| pop_ssl_threads || Number of SSL POP3 threads | |||
|- | |||
| pop_threads || Number of POP3 threads | |||
|- | |||
| server_cache_hit_rate || LDAP server cache hit rate | |||
|- | |||
| server_cache_size || LDAP server cache size | |||
|- | |||
| soap_count || Number of SOAP requests received | |||
|- | |||
| soap_ms_avg || Average processing time (ms) of SOAP requests | |||
|- | |||
| soap_sessions || Number of SOAP sessions | |||
|- | |||
| timestamp || Time when sample was collected | |||
|- | |||
| ucservice_cache_hit_rate || ucservice_cache_hit_rate | |||
|- | |||
| ucservice_cache_size || ucservice_cache_size | |||
|- | |||
| xmpp_cache_hit_rate || LDAP XMPP cache hit rate | |||
|- | |||
| xmpp_cache_size || LDAP XMPP cache size | |||
|- | |||
| zimlet_cache_hit_rate || LDAP zimlet cache hit rate | |||
|- | |||
| zimlet_cache_size || LDAP zimlet cache size | |||
|- | |||
|} | |||
=== mtaqueue.csv === | |||
Written by zmstat-mtaqueue<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: postqueue(1) man page http://www.postfix.org/postqueue.1.html | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| KBytes || kilobytes queued by the mta | |||
|- | |||
| requests || number of items queued by the mta | |||
|- | |||
|} | |||
=== | === mysql.csv === | ||
Columns for mysql.csv are derived from the values of the query "SHOW GLOBAL STATUS". Refer to the mysql administration manual for further elaboration on the meanings of all its counters. | Columns for mysql.csv are derived from the values of the query "SHOW GLOBAL STATUS". Refer to the mysql administration manual for further elaboration on the meanings of all its counters. | ||
Written by zmstat-mysql<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: | |||
: Server Status Variables - MariaDB Knowledge Base https://mariadb.com/kb/en/mariadb/server-status-variables/ | |||
: MySQL :: MySQL 5.0 Reference Manual :: 5.1.6 Server Status Variables https://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html | |||
=== nginx.csv (sinze ZCS 6.0) === | |||
Written by zmstat-nginx<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: proc(5) man page. /proc/[pid]/stat, /proc/[pid]/io | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| utime || user time for convertd | |||
|- | |||
| stime || system time for convertd | |||
|- | |||
| cputime || user + system time total | |||
|- | |||
| rchar || bytes read (*1) | |||
|- | |||
| wchar || bytes written (*1) | |||
|- | |||
| read_bytes || bytes read from disk (*1) | |||
|- | |||
| write_bytes || bytes written to disk (*1) | |||
|- | |||
| rss || resident-set-size memory usage (kiloBytes) | |||
|- | |||
| processes || number of processes | |||
|- | |||
| threads || number of threads | |||
|- | |||
|} | |||
(*1) Only if IO Accounting is enabled in the Linux kernel | |||
=== pop3.csv === | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| command || executed command | |||
|- | |||
| exec_count || number of executions | |||
|- | |||
| exec_ms_avg || average execution time | |||
|- | |||
|} | |||
=== proc.csv === | |||
Written by zmstat-proc<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: proc(5) man page. /proc/stat, /proc/[pid]/stat, /proc/[pid]/statm | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| system || label | |||
|- | |||
| user || user time (percent) | |||
|- | |||
| sys || system time (percent) | |||
|- | |||
| idle || idle time (percent) | |||
|- | |||
| iowait || iowait time (percent) | |||
|- | |||
| PROC || PROC name | |||
|- | |||
| PROC-total-cpu || user + system time total for PROC | |||
|- | |||
| PROC-utime || user time for PROC | |||
|- | |||
| PROC-stime || system time for PROC | |||
|- | |||
| PROC-totalMB || total memory footprint for PROC | |||
|- | |||
| PROC-rssMB || resident-set-size of PROC | |||
|- | |||
| PROC-sharedMB || shared memory of PROC | |||
|- | |||
| PROC-process-count || number of threads/subprocesses | |||
|- | |||
|} | |||
=== soap.csv === | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| command || executed command | |||
|- | |||
| exec_count || number of executions | |||
|- | |||
| exec_ms_avg || average execution time | |||
|- | |||
|} | |||
=== sql.csv (since ZCS 8.5) === | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| command || executed command | |||
|- | |||
| exec_count || number of executions | |||
|- | |||
| exec_ms_avg || average execution time | |||
|- | |||
|} | |||
=== sync.csv (since ZCS 8.0) === | |||
Written by mailboxd<br> | |||
Interval: 1 minute | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| command || executed command | |||
|- | |||
| exec_count || number of executions | |||
|- | |||
| exec_ms_avg || average execution time | |||
|- | |||
|} | |||
=== threads.csv === | |||
Written by mailboxd<br> | |||
Interval: 1 minute<br> | |||
Relative configuration: zimbraStatThreadNamePrefix (since ZCS 6.0) | |||
{| | |||
| timestamp || Time when sample was collected | |||
|- | |||
| THREAD || number of threads | |||
|- | |||
| total || total number of threads | |||
|- | |||
|} | |||
=== vm.csv === | |||
The output of vmstat is recorded for reviewable statistics. | |||
Written by zmstat-vm<br> | |||
Interval: LC.zmstat_interval<br> | |||
Reference: | |||
: vmstat(8) man page | |||
: proc(5) man page. /proc/meminfo, /proc/loadavg | |||
vmstat | |||
* r | |||
* b | |||
* swpd | |||
* free | |||
* buff | |||
* cache | |||
* si | |||
* so | |||
* bi | |||
* bo | |||
* in | |||
* cs | |||
* us | |||
* sy | |||
* id | |||
* wa | |||
* st | |||
* other counters reported by vmstat | |||
proc | |||
* MemTotal | |||
* MemFree | |||
* Buffers | |||
* Cached | |||
* SwapCached | |||
* Active | |||
* Inactive | |||
* Active(anon) | |||
* Inactive(anon) | |||
* Active(file) | |||
* Inactive(file) | |||
* Unevictable | |||
* Mlocked | |||
* SwapTotal | |||
* SwapFree | |||
* Dirty | |||
* Writeback | |||
* AnonPages | |||
* Mapped | |||
* Shmem | |||
* Slab | |||
* SReclaimable | |||
* SUnreclaim | |||
* KernelStack | |||
* PageTables | |||
* NFS_Unstable | |||
* Bounce | |||
* WritebackTmp | |||
* CommitLimit | |||
* Committed_AS | |||
* VmallocTotal | |||
* VmallocUsed | |||
* VmallocChunk | |||
* HardwareCorrupted | |||
* AnonHugePages | |||
* HugePages_Total | |||
* HugePages_Free | |||
* HugePages_Rsvd | |||
* HugePages_Surp | |||
* Hugepagesize | |||
* DirectMap4k | |||
* DirectMap2M | |||
* DirectMap1G | |||
* Loadavg | |||
Related: [[Server Monitoring]] | Related: [[Server Monitoring]] | ||
Line 172: | Line 810: | ||
[[Category:Monitoring]] | [[Category:Monitoring]] | ||
[[Category:Command Line Interface]] | [[Category:Command Line Interface]] | ||
[[Category:ZCS 8.6]] | |||
[[Category:ZCS 8.5]] | |||
[[Category:ZCS 8.0]] | |||
[[Category:ZCS 7.0]] | |||
[[Category:ZCS 6.0]] | |||
[[Category:ZCS 5.0]] | [[Category:ZCS 5.0]] |
Latest revision as of 06:37, 14 August 2015
Zmstats
Zmstats
Zmstats is how Zimbra exposes its performance metrics and statistics to the world. The information covers a wide array of data: disk usage, cpu utilization, java statistics, zimbra counters and beyond.
Zmstats consists of the following components and scripts:
- zmstatctl (all zmstat collection scripts are located in /opt/zimbra/libexec)
- zmstat-allprocs (since ZCS 6.0)
- zmstat-convertd
- zmstat-cpu
- zmstat-df (since ZCS 6.0)
- zmstat-fd
- zmstat-io
- zmstat-ldap (since ZCS 8.0)
- zmstat-mtaqueue
- zmstat-mysql
- zmstat-nginx (since ZCS 6.0)
- zmstat-proc
- zmstat-vm
- zmstat-chart
- zmstat-chart-config
Running zmstats
- On the server where the stats were produced, make sure that the zmstat-chart.xml is provided. When running zmdiaglog, the zmstats and zmstat-chart-config are automatically produced]
# su - zimbra $ zmstat-chart-config > /tmp/zmstat-chart-`zmhostname`.xml
- Make a charts directory:
$ mkdir ~/zmstat/2010-06-01/charts
- Then produce the stats:
$ zmstat-chart -c /tmp/zmstat-chart.xml -s ~/zmstat/2010-06-01 -d ~/zmstat/2010-06-01/charts
zmstatctl
zmstatctl is used to start and stop and various zmstat-* data logging scripts
zmstat-chart
zmstat-chart reads an XML configuration file generated by zmstat-chart-config and generates a set of HTML and PNG graph images suitable for rapidly diagnosing problems and load issues.
zmstat-chart-config.xml
Used to control what is graphed and how
Examples:
<chart title="Mailboxd: JVM Heap Used" category="mailboxd" infile="mailboxd.csv" outfile="mboxd-heap-used.png" yAxis="MB"> <plot data="heap_used" legend="total" divisor="1m"/> </chart>
The above defines a chart that reads the counter "heap_used" out of
mailboxd.csv. It takes that counter and graphs it to a file
"mboxd-heap-used.png". There will be a resulting graph with a yAxis labelled
"MB" and "heap_used" divided by 1million (megabytes) will be graphed over time.
Multiple plots can be placed onto a single chart through the use of additional <plot> elements.
See /opt/zimbra/conf/zmstat-chart.xml for more examples
allprocs.csv (sinze ZCS 6.0)
Written by zmstat-allprocs
Interval: LC.zmstat_interval
Reference: proc(5) man page. /proc/[pid]/stat, /pro/[pid]/io
timestamp | Time when sample was collected |
process | process name |
utime | total user time |
stime | total system time |
cputime | user + system time total |
rchar | bytes read (*1) |
wchar | bytes written (*1) |
read_bytes | bytes read from disk (*1) |
write_bytes | bytes written to disk (*1) |
rss | resident-set-size memory usage (kiloBytes) |
processes | number of processes |
threads | number of threads |
(*1) Only if IO Accounting is enabled in the Linux kernel
convertd.csv
Collects CPU statistics for the convertd process (NE only).
Written by zmstat-convertd
Interval: LC.zmstat_interval
Reference: proc(5) man page. /proc/[pid]/stat, /pro/[pid]/io
timestamp | Time when sample was collected |
utime | user time for convertd |
stime | system time for convertd |
cputime | user + system time total |
rchar | bytes read (*1) |
wchar | bytes written (*1) |
read_bytes | bytes read from disk (*1) |
write_bytes | bytes written to disk (*1) |
rss | resident-set-size memory usage (kiloBytes) |
processes | number of processes |
theards | number of threads |
(*1) Only if IO Accounting is enabled in the Linux kernel
cpu.csv
Written by zmstat-cpu
Interval: LC.zmstat_interval
Reference: proc(5) man page. /proc/stat
timestamp | Time when sample was collected |
cpu:user | total user time |
cpu:nice | total nice process time |
cpu:sys | total system time |
cpu:idle | total idle time |
cpu:iowait | total time in iowait |
cpu:irq | total time in irq |
cpu:softirq | total time in softirq |
cpu-N:XXX | same as above, but per individual core/cpu |
df.csv (since ZCS 6.0)
Captures disk usage
Written by zmstat-df
Interval: LC.zmstat_disk_interval
Reference: df(1) man page
timestamp | Time when sample was collected |
path | mount point |
disk | device |
disk_use | space used (kiloBytes) |
disk_space | total space (kiloBytes) |
disk_pct_used | percentage used |
fd.csv
Captures file descriptor usage on the system
Written by zmstat-fd
Interval: LC.zmstat_interval
Reference: proc(5) man page. /proc/sys/fs/file-nr, /proc/[pid]/fs/
timestamp | Time when sample was collected |
fd_count | current number of open file descriptors |
mailboxd_fd_count | current number of open file descriptors by mailboxd |
imap.csv
Written by mailboxd
Interval: 1 minute
timestamp | Time when sample was collected |
command | executed command |
exec_count | number of executions |
exec_ms_avg | average execution time |
io.csv and io-x.csv
Written by zmstat-io
Interval: LC.zmstat_interval
Reference: iostat(1) man page
timestamp | Time when sample was collected |
dev:tps | transactions per second |
dev:kB_read/s | read rate |
dev:kB_wrtn/s | write rate |
dev:kB_read | bytes read |
dev:kB_wrtn | bytes written |
dev:rrqm/s | read requests merged per second queued to device |
dev:wrqm/s | write requests merged per second queued to device |
dev:r/s | reads per second |
dev:w/s | writes per second |
dev:rkB/s | read rate |
dev:wkB/s | write rate |
dev:avgrq-sz | average size (sectors) of requests |
dev:avgqu-sz | average queue length |
dev:await | average wait time for requests to be served |
dev:svctm | average time to service requests |
dev:%util | percentage of CPU time / bandwidth utilization of device |
ldap.csv (since ZCS 8.0)
LDAP server
Written by zmstat-ldap
Interval: LC.zmstat_interval
Reference: OpenLDAP Software 2.4 Administrator's Guide: Monitoring http://www.openldap.org/doc/admin24/monitoringslapd.html
timestamp | Time when sample was collected |
abandon_ops | number of completed Abandon operations |
add_ops | number of completed Add operations |
bind_ops | number of completed Bind operations |
bytes_sent | bytes sent by the server |
compare_ops | number of completed Compare operations |
completed_ops | See Bug 100731 |
connections | number of connections |
delete_ops | number of completed Delete operations |
entries_sent | entries sent by the server |
extended_ops | number of completed Extended operations |
initiated_ops | See Bug 100731 |
modify_ops | number of completed Modify operations |
modrdn_ops | number of completed Modrdn operations |
read_waiters | number of current read waiters |
referrals_sent | referrals sent by the server |
search_ops | number of completed Search operations |
unbind_ops | number of completed Unbind operations |
write_waiters | number of current write waiters |
Mailbox server
Written by mailboxd
Interval: 1 minute
timestamp | Time when sample was collected |
command | executed command |
exec_count | number of executions |
exec_ms_avg | average execution time |
Known bug: Bug 99936 - mailboxd overwrites ldap.csv generated by zmstat-ldap
mailboxd.csv
Written by mailboxd
Interval: 1 minute
account_cache_hit_rate | LDAP account cache hit rate |
account_cache_size | LDAP account cache size |
acl_cache_hit_rate | LDAP ACL cache hit rate |
bis_read | Number of times that the file descriptor cache read message data from disk |
bis_seek_rate | Percentage of file descriptor cache disk reads that required a seek |
calcache_hit | Hit rate of calendar summary cache, counting cache hit from both memory and file |
calcache_lru_size | Number of calendars (folders) in the calendar summary cache LRU in Java heap |
calcache_mem_hit | Hit rate of calendar summary cache, counting cache hit from memory only |
cos_cache_hit_rate | LDAP COS cache hit rate |
cos_cache_size | LDAP COS cache size |
db_conn_count | Number of times that the server got a database connection from the pool |
db_conn_ms_avg | Average latency (ms) of getting a database connection from the pool |
db_pool_size | Number of database connections in use |
domain_cache_hit_rate | LDAP domain cache hit rate |
domain_cache_size | LDAP domain cache size |
ews_syncstate_cache_hit_rate | EWS Syncstate Cache Hit Rate |
ews_syncstate_cache_size | EWS Syncstate Cache Size |
fd_cache_hit_rate | File descriptor cache hit rate |
fd_cache_size | Number of open file descriptors that reference message content |
gc_concurrentmarksweep_count | Number of times that concurrentmarksweep GC was invoked |
gc_concurrentmarksweep_ms | Time (ms) spent on concurrentmarksweep GC |
gc_major_count | Number of times that major GC was invoked |
gc_major_ms | Time (ms) spent on major GC |
gc_minor_count | Number of times that minor GC was invoked |
gc_minor_ms | Time (ms) spent on minor GC |
gc_parnew_count | Number of times that parnew GC was invoked |
gc_parnew_ms | Time (ms) spent on parnew GC |
group_cache_hit_rate | LDAP group cache hit rate |
group_cache_size | LDAP group cache size |
heap_free | Number of bytes free in the entire JVM heap |
heap_used | Number of bytes used in the entire JVM heap |
http_idle_threads | Number of HTTP idle threads |
http_threads | Number of HTTP threads |
idx_bytes_read | Accumulated bytes read by Lucene |
idx_bytes_read_avg | Average of idx_bytes_read |
idx_bytes_written | Accumulated bytes written by Lucene |
idx_bytes_written_avg | Average of idx_bytes_written |
idx_wrt_avg | Average number of concurrent index writers |
idx_wrt_opened | Accumulated number of index writers opened |
idx_wrt_opened_cache_hit | Accumulated number of cache hits when opening an index writer |
imap_conn | Number of cleartext IMAP connections |
imap_count | Number of IMAP requests received |
imap_ms_avg | Average processing time (ms) of IMAP requests |
imap_ssl_conn | Number of SSL IMAP connections |
imap_ssl_threads | Number of SSL IMAP threads |
imap_threads | Number of IMAP threads |
innodb_bp_hit_rate | InnoDB buffer pool hit rate |
ldap_dc_count | Number of times that the server got an LDAP directory context |
ldap_dc_ms_avg | Average latency (ms) of getting an LDAP directory context |
lmtp_conn | Number of LMTP connections |
lmtp_dlvd_bytes | Number of bytes of data delivered to mailboxes as a result of LMTP delivery |
lmtp_dlvd_msgs | Number of messages delivered to mailboxes as a result of LMTP delivery |
lmtp_rcvd_bytes | Number of bytes received over LMTP |
lmtp_rcvd_msgs | Number of messages received over LMTP |
lmtp_rcvd_rcpt | Number of LMTP recipients |
lmtp_threads | Number of LMTP threads |
mbox_add_msg_count | Number of messages that were added to a mailbox |
mbox_add_msg_ms_avg | Average latency (ms) of adding a message to a mailbox |
mbox_cache | Mailbox cache hit rate |
mbox_cache_size | Number of mailboxes cached in memory |
mbox_get_count | Number of times that the server got a mailbox from the cache |
mbox_get_ms_avg | Average latency (ms) of getting a mailbox from the cache |
mbox_item_cache | Item cache hit rate |
mbox_msg_cache | Message cache hit rate |
mobile_ping_cache_hit_rate | mobile_ping_cache_hit_rate |
mobile_ping_cache_size | mobile_ping_cache_size |
mobile_syncstate_cache_hit_rate | mobile_syncstate_cache_hit_rate |
mobile_syncstate_cache_size | mobile_syncstate_cache_size |
mpool_cms_old_gen_free | mpool_cms_old_gen_free |
mpool_cms_old_gen_used | mpool_cms_old_gen_used |
mpool_code_cache_free | Number of bytes free in the code cache memory pool |
mpool_code_cache_used | Number of bytes used in the code cache memory pool |
mpool_compressed_class_space_free | mpool_compressed_class_space_free |
mpool_compressed_class_space_used | mpool_compressed_class_space_used |
mpool_metaspace_free | mpool_metaspace_free |
mpool_metaspace_used | mpool_metaspace_used |
mpool_par_eden_space_free | Number of bytes free in the eden space memory pool |
mpool_par_eden_space_used | Number of bytes used in the eden space memory pool |
mpool_par_survivor_space_free | Number of bytes free in the survivor space memory pool |
mpool_par_survivor_space_used | Number of bytes used in the survivor space memory pool |
msg_cache_size | Number of message structures cached in memory |
pop_conn | Number of cleartext POP3 connections |
pop_count | Number of POP3 requests received |
pop_ms_avg | Average processing time (ms) of POP3 requests |
pop_ssl_conn | Number of SSL POP3 connections |
pop_ssl_threads | Number of SSL POP3 threads |
pop_threads | Number of POP3 threads |
server_cache_hit_rate | LDAP server cache hit rate |
server_cache_size | LDAP server cache size |
soap_count | Number of SOAP requests received |
soap_ms_avg | Average processing time (ms) of SOAP requests |
soap_sessions | Number of SOAP sessions |
timestamp | Time when sample was collected |
ucservice_cache_hit_rate | ucservice_cache_hit_rate |
ucservice_cache_size | ucservice_cache_size |
xmpp_cache_hit_rate | LDAP XMPP cache hit rate |
xmpp_cache_size | LDAP XMPP cache size |
zimlet_cache_hit_rate | LDAP zimlet cache hit rate |
zimlet_cache_size | LDAP zimlet cache size |
mtaqueue.csv
Written by zmstat-mtaqueue
Interval: LC.zmstat_interval
Reference: postqueue(1) man page http://www.postfix.org/postqueue.1.html
timestamp | Time when sample was collected |
KBytes | kilobytes queued by the mta |
requests | number of items queued by the mta |
mysql.csv
Columns for mysql.csv are derived from the values of the query "SHOW GLOBAL STATUS". Refer to the mysql administration manual for further elaboration on the meanings of all its counters.
Written by zmstat-mysql
Interval: LC.zmstat_interval
Reference:
- Server Status Variables - MariaDB Knowledge Base https://mariadb.com/kb/en/mariadb/server-status-variables/
- MySQL :: MySQL 5.0 Reference Manual :: 5.1.6 Server Status Variables https://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html
nginx.csv (sinze ZCS 6.0)
Written by zmstat-nginx
Interval: LC.zmstat_interval
Reference: proc(5) man page. /proc/[pid]/stat, /proc/[pid]/io
timestamp | Time when sample was collected |
utime | user time for convertd |
stime | system time for convertd |
cputime | user + system time total |
rchar | bytes read (*1) |
wchar | bytes written (*1) |
read_bytes | bytes read from disk (*1) |
write_bytes | bytes written to disk (*1) |
rss | resident-set-size memory usage (kiloBytes) |
processes | number of processes |
threads | number of threads |
(*1) Only if IO Accounting is enabled in the Linux kernel
pop3.csv
Written by mailboxd
Interval: 1 minute
timestamp | Time when sample was collected |
command | executed command |
exec_count | number of executions |
exec_ms_avg | average execution time |
proc.csv
Written by zmstat-proc
Interval: LC.zmstat_interval
Reference: proc(5) man page. /proc/stat, /proc/[pid]/stat, /proc/[pid]/statm
timestamp | Time when sample was collected |
system | label |
user | user time (percent) |
sys | system time (percent) |
idle | idle time (percent) |
iowait | iowait time (percent) |
PROC | PROC name |
PROC-total-cpu | user + system time total for PROC |
PROC-utime | user time for PROC |
PROC-stime | system time for PROC |
PROC-totalMB | total memory footprint for PROC |
PROC-rssMB | resident-set-size of PROC |
PROC-sharedMB | shared memory of PROC |
PROC-process-count | number of threads/subprocesses |
soap.csv
Written by mailboxd
Interval: 1 minute
timestamp | Time when sample was collected |
command | executed command |
exec_count | number of executions |
exec_ms_avg | average execution time |
sql.csv (since ZCS 8.5)
Written by mailboxd
Interval: 1 minute
timestamp | Time when sample was collected |
command | executed command |
exec_count | number of executions |
exec_ms_avg | average execution time |
sync.csv (since ZCS 8.0)
Written by mailboxd
Interval: 1 minute
timestamp | Time when sample was collected |
command | executed command |
exec_count | number of executions |
exec_ms_avg | average execution time |
threads.csv
Written by mailboxd
Interval: 1 minute
Relative configuration: zimbraStatThreadNamePrefix (since ZCS 6.0)
timestamp | Time when sample was collected |
THREAD | number of threads |
total | total number of threads |
vm.csv
The output of vmstat is recorded for reviewable statistics.
Written by zmstat-vm
Interval: LC.zmstat_interval
Reference:
- vmstat(8) man page
- proc(5) man page. /proc/meminfo, /proc/loadavg
vmstat
- r
- b
- swpd
- free
- buff
- cache
- si
- so
- bi
- bo
- in
- cs
- us
- sy
- id
- wa
- st
- other counters reported by vmstat
proc
- MemTotal
- MemFree
- Buffers
- Cached
- SwapCached
- Active
- Inactive
- Active(anon)
- Inactive(anon)
- Active(file)
- Inactive(file)
- Unevictable
- Mlocked
- SwapTotal
- SwapFree
- Dirty
- Writeback
- AnonPages
- Mapped
- Shmem
- Slab
- SReclaimable
- SUnreclaim
- KernelStack
- PageTables
- NFS_Unstable
- Bounce
- WritebackTmp
- CommitLimit
- Committed_AS
- VmallocTotal
- VmallocUsed
- VmallocChunk
- HardwareCorrupted
- AnonHugePages
- HugePages_Total
- HugePages_Free
- HugePages_Rsvd
- HugePages_Surp
- Hugepagesize
- DirectMap4k
- DirectMap2M
- DirectMap1G
- Loadavg
Related: Server Monitoring