Troubleshooting Course Content Rough Drafts-Recover Missing Data - User: Difference between revisions

Line 1: Line 1:
=Need To Do=
==Need To Do==

* review training materials for sys admin course
* review training materials for sys admin course
* determine what should be 'video' and how it can all be built upon preceding steps in a hands on training manner
* determine what should be 'video' and how it can all be built upon preceding steps in a hands on training manner
==Bug & RFE's I Made While Researching This Draft Write Up==
* [story] Ability to search data within backup and do "item" restores or identify locations of search results
* admin console backup label view doesn't list accounts in the all accounts tab
* admin console restore - doesn't autocomplete / suggest account matches when filling out email address box
* document new restore functions / options with ZCS 8+ for admin console restore
* admin console restore - rename "Selected Servers" panel to "Restore Options"
*admin console restore - if only one mailstore in env. then state such in second panel of restore about "server for the restored accounts"
*admin console restore - expand restore To options - To full backup label, To incremental target
*admin console restore - "restore to the latest backup" incorrectly described / broken
* admin console restore - unable to restore individual accounts [sort of]
* admin console restore - reuse GAL/Contact Picker Window for "restore individual accounts"
* Default COS value [ zimbraPrefShowSelectionCheckbox ] for checkboxes in lists should be TRUE and improve 'select all' functionality

==General Overview Of Backup Environment==
==General Overview Of Backup Environment==

Revision as of 15:26, 29 January 2015

Need To Do

  • review training materials for sys admin course
  • determine what should be 'video' and how it can all be built upon preceding steps in a hands on training manner

Bug & RFE's I Made While Researching This Draft Write Up

General Overview Of Backup Environment

Backup Directory Partition & Filesystem

dedicated backup partition - only use supported methods:

NFS Use For Backups

Please see this RFE:

This is the proposed statement to be included in the release notes following the RFE:

Zimbra will support customers that store backups (e.g. /opt/zimbra/backup) on an NFS-mounted partition. Please note that this does not relieve the customer from the responsibility of providing a storage system with a performance level appropriate to their desired backup and restore times. In our experience, network-based storage access is more likely to encounter latency or disconnects than is equivalent storage attached by fiber channel or direct SCSI.
Zimbra continues to view NFS storage of other parts of the system as unsupported. Our testing has shown poor read and write performance for small files over NFS implementations, and as such we view it unlikely that this policy will change for the index and database stores. We will continue to evaluate support for NFS for the message store as customer demand warrants.
When working with Zimbra Support on related issues, the customer must please disclose that the backup storage used is NFS.
Things To Check When Using NFS
  • Check the /var/log/messages on both the zimbra server and the nfs server for nfs related errors during the time frame of your backup.
  • Check /opt/zimbra/log/mailbox.log for error messages about folders/files not being able to be written or missing directory errors.
  • Is root_squash configured on the nfs server? If it's changed to no_root_squash , does the behavior of the backup change?
  • Is the */backup directory owned by zimbra:zimbra with at least 750 or 755 permissions?
    • This parent directory as given in:
      • zmprov gs `zmhostname` zimbraBackupTarget
  • Does zimbraBackupTarget have at least the subdirectories of : sessions and tmp  : and are owned by zimbra:zimbra with 750 or 755 permissions?
    • If not, try manually creating them and then running a test backup.
    • If your backup session directory shows something like : drwxrwx---+ 2 zimbra zimbra 4096 Sep 14 00:00 TO_DELETE-full-XXXXXX , that + sign indicated extended acls are in use.
Debugging Example When Using NFS

Steps I wrote for one customer, where saving out the information as you walk through all the commands would give enough information [hopefully] to submit a good rfe/bug:

 1. make a test partition on nfs server - /nfs-test

 2. mount on zimbra server
 2A. mkdir /nfs-test
 2B. chmod 755 /nfs-test
 2C. mount nfs-server:/nfs-test /nfs-test
 2D. ls -la /nfs-test
 2E. mkdir /nfs-test/backup
 2F. chown zimbra:zimbra /nfs-test/backup
 2G. chmod 755 /nfs-test/backup
 2H. su - zimbra ; touch /nfs-test/backup/testfile 
 2I. ls -laR /nfs-test/
 2J. rm /nfs-test/backup/testfile

 3. Set zimbraBackupTarget
 3A. zmprov ms `zmhostname` zimbraBackupTarget /nfs-test/backup

 4. Run a full backup against one account
 4A. ex. zmbackup -f -a

 5. ls -laR /nfs-test/

 6. If you again, run into the same problem. You could also repeat the backup after increasing the backup 
    logging variable for the account your trying to backup. If you didn't run into the same problem, it might 
    had to do with the initial setup of the nfs mount and permissions being used during the directory creation.
 6A. zmprov aal zimbra.backup debug
 6B. logging will show up in /opt/zimbra/log/mailbox.log
 6C. Remove account logging when your done.  zmprov ral zimbra.backup

 8. Change zimbraBackupTarget back to your production path.
Setup A Fast Test NOT Using NFS

A way to "avoid" the NFS issues for testing purposes would be to setup a new zimbraBackupTarget to try doing a full backup of a couple of user accounts. I DON'T recommend this if your using auto-group for zimbraBackupMode [ zmprov gs `zmhostname` zimbraBackupMode ] , only if your using Standard mode.

[as root]
** adjust your new "backup" directory path as needed - mine is just an example**
mkdir /mnt/usb1/backup-test
chown zimbra:zimbra /mnt/usb1/backup-test
chmod 750 /mnt/usb1/backup-test
su - zimbra
**confirm backup mode as standard**
zmprov gs `zmhostname` zimbraBackupMode
**if not stand, please stop**
zmprov ms `zmhostname` zimbraBackupTarget /mnt/usb1/backup-test
**Find a couple of test accounts to do a full for. Make sure you'll have space to do the backup regards to need free space.**
zmbackup -f -a
**Watch and confirm status of backup you just started.**
**Confirm files were backed up in right location**
ls /mnt/usb1/backup-test/sessions/
**Failed backups would most likely results in left over directory in tmp directory**
ls /mnt/usb1/backup-test/tmp/

Standard Backup or Autogroup

Backup Schedule


Backup CLI Commands

zmbackup zmbackupquery

Monitoring And Maintenance Of Backups

monitoring backups and disk usage

Server Issues

Questions To Ask & Address Prior To Doing A Server Restore

  • Should I shutdown ZCS services or block client access until issue and resolution are identified?
  • What data needs to really be restored - can you restore just that or does it need a full restore?

Server Restores

Server Restores - Disaster Recoveries

For Full Single ZCS Server DR Restores

Please see Network_Edition_Disaster_Recovery

Some additional notes I have on it - Ajcody-Disaster-Recovery-Specific-Notes

For Multi-Server DR Restore Specifics

Along with the above references, please see Ajcody-Notes-Multi-Server-Restore-DR

To Restore Just The LDAP Date

Let's say your ldap data was 'lost/destoyed' but everything else was intact, you should look at the zmrestoreldap command.

This section should have more precaution and background information to handle this section.

The basics:

  • To find the LDAP session labels type -lbs.
    • zmrestoreldap -lbs
  • Restore the complete LDAP directory server
    • zmrestoreldap -lb full20061130135236
  • Restore LDAP data for specific accounts
    • zmrestoreldap -lb full20061130135236 -a
To Restore Just The Mysql DB

Option available from the following RFE work:

  • "Allow backup of only primary message volume"
      • Options to exclude types of data as described in the 6.0.6 Admin Guide:
        • Search index : If you do not restore the search index data, the mailbox will have to be reindexed after the restore.
          • zmrestore <all or account> --exclude-search-index
        • Blobs : This is a useful option when all blobs for the mailbox being restored already exists.
          • zmrestore <all or account>|--exclude-blobs
        • HSM-blobs : This is useful when all HSM blobs for the mailbox being restored already exists.
          • zmrestore <all or account> --exclude-hsm-blobs
Steps To Practice And Test A DR Situation - System And Mysql Data Only

This test should be done against a non-production server. Either install a 'dummy' ZCS server or create a clone of your production server to perform this test against.

Let's say your mysql data was 'lost/destoyed' but everything else was intact. This might be the solution for the situation:

The steps below are to REPRODUCE A DR situation to test.

  1. zmcontrol stop
  2. Caution - step to reproduce DR situation to test against
    1. mv ~/db/data ~/db/data.OLD
  3. Getting old mysql passwords
    • zmlocalconfig -s mysql_root_password
    • zmlocalconfig -s zimbra_mysql_password
  4. [as zimbra] /opt/zimbra/libexec/zmmyinit
  5. ldap start
  6. zmconvertctl start
  7. mysql is already running per the zmmyinit - mysql.server status - to check.
  8. zmrestoreoffline -a all -br --systemData --excludeSearchIndex --excludeBlobs --excludeHsmBlobs
    • Note - you most likely will want to use the -br or -rf option for this situation.
      • -br,--backedupRedologsOnly : Replays the redo logs in backup only, which excludes archived and current redo logs of the system
      • -rf,--restoreFullBackupOnly : Restores to last full backup only, which excludes incremental backups.
    • -sys,--systemData : Restores global tables and local config.
    • Note the use of -a all , you could also do this for one account first to confirm operation is successful for your circumstances.
  9. If you used -rf or -br with your zmestoreoffline, you might also need to use zmplayredo to finish it up to get the most complete restore.

Server Cloning - Server Moves/Testing Purposes

User Issues

Questions To Ask & Address Prior To Doing A User Restore

  • Can data be restored from the user's dumpster [if enabled] ?
  • Is there a copy of the data in a third party client ? [POP email client that downloaded it, for example]
  • Does the data maybe exist in another user's account ? [For example, the senders account or others in the To/CC]
  • Does the data reside in a redolog/s ?
  • Finally, restore from backups ?
    • Do I need to overwrite the existing account with the restore or should I restore to a new account name?
      • Using the option -ca [create account] and -pre restore_ [prefix to the new account name, restore_user@ ]
    • What are your retention policies and how it might determine the backup set you need?

User Restores

  • Admin Console
    • Restore/Backup
    • Redirected restore : -ca -pre restored_
    • View Mail > TGZ export/import
  • Crossmailboxsearch
  • CLI
    • zmrestore
      • to latest point in time
      • to specific point in time in past
      • to incremental session label
      • to full backup label
  • Importing Restored Data Into Parent Account

Trouble-shooting Backup/Restore Issues And Other General Questions

Backup And Restore Compatibility Between ZCS Versions

Need to create summary statement




Basic Backup Information To Submit To Support

Disk Space Usage Issues
Trend Data

If there is concerns about disks/partitions getting full, this command would be helpful for trending data on your server. Send support the resulting df.tar file . Note - adjust the tail command if you want more than 20 day's worth of trending data, the -n 20 option.

[zimbra@zcs806 tmp]$ /tmp

[zimbra@zcs806 tmp]$ tar cvf /tmp/df.tar `find /opt/zimbra/zmstat -name df.cs\* | sort | tail -n 20`
tar: Removing leading `/' from member names
 [cut - Ajcody]

[zimbra@zcs806 tmp]$ ls -lah /tmp/df.tar
-rw-r----- 1 zimbra zimbra 80K Mar 29 06:44 /tmp/df.tar

[zimbra@zcs806 tmp]$ tar tvf /tmp/df.tar
-rw-r----- zimbra/zimbra  2566 2014-03-11 00:00 opt/zimbra/zmstat/2014-03-10/df.csv.gz
-rw-r----- zimbra/zimbra  2553 2014-03-12 00:00 opt/zimbra/zmstat/2014-03-11/df.csv.gz
 [cut - Ajcody]
-rw-r----- zimbra/zimbra  2513 2014-03-28 00:00 opt/zimbra/zmstat/2014-03-27/df.csv.gz
-rw-r----- zimbra/zimbra  2531 2014-03-29 00:00 opt/zimbra/zmstat/2014-03-28/df.csv.gz
-rw-r----- zimbra/zimbra  8013 2014-03-29 06:40 opt/zimbra/zmstat/df.csv
Directory Sizes In /opt/zimbra

Please see the following and provide the output to support. Note, even though this method is faster than doing a du it still can take awhile.

* Ajcody-Server-Misc-Topics#Faster_Way_To_Get_Directory_Size_On_Filesytem_-_find_vs_du
Adjusting The Disk Alert Threshold

Note - zmlocalconfig smtp_notify must return yes if you want to receive the notifications.

If you just need to adjust the disk alert threshold, then see the following:

See current values:

 zmlocalconfig | grep zmdisklog

Example adjustment:

 su - zimbra
 zmlocalconfig -e zmdisklog_critical_threshold=98
 zmlocalconfig -e zmdisklog_warn_threshold=95

To exclude a partition from the checks [example of two being excluded]:

 su - zimbra
 zmlocalconfig -e zmstat_df_excludes="/mount/point:/mount/point2"

They might be a bug on this, where you'll keep getting email until a logrotate happens [zimbra.log?].

Some things to do to confirm and share with support or in bug. As zimbra

su - zimbra

ls -la /var/log/zimbra.log

df -h
                      5.5G  3.5G  1.7G  68% /
 tmpfs                 939M     0  939M   0% /dev/shm
 /dev/sda1             485M   79M  381M  18% /boot
 /dev/sdb1              30G  6.2G   23G  22% /opt


zmlocalconfig | grep zmdisklog
 zmdisklog_critical_threshold = 80
 zmdisklog_warn_threshold = 85  

zmlocalconfig -e zmdisklog_critical_threshold=95

zmlocalconfig -e zmdisklog_warn_threshold=90

zmlocalconfig | grep zmdisklog
 zmdisklog_critical_threshold = 95
 zmdisklog_warn_threshold = 90

zmstatctl restart


ps -eaf | grep zmstat-df

ls -la /var/log/zimbra.log

date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday"

##Note - Emails by default go out every 10 minutes - for example:

[zimbra@zcs803 ~]$ date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday"
Thu May 22 09:40:08 PDT 2014
/var/log/zimbra.log:May 22 08:30:00 zcs803 zimbramon[18826]: 18826:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 08:40:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 08:50:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:00:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
  ## Note - I had readjusted the variable to not warn during this time segment ##
/var/log/zimbra.log:May 22 09:20:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:30:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:40:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: / on device /dev/mapper/vg_rhel664-lv_root at 82%
num: 7, more: false

     Id  Type   From                  Subject                                             Date
   ----  ----   --------------------  --------------------------------------------------  --------------
1.  328  mess   admin                 Disk / at 82% on           05/22/14 09:40
2.  327  mess   admin                 Disk / at 82% on           05/22/14 09:30
3.  326  mess   admin                 Disk / at 82% on           05/22/14 09:20
  ## Note - I had readjusted the variable to not warn during this time segment ##
4.  325  mess   admin                 Disk / at 82% on           05/22/14 09:00
5.  324  mess   admin                 Disk / at 82% on           05/22/14 08:50
6.  323  mess   admin                 Disk / at 82% on           05/22/14 08:40
7.  320  mess   admin                 Disk / at 82% on           05/22/14 08:31

Continue to monitor your zmmailbox search results for an hour.

The Basic Information Support Needs

as root:

  • cat /etc/fstab
    • Shows us what is mounted upon boot
  • cat /proc/mounts
    • Shows us what is currently mounted and its status - you can see if a mount is read-only here.
  • df -hT
    • Lists current mounts using human-readable size information and also notes the filesystem type.

as zimbra:

  • zmprov -l gs `zmhostname` | egrep 'Back|Redo'
    • Will show us a number of variables related to backup and redologs. Also tell us if your using auto-group or the default method.
  • du -sh /opt/zimbra/redolog
    • Will might notice your redolog logs aren't rolling over, causing a possible issue.
  • ls -latr /opt/zimbra/backup
    • This is the default backup target, please adjust this path here and below if you are using a different zimbraBackupTarget value.
    • zmprov gs `zmhostname` zimbraBackupTarget
    • We'll be able to confirm permissions are right.
  • ls -latr /opt/zimbra/backup/tmp
    • This will show us if you have failed backup jobs and confirm tmp is being cleaned appropriately after the backup is done.
  • ls -latr /opt/zimbra/backup/sessions
    • This will show us what backup sessions are available and confirm permissions are correct.
    • Adjust path if your zimbraBackupTarget value is not the default path.
  • Some directory sizes in the backup directory:
    • Default path first
      • du -sh `find /opt/zimbra/backup -maxdepth 2 -type d`
    • If your using a different backup target, check that directory also. Replace /opt/zimbra/backup above with your backup path.
  • zmbackupquery
    • This should match what's in the sessions directory and it will also tell us if status of each backup and how many accts were done.
  • crontab -l | grep -i back
    • This will show use when backups are support to run and with what options they are running with.
  • zmlocalconfig | grep -i back
    • This is useful to see a number of backup options not exposed in the crontab, things related to the zip options.
  • zmvolume -l
    • This is useful to see how many volumes are being used, if HSM is being used, and if compression is being done at the volume level.
Additional Log Files Support Might Need

And send the following logs:

  • /var/log/messages
    • Filesystem issues often times are noted here and also in syslog. This might explain an interruption in the backup process. Server restarts, filesystem going full, filesystem going read-only, etc.
  • /var/log/syslog
  • /opt/zimbra/log/mailbox.log
    • The backup activity is logged here.
    • And any other mailbox.log file that would cover the event

Additional Checks For Performance Specific Issues

If Your Using a SAN or NFS For Your Backup Target - Please Check Your IOWait

Ideally, you would compare iowait and performance data from the target backup host as well as the stats available on the ZCS servers. To get graphs and stats on this from ZCS, please see Ajcody-Testing-Debugging#zmstat_and_zmstat-chart . You should submit this data and iowait conclusions if you still need to submit a support case about backup performance issues.

Is HSM Running During Your Backup Window
Are You Using --zipStore

--zipStore zips the blobs vs. keeping the blobs as individual files. --zipStore does not use compression either. For most circumstances, this will give the best performance, especially with NFS. This should be the default behavior of the backups, the following RFE is when it became the default [ZCS6+] :

To see if zip's are being used for backups for example, in the backup/session directory you'll know if it is by seeing .zip files:

zimbra$ ls

To see if the zip file is using compression [-Z option for unzip will indicate whether or not the archive is actually compressed] :

unzip -Z
293 files, 5982984 bytes uncompressed, 5982984 bytes compressed:  0.0%

Also, if your zmvolume has compression enabled the blobs will remain compressed within the zip also upon backup. The point being, they are uncompressed to be then put into a zip file when the backup is using --zipStore.

Performance Issues And Time To Complete

RFE's for increase backup performance:

What Might Be Wrong?

Need to expand this section or refer to the part above about all the log/data gathering that support would normally ask for

First thing to check is the log file and see if anything notable stands out.

grep -i backup /opt/zimbra/log/mailbox.log
Use Auto-Group Backups Rather Than Default Style Backup

Having trouble completing that entire full backup during off-hours? Enter the hybrid auto-grouped mode, which combines the concept of full and incremental backup functions - you’re completely backing up a target number of accounts daily rather than running incrementals.

Auto-grouped mode automatically pulls in the redologs since the last run so you get incremental backups of the remaining accounts; although the incremental accounts captured via the redologs are not listed specifically in the backup account list. This still allows you to do a point in time restore for any account.

Please see the following for more detailed information:

Need To Write Fewer Files - Add The Zip Option To Your Backup Commands

Using the zip option will compress all those thousands of single files that exist under a user's backup, decreasing performance issues that arise from writing out thousands of small files as compared to large ones. This is often seen when one is :

  • Using nfs for the backup directory
  • Copying/rsyncing backups to a remote server
  • Are using some third party backup software (to tape) to archive/backup the zimbra backup sessions.

Please see the following for more information about using the Zip option:

SAN Snapshots For Backups

Please see:

Cloud Backups

Please see:

Tape Backups

I would then use the non-rsync network ports for your traditional network backup software to run over to dump the data to tape. This way that activity doesn't effect prod performance at all. All full DR would use the backup/ data anyways (offsite DR). I've created another section that will deal with this in more details - specifically handling the hard links that are used by Zimbra.

Please see:

Verified Against: Zimbra Collaboration Suite 8.6 Date Created: 01/22/2015
Article ID: Date Modified: 2015-01-29

Try Zimbra

Try Zimbra Collaboration with a 60-day free trial.
Get it now »

Want to get involved?

You can contribute in the Community, Wiki, Code, or development of Zimlets.
Find out more. »

Looking for a Video?

Visit our YouTube channel to get the latest webinars, technology news, product overviews, and so much more.
Go to the YouTube channel »

Jump to: navigation, search