Troubleshooting Course Content Rough Drafts-Recover Missing Data - User
Need To Do
- review training materials for sys admin course
- determine what should be 'video' and how it can all be built upon preceding steps in a hands on training manner
General Overview Of Backup Environment
Backup Directory Partition & Filesystem
dedicated backup partition - only use supported methods:
- no cifs, etc
- nfs - limited support
NFS Use For Backups
Please see this RFE:
- "Need clarity on supporting nfs mounted zimbra directories - report error/msg if nfs mount is present"
This is the proposed statement to be included in the release notes following the RFE:
- ZCS & NFS:
- Zimbra will support customers that store backups (e.g. /opt/zimbra/backup) on an NFS-mounted partition. Please note that this does not relieve the customer from the responsibility of providing a storage system with a performance level appropriate to their desired backup and restore times. In our experience, network-based storage access is more likely to encounter latency or disconnects than is equivalent storage attached by fiber channel or direct SCSI.
- Zimbra continues to view NFS storage of other parts of the system as unsupported. Our testing has shown poor read and write performance for small files over NFS implementations, and as such we view it unlikely that this policy will change for the index and database stores. We will continue to evaluate support for NFS for the message store as customer demand warrants.
- When working with Zimbra Support on related issues, the customer must please disclose that the backup storage used is NFS.
Things To Check When Using NFS
- Check the /var/log/messages on both the zimbra server and the nfs server for nfs related errors during the time frame of your backup.
- Check /opt/zimbra/log/mailbox.log for error messages about folders/files not being able to be written or missing directory errors.
- Is root_squash configured on the nfs server? If it's changed to no_root_squash , does the behavior of the backup change?
- Is the */backup directory owned by zimbra:zimbra with at least 750 or 755 permissions?
- This parent directory as given in:
- zmprov gs `zmhostname` zimbraBackupTarget
- This parent directory as given in:
- Does zimbraBackupTarget have at least the subdirectories of : sessions and tmp : and are owned by zimbra:zimbra with 750 or 755 permissions?
- If not, try manually creating them and then running a test backup.
- IF USING A NAS - MAKE SURE YOUR NOT USING EXTENDED ACLS OR THAT YOU HAVE THEM CONFIGURED PROPERLY
- If your backup session directory shows something like : drwxrwx---+ 2 zimbra zimbra 4096 Sep 14 00:00 TO_DELETE-full-XXXXXX , that + sign indicated extended acls are in use.
Debugging Example When Using NFS
Steps I wrote for one customer, where saving out the information as you walk through all the commands would give enough information [hopefully] to submit a good rfe/bug:
1. make a test partition on nfs server - /nfs-test 2. mount on zimbra server 2A. mkdir /nfs-test 2B. chmod 755 /nfs-test 2C. mount nfs-server:/nfs-test /nfs-test 2D. ls -la /nfs-test 2E. mkdir /nfs-test/backup 2F. chown zimbra:zimbra /nfs-test/backup 2G. chmod 755 /nfs-test/backup 2H. su - zimbra ; touch /nfs-test/backup/testfile 2I. ls -laR /nfs-test/ 2J. rm /nfs-test/backup/testfile 3. Set zimbraBackupTarget 3A. zmprov ms `zmhostname` zimbraBackupTarget /nfs-test/backup 4. Run a full backup against one account 4A. ex. zmbackup -f -a user@domain.com 5. ls -laR /nfs-test/ 6. If you again, run into the same problem. You could also repeat the backup after increasing the backup logging variable for the account your trying to backup. If you didn't run into the same problem, it might had to do with the initial setup of the nfs mount and permissions being used during the directory creation. 6A. zmprov aal user@domain.com zimbra.backup debug 6B. logging will show up in /opt/zimbra/log/mailbox.log 6C. Remove account logging when your done. zmprov ral user@domain.com zimbra.backup 8. Change zimbraBackupTarget back to your production path.
Setup A Fast Test NOT Using NFS
A way to "avoid" the NFS issues for testing purposes would be to setup a new zimbraBackupTarget to try doing a full backup of a couple of user accounts. I DON'T recommend this if your using auto-group for zimbraBackupMode [ zmprov gs `zmhostname` zimbraBackupMode ] , only if your using Standard mode.
[as root] ** adjust your new "backup" directory path as needed - mine is just an example** mkdir /mnt/usb1/backup-test chown zimbra:zimbra /mnt/usb1/backup-test chmod 750 /mnt/usb1/backup-test su - zimbra **confirm backup mode as standard** zmprov gs `zmhostname` zimbraBackupMode **if not stand, please stop** zmprov ms `zmhostname` zimbraBackupTarget /mnt/usb1/backup-test **Find a couple of test accounts to do a full for. Make sure you'll have space to do the backup regards to need free space.** zmbackup -f -a user1@domain.com user2@domain.com user3@domain.com **Watch and confirm status of backup you just started.** zmbackupquery **Confirm files were backed up in right location** ls /mnt/usb1/backup-test/sessions/ **Failed backups would most likely results in left over directory in tmp directory** ls /mnt/usb1/backup-test/tmp/
Standard Backup or Autogroup
Backup Schedule
crontab
Backup CLI Commands
zmbackup zmbackupquery
Monitoring And Maintenance Of Backups
monitoring backups and disk usage
Server Issues
Questions To Ask & Address Prior To Doing A Server Restore
Should I shutdown ZCS services or block client access until issue and resolution are identified? What data needs to really be restored - can you restore just that or does it need a full restore?
Server Restores
Server Restores - Disaster Recoveries
Need to create summary statement
References:
- Upgrades And Compatibility Of Older Backups
- "Backups must be compatible across patch releases"
- "Incorrect upgrade documentation regarding backups"
- "support for restore across major versions"
- "Add conversion tool to upgrade backup versions to allow restore on later zcs versions"
- From 5.0.7 - 5.0.10 You Might See minor version upgrades moving your backups into a subdirectory
- "upgrade incorrectly invalidates backups."
Server Cloning - Server Moves/Testing Purposes
User Issues
Questions To Ask & Address Prior To Doing A User Restore
- Can data be restored from the user's dumpster [if enabled] ?
- Is there a copy of the data in a third party client ? [POP email client that downloaded it, for example]
- Does the data maybe exist in another user's account ? [For example, the senders account or others in the To/CC]
- Does the data reside in a redolog/s ?
- Finally, restore from backups ?
- Do I need to overwrite the existing account with the restore or should I restore to a new account name?
- Using the option -ca [create account] and -pre restore_ [prefix to the new account name, restore_user@ ]
- What are your retention policies and how it might determine the backup set you need?
- Do I need to overwrite the existing account with the restore or should I restore to a new account name?
User Restores
- Admin Console
- Restore/Backup
- Redirected restore : -ca -pre restored_
- View Mail > TGZ export/import
- Crossmailboxsearch
- CLI
- zmrestore
- to latest point in time
- to specific point in time in past
- to incremental session label
- to full backup label
- zmrestore
- Importing Restored Data Into Parent Account
Trouble-shooting Backup/Restore Issues And Other General Questions
Restore Compatibility Between ZCS Versions
Trouble-shooting
Ref:
Basic Backup Information To Submit To Support
Disk Space Usage Issues
Trend Data
If there is concerns about disks/partitions getting full, this command would be helpful for trending data on your server. Send support the resulting df.tar file . Note - adjust the tail command if you want more than 20 day's worth of trending data, the -n 20 option.
[zimbra@zcs806 tmp]$ /tmp [zimbra@zcs806 tmp]$ tar cvf /tmp/df.tar `find /opt/zimbra/zmstat -name df.cs\* | sort | tail -n 20` tar: Removing leading `/' from member names /opt/zimbra/zmstat/2014-03-10/df.csv.gz /opt/zimbra/zmstat/2014-03-11/df.csv.gz [cut - Ajcody] /opt/zimbra/zmstat/2014-03-27/df.csv.gz /opt/zimbra/zmstat/2014-03-28/df.csv.gz /opt/zimbra/zmstat/df.csv [zimbra@zcs806 tmp]$ ls -lah /tmp/df.tar -rw-r----- 1 zimbra zimbra 80K Mar 29 06:44 /tmp/df.tar [zimbra@zcs806 tmp]$ tar tvf /tmp/df.tar -rw-r----- zimbra/zimbra 2566 2014-03-11 00:00 opt/zimbra/zmstat/2014-03-10/df.csv.gz -rw-r----- zimbra/zimbra 2553 2014-03-12 00:00 opt/zimbra/zmstat/2014-03-11/df.csv.gz [cut - Ajcody] -rw-r----- zimbra/zimbra 2513 2014-03-28 00:00 opt/zimbra/zmstat/2014-03-27/df.csv.gz -rw-r----- zimbra/zimbra 2531 2014-03-29 00:00 opt/zimbra/zmstat/2014-03-28/df.csv.gz -rw-r----- zimbra/zimbra 8013 2014-03-29 06:40 opt/zimbra/zmstat/df.csv
Directory Sizes In /opt/zimbra
Please see the following and provide the output to support. Note, even though this method is faster than doing a du it still can take awhile.
* Ajcody-Server-Misc-Topics#Faster_Way_To_Get_Directory_Size_On_Filesytem_-_find_vs_du
Adjusting The Disk Alert Threshold
Note - zmlocalconfig smtp_notify must return yes if you want to receive the notifications.
If you just need to adjust the disk alert threshold, then see the following:
See current values:
zmlocalconfig | grep zmdisklog
Example adjustment:
su - zimbra zmlocalconfig -e zmdisklog_critical_threshold=98 zmlocalconfig -e zmdisklog_warn_threshold=95 zmstatctl
To exclude a partition from the checks [example of two being excluded]:
su - zimbra zmlocalconfig -e zmstat_df_excludes="/mount/point:/mount/point2" zmstatctl
They might be a bug on this, where you'll keep getting email until a logrotate happens [zimbra.log?].
- Changing Zmstat-df values do not take affect until logrotate
Some things to do to confirm and share with support or in bug. As zimbra
su - zimbra ls -la /var/log/zimbra.log df -h /dev/mapper/vg_rhel664-lv_root 5.5G 3.5G 1.7G 68% / tmpfs 939M 0 939M 0% /dev/shm /dev/sda1 485M 79M 381M 18% /boot /dev/sdb1 30G 6.2G 23G 22% /opt date zmlocalconfig | grep zmdisklog zmdisklog_critical_threshold = 80 zmdisklog_warn_threshold = 85 zmlocalconfig -e zmdisklog_critical_threshold=95 zmlocalconfig -e zmdisklog_warn_threshold=90 zmlocalconfig | grep zmdisklog zmdisklog_critical_threshold = 95 zmdisklog_warn_threshold = 90 zmstatctl restart date ps -eaf | grep zmstat-df ls -la /var/log/zimbra.log date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday" ##Note - Emails by default go out every 10 minutes - for example: [zimbra@zcs803 ~]$ date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday" Thu May 22 09:40:08 PDT 2014 /var/log/zimbra.log:May 22 08:30:00 zcs803 zimbramon[18826]: 18826:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% /var/log/zimbra.log:May 22 08:40:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% /var/log/zimbra.log:May 22 08:50:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% /var/log/zimbra.log:May 22 09:00:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% ## Note - I had readjusted the variable to not warn during this time segment ## /var/log/zimbra.log:May 22 09:20:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% /var/log/zimbra.log:May 22 09:30:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% /var/log/zimbra.log:May 22 09:40:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82% num: 7, more: false Id Type From Subject Date ---- ---- -------------------- -------------------------------------------------- -------------- 1. 328 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 09:40 2. 327 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 09:30 3. 326 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 09:20 ## Note - I had readjusted the variable to not warn during this time segment ## 4. 325 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 09:00 5. 324 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 08:50 6. 323 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 08:40 7. 320 mess admin Disk / at 82% on zcs803.DOMAIN.com: 05/22/14 08:31
Continue to monitor your zmmailbox search results for an hour.
The Basic Information Support Needs
as root:
- cat /etc/fstab
- Shows us what is mounted upon boot
- cat /proc/mounts
- Shows us what is currently mounted and its status - you can see if a mount is read-only here.
- df -hT
- Lists current mounts using human-readable size information and also notes the filesystem type.
as zimbra:
- zmprov -l gs `zmhostname` | egrep 'Back|Redo'
- Will show us a number of variables related to backup and redologs. Also tell us if your using auto-group or the default method.
- du -sh /opt/zimbra/redolog
- Will might notice your redolog logs aren't rolling over, causing a possible issue.
- ls -latr /opt/zimbra/backup
- This is the default backup target, please adjust this path here and below if you are using a different zimbraBackupTarget value.
- zmprov gs `zmhostname` zimbraBackupTarget
- We'll be able to confirm permissions are right.
- ls -latr /opt/zimbra/backup/tmp
- This will show us if you have failed backup jobs and confirm tmp is being cleaned appropriately after the backup is done.
- ls -latr /opt/zimbra/backup/sessions
- This will show us what backup sessions are available and confirm permissions are correct.
- Adjust path if your zimbraBackupTarget value is not the default path.
- Some directory sizes in the backup directory:
- Default path first
- du -sh `find /opt/zimbra/backup -maxdepth 2 -type d`
- If your using a different backup target, check that directory also. Replace /opt/zimbra/backup above with your backup path.
- Default path first
- zmbackupquery
- This should match what's in the sessions directory and it will also tell us if status of each backup and how many accts were done.
- crontab -l | grep -i back
- This will show use when backups are support to run and with what options they are running with.
- zmlocalconfig | grep -i back
- This is useful to see a number of backup options not exposed in the crontab, things related to the zip options.
- zmvolume -l
- This is useful to see how many volumes are being used, if HSM is being used, and if compression is being done at the volume level.
Additional Log Files Support Might Need
And send the following logs:
- /var/log/messages
- Filesystem issues often times are noted here and also in syslog. This might explain an interruption in the backup process. Server restarts, filesystem going full, filesystem going read-only, etc.
- /var/log/syslog
- /opt/zimbra/log/mailbox.log
- The backup activity is logged here.
- And any other mailbox.log file that would cover the event
Additional Checks For Performance Specific Issues
If Your Using a SAN or NFS For Your Backup Target - Please Check Your IOWait
Ideally, you would compare iowait and performance data from the target backup host as well as the stats available on the ZCS servers. To get graphs and stats on this from ZCS, please see Ajcody-Testing-Debugging#zmstat_and_zmstat-chart . You should submit this data and iowait conclusions if you still need to submit a support case about backup performance issues.
Is HSM Running During Your Backup Window
- Are you running HSM? HSM should not be ran during your backup window.
- "RFE: HSM and backup should not run at the same time if initated."
Are You Using --zipStore
--zipStore zips the blobs vs. keeping the blobs as individual files. --zipStore does not use compression either. For most circumstances, this will give the best performance, especially with NFS. This should be the default behavior of the backups, the following RFE is when it became the default [ZCS6+] :
- "backup: default to the zip option"
- https://bugzilla.zimbra.com/show_bug.cgi?id=31836#c6
- Link to comment that explains options and default behavior.
- https://bugzilla.zimbra.com/show_bug.cgi?id=31836#c6
To see if zip's are being used for backups for example, in the backup/session directory you'll know if it is by seeing .zip files:
mail:~/backup/sessions/full-20080820.160003.770/accounts/115/988/11598896-a89b-4b9d-bedb-1ed1afcb6c87/blobs zimbra$ ls blobs-1.zip blobs-2.zip blobs-3.zip blobs-4.zip
To see if the zip file is using compression [-Z option for unzip will indicate whether or not the archive is actually compressed] :
unzip -Z blobs-4.zip 293 files, 5982984 bytes uncompressed, 5982984 bytes compressed: 0.0%
Also, if your zmvolume has compression enabled the blobs will remain compressed within the zip also upon backup. The point being, they are uncompressed to be then put into a zip file when the backup is using --zipStore.