Troubleshooting Course Content Rough Drafts-Recover Missing Data - User

Revision as of 21:54, 2 March 2015 by Gayle Billat (talk | contribs) (Setup A Fast Test NOT Using NFS)

Need To Do

  • review training materials for sys admin course
  • determine what should be 'video' and how it can all be built upon preceding steps in a hands on training manner

Bug & RFE's I Made While Researching This Draft Write Up

General Overview Of Backup Environment

Backup Directory Partition & Filesystem

Confirm You Are Using Support Partition & Filesystem

It's recommended to have a dedicated partition for backups and to use only a supported partition/filesystem type:

NFS Use For Backups

Please see this RFE:

This is the proposed statement to be included in the release notes following the RFE:

ZCS & NFS:
Zimbra will support customers that store backups (e.g. /opt/zimbra/backup) on an NFS-mounted partition. Please note that this does not relieve the customer from the responsibility of providing a storage system with a performance level appropriate to their desired backup and restore times. In our experience, network-based storage access is more likely to encounter latency or disconnects than is equivalent storage attached by fiber channel or direct SCSI.
Zimbra continues to view NFS storage of other parts of the system as unsupported. Our testing has shown poor read and write performance for small files over NFS implementations, and as such we view it unlikely that this policy will change for the index and database stores. We will continue to evaluate support for NFS for the message store as customer demand warrants.
When working with Zimbra Support on related issues, the customer must please disclose that the backup storage used is NFS.
Things To Check When Using NFS
  • Check the /var/log/messages on both the zimbra server and the nfs server for nfs related errors during the time frame of your backup.
  • Check /opt/zimbra/log/mailbox.log for error messages about folders/files not being able to be written or missing directory errors.
  • Is root_squash configured on the nfs server? If it's changed to no_root_squash , does the behavior of the backup change?
  • Is the */backup directory owned by zimbra:zimbra with at least 750 or 755 permissions?
    • This parent directory as given in:
      • zmprov gs `zmhostname` zimbraBackupTarget
  • Does zimbraBackupTarget have at least the subdirectories of : sessions and tmp  : and are owned by zimbra:zimbra with 750 or 755 permissions?
    • If not, try manually creating them and then running a test backup.
  • IF USING A NAS - MAKE SURE YOUR NOT USING EXTENDED ACLS OR THAT YOU HAVE THEM CONFIGURED PROPERLY
    • If your backup session directory shows something like : drwxrwx---+ 2 zimbra zimbra 4096 Sep 14 00:00 TO_DELETE-full-XXXXXX , that + sign indicated extended acls are in use.
Debugging Example When Using NFS

Steps I wrote for one customer, where saving out the information as you walk through all the commands would give enough information [hopefully] to submit a good rfe/bug:

 1. make a test partition on nfs server - /nfs-test

 2. mount on zimbra server
 2A. mkdir /nfs-test
 2B. chmod 755 /nfs-test
 2C. mount nfs-server:/nfs-test /nfs-test
 2D. ls -la /nfs-test
 2E. mkdir /nfs-test/backup
 2F. chown zimbra:zimbra /nfs-test/backup
 2G. chmod 755 /nfs-test/backup
 2H. su - zimbra ; touch /nfs-test/backup/testfile 
 2I. ls -laR /nfs-test/
 2J. rm /nfs-test/backup/testfile

 3. Set zimbraBackupTarget
 3A. zmprov ms `zmhostname` zimbraBackupTarget /nfs-test/backup

 4. Run a full backup against one account
 4A. ex. zmbackup -f -a user@domain.com

 5. ls -laR /nfs-test/

 6. If you again, run into the same problem. You could also repeat the backup after increasing the backup 
    logging variable for the account your trying to backup. If you didn't run into the same problem, it might 
    had to do with the initial setup of the nfs mount and permissions being used during the directory creation.
 6A. zmprov aal user@domain.com zimbra.backup debug
 6B. logging will show up in /opt/zimbra/log/mailbox.log
 6C. Remove account logging when your done.  zmprov ral user@domain.com zimbra.backup

 8. Change zimbraBackupTarget back to your production path.
Setup A Fast Test NOT Using NFS

A way to "avoid" the NFS issues for testing purposes would be to setup a new zimbraBackupTarget to try doing a full backup of a couple of user accounts. I DON'T recommend this if you are using auto-group for zimbraBackupMode [ zmprov gs `zmhostname` zimbraBackupMode ] , only if you are using Standard mode.

[as root]
** adjust your new "backup" directory path as needed - mine is just an example**
mkdir /mnt/usb1/backup-test
chown zimbra:zimbra /mnt/usb1/backup-test
chmod 750 /mnt/usb1/backup-test
su - zimbra
**confirm backup mode as standard**
zmprov gs `zmhostname` zimbraBackupMode
**if not stand, please stop**
zmprov ms `zmhostname` zimbraBackupTarget /mnt/usb1/backup-test
**Find a couple of test accounts to do a full for. Make sure you'll have space to do the backup regards to need free space.**
zmbackup -f -a user1@domain.com user2@domain.com user3@domain.com
**Watch and confirm status of backup you just started.**
zmbackupquery
**Confirm files were backed up in right location**
ls /mnt/usb1/backup-test/sessions/
**Failed backups would most likely results in left over directory in tmp directory**
ls /mnt/usb1/backup-test/tmp/

Standard Backup or Autogroup

Backup Schedule

crontab

Backup CLI Commands

zmbackup zmbackupquery

Monitoring And Maintenance Of Backups

monitoring backups and disk usage

Server Issues

Questions To Ask & Address Prior To Doing A Server Restore

  • Should I shutdown ZCS services or block client access until issue and resolution are identified?
  • What data needs to really be restored - can you restore just that or does it need a full restore?

Server Restores

Server Restores - Disaster Recoveries

For Full Single ZCS Server DR Restores

Please see Network_Edition_Disaster_Recovery

Some additional notes I have on it - Ajcody-Disaster-Recovery-Specific-Notes

For Multi-Server DR Restore Specifics

Along with the above references, please see Ajcody-Notes-Multi-Server-Restore-DR

To Restore Just The LDAP Date

Let's say your ldap data was 'lost/destoyed' but everything else was intact, you should look at the zmrestoreldap command.

This section should have more precaution and background information to handle this section.

The basics:

  • To find the LDAP session labels type -lbs.
    • zmrestoreldap -lbs
  • Restore the complete LDAP directory server
    • zmrestoreldap -lb full20061130135236
  • Restore LDAP data for specific accounts
    • zmrestoreldap -lb full20061130135236 -a tac@DOMAIN.com jane@DOMAIN.com
To Restore Just The Mysql DB

Option available from the following RFE work:

  • "Allow backup of only primary message volume"
    • https://bugzilla.zimbra.com/show_bug.cgi?id=35278
      • Options to exclude types of data as described in the 6.0.6 Admin Guide:
        • Search index : If you do not restore the search index data, the mailbox will have to be reindexed after the restore.
          • zmrestore <all or account> --exclude-search-index
        • Blobs : This is a useful option when all blobs for the mailbox being restored already exists.
          • zmrestore <all or account>|--exclude-blobs
        • HSM-blobs : This is useful when all HSM blobs for the mailbox being restored already exists.
          • zmrestore <all or account> --exclude-hsm-blobs
Steps To Practice And Test A DR Situation - System And Mysql Data Only

This test should be done against a non-production server. Either install a 'dummy' ZCS server or create a clone of your production server to perform this test against.

Let's say your mysql data was 'lost/destoyed' but everything else was intact. This might be the solution for the situation:

The steps below are to REPRODUCE A DR situation to test.

  1. zmcontrol stop
  2. Caution - step to reproduce DR situation to test against
    1. mv ~/db/data ~/db/data.OLD
  3. Getting old mysql passwords
    • zmlocalconfig -s mysql_root_password
    • zmlocalconfig -s zimbra_mysql_password
  4. [as zimbra] /opt/zimbra/libexec/zmmyinit
  5. ldap start
  6. zmconvertctl start
  7. mysql is already running per the zmmyinit - mysql.server status - to check.
  8. zmrestoreoffline -a all -br --systemData --excludeSearchIndex --excludeBlobs --excludeHsmBlobs
    • Note - you most likely will want to use the -br or -rf option for this situation.
      • -br,--backedupRedologsOnly : Replays the redo logs in backup only, which excludes archived and current redo logs of the system
      • -rf,--restoreFullBackupOnly : Restores to last full backup only, which excludes incremental backups.
    • -sys,--systemData : Restores global tables and local config.
    • Note the use of -a all , you could also do this for one account first to confirm operation is successful for your circumstances.
  9. If you used -rf or -br with your zmestoreoffline, you might also need to use zmplayredo to finish it up to get the most complete restore.

Server Cloning - Server Moves/Testing Purposes

User Issues

Questions To Ask & Address Prior To Doing A User Restore

  • Can data be restored from the user's dumpster [if enabled] ?
  • Is there a copy of the data in a third party client ? [POP email client that downloaded it, for example]
  • Does the data maybe exist in another user's account ? [For example, the senders account or others in the To/CC]
  • Does the data reside in a redolog/s ?
  • Finally, restore from backups ?
    • Do I need to overwrite the existing account with the restore or should I restore to a new account name?
      • Using the option -ca [create account] and -pre restore_ [prefix to the new account name, restore_user@ ]
    • What are your retention policies and how it might determine the backup set you need?

User Restores

  • Admin Console
    • Restore/Backup
    • Redirected restore : -ca -pre restored_
    • View Mail > TGZ export/import
  • Crossmailboxsearch
  • CLI
    • zmrestore
      • to latest point in time
      • to specific point in time in past
      • to incremental session label
      • to full backup label
  • Importing Restored Data Into Parent Account

Trouble-shooting Backup/Restore Issues And Other General Questions

Backup And Restore Compatibility Between ZCS Versions

Need to create summary statement

References:

Trouble-shooting

Ref:

Basic Backup Information To Submit To Support

Disk Space Usage Issues
Trend Data

If there is concerns about disks/partitions getting full, this command would be helpful for trending data on your server. Send support the resulting df.tar file . Note - adjust the tail command if you want more than 20 day's worth of trending data, the -n 20 option.

[zimbra@zcs806 tmp]$ /tmp

[zimbra@zcs806 tmp]$ tar cvf /tmp/df.tar `find /opt/zimbra/zmstat -name df.cs\* | sort | tail -n 20`
tar: Removing leading `/' from member names
/opt/zimbra/zmstat/2014-03-10/df.csv.gz
/opt/zimbra/zmstat/2014-03-11/df.csv.gz
 [cut - Ajcody]
/opt/zimbra/zmstat/2014-03-27/df.csv.gz
/opt/zimbra/zmstat/2014-03-28/df.csv.gz
/opt/zimbra/zmstat/df.csv

[zimbra@zcs806 tmp]$ ls -lah /tmp/df.tar
-rw-r----- 1 zimbra zimbra 80K Mar 29 06:44 /tmp/df.tar

[zimbra@zcs806 tmp]$ tar tvf /tmp/df.tar
-rw-r----- zimbra/zimbra  2566 2014-03-11 00:00 opt/zimbra/zmstat/2014-03-10/df.csv.gz
-rw-r----- zimbra/zimbra  2553 2014-03-12 00:00 opt/zimbra/zmstat/2014-03-11/df.csv.gz
 [cut - Ajcody]
-rw-r----- zimbra/zimbra  2513 2014-03-28 00:00 opt/zimbra/zmstat/2014-03-27/df.csv.gz
-rw-r----- zimbra/zimbra  2531 2014-03-29 00:00 opt/zimbra/zmstat/2014-03-28/df.csv.gz
-rw-r----- zimbra/zimbra  8013 2014-03-29 06:40 opt/zimbra/zmstat/df.csv
Directory Sizes In /opt/zimbra

Please see the following and provide the output to support. Note, even though this method is faster than doing a du it still can take awhile.

* Ajcody-Server-Misc-Topics#Faster_Way_To_Get_Directory_Size_On_Filesytem_-_find_vs_du
Adjusting The Disk Alert Threshold

Note - zmlocalconfig smtp_notify must return yes if you want to receive the notifications.

If you just need to adjust the disk alert threshold, then see the following:

See current values:

 zmlocalconfig | grep zmdisklog

Example adjustment:

 su - zimbra
 zmlocalconfig -e zmdisklog_critical_threshold=98
 zmlocalconfig -e zmdisklog_warn_threshold=95
 zmstatctl

To exclude a partition from the checks [example of two being excluded]:

 su - zimbra
 zmlocalconfig -e zmstat_df_excludes="/mount/point:/mount/point2"
 zmstatctl

They might be a bug on this, where you'll keep getting email until a logrotate happens [zimbra.log?].

Some things to do to confirm and share with support or in bug. As zimbra

su - zimbra

ls -la /var/log/zimbra.log

df -h
 /dev/mapper/vg_rhel664-lv_root
                      5.5G  3.5G  1.7G  68% /
 tmpfs                 939M     0  939M   0% /dev/shm
 /dev/sda1             485M   79M  381M  18% /boot
 /dev/sdb1              30G  6.2G   23G  22% /opt

date

zmlocalconfig | grep zmdisklog
 zmdisklog_critical_threshold = 80
 zmdisklog_warn_threshold = 85  

zmlocalconfig -e zmdisklog_critical_threshold=95

zmlocalconfig -e zmdisklog_warn_threshold=90

zmlocalconfig | grep zmdisklog
 zmdisklog_critical_threshold = 95
 zmdisklog_warn_threshold = 90

zmstatctl restart

date

ps -eaf | grep zmstat-df

ls -la /var/log/zimbra.log

date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday"

##Note - Emails by default go out every 10 minutes - for example:

[zimbra@zcs803 ~]$ date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday"
Thu May 22 09:40:08 PDT 2014
/var/log/zimbra.log:May 22 08:30:00 zcs803 zimbramon[18826]: 18826:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 08:40:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 08:50:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:00:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
  ## Note - I had readjusted the variable to not warn during this time segment ##
/var/log/zimbra.log:May 22 09:20:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:30:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:40:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
num: 7, more: false

     Id  Type   From                  Subject                                             Date
   ----  ----   --------------------  --------------------------------------------------  --------------
1.  328  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:40
2.  327  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:30
3.  326  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:20
  ## Note - I had readjusted the variable to not warn during this time segment ##
4.  325  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:00
5.  324  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 08:50
6.  323  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 08:40
7.  320  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 08:31

Continue to monitor your zmmailbox search results for an hour.

The Basic Information Support Needs

as root:

  • cat /etc/fstab
    • Shows us what is mounted upon boot
  • cat /proc/mounts
    • Shows us what is currently mounted and its status - you can see if a mount is read-only here.
  • df -hT
    • Lists current mounts using human-readable size information and also notes the filesystem type.

as zimbra:

  • zmprov -l gs `zmhostname` | egrep 'Back|Redo'
    • Will show us a number of variables related to backup and redologs. Also tell us if your using auto-group or the default method.
  • du -sh /opt/zimbra/redolog
    • Will might notice your redolog logs aren't rolling over, causing a possible issue.
  • ls -latr /opt/zimbra/backup
    • This is the default backup target, please adjust this path here and below if you are using a different zimbraBackupTarget value.
    • zmprov gs `zmhostname` zimbraBackupTarget
    • We'll be able to confirm permissions are right.
  • ls -latr /opt/zimbra/backup/tmp
    • This will show us if you have failed backup jobs and confirm tmp is being cleaned appropriately after the backup is done.
  • ls -latr /opt/zimbra/backup/sessions
    • This will show us what backup sessions are available and confirm permissions are correct.
    • Adjust path if your zimbraBackupTarget value is not the default path.
  • Some directory sizes in the backup directory:
    • Default path first
      • du -sh `find /opt/zimbra/backup -maxdepth 2 -type d`
    • If your using a different backup target, check that directory also. Replace /opt/zimbra/backup above with your backup path.
  • zmbackupquery
    • This should match what's in the sessions directory and it will also tell us if status of each backup and how many accts were done.
  • crontab -l | grep -i back
    • This will show use when backups are support to run and with what options they are running with.
  • zmlocalconfig | grep -i back
    • This is useful to see a number of backup options not exposed in the crontab, things related to the zip options.
  • zmvolume -l
    • This is useful to see how many volumes are being used, if HSM is being used, and if compression is being done at the volume level.
Additional Log Files Support Might Need

And send the following logs:

  • /var/log/messages
    • Filesystem issues often times are noted here and also in syslog. This might explain an interruption in the backup process. Server restarts, filesystem going full, filesystem going read-only, etc.
  • /var/log/syslog
  • /opt/zimbra/log/mailbox.log
    • The backup activity is logged here.
    • And any other mailbox.log file that would cover the event

Additional Checks For Performance Specific Issues

If Your Using a SAN or NFS For Your Backup Target - Please Check Your IOWait

Ideally, you would compare iowait and performance data from the target backup host as well as the stats available on the ZCS servers. To get graphs and stats on this from ZCS, please see Ajcody-Testing-Debugging#zmstat_and_zmstat-chart . You should submit this data and iowait conclusions if you still need to submit a support case about backup performance issues.

Is HSM Running During Your Backup Window
Are You Using --zipStore

--zipStore zips the blobs vs. keeping the blobs as individual files. --zipStore does not use compression either. For most circumstances, this will give the best performance, especially with NFS. This should be the default behavior of the backups, the following RFE is when it became the default [ZCS6+] :

To see if zip's are being used for backups for example, in the backup/session directory you'll know if it is by seeing .zip files:

mail:~/backup/sessions/full-20080820.160003.770/accounts/115/988/11598896-a89b-4b9d-bedb-1ed1afcb6c87/blobs 
zimbra$ ls blobs-1.zip    blobs-2.zip    blobs-3.zip    blobs-4.zip

To see if the zip file is using compression [-Z option for unzip will indicate whether or not the archive is actually compressed] :

unzip -Z blobs-4.zip
293 files, 5982984 bytes uncompressed, 5982984 bytes compressed:  0.0%

Also, if your zmvolume has compression enabled the blobs will remain compressed within the zip also upon backup. The point being, they are uncompressed to be then put into a zip file when the backup is using --zipStore.

Performance Issues And Time To Complete

RFE's for increase backup performance:

What Might Be Wrong?

Need to expand this section or refer to the part above about all the log/data gathering that support would normally ask for

First thing to check is the log file and see if anything notable stands out.

grep -i backup /opt/zimbra/log/mailbox.log
Use Auto-Group Backups Rather Than Default Style Backup

Having trouble completing that entire full backup during off-hours? Enter the hybrid auto-grouped mode, which combines the concept of full and incremental backup functions - you’re completely backing up a target number of accounts daily rather than running incrementals.

Auto-grouped mode automatically pulls in the redologs since the last run so you get incremental backups of the remaining accounts; although the incremental accounts captured via the redologs are not listed specifically in the backup account list. This still allows you to do a point in time restore for any account.

Please see the following for more detailed information:

Need To Write Fewer Files - Add The Zip Option To Your Backup Commands

Using the zip option will compress all those thousands of single files that exist under a user's backup, decreasing performance issues that arise from writing out thousands of small files as compared to large ones. This is often seen when one is :

  • Using nfs for the backup directory
  • Copying/rsyncing backups to a remote server
  • Are using some third party backup software (to tape) to archive/backup the zimbra backup sessions.

Please see the following for more information about using the Zip option:

SAN Snapshots For Backups

Please see:

Cloud Backups

Please see:

Tape Backups

Use a 'secondary' network card to run your backups over if possible. This way the backup activity doesn't effect ZCS network performance over the default network card. Also, please review the following section on Zimbra and hard links and confirm if your third party backup software or process will handle restoring hard links correctly.

Please see:

Quiz

  1. What are three methods you can locate which backups a user is in?
  2. To stop a restore from playing all the way up to the current time, you would need to use one of these two zmrestore variables - what are they?
  3. If you see an error about being unable to restore because of redologs, what option might you need to include?
  4. What is the primary log file on the ZCS server you'll monitor or review for restore or backup issues?
  5. Provide two methods to increase logging/debug output when attempting a restore?
  6. Can you perform a full system restore from the backups done on an older ZCS version?
  7. What are the two identify variables that zmbackup/zmrestore uses for an individual?
  8. What user variable would you use to locate the directory a users data is stored in within a backup session?
Verified Against: Zimbra Collaboration Suite 8.6 Date Created: 01/22/2015
Article ID: https://wiki.zimbra.com/index.php?title=Troubleshooting_Course_Content_Rough_Drafts-Recover_Missing_Data_-_User Date Modified: 2015-03-02



Try Zimbra

Try Zimbra Collaboration with a 60-day free trial.
Get it now »

Want to get involved?

You can contribute in the Community, Wiki, Code, or development of Zimlets.
Find out more. »

Looking for a Video?

Visit our YouTube channel to get the latest webinars, technology news, product overviews, and so much more.
Go to the YouTube channel »

Jump to: navigation, search