Troubleshooting Course Content Rough Drafts-Recover Missing Data - User

Need To Do

  • review training materials for sys admin course
  • determine what should be 'video' and how it can all be built upon preceding steps in a hands on training manner

Bug & RFE's I Made While Researching This Draft Write Up

General Overview Of Backup Environment

Backup Directory Partition & Filesystem

Confirm You Are Using Support Partition & Filesystem

It's recommended to have a dedicated partition for backups and to use only a supported partition/filesystem type:

NFS Use For Backups

Please see this RFE:

Statement that was included in the release notes following the RFE:

ZCS & NFS:
Zimbra will support customers that store backups (e.g. /opt/zimbra/backup) on an NFS-mounted partition. Please note that this does not relieve the customer from the responsibility of providing a storage system with a performance level appropriate to their desired backup and restore times. In our experience, network-based storage access is more likely to encounter latency or disconnects than is equivalent storage attached by fiber channel or direct SCSI.
Zimbra continues to view NFS storage of other parts of the system as unsupported. Our testing has shown poor read and write performance for small files over NFS implementations, and as such we view it unlikely that this policy will change for the index and database stores. We will continue to evaluate support for NFS for the message store as customer demand warrants.
When working with Zimbra Support on related issues, the customer must please disclose that the backup storage used is NFS.
Things To Check When Using NFS
  • Check the /var/log/messages on both the zimbra server and the nfs server for nfs related errors during the time frame of your backup.
  • Check /opt/zimbra/log/mailbox.log for error messages about folders/files not being able to be written or missing directory errors.
  • Is root_squash configured on the nfs server? If it's changed to no_root_squash , does the behavior of the backup change?
  • Is the */backup directory owned by zimbra:zimbra with at least 750 or 755 permissions?
    • This parent directory as given in:
      • zmprov gs `zmhostname` zimbraBackupTarget
  • Does zimbraBackupTarget have at least the subdirectories of : sessions and tmp  : and are owned by zimbra:zimbra with 750 or 755 permissions?
    • If not, try manually creating them and then running a test backup.
  • IF USING A NAS - MAKE SURE YOUR NOT USING EXTENDED ACLS OR THAT YOU HAVE THEM CONFIGURED PROPERLY
    • If your backup session directory shows a + sign in the description section like the example below, it's a sign your using extend acls.
      •  drwxrwx---+  2 zimbra zimbra  4096 Sep 14 00:00 TO_DELETE-full-XXXXXX
      • The example above is from a customer reporting that their backups weren't being deleted and it was filling up their backup partitions. The old backup sessions were being renamed to TO_DELETE-full-XXXXXX directory naming scheme but weren't being deleted. The root cause of the problem was because extended acls were being used on their NAS server and it was preventing the delete operation from completing.
Troubleshooting Example When Using NFS

Steps to setup your ZCS server to use a NFS mount and to confirm you can save a backup to the NFS partition mount:

 1. make a test partition on nfs server - /nfs-test

 2. mount on zimbra server
 2A. mkdir /nfs-test
 2B. chmod 755 /nfs-test
 2C. mount nfs-server:/nfs-test /nfs-test
 2D. ls -la /nfs-test
 2E. mkdir /nfs-test/backup
 2F. chown zimbra:zimbra /nfs-test/backup
 2G. chmod 755 /nfs-test/backup
 2H. su - zimbra ; touch /nfs-test/backup/testfile 
 2I. ls -laR /nfs-test/
 2J. rm /nfs-test/backup/testfile

 3. Set zimbraBackupTarget
 3A. zmprov ms `zmhostname` zimbraBackupTarget /nfs-test/backup

 4. Run a full backup against one account
 4A. ex. zmbackup -f -a user@domain.com

 5. ls -laR /nfs-test/

 6. If you again, run into the same problem. You could also repeat the backup after increasing the backup 
    logging variable for the account your trying to backup. If you didn't run into the same problem, it might 
    had to do with the initial setup of the nfs mount and permissions being used during the directory creation.
 6A. zmprov aal user@domain.com zimbra.backup debug
 6B. logging will show up in /opt/zimbra/log/mailbox.log
 6C. Remove account logging when your done.  zmprov ral user@domain.com zimbra.backup

 8. Change zimbraBackupTarget back to your production path.
Setup A Fast Test NOT Using NFS

A way to "avoid" the NFS issues for testing purposes would be to setup a new zimbraBackupTarget to try doing a full backup of a couple of user accounts. I DON'T recommend this if you are using auto-group for zimbraBackupMode [ zmprov gs `zmhostname` zimbraBackupMode ] , only if you are using Standard mode.

[as root]
** adjust your new "backup" directory path as needed - mine is just an example**
mkdir /mnt/usb1/backup-test
chown zimbra:zimbra /mnt/usb1/backup-test
chmod 750 /mnt/usb1/backup-test
su - zimbra
**confirm backup mode as standard**
zmprov gs `zmhostname` zimbraBackupMode
**if not Standard, please stop**
zmprov ms `zmhostname` zimbraBackupTarget /mnt/usb1/backup-test
**Find a couple of test accounts to do a full for. Make sure you'll have space to do the backup regards to need free space.**
zmbackup -f -a user1@domain.com user2@domain.com user3@domain.com
**Watch and confirm status of backup you just started.**
zmbackupquery
**Confirm files were backed up in right location**
ls /mnt/usb1/backup-test/sessions/
**Failed backups would most likely results in left over directory in tmp directory**
ls /mnt/usb1/backup-test/tmp/

Standard Backup Or Autogroup

Knowing what the zimbraBackupMode variable is set to on the server is important when trouble shooting and backup/restore issues. The options for zimbraBackupMode are Standard or Auto-Grouped.

  • The Standard backup method is to run a weekly full backup session and daily incremental backup sessions to back up all mailboxes daily. The standard backup method is appropriate for enterprise deployments where full backups are run during non-working days.
  • The Auto-Grouped backup method is recommended for large ZCS environments where running a full backup of all accounts at one time would take too long.The auto-grouped backup method runs a full backup session for a different group of mailboxes at each scheduled backup. The system administrator configures the interval that backups should run and configures the number of groups that backups are made up of. ZCS then automatically backs up mailboxes in groups over the interval specified.

zimbraBackupMode can be set at the Global and Server level, if set at the server level it overrides the Global setting if different. To see what it's currently set at you would do:

zmprov gcf zimbraBackupMode 
zmprov gs `zmhostname` zimbraBackupMode 

Your backup scheduled and crontab should be set appropriately for the zimbraBackupMode method your using. And changes to zimbraBackupMode should also have you review the backup schedule and crontab setup.

Auto-Grouped Wants To Run A Full Backup Against All Users On The First Run

One issue that might happen when switching from Standard to Auto-Grouped for your backup method is for the first Auto-Grouped backup job it will attempt to backup all users. This can cause your disk partition to become full if your unprepared for it. To avoid this first full backup of all users when switching to the Auto-Grouped method you can consult the following bug for a work around.

Backup Schedule

The backup scheduled can be set by the zmschedulebackup command or modifying the zimbra users crontab file.

The default backup schedule for a server that is using the Standard backup method is below. The example shows what's in the crontab and also the output of zmschedulebackup query command for the current schedule, the -q option :

[zimbra@mail-172 sessions]$ crontab -l | grep backup
0 1 * * 6 /opt/zimbra/bin/zmbackup -f -a all    --mail-report
0 1 * * 0-5 /opt/zimbra/bin/zmbackup -i  --mail-report
0 0 * * * /opt/zimbra/bin/zmbackup -del 1m --mail-report

[zimbra@mail-172 ~]$ zmschedulebackup -q
Current Schedule:

        f 0 1 * * 6 -a all --mail-report
        i 0 1 * * 0-5 --mail-report
        d 1m 0 0 * * * --mail-report

zmschedulebackup -D will set the following defaults backup scheduled based upon what zimbraBackupMode is set to on the server. The example below shows setting zimbraBackupMode to Auto-Grouped and using zmschedulebackup to set the default backup schedule in the zimbra crontab for Auto-Grouped method.

[zimbra@mail-172 sessions]$ zmschedulebackup -D
Default schedule set

Current Schedule:

        f 0 1 * * 0-6
        d 1m 0 0 * * *
[zimbra@mail-172 sessions]$ crontab -l | grep backup
0 1 * * 0-6 /opt/zimbra/bin/zmbackup -f
0 0 * * * /opt/zimbra/bin/zmbackup -del 1m

Notice that the Auto-Grouped method runs a "full" everyday and drops the line for incremental backups. It also drops the "-a all" option since it's an inappropriate option when using Auto-Grouped. There is currently a bug, see below, that also caused the --mail-report option to be dropped. You can include it against by manually modifying the crontab file.

Backup And Restore CLI Commands You Should Be Familiar With

  • zmschedulebackup : This command is used to schedule full backups, incremental backups, and deletion of old backups.
  • zmbackup : This command executes full or incremental backup of the mail server. This is run on a live server, while the mailboxd process and the mailbox server are running. This command also has an option to manually delete old backups when they are no longer needed.
  • zmbackupabort : This command stops a full backup that is in process.
    • zmbackupabort -r : This command stops an ongoing restore.
  • zmbackupquery : This command lists the information about ongoing and completed backups, including labels and dates.
  • zmrestore : This command executes a full or incremental restore to the Zimbra mail server. The zmrestore command is performed on a server that is running.
  • zmrestoreoffline : This command restores the Zimbra mail server when the mailboxd process is stopped.
  • zmrestoreldap : This command restores the complete LDAP directory server, including accounts, domains, servers, COS and other data.
  • zmplayredo : This command allows you to play/replay a redolog file/s. It requires the mailbox service to be stopped though.
  • zmredodump : This command will dump the contents of a redolog file/s so you can see the transactions that were done. This is useful if your trying to confirm an action was or wasn't done, what time it was done, and so forth. This does not require the mailbox service to be stopped.

Monitoring And Maintenance Of Backups

Monitoring backups and disk usage

  • Confirm that your zimbra crontab file for your backup runs include the --mail-report option.
  • Make sure your monitoring your partition space appropriately. You should have the stats service installed and enabled on your servers so it can monitor your partitions and trigger an alert if they exceed a percentage used.
[zimbra@mail-172 ~]$ zmcontrol status | grep stats
        stats                   Running

[zimbra@mail-172 ~]$ zmlocalconfig | egrep -i 'zmstat_d|zmdisk'
zmdisklog_critical_threshold = 95
zmdisklog_warn_threshold = 85
zmstat_df_excludes =
zmstat_disk_interval = 600
  • Monitor the following directories as well, they often times can be the reason behind your disk partition becoming full.
    • Default path: /opt/zimbra/redolog/archive/ [zmprov gs `zmhostname` zimbraRedoLogArchiveDir] ; uses the following variable for the redolog archive directory.
      • This could indicate your backups are failing and therefor your archived redologs aren't being purged with a successful backup.
    • Default path: /opt/zimbra/backup/tmp/ [zmprov gs `zmhostname` zimbraBackupTarget] ; uses the following variable for the tmp directory.
      • Failed backups will stay in the tmp directory and not be moved into the sessions directory.
  • Monitoring and reviewing the output of zmbackupquery and confirming that the "Number of accounts:" line falls within expectations for your system - most notably, for the full backups. Do NOT assume that full backup labels in your backup directory are an indication your system is backing up all your users.
    • Case in point of a real support case, if the system no longer has enough space to do a full backup of all your users it will still backup new users and the label will be a full. The system administrator has assumed full backups were being done correctly because they simply listed the directory contents of the sessions directory and saw numerous full backup directories. What they didn't realize, those were just for new users and each full backup session only held one or two accounts. The server was failing on the full backup for all users because it was running out of space and would terminate prior to completing.
[zimbra@mail-172 ~]$ zmbackupquery --type full
Label:   full-20150228.060012.892
Type:    full
Status:  completed
Started: Sat, 2015/02/28 01:00:12.892 EST
Ended:   Sat, 2015/02/28 01:00:29.751 EST
Redo log sequence range: 7 .. 7
Number of accounts: 10 out of 10 completed

[zimbra@mail-172 ~]$ zmprov -l gaa | wc -l
10

When your using the Standard backup method, your incremental backup runs will also run a full backup against any 'new' users that currently don't have a full backup. Example of "new" accounts being backed up and also seeing the incremental being ran.

drwxr-x--- 6 zimbra zimbra 4096 Aug 20 01:04 incr-20130819.193041.672
drwxr-x--- 6 zimbra zimbra 4096 Aug 20 01:04 full-20130819.193420.421
drwxr-x--- 6 zimbra zimbra 4096 Aug 21 01:04 incr-20130820.193044.443
drwxr-x--- 6 zimbra zimbra 4096 Aug 21 01:04 full-20130820.193424.551
drwxr-x--- 6 zimbra zimbra 4096 Aug 22 01:04 incr-20130821.193039.818
drwxr-x--- 6 zimbra zimbra 4096 Aug 22 01:04 full-20130821.193427.056
drwxr-x--- 6 zimbra zimbra 4096 Aug 23 01:03 incr-20130822.193037.145
drwxr-x--- 6 zimbra zimbra 4096 Aug 23 01:04 full-20130822.193352.394

Server Issues

Questions To Ask & Address Prior To Doing A Server Restore

  • Should I shutdown ZCS services or block client access until issue and resolution are identified?
  • What data needs to really be restored - can you restore just that or does it need a full restore?

Server Restores

Server Restores - Disaster Recoveries

For A ZCS Server DR Restore

Please see Network_Edition_Disaster_Recovery on doing a full system restore.

To Restore Just The LDAP Date

Let's say your ldap data was 'lost/destoyed' but everything else was intact, you should look at the zmrestoreldap command.

This section should have more precaution and background information to handle this section.

The basics:

  • To find the LDAP session labels type -lbs.
    • zmrestoreldap -lbs
  • Restore the complete LDAP directory server
    • zmrestoreldap -lb full20061130135236
  • Restore LDAP data for specific accounts
    • zmrestoreldap -lb full20061130135236 -a tac@DOMAIN.com jane@DOMAIN.com
To Restore Just The Mysql DB

Option available from the following RFE work:

  • "Allow backup of only primary message volume"
    • https://bugzilla.zimbra.com/show_bug.cgi?id=35278
      • Options to exclude types of data as described in the 6.0.6 Admin Guide:
        • Search index : If you do not restore the search index data, the mailbox will have to be reindexed after the restore.
          • zmrestore <all or account> --exclude-search-index
        • Blobs : This is a useful option when all blobs for the mailbox being restored already exists.
          • zmrestore <all or account>|--exclude-blobs
        • HSM-blobs : This is useful when all HSM blobs for the mailbox being restored already exists.
          • zmrestore <all or account> --exclude-hsm-blobs
Steps To Practice And Test A DR Situation - System And Mysql Data Only

This test should be done against a non-production server. Either install a 'dummy' ZCS server or create a clone of your production server to perform this test against.

Let's say your mysql data was 'lost/destoyed' but everything else was intact. This might be the solution for the situation:

The steps below are to REPRODUCE A DR situation to test.

  1. zmcontrol stop
  2. Caution - step to reproduce DR situation to test against
    1. mv ~/db/data ~/db/data.OLD
  3. Getting old mysql passwords
    • zmlocalconfig -s mysql_root_password
    • zmlocalconfig -s zimbra_mysql_password
  4. [as zimbra] /opt/zimbra/libexec/zmmyinit
  5. ldap start
  6. zmconvertctl start
  7. mysql is already running per the zmmyinit - mysql.server status - to check.
  8. zmrestoreoffline -a all -br --systemData --excludeSearchIndex --excludeBlobs --excludeHsmBlobs
    • Note - you most likely will want to use the -br or -rf option for this situation.
      • -br,--backedupRedologsOnly : Replays the redo logs in backup only, which excludes archived and current redo logs of the system
      • -rf,--restoreFullBackupOnly : Restores to last full backup only, which excludes incremental backups.
    • -sys,--systemData : Restores global tables and local config.
    • Note the use of -a all , you could also do this for one account first to confirm operation is successful for your circumstances.
  9. If you used -rf or -br with your zmestoreoffline, you might also need to use zmplayredo to finish it up to get the most complete restore.

Server Cloning - Server Moves/Testing Purposes

User Issues

Questions To Ask & Address Prior To Doing A User Restore

  • Can data be restored from the user's dumpster [if enabled] ?
  • Is there a copy of the data in a third party client ? [POP email client that downloaded it, for example]
  • Does the data maybe exist in another user's account ? [For example, the senders account or others in the To/CC]
  • Does the data reside in a redolog/s ?
  • Finally, restore from backups ?
    • Do I need to overwrite the existing account with the restore or should I restore to a new account name?
      • Using the option -ca [create account] and -pre restore_ [prefix to the new account name, restore_user@ ]
    • What are your retention policies and how it might determine the backup set you need?

User Restores

  • Admin Console
    • Restore/Backup
    • Redirected restore : -ca -pre restored_
    • View Mail > TGZ export/import
  • Crossmailboxsearch
  • Importing Restored Data Into Parent Account

Restoring A User To A Specific Full Backup, Incremental Backup, Or Point In Time

One of the more common support issues around restoring a user account is the complaint that the restore didn't roll back the account to the desired state. Almost always, this is because the proper options were not passed to the zmrestore command for the restore to stop at the desired time frame. If you don't give the proper options to zmrestore, it will restore the account all the way to the most recent redolog operations on the server. For example, it replays the delete operations that caused the user to request to restore in the first place.

You MUST use the -lb [full backup label name] option when your trying to restore anything that ISN'T meant to include the latest information for the mailbox. The -lb argument should specify a full backup that took place prior to the time of the backup you wish to restore.

Find Out What Backup Session Labels You Need First

To find out what backups are associated with a particular account, you would do the following :

zmbackupquery -a user@domain

You'll want to note what is the first full that occurs before the point in time you want to restore. And then the incremental that follows right after your point in time.

Backup label (-lb) for fulls can be found by doing [include the -v option if you want to see a listing of the user accounts within the backups] :

zmbackupquery --type full

Backup labels (-restoreToIncrLabel) for incrementals can be found by:

zmbackupquery --type incremental
Command Syntax Example For Restores On The CLI

Example to full label and stop :

zmrestore -a USER@DOMAIN.com -lb full-20080726.050017.306 -rf -ca -pre restore_

Example to incremental label and stop :

zmrestore -a USER@DOMAIN.com -restoreToIncrLabel incr-20080731.060007.644 -lb full-20080726.050017.306 -br -ca -pre restore_

Example to specific time and stop :

zmrestore -a USER@DOMAIN.com -restoreToTime 20080801011800 -lb full-20080726.050017.306 -br -ca -pre restore_


Important Options You Might Want Or Need To Include

--ignoreRedoErrors : If you attempt a restore and you see an error about problems related to playing the redolog, you'll want to run the restore command again and include this option.


--skipDeletes : Please see http://bugzilla.zimbra.com/show_bug.cgi?id=31824#c5 for details on this.


-t /path/to/backup_dir : If you are restoring from another backup directory besides your current default path.


Variables that are asking for TIME rather than LABELS should follow this syntax (from zmrestore --help):

Specify date/time in one of these formats:

    2008/08/06 09:55:50
    2008/08/06 09:55:50 572
    2008/08/06 09:55:50.572
    2008/08/06-09:55:50-572
    2008/08/06-09:55:50
    20080806.095550.572
    20080806.095550
    20080806095550572
    20080806095550

Specify year, month, date, hour, minute, second, and optionally millisecond.
Month/date/hour/minute/second are 0-padded to 2 digits, millisecond to 3 digits.
Hour must be specified in 24-hour format, and time is in local time zone.

Trouble-shooting Backup/Restore Issues And Other General Questions

Backup And Restore Compatibility Between ZCS Versions

  • You can use backups from older versions of ZCS for account restores.
  • You can NOT use backups from old versions of ZCS for system restores though. Disaster Recovery restores have to be done with the same version of ZCS as the backups were done with.

References:

Trouble-shooting

Ref:

Basic Backup Information To Submit To Support

Disk Space Usage Issues
Trend Data

If there is concerns about disks/partitions getting full, this command would be helpful for trending data on your server. Send support the resulting df.tar file . Note - adjust the tail command if you want more than 20 day's worth of trending data, the -n 20 option.

[zimbra@zcs806 tmp]$ /tmp

[zimbra@zcs806 tmp]$ tar cvf /tmp/df.tar `find /opt/zimbra/zmstat -name df.cs\* | sort | tail -n 20`
tar: Removing leading `/' from member names
/opt/zimbra/zmstat/2014-03-10/df.csv.gz
/opt/zimbra/zmstat/2014-03-11/df.csv.gz
 [cut - Ajcody]
/opt/zimbra/zmstat/2014-03-27/df.csv.gz
/opt/zimbra/zmstat/2014-03-28/df.csv.gz
/opt/zimbra/zmstat/df.csv

[zimbra@zcs806 tmp]$ ls -lah /tmp/df.tar
-rw-r----- 1 zimbra zimbra 80K Mar 29 06:44 /tmp/df.tar

[zimbra@zcs806 tmp]$ tar tvf /tmp/df.tar
-rw-r----- zimbra/zimbra  2566 2014-03-11 00:00 opt/zimbra/zmstat/2014-03-10/df.csv.gz
-rw-r----- zimbra/zimbra  2553 2014-03-12 00:00 opt/zimbra/zmstat/2014-03-11/df.csv.gz
 [cut - Ajcody]
-rw-r----- zimbra/zimbra  2513 2014-03-28 00:00 opt/zimbra/zmstat/2014-03-27/df.csv.gz
-rw-r----- zimbra/zimbra  2531 2014-03-29 00:00 opt/zimbra/zmstat/2014-03-28/df.csv.gz
-rw-r----- zimbra/zimbra  8013 2014-03-29 06:40 opt/zimbra/zmstat/df.csv
Directory Sizes In /opt/zimbra

Please see the following and provide the output to support. Note, even though this method is faster than doing a du it still can take awhile.

* Ajcody-Server-Misc-Topics#Faster_Way_To_Get_Directory_Size_On_Filesytem_-_find_vs_du
Adjusting The Disk Alert Threshold

Note - zmlocalconfig smtp_notify must return yes if you want to receive the notifications.

If you just need to adjust the disk alert threshold, then see the following:

See current values:

 zmlocalconfig | grep zmdisklog

Example adjustment:

 su - zimbra
 zmlocalconfig -e zmdisklog_critical_threshold=98
 zmlocalconfig -e zmdisklog_warn_threshold=95
 zmstatctl

To exclude a partition from the checks [example of two being excluded]:

 su - zimbra
 zmlocalconfig -e zmstat_df_excludes="/mount/point:/mount/point2"
 zmstatctl

Note, depending on your version of ZCS, you might be hitting a bug where you'll keep getting emails until a logrotate happens.

Some things to do to confirm and share with support or in bug. As zimbra

su - zimbra

ls -la /var/log/zimbra.log

df -h
 /dev/mapper/vg_rhel664-lv_root
                      5.5G  3.5G  1.7G  68% /
 tmpfs                 939M     0  939M   0% /dev/shm
 /dev/sda1             485M   79M  381M  18% /boot
 /dev/sdb1              30G  6.2G   23G  22% /opt

date

zmlocalconfig | grep zmdisklog
 zmdisklog_critical_threshold = 80
 zmdisklog_warn_threshold = 85  

zmlocalconfig -e zmdisklog_critical_threshold=95

zmlocalconfig -e zmdisklog_warn_threshold=90

zmlocalconfig | grep zmdisklog
 zmdisklog_critical_threshold = 95
 zmdisklog_warn_threshold = 90

zmstatctl restart

date

ps -eaf | grep zmstat-df

ls -la /var/log/zimbra.log

date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday"

##Note - Emails by default go out every 10 minutes - for example:

[zimbra@zcs803 ~]$ date ; grep "Disk warning" /var/log/zimbra* ; zmmailbox -z -m admin@`zmhostname` s -l 100 -t message "Subject: Disk and after:yesterday"
Thu May 22 09:40:08 PDT 2014
/var/log/zimbra.log:May 22 08:30:00 zcs803 zimbramon[18826]: 18826:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 08:40:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 08:50:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:00:00 zcs803 zimbramon[22970]: 22970:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
  ## Note - I had readjusted the variable to not warn during this time segment ##
/var/log/zimbra.log:May 22 09:20:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:30:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
/var/log/zimbra.log:May 22 09:40:00 zcs803 zimbramon[8322]: 8322:err: Disk warning: zcs803.DOMAIN.com: / on device /dev/mapper/vg_rhel664-lv_root at 82%
num: 7, more: false

     Id  Type   From                  Subject                                             Date
   ----  ----   --------------------  --------------------------------------------------  --------------
1.  328  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:40
2.  327  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:30
3.  326  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:20
  ## Note - I had readjusted the variable to not warn during this time segment ##
4.  325  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 09:00
5.  324  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 08:50
6.  323  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 08:40
7.  320  mess   admin                 Disk / at 82% on zcs803.DOMAIN.com:           05/22/14 08:31

Continue to monitor your zmmailbox search results for an hour.

The Basic Information Support Needs

as root:

  • cat /etc/fstab
    • Shows us what is mounted upon boot
  • cat /proc/mounts
    • Shows us what is currently mounted and its status - you can see if a mount is read-only here.
  • df -hT
    • Lists current mounts using human-readable size information and also notes the filesystem type.

as zimbra:

  • zmprov -l gs `zmhostname` | egrep 'Back|Redo'
    • Will show us a number of variables related to backup and redologs. Also tell us if your using auto-group or the default method.
  • du -sh /opt/zimbra/redolog
    • Will might notice your redolog logs aren't rolling over, causing a possible issue.
  • ls -latr /opt/zimbra/backup
    • This is the default backup target, please adjust this path here and below if you are using a different zimbraBackupTarget value.
    • zmprov gs `zmhostname` zimbraBackupTarget
    • We'll be able to confirm permissions are right.
  • ls -latr /opt/zimbra/backup/tmp
    • This will show us if you have failed backup jobs and confirm tmp is being cleaned appropriately after the backup is done.
  • ls -latr /opt/zimbra/backup/sessions
    • This will show us what backup sessions are available and confirm permissions are correct.
    • Adjust path if your zimbraBackupTarget value is not the default path.
  • Some directory sizes in the backup directory:
    • Default path first
      • du -sh `find /opt/zimbra/backup -maxdepth 2 -type d`
    • If your using a different backup target, check that directory also. Replace /opt/zimbra/backup above with your backup path.
  • zmbackupquery
    • This should match what's in the sessions directory and it will also tell us if status of each backup and how many accts were done.
  • crontab -l | grep -i back
    • This will show use when backups are support to run and with what options they are running with.
  • zmlocalconfig | grep -i back
    • This is useful to see a number of backup options not exposed in the crontab, things related to the zip options.
  • zmvolume -l
    • This is useful to see how many volumes are being used, if HSM is being used, and if compression is being done at the volume level.
Additional Log Files Support Might Need

And send the following logs:

  • /var/log/messages
    • Filesystem issues often times are noted here and also in syslog. This might explain an interruption in the backup process. Server restarts, filesystem going full, filesystem going read-only, etc.
  • /var/log/syslog
  • /opt/zimbra/log/mailbox.log
    • The backup activity is logged here.
    • And any other mailbox.log file that would cover the event

Additional Checks For Performance Specific Issues

If Your Using a SAN or NFS For Your Backup Target - Please Check Your IOWait

Ideally, you would compare iowait and performance data from the target backup host as well as the stats available on the ZCS servers. To get graphs and stats on this from ZCS, please see Ajcody-Testing-Debugging#zmstat_and_zmstat-chart . You should submit this data and iowait conclusions if you still need to submit a support case about backup performance issues.

Is HSM Running During Your Backup Window
Are You Using --zipStore

--zipStore zips the blobs vs. keeping the blobs as individual files. --zipStore does not use compression either. For most circumstances, this will give the best performance, especially with NFS. This should be the default behavior of the backups, the following RFE is when it became the default [ZCS6+] :

To see if zip's are being used for backups for example, in the backup/session directory you'll know if it is by seeing .zip files:

mail:~/backup/sessions/full-20080820.160003.770/accounts/115/988/11598896-a89b-4b9d-bedb-1ed1afcb6c87/blobs 
zimbra$ ls blobs-1.zip    blobs-2.zip    blobs-3.zip    blobs-4.zip

To see if the zip file is using compression [-Z option for unzip will indicate whether or not the archive is actually compressed] :

unzip -Z blobs-4.zip
293 files, 5982984 bytes uncompressed, 5982984 bytes compressed:  0.0%

Also, if your zmvolume has compression enabled the blobs will remain compressed within the zip also upon backup. The point being, they are uncompressed to be then put into a zip file when the backup is using --zipStore.

Some Specific Example Restore/Backup Issues

NO_SUCH_ACCOUNT_BACKUP

Example errors:

Error code: backup.NO_SUCH_ACCOUNT_BACKUP 
Message: no such backup for account: Missing full backup earlier than restore-to time for account 3c397edc-c015-4d82-ae8a-1084692f8a93 
Details:soap:Sender 

Error occurred: no such backup for account: 3c397edc-c015-4d82-ae8a-1084692f8a93

Message: no such backup for account: Missing full backup earlier than restore-to time for account 3c397edc-c015-4d82-ae8a-1084692f8a93 

3c397edc-c015-4d82-ae8a-1084692f8a93 is an example of the zimbraId of a user [ user1@mail-172.example.com ] .

Confirm the current email address is mapped to the UID.

$ zmprov -l ga user1@mail-172.example.com zimbraId
# name user1@mail-172.example.com
zimbraId: 3c397edc-c015-4d82-ae8a-1084692f8a93

$ mysql -e 'SELECT * FROM zimbra.mailbox WHERE comment LIKE "user1@mail-172.example.com"\G'

$ mysql -e 'SELECT * FROM zimbra.mailbox WHERE account_id="3c397edc-c015-4d82-ae8a-1084692f8a93"\G'

$ grep -C1 3c397edc-c015-4d82-ae8a-1084692f8a93 /opt/zimbra/backup/accounts.xml

   [using the default path; /opt/zimbra/backup/accounts.xml]

$ egrep "user1@mail-172.example.com|3c397edc-c015-4d82-ae8a-1084692f8a93" /opt/zimbra/backup/accounts.xml

$ egrep "user1@mail-172.example.com|3c397edc-c015-4d82-ae8a-1084692f8a93" /opt/zimbra/backup/sessions/*/session.xml
Restore encountered redo log sequence error: Found gap in redo log sequence

Example error:

Error occurred: system failure: Restore encountered redo log sequence error: 
 Found gap in redo log sequence; missing 3899 through 3895; To avoid future restore problems, 
 discard all existing backups and take a full backup of all accounts; If this error occurred 
 during restore, try the --ignoreRedoErrors option

This error actually gives the right direction with its output "To avoid future restore problems, discard all existing backups and take a full backup of all accounts; If this error occurred during restore, try the --ignoreRedoErrors option".

exception during auth

Example error:

LDAP backup failed: system failure: exception during auth 
 {RemoteManager: zimbramail.example.com->zimbra@zimbramail.example.com:22}
 com.zimbra.common.service.ServiceException: system failure: LDAP backup failed: 
 system failure: exception during auth {RemoteManager: zimbramail.example.com->
 zimbra@zimbramail.example.com:22}

This indicates the ssh keys aren't configured correctly in the environment. You should be able to do the following to resolve the issue:

su - zimbra
zmsshkeygen 
zmupdateauthkeys
LICENSE_ERROR_Message: AccountsLimits exceeded

Example error:

LICENSE_ERROR_Message: AccountsLimits exceeded

You should check your license and confirm it's valid and active:

su - zimbra
zmlicense -p

And confirm how many licenses are used: First, flush the cache on all servers for the license information. Then do the query via soap.

zmprov fc -a license
zmsoap -z GetLicenseRequest

Four lines that are generally of interest are shown below. The first two state how many one is licensed for, the bottom two show how many are currently used :

 <attr name="AccountsLimit">1000</attr>
 <attr name="ArchivingAccountsLimit">1000</attr>

 <attr name="TotalAccounts">3</attr>
 <attr name="ArchivingAccounts">3</attr>

Performance Issues And Time To Complete

RFE's for increase backup performance:

What Might Be Wrong?

Need to expand this section or refer to the part above about all the log/data gathering that support would normally ask for

First thing to check is the log file and see if anything notable stands out.

grep -i backup /opt/zimbra/log/mailbox.log
Use Auto-Group Backups Rather Than Default Style Backup

Having trouble completing that entire full backup during off-hours? Enter the hybrid auto-grouped mode, which combines the concept of full and incremental backup functions - you’re completely backing up a target number of accounts daily rather than running incrementals.

Auto-grouped mode automatically pulls in the redologs since the last run so you get incremental backups of the remaining accounts; although the incremental accounts captured via the redologs are not listed specifically in the backup account list. This still allows you to do a point in time restore for any account.

Please see the following for more detailed information:

Need To Write Fewer Files - Add The Zip Option To Your Backup Commands

Using the zip option will compress all those thousands of single files that exist under a user's backup, decreasing performance issues that arise from writing out thousands of small files as compared to large ones. This is often seen when one is :

  • Using nfs for the backup directory
  • Copying/rsyncing backups to a remote server
  • Are using some third party backup software (to tape) to archive/backup the zimbra backup sessions.

Please see the following for more information about using the Zip option:

SAN Snapshots For Backups

Please see:

Cloud Backups

Please see:

Tape Backups

Use a 'secondary' network card to run your backups over if possible. This way the backup activity doesn't effect ZCS network performance over the default network card. Also, please review the following section on Zimbra and hard links and confirm if your third party backup software or process will handle restoring hard links correctly.

Please see:

Quiz

  1. What are three methods you can locate which backups a user is in?
  2. To stop a restore from playing all the way up to the current time, you would need to use one of these two zmrestore variables - what are they?
  3. If you see an error about being unable to restore because of redologs, what option might you need to include?
  4. What is the primary log file on the ZCS server you'll monitor or review for restore or backup issues?
  5. Provide two methods to increase logging/debug output when attempting a restore?
  6. Can you perform a full system restore from the backups done on an older ZCS version?
  7. What are the two identify variables that zmbackup/zmrestore uses for an individual?
  8. What user variable would you use to locate the directory a users data is stored in within a backup session?
Verified Against: Zimbra Collaboration Suite 8.6 Date Created: 01/22/2015
Article ID: https://wiki.zimbra.com/index.php?title=Troubleshooting_Course_Content_Rough_Drafts-Recover_Missing_Data_-_User Date Modified: 2015-03-03



Try Zimbra

Try Zimbra Collaboration with a 60-day free trial.
Get it now »

Want to get involved?

You can contribute in the Community, Wiki, Code, or development of Zimlets.
Find out more. »

Looking for a Video?

Visit our YouTube channel to get the latest webinars, technology news, product overviews, and so much more.
Go to the YouTube channel »

Jump to: navigation, search