Restoring a Single User from Backups Archived to Tape

Revision as of 23:05, 1 December 2010 by Gayle (talk | contribs)


This article discusses how to speed up the restore of a single user from backups archived to tape by minimizing the amount of data retrieved from tape.

Short Version vs. Long Version

The short version: You need the full backup's session.xml file, the directory corresponding to the user under the accounts subdirectory, and the entire shared_blobs directory. If you want incremental restore as well, you'll also need session.xml file and redologs directory from all subsequent incremental backups. There is a slight variation if auto-grouped backup mode is used.

The long version requires a discussion of the backup types and their directory structure. There are two backup modes in ZCS: standard backup and auto-grouped backup.

Standard Backup

In standard backup, the typical setup is to run a weekly full backup and daily incremental backup on all the other days. For example, you might run a full backup Saturday night and incremental backups Sunday through Friday. The full backup is a logical data dump of all users and incremental backup is the redo logs that capture changes made since the last incremental backup.

Let's look at the directory structure of full and incremental backups. By default all backups are stored under /opt/zimbra/backup directory. This is called the backup target and can be changed in configuration or in each backup/restore request. The backup target directory has:


The accounts.xml file lists all users and their latest full backup. The sessions directory contains the full and incremental backups. Backups under sessions directory are deleted after being archived. The tmp directory is used during a backup or a restore.

Under the sessions directory are subdirectories for full and incremental backups. Each directory contains one full or incremental backup. Example:


The directories are named “full-” (full backup) or “incr-” (incremental backup) followed by the backup timestamp. (in GMT time zone, using the year-month-date.hour-minute-second.millisecond format) Let's look at the contents of each backup type.

Under the full backup directory we have:

accounts root directory for per-user data
ldap directory containing system-wide ldap dump
session.xml file containing metadata about the backup, including the list of users in the backup
shared_blobs root directory for shared blobs (e.g. emails sent to two or more people)
sys directory containing dump of global database tables

The incremental backup directory has:

accounts root directory for per-user data
ldap directory containing system-wide ldap dump
redologs directory containing redo logs since the previous incremental backup
session.xml file containing metadata about the backup, including the list of users in the backup
sys directory containing dump of global database tables

The ldap and sys directories contain global data and are used only when restoring the entire system. They are not needed for single-user restores. The accounts and shared_blobs directories are important and will be discussed shortly.

Auto-Grouped Backup

The auto-grouped backup mode is used when standard full backup of all users takes too long and cannot be completed during off hours. The idea is to split the users into groups and backup one group per night. Typical setup uses a one week interval. Keeping a week's worth of backups guarantees restorability for any user. The system automatically selects the group of users to backup each night, hence the name auto-grouped backup.

As is done in standard backup mode, auto-grouped backups are stored under /opt/zimbra/backup/sessions directory. Each backup is named “full-<timestamp>” just like full backups in standard mode. The contents of an auto-grouped backup are:

accounts root directory for per-user data
ldap directory containing system-wide ldap dump
redologs directory containing redo logs since the previous backup
session.xml file containing metadata about the backup, including the list of users in the backup
shared_blobs root directory for shared blobs (e.g. emails sent to two or more people)
sys directory containing dump of global database tables

An auto-grouped backup is a full backup of the users chosen for the group plus the redo logs generated since the previous backup. The directory structure reflects that.

The accounts Directory

Each user account has a subdirectory under the accounts directory of a full, incremental, or auto-grouped backup. The subdirectory has a three-level path based on the account's zimbraId string. The zimbraId is the globally unique ID assigned to each user by the system. Let's use an example zimbraId: 36d4d92f-c0fe-4578-869e-929c4b6b52aa. (not including the trailing period)

The account's three-level path is formed by taking the first three characters of the zimbraId, the next three characters, and the entire zimbraId value. For our example, the directory is: 36d/4d9/36d4d92f-c0fe-4578-869e-929c4b6b52aa. In full backup and auto-grouped backup, this directory has subdirectories for the user's database dump, blobs, search index, etc. The details are not important for our purpose, but here's an example listing:

blobs directory containing blobs for messages and other items
db directory containing database dump
index directory containing search index files
ldap.xml ldap data for the account at the time of backup
ldap_latest.xml hard link to ldap.xml file for the account in the most recent incremental backup
meta.xml metadata about the account's backup

In an incremental backup, the account's directory has only the ldap.xml file.

The shared_blobs Directory

The shared_blobs directory can contain either a collection of zip files or a directory structure, depending on the zip option used in the backup. If zip was used there will be zip files named, where X is a single character. The zip file contains all shared blobs whose blob digest begins with the lowercase of that character. If zip was not used during backup, the shared_blobs directory contains a six-level directory structure for each shared blob file. Example:


The first five levels are the lowercase of two characters from the start of the blob digest, the sixth and final subdirectory is the blob digest value, and finally the file itself which is always named blob.dat.

The shared_blobs directory can be very large because it contains shared blobs for all users. This is a problem for our goal of minimizing data to retrieve from tape because the contents are not organized per user. There is no easy way to extract a portion of the shared_blobs directory applicable to a single user. This means you will need to retrieve a lot more data from tape than the expected size of the user being restored. This could even be orders of magnitude larger than the user's mailbox size.

The account's per-user blob directory contains pointers to the shared blobs used by the account. This information can be used to minimize the shared blobs to pull from tape. But this technique is not useful if zip option was used during backup because then the shared_blobs directory will have 30 or so very large zip files. You might as well pull all of them from tape. Zip mode is used by default and is the preferred way because it greatly reduces the number of files produced by the backup, allowing efficient backup archival/deletion. In other words, you'll have to be prepared to retrieve the entire shared_blobs data set.

The Restore Process

We've talked about the backup process. Let's talk about restore a bit. Command line examples shown in this section are there for discussion only. Actual restore instructions come in the next section.

A user account is restored by first initializing the ldap entry, database, blobs and search index using the data from the full backup, then optionally replaying the changes in the redo logs from the incremental backups to bring the data more current. In auto-grouped backup, the redo log replay uses logs from the chosen “full” backup and all ensuing backups.

The basic command to restore the user named looks like this:

    zmrestore -a -br

The -a option names to user to be restored. The -br option will be explained a bit later.

The zmrestore command uses the account's latest full backup (or auto-grouped backup, which can be thought of as a full backup) as the starting point. This information comes from the /opt/zimbra/backup/accounts.xml file. A different starting backup can be specified with the -lb option. This is most likely what you'll be doing when using backups pulled from tape. Example:

    zmrestore -a -br -lb full-20101129.224845.750

The above command will include replaying redo logs from the incremental backups. If you want to restore the user only to the state as of the full backup, use the -rf option:

    zmrestore -a -br -lb full-20101129.224845.750 -rf

Other zmrestore options are available to manage the redo log replay and restore the user account to a specific point in time.

When using data from tape, you might have put the retrieved data under a different backup target than the default /opt/zimbra/backup. In fact, it's a good idea to do so to avoid corrupting your live backup area by mistake. Let's say the custom target is /myrestore. The retrieved full and incremental backups must be placed under /myrestore/sessions. You can tell zmrestore about the new target with the -t option:

    zmrestore -a -br -lb full-20101129.224845.750 -t /myrestore

Notice the “sessions” subdirectory is not included in the command line.

Now about the -br option. By default the restore process brings the user account to the most recent state which is the current time. After replaying the redo logs from the incremental backups, replay continues with redo logs under /opt/zimbra/redolog/archive and finally the current redo log file, /opt/zimbra/redolog/redo.log. This requires all redo logs to be available in the backup. But this is often not the case when working with an older backup archived on tape, or when using a custom backup target directory. (because replay goes straight to /opt/zimbra/redolog after processing the logs from the custom target, skipping any logs in the default backup target) This is where the -br option comes in. It tells restore to stop the replay after processing all redo logs available in the backup target directory. The -br option is not needed when using the -rf option (restore to full backup only) because -rf means no redo log replay at all.

Finally, Getting the Data from Tape and Running Restore

Now that we covered all the background information, it should be easy to determine what set of data to get from tape in order to run a single user restore. First you need to know the zimbraId of the user. You can get this from the admin console, zmprov command, or from searching the /opt/zimbra/backup/accounts.xml file.

Next you need to know which full (or auto-grouped) backup contains the user you want to restore. This is obvious from looking at the directory listing if using standard backup mode. You can use the most recent directory named “full-<timestamp>”. For auto-grouped backup, you need to look at each backup's session.xml file to see if your user is listed.

Let's say the full backup chosen is called full-XYZ. Using the custom backup target directory /myrestore, create the directory to hold the data pulled from tape. The full backup's directory should be:


The /myrestore directory should be owned by the zimbra unix user.

Next, retrieve the session.xml file and shared_blobs directory. Remember, the shared_blobs directory can be very large and you need all of its contents even though you won't use most of it. Then, retrieve the accounts subdirectory for the user's zimbraId. Make sure the directory structure under full-XYZ remains the same on disk and on tape. This is all the data you need if restoring the user to the point of full backup. Let's spell out the directories: (using the earlier example zimbraId 36d4d92f-c0fe-4578-869e-929c4b6b52aa)

    /myrestore/sessions/full-XYZ/accounts/ 36d/4d9/36d4d92f-c0fe-4578-869e-929c4b6b52aa/*

Run the restore and you're done:

    zmrestore -a -br -lb full-XYZ -t /myrestore

We could have used the -rf option instead of the -br option. Since we only have the one backup under /myrestore/sessions, the effect of both options is the same: to avoid replaying the redo logs under /opt/zimbra/redolog.

If you want to bring the account more current with redo log replay, you need to get more data from tape. If full-XYZ was a standard full backup, retrieve the subsequent incremental backups. For each incremental backup, you only need the session.xml file and the redologs subdirectory.

If full-XYZ was an auto-grouped backup, retrieve its redologs subdirectory. Also get the subsequent backups, but for these you only need the session.xml file and the redologs subdirectory.

Once all data has been retrieved to /myrestore/sessions, run the restore:

    zmrestore -a -br -lb full-XYZ -t /myrestore

With the account restored, the /myrestore directory can be discarded.

Jump to: navigation, search