Network Edition Disaster Recovery

Revision as of 01:38, 31 October 2010 by Ajcody (talk | contribs) (Restoring to the new server)

Admin Article

Article Information

This article applies to the following ZCS versions.

ZCS 6.0 Article ZCS 6.0 ZCS 5.0 Article ZCS 5.0 ZCS 4.5 Article ZCS 4.5

This article describes the steps to replace a failed server in a version 4.5.x, 5, and 6 network edition single-server ZCS configuration. Any distinctions between versions will be handled within the article when needed.

Important: The ZCS release you install on the new server must be the same release as installed on the old server. The server can have a different operating system.

The new server hardware must meet the requirements described in the Installation Prerequisites section of the ZCS Single Server Installation Guide. Install the new operating system, making any necessary OS configuration modifications as described in the installation guide.

Other References

Other Documentation And Options

In case someone is wanting the older version of this wiki page prior to the latest major update, see:

Outstanding RFE's That Will Effect This Document

When Samba, Posix, And LDAP Customizations Were Used

ZCS 6.0.7 And After

  • "zmrestoreldap should restore config database"
    • Summary
      • With 6.0.7, I've added the config database to the nightly backups. We should have zmrestoreldap restore this database as well, so that the entire configuration of the server as it was (including any and all customizations, like additional schema, etc) are restored at the same time as the database is.

Before ZCS 6.0.7

Disaster Recovery - Changing Servers

You do the following to restore to a new server:

  • Prepare the new server
  • Block client access to the old server’s IP address with firewall rules
  • Mount any volumes that were in use on the older server
  • Delete the MySQL data that is set up in the initial installation of ZCS
  • Copy the backup files to the new server
  • Run zmrestoreldap to restore the global LDAP data
  • Run zmrestoreoffline to restore account data from the backup sessions
  • Prepare and run a new backup

Old Server Status

Two scenarios for disaster recovery are the server has died and the ZCS files cannot be accessed, or ZCS is still running, but the server needs to be replaced.

If the server is not running:

  1. Block client access to the server IP address with firewall rules.
  2. Find the latest full ZCS backup session to use.

If ZCS is still running, to prepare the move to the new server:

  1. Block client access to the server’s IP address with firewall rules.
  2. Run a full backup of the old service, or if the backup is recent, run an incremental backup to get the most current incremental backup session.
  3. Run zmcontrol stop, to stop ZCS. In order to restore to the most current state, no new mail should be received after the last incremental backup has run.
  4. Change the hostname and IP address on the old server to something else. Do not turn off the server.

Preparing the New Server

Before you begin, make sure that the new server is correctly configured with the IP address and hostname and that ZCS is installed and configured with the same domain, hostname, passwords, etc. as the previous server. See the Single-Server Installation Guide for more information about preparing the server. Before you begin to install ZCS, note the information you need from the old server including: admin account name and password, spam training and non-spam training user account names, exact domain name, and the global document account name.

Installing ZCS on new server

  1. If your doing a restore exclusively from a backup and don't recall the exact version, you can find out the version of ZCS within the full backup session.
    • cd /opt/zimbra/backup/sessions/full-[YOUR LABEL]/
    • grep zcsRelease session.xml
    • You'll see a line like this - notice the version is given:
    • <backupSet label="full-20100330.201810.579" zcsRelease="6.0.4_GA_2038.RHEL5 20091214172206 20091214-1723 NETWORK" startTime="1269980290579" endTime="1269980316645" minRedoSeq="44" maxRedoSeq="44" sharedBlobsZipped="true" sharedBlobsZipNameDigestChars="1" sharedBlobsDirectoryDepth="5" sharedBlobsCharsPerDirectory="2" type="full" accountsDirectoryDepth="2">
  2. Make sure your TIME and TIMEZONE is set right! See Time_Zones_in_ZCS#The_server_OS
  3. Ensure that the old hostname and MX DNS records resolve to the new server
    • Ajcody-Hostname-DNS has the commands you can run to confirm resolution and also basic setup instructions if you need to configure BIND on this DR server for DNS resolution issues.
  4. If the new server was used for testing in the past then there may be remnants of previous installations that need to be removed.
    Warning - the following commands will completely destroy any existing Zimbra installation:
    • sudo -u zimbra /opt/zimbra/bin/zmcontrol stop
    • If you still have the tarball extracted directory available, you can do the following to uninstall zimbra, as root:
      • zcs/ -u
    • If not, remove all the zimbra packages manually. As root :
      • rpm -qa zimbra*
      • Now, paste just in the package names for removal. For example:
      • rpm -e zimbra-spell zimbra-ldap zimbra-mta zimbra-logger zimbra-snmp zimbra-apache zimbra-store zimbra-core
      • And remove the old and unneeded zimbra directory:
      • rm -Rf /opt/zimbra
    Non-rpm based OS's can review UnInstall_Zimbra
  5. Have the ZCS installation software available on the new server and extracted. The tarball that is; for example : zcs-NETWORK-6.0.4_GA_2038.RHEL5.20091214172206.tgz .
  6. Run the zcs/ from the ZCS tar ball of 'the same version of ZCS that was running on your old server.
    1. Allow it to install all modules.
  7. When the configuration menu appears open up a new terminal window.
    1. License: Copy your ZCSLicense.xml file to /opt/zimbra/conf and then change ownership and rights:
      • cp ZCSLicense.xml /opt/zimbra/conf
      • chown zimbra:zimbra /opt/zimbra/conf/ZCSLicense.xml
      • chmod 444 /opt/zimbra/conf/ZCSLicense.xml
    2. Gather Variables From Old Server Config File To Use To Configure New Server During Setup:
      1. You have access to the old servers /opt/zimbra/conf directory and can copy the localconfig.xml from the old server or old location to /tmp on the new server:
        • /opt/zimbra/bin/zmlocalconfig -c /tmp/localconfig.xml -s > /tmp/OLD-localconfig.xml
      2. If you only have access to full backup sessions, you'll be able to find a localconfig.xml in the subdirectory of the full backup.
        • For example:
        • /opt/zimbra/bin/zmlocalconfig -c /opt/<old-zimbra-dir>/backup/sessions/full-<latest-label>/sys/localconfig.xml -s > /tmp/OLD-localconfig.xml
      3. You now have a text file [ /tmp/OLD-localconfig.xml ] that will have all the variables you need for the next part. It will also display the needed passwords for the various accounts.
        • Please delete this file once your finished with it.
  8. Returning to the configuration menu, which should be showing you 10 main categories that you can configure.
A. Select 1 for Common Configuration , you'll have the following sub-options.
  1. Confirm hostname [ zimbra_server_hostname ], ldap master host [ ldap_host ] , and timezone.
  2. Set Ldap Admin password using the information from the /tmp/OLD-localconfig.xml you made.
    • Variable that has password is : [ ldap_root_password ]
B. Select 2 for zimbra-ldap , you'll have the following sub-options.
  1. Confirm that Create Domain = Yes
  2. Confirm that Domain to create is correct. Note: Common Mistake Warning This value is your old initial domain rather than a subdomain of it. If it's wrong, revisit the above section for Common Configuration.
    • For example: [bad] vs. [correct]
    • A number of variables from the OLD-localconfig.xml can be double checked for this, one that probably changes the least is [ av_notify_domain ] .
  3. Set the following passwords using the variables from the OLD-localconfig.xml
    • Password Asked For || Variable Name From OLD-localconfig.xml
    • Ldap root password = ldap_root_password
    • Ldap replication password = ldap_replication_password
    • Ldap postfix password = ldap_postfix_password
    • Ldap amavis password = ldap_amavis_password
    • Ldap nginx password = ldap_nginx_password
C. Select 3 for zimbra-store, you'll have the following sub-options.
  1. Create Admin User should be yes.
  2. Double the value for Admin user to create , confirming it's not an improper subdomain.
    • Old variables to look at: [ av_notify_users ] , [ smtp_destination ] , [ smpt_source ] , [ zimbra_ldap_userdn ]
  3. Set Admin Password : [ this is the admin password you would use to login to admin console ]
  4. Spam training user and the Non-spam(Ham) training users user values from the old server can be found in the ldap.bak files from your full session backups.
    • Example of path location for ldap.bak:
    cd /opt/<old-zimbra-dir>/backup/sessions/full-<latest-label>/ldap/
    To get values : egrep -i 'zimbraSpamIsNotSpamAccount|zimbraSpamIsSpamAccount' ldap.bak
    • Example output :
  5. Global Documents Accounts equals the [ wiki_user ] variable plus your domain.
    • For example: [ wiki_user = wiki ] so I would set this option to using the domain example above.
  6. SMTP host: can be confirmed by checking the ldap.bak like we did above.
    • Example of path location for ldap.bak:
    cd /opt/<old-zimbra-dir>/backup/sessions/full-<latest-label>/ldap/
    To get value : grep zimbraSmtpHostname ldap.bak
  7. Adjust the remaining variables if you need to, most customer will not. Exception might be those using the proxy services and needing to enable the Instant Messaging Feature.
D. Change any other settings on the new server to match the configuration on the original server.
E. Disable auto-backup and starting of servers after configuration in the configuration menu:
  • Enable default backup schedule: no
  • Start servers after configuration no
F. Apply configuration changes, there will be one or two warnings that say - it's fine to ignore them:
  • "WARNING: Document and Zimlet initialization skipped because Application Server was not configured to start."
  • "WARNING: Convertd version 2 migration skipped because Application Server was not configured to start".

Restoring to the new server

  1. Confirm the new server isn't running zimbra. It shouldn't be if you selected the right option above during the installation. As zimbra type:
    • zmcontrol stop
    • confirm there's no 'zimbra services' processes running [you might see some related to swatch, which you can ignore].
    • ps -ef | grep zimbra
  2. Create Any Addition Storage Volumes
    1. Note: Common Problem : This step is skipped/forgotten by many customers that were using additional storage volumes on their old server.
      • Configure your extra disk/san/nfs mounts.
        • Review your /etc/fstab file from your old server if you have access to it.
        • Double check your old server, if possible, for any symbolic links going outside of /opt/zimbra . Common reason is sym linking the backup directory < /opt/zimbra/backup > to some other location < /other/location/backup >.
        • ls -la /opt/zimbra/
  3. Create Any Additional Zimbra Volumes Now Or Adjust Default Paths
    1. Note: Common Problem : This step is skipped/forgotten by many customers that were using additional zimbra volumes or had changed the paths to the defaults on their old server.
      • Create your additional zimbra volumes now.
        • Common reason for additional zimbra volumes is for HSM , Archive & Discover [A&D], or large mailstores because of sizing issues.
        • Note: Need to find various ways customers can track down the old information about volumes and include it here.
        • Types are: primaryMessage, secondaryMessage, or index
  4. Delete the mysql data and re initialize an empty data directory. If you do not do this, zmrestoreoffline will have errors. As zimbra, type:
    • rm -rf /opt/zimbra/db/data/*
    • /opt/zimbra/libexec/zmmyinit
    • Note: You might see events of "* Failed to connect to mysql...retrying", this is normal. If it goes over 10 occurrences, this might be an indication of something being wrong. Check the following logs:
      • /var/log/zimbra.log
      • ls -latr /opt/zimbra/log/
      • If you think there's an issue, did you confirm your passwords? Look for something like the following in /var/log/zimbra.log :
        • Oct 20 17:39:33 mail zimbramon[9745]: 9745:info: zmmtaconfig: gacf ERROR: service.FAILURE (system failure: unable to get config) (cause: javax.naming.AuthenticationException [LDAP: error code 49 - Invalid Credentials])
        • See the section below to fix: "9. Double check your LDAP password"
        • Review from the most recent and then check older ones. most likely: zmmyinit.log , myslow.log , mysql_error.log.
    • The MySQL service should now be running, a simple check would be:
    • ps -ef | grep mysql
  5. Copy the backup files from the old server or from an archive location to /opt/zimbra/backup/sessions/ or use the -t /PATH/TO/backup option for the zmrestore* commands below.
  6. To find the LDAP session label to restore, as zimbra type:
    • zmrestoreldap –lbs
    or using the -t /PATH/TO/backup option
    • Example: zmrestoreldap -t /opt/<old-zimbra-dir>/backup/ -lbs
      • I see that this is my latest one: incr-20100315.050022.428
  7. To restore the LDAP data, run one of the zmrestoreldap commands below as zimbra :
    • zmrestoreldap -lb <latest_label>
    or with the nohup option if you are restoring large number of accounts:
    • Example: cd /tmp ; nohup zmrestoreldap -t /opt/<old-zimbra-dir>/backup/ -lb <latest_label> &
      • Output is redirected to the nohup.out file that is created in the directory you run the command in.
      • Note: You'll [ zimbra ] need write permission for that directory the nohup.out file is written to.
    A. Note: Observe whether the LDAP server has started successfully after the restore, it must be running for the next steps. Simple checks are:
    • Confirm output to the screen or to the nohup.out file ended with:
    • Starting ldap...Started slapd: pid ####'
    • done.
    • Confirm ldap process is still running: ps -ef | grep ldap
    B. Note: The zmrestoreldap script included in ZCS 4.5.7 through ZCS 4.5.10 and ZCS 5.0 through ZCS 5.0.1 is broken.
    D. Note: On zimbra 5.0.7 this failed with the error "ERROR: Failed to move existing ldap data: 256"
    • This is because the directory /opt/zimbra/openldap-data/ is empty and the script is trying to backup the contents to /opt/zimbra/openldap-data/.priv and failing.
    • As a work around for this I placed a text file in that directory and the restore proceeded fine.
    E. Can't Do zmrestoreldap? - an alternative, depending on your situation and zcs version, would be to follow LDAP_data_import_export .
  8. This is required before running the zmrestoreoffline step below, which will error if convertd isn't running. As zimbra, type:
    • zmconvertctl start
      • Note: Skip the above step if your running ZCS 4.x or 5.x on Mac. Convertd isn't supported on Mac until ZCS 6.x - bug 29453
  9. Double check your LDAP password [ zimbra_ldap_password ] from the old servers localconfig.xml or the /tmp/OLD-localconfig.xml if you made it in the proceeding steps to the new servers LDAP config.
    • Run the following to see the new servers zimbra_ldap_password:
    • zmlocalconfig -s zimbra_ldap_password
    • If you have to change it , run the following below and replacing <password> with the needed password:
    • zmlocalconfig -f -e zimbra_ldap_password=<password>
  10. Note, A Common Problem Described Here And A Change In The Order Of The Old DR Steps:
    • To avoid the "No appenders could be found for logger (zimbra.misc) / Please initialize log4j" and other related problems that happened during the zmrestoreoffline. We'll start and then stop the mailboxd service:
    • zmmailboxdctl start
    • This could take a couple of minutes before it comes up. Monitor activity by : tail -f /var/log/zimbra.log
      • If you think there's an issue, did you confirm your passwords? Look for something like the following in /var/log/zimbra.log :
        • Oct 20 17:39:33 mail zimbramon[9745]: 9745:info: zmmtaconfig: gacf ERROR: service.FAILURE (system failure: unable to get config) (cause: javax.naming.AuthenticationException [LDAP: error code 49 - Invalid Credentials])
        • See the section above to fix: "9. Double check your LDAP password"
    • zmmailboxdctl stop
    • This will configure the and create the necessary log files, such as mailbox.log .
    Last check before doing system [-sys] and user data restores [-a all]
    Before you attempt the zmrestoreoffline below; make sure mailboxd isn't running and that only the necessary services are running. (ldap, mysql.server, zmconvertdctl [unless MAC w/ ZCS 4.x or 5.x]). To confirm [as zimbra user] :
      • zmcontrol status note running and those services that aren't.
      • mysql.server status
      • ldap status
      • zmconvertdctl status
  11. To start the offline restore, we have two methods:
    Option A - Traditional : This syntax will do a full restore from your latest FULL backup session label and then proceeds through your incremental sessions.
    A1. Type the following to start your restore or with the nohup options:
    Note: Use –c on the command line so that accounts will be restored even if some accounts encounter errors during the offline restore process.
    • zmrestoreoffline -sys -a all -c -br
    or with the nohup option and the -t /PATH/TO/backup . Remember to adjust for your session label:
    • Get latest full session label you want: zmrestoreldap -t /opt/<old-zimbra-dir>/backup/ -lbs | grep full
    • Example : cd /tmp ; nohup zmrestoreoffline -t /opt/<old-zimbra-dir>/backup/ -lb full-20100314.060050.268 -sys -a all -c -br &
    A2. To watch the progress, tail -f /opt/zimbra/log/mailbox.log [if it exists] and your nohup.out file if you used the nohup command.
    Note: There is a current bug about the output from zmrestoreoffline showing jvm gc events, this would explain the odd log events you might be witnessing. bug 41516
    A3. Proceed to next step, skipping the Option B entry below.
    Option B - Faster & More Incremental In Restore Steps : This syntax will do a full restore reading from the specified FULL backup session label you give it. Then you'll use zmplayredo to play 'incremental' data you have that came after that full. This is to avoid the performance issues described in bug 33606 : Improve zmrestore & zmrestoreoffline performance .
    B1. We'll first restore using ONLY the data in the full backup session. Type:
    Note: Use –c on the command line so that accounts will be restored even if some accounts encounter errors during the offline restore process.
    • zmrestoreoffline -sys -a all -c -rf -lb <full backup session label>
    or with the nohup option:
    • cd /tmp ; nohup zmrestoreoffline -sys -a all -c -rf -lb <full backup session label> &
    B2. To watch the progress, tail /opt/zimbra/log/mailbox.log and your nohup.out file if you used the nohup command.
    Note: There is a current bug about the output from zmrestoreoffline showing jvm gc events, this would explain the odd log events you might be witnessing. bug 41516
    B3. Now we'll restore the data from the redologs you have from your incremental sessions /opt/zimbra/backup/sessions/incr-<label>/redologs/ that are after the full session you used. An advanced option would be to also use the redologs from /opt/zimbra/redolog & /opt/zimbra/redolog/archives directories from the old zimbra installation. This would restore data after your last backup. When replaying redologs, you play from the oldest log to the newest.
    • zmplayredo --logfiles <arg> Replay these logfiles, in order As shown from --help output.
  12. Because some ZCS services are running at this point and we'll want to stop them before proceeding, type:
    • zmcontrol stop
  13. Now start ZCS, type :
    • zmcontrol start
    • You might want to monitor the following log files in various shells/terminals:
    • tail -f /var/log/zimbra.log
    • tail -f /opt/zimbra/log/zmmailboxd.out
    • tail -f /opt/zimbra/log/mailbox.log
  14. Check and confirm status of all zimbra services with:
    • zmcontrol status

Something go wrong? See Ajcody-Disaster-Recovery-Specific-Notes for current issues being investigated or documented.

Post Restore And Confirmation Steps:

  1. Here are some fairly fast and simple commands from the CLI to get a quick status of the situation. You can grep for or include a single user's account if you're confident of what it should be reported as.
    1. Confirm your accounts and domains are present and show active, as zimbra run:
      • zmaccts
    2. Confirm accounts show mailbox quota usage appropriately:
      • zmprov gqu `zmhostname`
        • Columns are for : User, Quota, Used
    3. Confirm data is in the various zimbra volumes and all that are needed are present:
      • zmvolume -l
      • Confirm all needed zimbra volumes are present. Some customers might have different volumes for older primaryMessage stores, older indexes, HSM, A&D and so forth.
      • Run the following command against your volume paths and ctrl+c once your confident things look right:
      • ls -laR /opt/zimbra/store [or some other volume path that you have]
  2. To test from the shell/terminal basic html client usability, you could install and run lynx.
    1. If you need to install lynx do the following as root:
      • [rhel] yum install lynx
      • [ubuntu/debian] apt-get install lynx
    2. To test, do the following below. Your arrow and tab keys are used to navigate you on the page. As root , because the zimbra $HOME directory isn't writable for zimbra at the top level :
      • lynx http://<zimbra servername>
  3. Special Items To Confirm:
    1. Proxy Operations
    2. Self-Signed and Commercial Certificates
    3. HSM
    4. A&D [Archive and Discovery]
    5. Mobile Phones and BES
    6. Backup Method And Cron Schedule
      1. Review the output from [special note to the zimbraBackupTarget path and zimbraBackupMode variable]. As zimbra:
        • zmprov gs `zmhostname` | grep -i backup
      2. Confirm what the schedule is for the backups. As zimbra:
        • crontab -l | grep -i backup
      3. If empty, this will set up the default backup schedule that is normally set and confirm crontab for it. As zimbra:
        • zmschedulebackup -D; crontab -l | grep backup
  4. Remove any old backup sessions because these sessions are no longer valid. Either by deleting or moving out of the way :
    1. To delete:
      • rm -rf /opt/zimbra/redolog/* /opt/zimbra/backup/*
    2. To move [make sure target location has space]:
      • rm -rf /opt/zimbra/redolog/*
      • mv /opt/zimbra/backup /TMP/PATH/
        • Of course you'll need to adjust /TMP/PATH above for your situation.
      • mkdir -p /tmp/zimbra/backup/{sessions,tmp} ; chown -R zimbra:zimbra /opt/zimbra/backup
  5. Remove the firewall rules and allow client access to the new server.


Rev 1.1 731/2008

Verified Against: 6.0.4 using Option B [3/16/2010] Date Created: 4/12/2007
Article ID: Date Modified: 2010-10-31

Try Zimbra

Try Zimbra Collaboration with a 60-day free trial.
Get it now »

Want to get involved?

You can contribute in the Community, Wiki, Code, or development of Zimlets.
Find out more. »

Looking for a Video?

Visit our YouTube channel to get the latest webinars, technology news, product overviews, and so much more.
Go to the YouTube channel »

Jump to: navigation, search