Zimbra Next Generation Modules/DR/Disaster Recovery - MariaDB data loss or corruption

WORK IN PROGRESS

Introduction

This step-by-step guide contains all the information needed to restore lost or corrupted MariaDB data. It applies to both single-server and multi-server infrastructures with only a single MariaDB server.

WARNING: due to the fact that volume information for all items is stored in the database, which in this specific scenario has been lost, your current volumes will not be of any use. Please make sure you have enough disk space before you continue, if not either delete or move to another storage your current volumes.

Step 1: Database re-initialisation

The first step in order to restore the database to a consistent state is to re-initialise the database itself to its initial state.

To do this:

  • Stop all services by running zmcontrol stop as the zimbra user.
  • Rename *or* empty the current Database folder - default /opt/zimbra/db/data/
    • Rename: mv /opt/zimbra/db/data/ /opt/zimbra/db/data_old/
    • Empty: rm -rf /opt/zimbra/db/data/*
  • After cleaning up the database, re-initialise it by running /opt/zimbra/libexec/zmmyinit as the zimbra user:
* Creating required directories
* Generating mysql config /opt/zimbra/conf/my.cnf
* Creating database in /opt/zimbra/db/data
* Starting mysql server
* Loading schema /opt/zimbra/db/db.sql
* Loading version from /opt/zimbra/db/versions-init.sql
* Loading version from /opt/zimbra/db/backup-version-init.sql
* Setting random passwd for mysql root user in zimbra localconfig
* Setting random passwd for mysql zimbra user in zimbra localconfig
* Changing mysql root user password
* Changing mysql zimbra user password
* Changed zimbra mysql user password

Database passwords as well as mailbox IDs will not be changed.

  • After re-initialising the database, start all services by running zmcontrol start as the zimbra user

If everything succeeded, after all services are initialised you should see that mailboxgroups have been created anew:

ls -la /opt/zimbra/db/data/
total 1099868
drwxr-x--- 16 zimbra zimbra      4096 Aug 28 12:09 .
drwxrwxr-x  3 zimbra zimbra      4096 Aug 25 18:03 ..
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 chat
-rw-rw----  1 zimbra zimbra 524288000 Aug 28 12:11 ib_logfile0
-rw-rw----  1 zimbra zimbra 524288000 Aug 28 12:03 ib_logfile1
-rw-rw----  1 zimbra zimbra  77594624 Aug 28 12:11 ibdata1
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup1
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup2
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup3
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup4
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup5
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup6
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup7
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup8
drwx------  2 zimbra zimbra      4096 Aug 28 12:09 mboxgroup9
-rw-rw----  1 zimbra zimbra         0 Aug 28 12:03 multi-master.info
drwx------  2 zimbra zimbra      4096 Aug 28 12:03 mysql
drwx------  2 zimbra zimbra      4096 Aug 28 12:03 performance_schema
-rw-rw----  1 zimbra zimbra     24576 Aug 28 12:03 tc.log
drwx------  2 zimbra zimbra      4096 Aug 28 12:03 test
drwx------  2 zimbra zimbra      4096 Aug 28 12:03 zimbra

Step 2: Data Restore

At this point of the process the database is up and running, but mostly empty. You can check this by looking at any mailbox's content using zmprov:

zmmailbox -z -m rick@domain.local gaf
        Id  View      Unread   Msg Count  Path
----------  ----  ----------  ----------  ----------
         1  unkn           0           0  /
        16  docu           0           0  /Briefcase
        10  appo           0           0  /Calendar
        14  mess           0           0  /Chats
         7  cont           0           0  /Contacts
         6  mess           0           0  /Drafts
        13  cont           0           0  /Emailed Contacts
         2  mess           0           0  /Inbox
         4  mess           0           0  /Junk
         5  mess           0           0  /Sent
        15  task           0           0  /Tasks
         3  unkn           0           0  /Trash

To restore the data, the doExternalRestore feature of Backup NG will be used.

  • To start the restore, simply start a complete External Restore by running zxsuite backup doExternalRestore [your backup path here] as the zimbra user
    • e.g. if your backup path is /opt/zimbra/backup/ then run zxsuite backup doExternalRestore /opt/zimbra/backup/
  • Once the restore has started, you'll find an "External Restore Started" notification in the "Notifications" section of the NG Adminitration Zimlet.
    • That notification, along with the output of the command, will include the "Operation ID" of the restore - make sure to keep it at hand.
  • To verify the progress of the restore, you can check the dedicated restore log found in /opt/zimbra/logs/ called op_ExternalRestore_[operation ID].log
  • After the External Restore has been completed, you will receive an "External Restore Completed" notification.
    • You can now verify that data has been restored by checking the content of a mailbox with zmprov:
zmmailbox -z -m rick@domain.local gaf
        Id  View      Unread   Msg Count  Path
----------  ----  ----------  ----------  ----------
         1  unkn           0           0  /
        16  docu           0           0  /Briefcase
        10  appo           0           0  /Calendar
        14  mess           0           0  /Chats
         7  cont           0           0  /Contacts
         6  mess           0           0  /Drafts
        13  cont           0           0  /Emailed Contacts
         2  mess        5159        6258  /Inbox
         4  mess           0           0  /Junk
         5  mess           0         118  /Sent
        15  task           0           0  /Tasks
         3  unkn          28          29  /Trash
  • In several cases, one run of the External Restore is not enough to restore all items, so run the very same restore command again (this will have a new Operation ID and a new restore log).
  • Once the second restore is completed, all data will have been restored.

Step 3: Volume Deduplication and aftermath

After the data restore is completed, all items will be stored in the Current Primary volume of your server. This step describes some follow-up operations that can be useful to get your volume configuration back to optimal (and save some disk space in the process).

==Run a Volume Deduplication==

When items are restored using the External Restore , there is a very high chance for the cache-based deduplication not to catch many duplicates: you can fix this by running a volume-wide deduplication on your primary volume using the zxsuite hsm doDeduplicate [volume ID], where [Volume ID] is the ID of the volume you wish to scan for duplicates.

Create your Secondary Volumes and apply the HSM Policy

If you used to have Secondary Volumes on your system, you can now create those again and apply the HSM policy to move all of your data. To run the HSM from the command line, use the zxsuite hsm doMoveBlobs as the zimbra user.

Compress your data

If you don't have Secondary Volumes but want to compress your primary volume's data to save disk space you need to:

  • Create a new Primary Volume, flagging it as both "Compressed" and "Current"
  • Use the zxsuite hsm doVolumeToVolumeMove command as the zimbra user to move all contents of the original volume to the new one. This will compress items in the process, since the new volume is marked as Compressed.
  • Once the Volume-To-Volume Move is completed, delete the original volume, which should now be empty
Jump to: navigation, search