Difference between revisions of "Mysql Crash Recovery (alternate method)"
(Method for dumping mysql data while skipping known corrupt users)
Revision as of 20:00, 8 April 2011
Mysql Crash Recovery (alternate method)
In some cases, it is difficult to obtain a clean mysqldump of the entire database, even with innodb_force_recovery enabled. In these cases, it is sometimes advisable to identify and isolate problem mailboxes, dump out all of the unaffected mailboxes in their mboxgroup, and restore the databases from that dump. The advantages to this approach are that it often lets administrators dump out full databases without using innodb_force_recovery at all, and it allows administrators to identify and disable problem mailboxes so that they will not cause further database crashes until a maintenance window can be scheduled for the database to be repaired. The primary disadvantage is that the affected mailbox will usually need to be fully restored from a backup.
Identifying a Problem mboxgroup Database
Frequently, Mysql corruption issues are caused by a single bad line in a table within one of the mboxgroup databases. Which mboxgroup database is affected is often logged in the /opt/zimbra/log/mysql_error.log file. If it is, then skip ahead to Identifying a Problem Mailbox.
In order to identify a problem mboxgroup, the easiest way to start is to attempt to dump out all of the databases as part of a full Mysql dump/recovery as described in the Mysql Crash Recovery wiki page. Any database that fully dumps out is not the problem. If a database fails, then that database will be used to identify the problem mailbox.
Identifying a Problem Mailbox
To identify a problem mailbox, you will need a list of all mailboxes within the previously identified mboxgroup. To create this list, use the following command (as the zimbra user):
mysql --skip-column-names -e "SELECT id FROM zimbra.mailbox WHERE group_id=XX" > /tmp/XXlist
Substitute the number of the problem mboxgroup for XX. The /tmp/XXlist file will contain a list of mailbox ID numbers. You will also need a list of tables in the mboxgroup. You can create that with this command:
mysql --skip-column-names -e "CONNECT mboxgroupXX;SHOW TABLES" > /tmp/tables
The /tmp/tables file will contain a list of tables within the affected mboxgroup. If you are working with multiple mboxgroups, it is not necessary to re-generate this list every time as the list of tables within the mboxgroup should be the same for all mboxgroups in the mysql database. You can then use a simple bash script to dump each table in the mboxgroup for each user individually. The actual output is not needed for this step, but the stderr output is, so the stdout will be redirected to /dev/null.
mkdir /tmp/XX for table in $(cat /tmp/tables); do for i in $(cat /tmp/XXlist); do mysql -e "SELECT * FROM mboxgroupXX.mail_item WHERE mailbox_id=$i" > /dev/null 2> /tmp/XX/user-$i.err;done;done
This will create .err files in the /tmp/XX directory for each table and user. Delete the ones with a file size that is zero:
find /tmp/XX -size 0 | xargs rm -f
The remaining files will have error codes inside them. They are likely to be these two:
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/opt/zimbra/db/mysql.sock' (2)
The first error is most likely related to a problem account. Record that account's id number. The second is likely a search that was executed after the problem account crashed the database, but before the database was able to recover. The problem account as well as any accounts that did not generate a non-zero-length error file can be removed from the /tmp/XXlist file and the 'for' command can be re-run. This should be repeated until the command can be run without generating any non-zero-length error files. When this is complete, all problem accounts should be identified. If it is not possible to shut down mailboxd to drop the tables and reload them immediately, these accounts can be put into maintenance mode until the database can be repaired. This will cut off access for those users, but will also prevent these users from crashing mysql (and possibly mailboxd) by attempting to access bad data elements in the Mysql database. To find the account names that match the already existing account id numbers, you can query mysql:
mysql -e "select comment from zimbra.mailbox where id=XX"
Dumping the Database
Now that the problem mailbox has been identified, the database can be dumped, dropped and reloaded. Before moving forward, confirm that a backup exists for the problem user account. This can be found using zmbackupquery:
zmbackupquery -a <account>
If a valid backup exists, then the database dump/restore can proceed. If it does not, the dump/restore can still be performed, but recovering the problem user will be more problematic. It may be necessary to recover that account by injecting the data from the user's store directory directly into the account, which will lose the account's folder structure. At this point an administrator will have to make a decision about the best way to proceed. It may be possible to identify and remove a specific line in the mysql database that is causing the problem for that user, but that is beyond the scope of this document.
Before dumping the database, mailboxd must be stopped so no new data is written to the database:
Find the 'zimbra' user's mysql password:
zmlocalconfig -s zimbra_mysql_password
It is also a good idea to make a copy of the mysql database before dropping databases. Copy the /opt/zimbra/db/data directory somewhere safe that has enough capacity. It will not be needed after this operation is complete, but could be very useful in case of typos or errors. To dump the mboxgroup without including the bad account, the following statement can be used:
/opt/zimbra/mysql/bin/mysqldump --socket=/opt/zimbra/db/mysql.sock -u zimbra -p mboxgroupXX --where=mailbox_id!=<account_id> > /tmp/dbdump.out
When prompted for a password, enter the zimbra user's mysql password. This will dump all data that does not include the problem mailbox's id number and store it in /tmp/dbdump.out. If there is not enough space in /tmp on this server, place the file somewhere else. If the dump finishes cleanly and with no errors, you can drop the affected mboxgroup in the database, then reload using the data from the dump:
mysql -e "DROP DATABASE mboxgroupXX" mysql -e "CREATE DATABASE mboxgroupXX DEFAULT CHARACTER SET utf-8" mysql < /tmp/dbdump.out
If this completes with no errors, then the database is restored and clean, but all entries related to the problem mailbox no longer exist. If the backup for the problem mailbox is intact, then the account can be restored:
zmrestoreoffline -a <account>
Alternatively, you could start mailboxd and then use 'zmrestore' in place of 'zmrestoreoffline'. For a single account, it won't take significantly longer and service will be restored that much sooner. If the account does not have a clean backup and you want to restore from the blob files, you will need to remove the user's account from the zimbra.mailbox table. That will force the system to create a new empty mailbox entry for the user. If this is not done, the mailbox will not be usable because the root folder and the rest of the folder structure for that account will not exist. To delete this line from zimbra.mailbox, run the following:
mysql -e "DELETE FROM zimbra.mailbox WHERE id=<account_id>"
When the account is next accessed, the new mailbox will be created. You can import the old mail data into the new box using zmmailbox. To do this, first find the user's store directory. It will be under /opt/zimbra/store/X/<id>/msg. X is a hash with a value of 4096. Divide the user's account number by 4096 to determine which folder the user's account will be under. For example, account_id 5000 will be in folder 1, while account_id 4090 will be in folder 0. Go to this directory. There will be at least one numbered subfolder. Each subfolder will contain mail. to add them to the mailbox, use the following commands:
zmmailbox -z -m <account email address> createFolder /Recovery zmmailbox -z -m <account email address> addMessage /Recovery 0 1 2 ...
Include each of the numbered subfolders in the 'zmmailbox addMessage' command. This will put all of this mail in the /Recovery folder for the new mailbox. The mail will no longer be sorted, but it will be searchable and the user's inconvenience will be minimized. It is obviously better to avoid this if possible, and to ensure that backups are taken on a regular basis.
When all of these steps are complete, the database will no longer be corrupt, users' mail will be restored, and mailboxd will be stable.