Difference between revisions of "Server Live sync"

m (Redolog)
(Link to Gitlab project)
 
(45 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{Unsupported}}
 
{{Unsupported}}
 +
 +
{|  width="100%" border="0"
 +
|  bgcolor="orange" | [[Image:Attention.png]]
 +
 +
This article describes the steps to move a ZCS server to a new physical or virtual server. '''This wiki article is NOT supported by the Zimbra Support team for Network Edition Customers. The only two supported method to follow are the [[Network_Edition_Disaster_Recovery]] and [[Ajcody-Notes-Server-Move]] wiki pages.''' Server moves not following these two wiki pages will not be supported by the Zimbra Support team.
 +
|}
 +
 +
==Zimbra version and platform==
 +
This script was developed and tested on Release 7 and 8, both open source and network editions on Redhat/CentOS 5 and Ubuntu 10.04LTS and 12.04LTS.
 +
 +
 +
'''The original author has discontinued his support to the solution. Development has been taken over and available on GitLab: https://gitlab.com/yetopen/zimbra-live-sync
 +
 
==Introduction==
 
==Introduction==
This is an experimental solution to providing near-live synchronisation between two Zimbra servers so that one of them is live and the other is kept in a cold or warm standby state.
+
This is an experimental solution to providing near-live synchronisation between two Zimbra servers so that one of them is live and the other is kept in a warm or very warm standby state.
  
 
The system is symmetrical. The sync can work in reverse when the mirror server becomes the active server. This allows easy fall-back to the original server once the failover condition is resolved.
 
The system is symmetrical. The sync can work in reverse when the mirror server becomes the active server. This allows easy fall-back to the original server once the failover condition is resolved.
Line 13: Line 26:
 
* Operates as "zimbra" user
 
* Operates as "zimbra" user
 
* Works on both Open Source or Network edition
 
* Works on both Open Source or Network edition
* Tested on CentOS/Redhat 5.x
 
* Developed on Zimbra 7.x
 
 
* Sync can work in either direction but only one way at a time
 
* Sync can work in either direction but only one way at a time
  
Line 84: Line 95:
 
# Title      :  live_syncd
 
# Title      :  live_syncd
 
# Author    :  Simon Blandford <simon -at- onepointltd -dt- com>
 
# Author    :  Simon Blandford <simon -at- onepointltd -dt- com>
# Date      :  2011-03-30
+
# Date      :  2013-03-12
 
# Requires  :  zimbra sync_commands inotify-tools
 
# Requires  :  zimbra sync_commands inotify-tools
 
# Category  :  Administration
 
# Category  :  Administration
# Version    :  1.0.0
+
# Version    :  2.1.5
 
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
 
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
 
# License    :  GPLv3 (see above)
 
# License    :  GPLv3 (see above)
Line 97: Line 108:
  
 
#******************************************************************************
 
#******************************************************************************
#********************** Globals ***********************************************
+
#********************** Constants *********************************************
 
#******************************************************************************
 
#******************************************************************************
  
base_dir="/opt/zimbra/live_sync"
+
LOG_LEVEL=5
locking_dir="$base_dir""/lock"
+
REDO_LOG_HISTORY_DAYS=10
pid_dir="$base_dir""/pid"
+
ERROR_CLEAR_MINUTES=10
log_dir="$base_dir""/log"
+
LDAP_CHECK_MINUTES_INTERVAL=10
ldap_dir="$base_dir""/ldap"
 
status_dir="$base_dir""/status"
 
 
 
  
SSH="ssh -i /opt/zimbra/.ssh/live_sync -o StrictHostKeyChecking=no -o CheckHostIP=no"\
+
ZIMBRA_DIR="/opt/zimbra"
 +
BASE_DIR="$ZIMBRA_DIR""/live_sync"
 +
LOCKING_DIR="$BASE_DIR""/lock"
 +
PID_DIR="$BASE_DIR""/pid"
 +
LOG_DIR="$BASE_DIR""/log"
 +
LOG_FILE="$LOG_DIR""/live_sync.log"
 +
LDAP_TEMP_DIR="$BASE_DIR""/ldap"
 +
LDAP_TEMP_LDIF="$BASE_DIR""/ldif.bak"
 +
STATUS_DIR="$BASE_DIR""/status"
 +
SSH_IDENTITY_FILE="$ZIMBRA_DIR""/.ssh/live_sync"
 +
REDOLOG_DIR="$ZIMBRA_DIR""/redolog"
 +
REDO_LOG_FILE="$REDOLOG_DIR""/redo.log"
 +
ARCHIVE_DIR="$REDOLOG_DIR""/archive"
 +
LIVE_SYNC_ARCHIVE_DIR="$REDOLOG_DIR""/live_sync_archives"
 +
LDAP_DATA_DIR="$ZIMBRA_DIR""/data/ldap/"
 +
BACKUP_DIR="$ZIMBRA_DIR""/backup"
 +
SYNC_COMMANDS_SCRIPT="$BASE_DIR""/sync_commands"
 +
SSH="ssh -i ""$SSH_IDENTITY_FILE"" -o StrictHostKeyChecking=no -o CheckHostIP=no"\
 
" -o PreferredAuthentications=hostbased,publickey"
 
" -o PreferredAuthentications=hostbased,publickey"
lock_dir="$locking_dir""/live_sync.lock"
+
LOCK_STATE_DIR="$LOCKING_DIR""/live_sync.lock"
stop_file="$status_dir""/live_sync.stop"
+
STOP_FILE="$STATUS_DIR""/live_sync.stop"
watches_file="$status_dir""/watches"
+
LAST_GOOD_REDO_REPLAY="$STATUS_DIR""/last_good_redo_replay"
log_file="$log_dir""/live_sync.log"
+
LAST_GOOD_REDO_SYNC="$STATUS_DIR""/last_good_redo_sync"
pid_file_ldap="$pid_dir""/ldap_live_sync.pid"
+
LAST_GOOD_REDO_STREAM="$STATUS_DIR""/last_good_redo_stream"
pid_file_redo="$pid_dir""/redo_log_live_sync.pid"
+
LAST_GOOD_LDAP_SYNC="$STATUS_DIR""/last_good_ldap_sync"
conf_file="$base_dir""/live_sync.conf"
+
LAST_GOOD_LDAP_START="$STATUS_DIR""/last_good_ldap_start"
 +
WATCHES_FILE="$STATUS_DIR""/watches"
 +
PID_FILE_LDAP="$PID_DIR""/ldap_live_sync.pid"
 +
PID_FILE_REDO="$PID_DIR""/redo_log_live_sync.pid"
 +
CONF_FILE="$BASE_DIR""/live_sync.conf"
  
 
#******************************************************************************
 
#******************************************************************************
Line 122: Line 151:
 
#******************************************************************************
 
#******************************************************************************
  
#Ensure ldap and mysql servers are running and then replay redo logs
+
#Format for log output with errors and warnings going to >&2
 +
logit () {
 +
  logit_1 () {
 +
    echo -n "$( date ) :"
 +
    case $ in
 +
      1)
 +
        echo -n "Error :"
 +
        ;;
 +
      2)
 +
        echo -n "Warning :"
 +
        ;;
 +
      3)
 +
        echo -n "Info :"
 +
        ;;
 +
      4)
 +
        echo -n "Debug :"
 +
        ;;
 +
    esac
 +
    echo $@
 +
  }
 +
  local msg_level output_chan
 +
  if [ $1 -le $LOG_LEVEL ]; then
 +
    msg_level=$1
 +
    shift
 +
    if [ $msg_level -le 2 ]; then
 +
      logit_1 $@ >&2
 +
    else
 +
      logit_1 $@
 +
    fi
 +
  fi
 +
}
 +
 
 +
#Detect HSM
 +
detect_hsm () {
 +
  local retval
 +
  #LDAP must be running
 +
  ldap status &>/dev/null || ldap start &>/dev/null
 +
  #MySQL must be running
 +
  mysql.server status &>/dev/null || mysql.server start &>/dev/null
 +
  #Preserve mailbox running state
 +
  zmmailboxdctl status &>/dev/null
 +
  prev_zmmailbox_status=$?
 +
  zmmailboxdctl start &>/dev/null
 +
  zmvolume -l | grep "type: secondaryMessage" >/dev/null
 +
  retval=$?
 +
  if [ $prev_zmmailbox_status -ne 0 ]; then
 +
    zmmailboxdctl stop &>/dev/null
 +
  fi
 +
  return $retval
 +
}
 +
 
 +
#Ensure ldap, convertd and mysql servers are running and then replay redo logs
 
replay_redo_logs () {
 
replay_redo_logs () {
 +
  local server_failed
 +
 
   ldap status &>/dev/null || ldap start &>/dev/null
 
   ldap status &>/dev/null || ldap start &>/dev/null
   mysql.server status status &>/dev/null || mysql.server start &>/dev/null
+
   mysql.server status &>/dev/null || mysql.server start &>/dev/null
   if ! ldap status &>/dev/null || ! mysql.server status status &>/dev/null; then
+
  server_failed=0
     echo "Start of local ldap/mysql servers failed" >&2
+
   if ! ldap status &>/dev/null; then
 +
     logit 1 "Start of local ldap server failed"
 
     ldap status >&2
 
     ldap status >&2
 +
    #Return error to trigger a "break" in while loop
 +
    server_failed=1
 +
  fi
 +
  if ! mysql.server status &>/dev/null; then
 +
    logit 1 "Start of local mysql server failed"
 
     mysql.server status >&2
 
     mysql.server status >&2
     break
+
     #Return error to trigger a "break" in while loop
 +
    server_failed=1
 +
  fi
 +
  if [ "x""$convertd_enabled" == "xtrue" ]; then
 +
    #Make sure indexing works while replaying redo log
 +
    zmconvertctl status &>/dev/null || zmconvertctl start &>/dev/null
 +
    if ! zmconvertctl status &>/dev/null; then
 +
      logit 2 "Start of local convertd servers failed"
 +
      zmconvertctl status >&2
 +
    fi
 
   fi
 
   fi
   echo -n "$( date ) :"
+
   [ $server_failed -eq 1 ] && return 1
   echo "Replaying redologs..."
+
   logit 3 "Replaying redologs..."
 
   if ! zmplayredo >/dev/null; then
 
   if ! zmplayredo >/dev/null; then
     echo "Replay of redolog failed" >&2
+
     logit 2 "Replay of redolog failed"
 +
    #No error returned here since "break" is not necessary
 +
  else
 +
    #If no errors then archive redo log files
 +
    if ! mkdir -p "$LIVE_SYNC_ARCHIVE_DIR"; then
 +
      logit 1 "Unable to create directory $LIVE_SYNC_ARCHIVE_DIR"
 +
      exit 1
 +
    fi
 +
    mv -f "$ARCHIVE_DIR""/"* "$LIVE_SYNC_ARCHIVE_DIR""/" 2>/dev/null
 +
    touch "$LAST_GOOD_REDO_REPLAY"
 
   fi
 
   fi
   echo -n "$( date ) :"
+
   logit 3 "Replaying redologs done"
  echo "Replaying redologs done"
+
  return 0
 
}
 
}
  
 
#The redo log sync daemon
 
#The redo log sync daemon
 
redo_log_live_sync () {
 
redo_log_live_sync () {
   local stream_pid archived_file
+
   local stream_pid archived_file i archived_redo_log_file prev_zmmailbox_status secondary_storage
  
   echo -n "$( date ) :"
+
   logit 3 "Starting redo log live sync process"
  echo "Starting redo log live sync process"
+
 
   #Mailbox process must not be running now
+
   #Wait for lock directory to be successfully created
   if zmmailboxdctl status &>/dev/null; then
+
   while ! mkdir "$LOCK_STATE_DIR" &>/dev/null; do
     zmmailboxdctl stop &>/dev/null
+
     sleep 2
   fi
+
   done
   if zmmailboxdctl status &>/dev/null; then
+
   logit 3 "Detecting if HSM used"
     echo "Unable to stop local Zimbra mailbox service" >&2
+
  if detect_hsm; then
     break
+
     logit 3 "HSM Detected"
 +
    secondary_storage="yes"
 +
  else
 +
     logit 3 "No HSM Detected"
 
   fi
 
   fi
   echo "Incremental backups enabled : $incremental_backups"
+
   rmdir "$LOCK_STATE_DIR"
 
    
 
    
   while [ ! -f "$stop_file" ]; do
+
   while [ ! -f "$STOP_FILE" ]; do
     while [ ! -f "$stop_file" ]; do
+
     while [ ! -f "$STOP_FILE" ]; do
 
       #Wait for lock directory to be successfully created
 
       #Wait for lock directory to be successfully created
       while ! mkdir "$lock_dir" &>/dev/null; do
+
       while ! mkdir "$LOCK_STATE_DIR" &>/dev/null; do
 
         sleep 2
 
         sleep 2
 
       done
 
       done
       [ -f "$stop_file" ] && break
+
       [ -f "$STOP_FILE" ] && break
       #Replay redo logs also at this point if incremental backups are happening in
+
      logit 3 "Syncing redologs..."
       #case redo log archives have now suddenly disappeared due to incremental backup
+
       #If incremental backups are enabled then gather redo logs from backups and copy
       if [ "x""$incremental_backups" == "xtrue" ]; then
+
       #to local archive directory
         replay_redo_logs
+
      redo_sync_fail="false"
      fi
+
       for archived_redo_log_file in $( echo "gather""$REDO_LOG_HISTORY_DAYS" | \
      echo -n "$( date ) :"
+
          $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" ); do
       echo "Syncing redologs..."
+
        if [ -f "$LIVE_SYNC_ARCHIVE_DIR""/""$( basename "$archived_redo_log_file" )" ]; then
       if ! rsync -e "$SSH" -aHz --force --delete \
+
          logit 4 "Already processed so skipping: $archived_redo_log_file"
         "$remote_address"":/opt/zimbra/redolog/" "/opt/zimbra/redolog"; then
+
         else
        echo "Rsync of redolog failed" >&2
+
          logit 4 "Syncing incremental backup file: $archived_redo_log_file"
         break
+
          if ! rsync -z -e "$SSH" --size-only "$remote_address"":""$archived_redo_log_file" \
 +
              "$ARCHIVE_DIR""/".; then
 +
            logit 2 "Rsync of a redolog, $archived_redo_log_file, failed"
 +
            redo_sync_fail="true"
 +
          fi
 +
        fi
 +
      done
 +
 
 +
      #Suspend if HSM is running
 +
       if which zmhsm >/dev/null && zmhsm -u | grep "Currently running" >/dev/null; then
 +
        logit 3 "Replaying redologs is suspended while HSM process is active"
 +
      else
 +
     
 +
        #Mailbox process must not be running now. Preserve state and stop.
 +
        zmmailboxdctl status &>/dev/null
 +
        prev_zmmailbox_status=$?
 +
        if [ $prev_zmmailbox_status -eq 0 ]; then
 +
          zmmailboxdctl stop &>/dev/null
 +
        fi
 +
        sleep 2
 +
        if zmmailboxdctl status &>/dev/null; then
 +
          logit 1 "Unable to stop local Zimbra mailbox service"
 +
          return 1
 +
        fi
 +
        
 +
        logit 4 "Syncing $REDO_LOG_FILE"
 +
        if ! rsync -e "$SSH" -z \
 +
          "$remote_address"":$REDO_LOG_FILE" "$REDO_LOG_FILE"; then
 +
          logit 2 "Rsync of $REDO_LOG_FILE failed"
 +
          redo_sync_fail="true"
 +
        fi
 +
        logit 4 "Syncing $REDO_LOG_FILE done"
 +
        if [ "x""$redo_sync_fail" == "xfalse" ]; then
 +
          touch "$LAST_GOOD_REDO_SYNC"
 +
        else
 +
          break
 +
        fi
 +
        logit 4 "Syncing redologs done"
 +
        logit 4 "Purging redolog directory and archives"
 +
        #Purge local redolog directory
 +
        find $REDOLOG_DIR -mtime +$REDO_LOG_HISTORY_DAYS -type f -exec rm {} \;
 +
        #Purge any interrupted rsync files
 +
        find $REDOLOG_DIR -name '.redo*' -type f -exec rm {} \;
 +
        logit 4 "Purge redolog directory and archives done"
 +
        replay_redo_logs || break
 +
     
 +
        #Restore mailboxd to previous running state or start if HSM is being used
 +
         if [ $prev_zmmailbox_status -eq 0 ] || \
 +
            [ "x""$secondary_storage" == "xyes" ] >/dev/null; then
 +
          logit 4 "Re-starting Zimbra mailbox service"
 +
          zmmailboxdctl start &>/dev/null
 +
          if ! zmmailboxdctl status &>/dev/null; then
 +
            logit 2 "Unable to re-start local Zimbra mailbox service"
 +
          fi
 +
         fi
 
       fi
 
       fi
       echo -n "$( date ) :"
+
        
      echo "Syncing redologs done"
 
      replay_redo_logs
 
 
       #If there are no incremental backups then remote archive directory will need purging
 
       #If there are no incremental backups then remote archive directory will need purging
 
       if [ "x""$incremental_backups" != "xtrue" ]; then
 
       if [ "x""$incremental_backups" != "xtrue" ]; then
         echo purge | \
+
        logit 4 "Purging remote redolog directory"
           $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands"
+
         echo "purge""$REDO_LOG_HISTORY_DAYS" | \
 +
           $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT"
 +
        logit 4 "Purging remote redolog directory done"
 
       fi
 
       fi
 
       #Establish copy-and-live-stream of current redo.log file
 
       #Establish copy-and-live-stream of current redo.log file
 +
      logit 4 "Live streaming redolog"
 
       echo stream | \
 
       echo stream | \
 
         $SSH "$remote_address" \
 
         $SSH "$remote_address" \
         "/opt/zimbra/live_sync/sync_commands" >"/opt/zimbra/redolog/redo.log" 2>/dev/null &
+
         "$SYNC_COMMANDS_SCRIPT" >"$REDO_LOG_FILE" &
 
       stream_pid=$!
 
       stream_pid=$!
 +
      disown $stream_pid
 +
      #Delay as PID was sometimes not being found if checked immediately
 +
      sleep 5
 
       #If successfully established stream then sit and wait for move to archive
 
       #If successfully established stream then sit and wait for move to archive
       if ps $stream_pid | grep "/opt/zimbra/live_sync/sync_commands" &>/dev/null; then
+
       if ps $stream_pid | grep "$SYNC_COMMANDS_SCRIPT" &>/dev/null; then
 +
        logit 4 "Live streaming redolog established"
 +
        touch "$LAST_GOOD_REDO_STREAM"
 
         #Remove lock file, this is resting point
 
         #Remove lock file, this is resting point
         rmdir "$lock_dir" &>/dev/null
+
         rmdir "$LOCK_STATE_DIR" &>/dev/null
 
         #Wait for name to be passed of new archive file after redo.log is moved on remote server
 
         #Wait for name to be passed of new archive file after redo.log is moved on remote server
 
         #This is normal resting point of this process
 
         #This is normal resting point of this process
 
         archived_file=$( echo wait_redo | \
 
         archived_file=$( echo wait_redo | \
           $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands" 2>/dev/null | \
+
           $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" | \
           tail -n 1 | egrep -o "redo-.*log" )
+
           tail -n 1 | grep -Eo "redo-.*log" )
 
         #Kill stream
 
         #Kill stream
         kill -KILL $( ps aux | grep "/opt/zimbra/live_sync/sync_commands" | \
+
         kill -KILL $( ps aux | grep "$SYNC_COMMANDS_SCRIPT" | \
 
           grep -v grep | awk '{print $2}' ) &>/dev/null
 
           grep -v grep | awk '{print $2}' ) &>/dev/null
 
         #Mirror move operation on local server
 
         #Mirror move operation on local server
         if echo "$archived_file" | egrep "redo-.*log" &>/dev/null; then
+
         if echo "$archived_file" | grep -E "redo-.*log" &>/dev/null; then
           echo "Moving redo.log to $archived_file"
+
           logit 4 "Moving redo.log to $archived_file"
           mv -f "/opt/zimbra/redolog/redo.log" "/opt/zimbra/redolog/archive/""$archived_file"
+
           mv -f "$REDO_LOG_FILE" "$ARCHIVE_DIR""/""$archived_file" 2>/dev/null
 
         else
 
         else
           echo "Archive file name not found" >&2
+
           logit 2 "Archive file name not found"
 
         fi
 
         fi
         [ -f "$stop_file" ] && break
+
         [ -f "$STOP_FILE" ] && break
 
       else
 
       else
         echo "Failed to start redolog streaming" >&2
+
         logit 2 "Failed to start redolog streaming, PID=$stream_pid"
 
         break
 
         break
 
       fi
 
       fi
 
     done
 
     done
     rmdir "$lock_dir" &>/dev/null
+
     rmdir "$LOCK_STATE_DIR" &>/dev/null
     #Wait 10 minutes for error to error to clear
+
     #Wait $ERROR_CLEAR_MINUTES minutes for error to error to clear
     [ ! -f "$stop_file" ] && sleep 600
+
     i=0
 +
    while [ $(( i++ )) -lt 60 ] && [ ! -f "$STOP_FILE" ]; do
 +
      sleep $ERROR_CLEAR_MINUTES
 +
    done
 
   done
 
   done
   echo -n "$( date ) :"
+
   logit 3 "Ending redo log live sync process"
  echo "Ending redo log live sync process"
 
 
}
 
}
  
 
#The ldap sync daemon
 
#The ldap sync daemon
 
ldap_live_sync () {
 
ldap_live_sync () {
   local ldap_wait_pid
+
   local ldap_wait_pid i last_ldap_success_state
  
   echo -n "$( date ) :"
+
   last_ldap_success_state="false"
   echo "Starting ldap live sync process"
+
    
   while [ ! -f "$stop_file" ]; do
+
  logit 3  "Starting ldap live sync process"
     while [ ! -f "$stop_file" ]; do
+
   while [ ! -f "$STOP_FILE" ]; do
 +
     while [ ! -f "$STOP_FILE" ]; do
 
       #Wait for lock directory to be successfully created
 
       #Wait for lock directory to be successfully created
       while ! mkdir "$lock_dir" &>/dev/null; do
+
       while ! mkdir "$LOCK_STATE_DIR" &>/dev/null; do
 
         sleep 3
 
         sleep 3
 
       done
 
       done
       echo -n "$( date ) :"
+
       if [ $zimbra_version -lt 8 ]; then
      echo "Syncing ldap"
+
        logit 3 "Syncing ldap using rsync"
      while [ 1 ]; do
+
        #Use rsync for Zimbra older than verion 8
        #Check for changes during ldap sync operation
+
        while [ 1 ]; do
        echo wait_ldap | \
+
          #Check for changes during ldap sync operation
          $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands" &>"$watches_file" &
+
          echo wait_ldap | \
        ldap_wait_pid=$!
+
            $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" &>"$WATCHES_FILE" &
        if ! ps "$ldap_wait_pid" &>/dev/null; then
+
          ldap_wait_pid=$!
          echo "Unable to establish watch on remote LDAP directory, no ldap sync performed"
+
          disown $ldap_wait_pid
 +
          if ! ps "$ldap_wait_pid" &>/dev/null; then
 +
            logit 2 "Unable to establish watch on remote LDAP directory, no ldap sync performed"
 +
            break
 +
          fi
 +
          #Wait for watches to be established
 +
          while ! grep "established" "$WATCHES_FILE" &>/dev/null && \
 +
              ps "$ldap_wait_pid" &>/dev/null; do
 +
            sleep 1
 +
          done
 +
          #Echo out status
 +
          cat "$WATCHES_FILE"
 +
          rm -f "$WATCHES_FILE"
 +
         
 +
         
 +
          #Rsync remote server to temporary local ldap directory
 +
          if ! rsync -e "$SSH" -aHz --sparse --force --delete \
 +
            "$remote_address"":$LDAP_DATA_DIR""/" "$LDAP_TEMP_DIR""/"; then
 +
            logit 2 "Rsync of ldap failed"
 +
            break
 +
          else
 +
            touch "$LAST_GOOD_LDAP_SYNC"
 +
          fi
 +
          ps $ldap_wait_pid &>/dev/null && break
 +
          logit 3 "Ldap changed during rsync. Re-syncing."
 +
          sleep 10
 +
        done
 +
        kill -KILL $ldap_wait_pid &>/dev/null
 +
      else
 +
        #Use ldif export for Zimbra 8 and over
 +
        logit 3 "Syncing ldap using ldif"
 +
        if ! echo dump_ldap | \
 +
          $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" >"$LDAP_TEMP_LDIF"; then
 +
          logit 2 "Unable to fetch remote LDIF, no LDAP sync performed"
 
           break
 
           break
 +
        else
 +
          touch "$LAST_GOOD_LDAP_SYNC"
 
         fi
 
         fi
        #Wait for watches to be established
+
      fi
        while ! grep "established" "$watches_file" &>/dev/null && \
+
      if which zmhsm >/dev/null && zmhsm -u | grep "Currently running" >/dev/null; then
            ps "$ldap_wait_pid" &>/dev/null; do
+
         logit 3 "LDAP update is suspended while HSM process is active"
          sleep 1
+
      else
         done
+
         #Stop ldap
        rm -f "$watches_file"
+
         ldap status &>/dev/null && ldap stop &>/dev/null
         #Rsync remote server to temporary local ldap directory
+
        if ldap status &>/dev/null; then
         if ! rsync -e "$SSH" -aHz --force --delete \
+
           logit 1 "Unable to stop local ldap server"
          "$remote_address"":/opt/zimbra/data/ldap/" "$ldap_dir""/"; then
 
           echo "Rsync of ldap failed" >&2
 
 
           break
 
           break
 
         fi
 
         fi
         ps $ldap_wait_pid &>/dev/null && break
+
         if [ $zimbra_version -lt 8 ]; then
        echo "Ldap changed during rsync. Re-syncing."
+
          #Use rsync for Zimbra older than verion 8
         sleep 10
+
          #rsync temporary local ldap directory to real local ldap directory
      done
+
          rsync -aH --sparse "$LDAP_TEMP_DIR""/" "$LDAP_DATA_DIR""/"
      kill -KILL $ldap_wait_pid &>/dev/null
+
         else
      #Stop ldap
+
          #Use LDIF import for Zimbra 8 and over
      ldap status &>/dev/null && ldap stop &>/dev/null
+
          rm -rf "$LDAP_DATA_DIR""/mdb" && \
      if ldap status &>/dev/null; then
+
          mkdir -p "$LDAP_DATA_DIR""/mdb/db" && \
        echo "Unable to stop local ldap server" >&2
+
          mkdir -p "$LDAP_DATA_DIR""/mdb/log" && \
        break
+
          /opt/zimbra/libexec/zmslapadd "$LDAP_TEMP_LDIF"
      fi
+
          if [ $? != 0 ]; then
      #rsync temporary local ldap directory to real local ldap directory
+
            logit 2 "Unable to import LDIF into local LDAP"
      rsync -aH "$ldap_dir""/" "/opt/zimbra/data/ldap/"
+
            break
      #Restart ldap
+
          fi
      ldap status &>/dev/null || ldap start &>/dev/null
+
        fi
      if ! ldap status &>/dev/null; then
+
        #Restart ldap
        echo "Unable to restart local ldap server" >&2
+
        ldap status &>/dev/null || ldap start &>/dev/null
 +
        if ! ldap status &>/dev/null; then
 +
          logit 1 "Unable to restart local ldap server"
 +
          last_ldap_success_state="false"
 +
        else
 +
          last_ldap_success_state="true"
 +
        fi
 +
        logit 4 "Syncing LDAP done"
 
       fi
 
       fi
      echo -n "$( date ) :"
+
       rmdir "$LOCK_STATE_DIR" &>/dev/null
      echo "Syncing LDAP done"
+
       [ -f "$STOP_FILE" ] && break
       rmdir "$lock_dir" &>/dev/null
+
       #Wait for change in remote ldap over $LDAP_CHECK_MINUTES_INTERVAL intervals
       [ -f "$stop_file" ] && break
 
       #Wait for change in remote ldap over 10 minute intervals
 
 
       echo wait_ldap | \
 
       echo wait_ldap | \
         $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands" 2>/dev/null &
+
         $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" &
 
       ldap_wait_pid=$!
 
       ldap_wait_pid=$!
       while [ 1 ]; do
+
      disown $ldap_wait_pid
 +
       while [ ! -f "$STOP_FILE" ]; do
 +
        logit 4 "Start new LDAP monitor period"
 +
        #Repeat last ldap success so that no ldap change is not
 +
        #interpreted by Nagios as no ldap success.
 +
        if [ "x""$last_ldap_success_state" == "xtrue" ]; then
 +
          touch "$LAST_GOOD_LDAP_START"
 +
        fi
 
         #Restart wait for ldap change if required
 
         #Restart wait for ldap change if required
 
         if ! ps $ldap_wait_pid &>/dev/null; then
 
         if ! ps $ldap_wait_pid &>/dev/null; then
 
           echo wait_ldap | \
 
           echo wait_ldap | \
             $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands" 2>/dev/null &
+
             $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" &
 
           ldap_wait_pid=$!
 
           ldap_wait_pid=$!
 +
          disown $ldap_wait_pid
 
         fi
 
         fi
         sleep 600
+
         #Wait $LDAP_CHECK_MINUTES_INTERVAL minutes
 +
        i=0
 +
        while [ $(( i++ )) -lt 60 ] && [ ! -f "$STOP_FILE" ]; do
 +
          sleep $LDAP_CHECK_MINUTES_INTERVAL
 +
        done
 
         #If wait process is not still running then there was a change
 
         #If wait process is not still running then there was a change
 
         ps $ldap_wait_pid &>/dev/null || break
 
         ps $ldap_wait_pid &>/dev/null || break
 
       done
 
       done
 
     done
 
     done
     rmdir "$lock_dir" &>/dev/null
+
     rmdir "$LOCK_STATE_DIR" &>/dev/null
     #Wait 10 minutes for error to error to clear
+
     #Wait $ERROR_CLEAR_MINUTES minutes for error to error to clear
     [ ! -f "$stop_file" ] && sleep 600
+
     i=0
 +
    while [ $(( i++ )) -lt 60 ] && [ ! -f "$STOP_FILE" ]; do
 +
      sleep $ERROR_CLEAR_MINUTES
 +
    done
 
   done
 
   done
   echo -n "$( date ) :"
+
   logit 3 "Ending ldap live sync process"
   echo "Ending ldap live sync process"
+
}
 +
 
 +
get_zimbra_config_globals () {
 +
  #Query whether incremental backups are enabled
 +
  incremental_backups=$( echo "query_incremental" | \
 +
    $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" )
 +
   
 +
  #Query whether convertd is installed and enabled
 +
  ldap status &>/dev/null || ldap start &>/dev/null
 +
  if ! ldap status &>/dev/null; then
 +
    logit 1 "Unable to start local ldap server"
 +
    exit 1
 +
  fi
 +
  if [ $( zmprov -l  gs `zmhostname` | \
 +
          grep -E "(zimbraServiceInstalled|zimbraServiceEnabled):[[:space:]]*convertd" | \
 +
          wc -l  ) -eq 2 ]; then
 +
    convertd_enabled="true"
 +
   else
 +
    convertd_enabled="false"
 +
  fi
 
}
 
}
  
 
kill_everything () {
 
kill_everything () {
   touch "$stop_file"
+
   touch "$STOP_FILE"
   kill -KILL $( head -n 1 "$pid_file_ldap" 2>/dev/null ) &>/dev/null
+
   kill -KILL $( head -n 1 "$PID_FILE_LDAP" 2>/dev/null ) &>/dev/null
   kill -KILL $( head -n 1 "$pid_file_redo" 2>/dev/null ) &>/dev/null
+
   kill -KILL $( head -n 1 "$PID_FILE_REDO" 2>/dev/null ) &>/dev/null
 
   kill -KILL $( ps aux | grep "live_syncd start" | grep -v grep | awk '{print $2}' ) &>/dev/null
 
   kill -KILL $( ps aux | grep "live_syncd start" | grep -v grep | awk '{print $2}' ) &>/dev/null
 
   kill -KILL $( ps aux | grep "redo_log_live_sync" | grep -v grep | awk '{print $2}' ) &>/dev/null
 
   kill -KILL $( ps aux | grep "redo_log_live_sync" | grep -v grep | awk '{print $2}' ) &>/dev/null
 
   kill -KILL $( ps aux | grep "ldap_live_sync" | grep -v grep | awk '{print $2}' ) &>/dev/null
 
   kill -KILL $( ps aux | grep "ldap_live_sync" | grep -v grep | awk '{print $2}' ) &>/dev/null
 
   kill -KILL $( ps aux | \
 
   kill -KILL $( ps aux | \
     grep "/opt/zimbra/live_sync/sync_commands" | grep -v grep | awk '{print $2}' ) &>/dev/null
+
     grep "$SYNC_COMMANDS_SCRIPT" | grep -v grep | awk '{print $2}' ) &>/dev/null
   rm -f "$stop_file"
+
  kill -KILL $( ps aux | grep "rsync" | grep -E "$REDOLOG_DIR""|""$LDAP_DATA_DIR""|""$BACKUP_DIR" | \
   rm -f "$pid_file_ldap"
+
    awk '{print $2}' ) &>/dev/null
   rm -f "$pid_file_redo"
+
  #Kill redolog playback if running
   rmdir "$lock_dir" &>/dev/null
+
  kill -KILL $( ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep | \
 +
    awk '{print $2}' ) &>/dev/null
 +
   rm -f "$STOP_FILE"
 +
   rm -f "$PID_FILE_LDAP"
 +
   rm -f "$PID_FILE_REDO"
 +
   rmdir "$LOCK_STATE_DIR" &>/dev/null
 
}
 
}
  
Line 324: Line 573:
 
   trap - INT TERM SIGINT SIGTERM
 
   trap - INT TERM SIGINT SIGTERM
 
   echo 'kill -KILL $( ps aux | grep live_syncd | grep -v grep | awk '"'"'{print $2}'"'"' ) &>/dev/null' | \
 
   echo 'kill -KILL $( ps aux | grep live_syncd | grep -v grep | awk '"'"'{print $2}'"'"' ) &>/dev/null' | \
     at now && sleep 1 && rmdir "$lock_dir" &>/dev/null
+
     at now && sleep 1 && rmdir "$LOCK_STATE_DIR" &>/dev/null
 
   exit
 
   exit
 
}
 
}
Line 338: Line 587:
 
fi
 
fi
  
mkdir -p "$locking_dir"
+
mkdir -p "$LOCKING_DIR"
mkdir -p "$pid_dir"
+
mkdir -p "$PID_DIR"
mkdir -p "$log_dir"
+
mkdir -p "$LOG_DIR"
mkdir -p "$ldap_dir"
+
mkdir -p "$LDAP_TEMP_DIR"
mkdir -p "$status_dir"
+
mkdir -p "$STATUS_DIR"
 +
chmod 755 "$STATUS_DIR"
  
if [ ! -f "$conf_file" ]; then
+
if [ ! -f "$CONF_FILE" ]; then
   echo "Configuration file, $conf_file, not found" >&2
+
   echo "Configuration file, $CONF_FILE, not found" >&2
 
   exit 1
 
   exit 1
 
fi
 
fi
  
source "$conf_file"
+
source "$CONF_FILE"
  
 
#Find all local addresses
 
#Find all local addresses
 
server_addresses=$( /sbin/ifconfig | grep inet | \
 
server_addresses=$( /sbin/ifconfig | grep inet | \
   egrep -io "addr:[[:space:]]*(([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5})" | \
+
   grep -Eio "addr:[[:space:]]*(([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5})" | \
 
   sed "s/addr://" | tr -d " \t" )
 
   sed "s/addr://" | tr -d " \t" )
  
 
#Check configured server addresses are valid
 
#Check configured server addresses are valid
 
if ! echo "$server1" | \
 
if ! echo "$server1" | \
     egrep -i "([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5}" &>/dev/null; then
+
     grep -Ei "([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5}" &>/dev/null; then
 
   echo "No valid IP address found for server1 in configuration file" >&2
 
   echo "No valid IP address found for server1 in configuration file" >&2
 
   exit 1
 
   exit 1
 
fi
 
fi
 
if ! echo "$server2" | \
 
if ! echo "$server2" | \
     egrep -i "([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5}" &>/dev/null; then
+
     grep -Ei "([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5}" &>/dev/null; then
 
   echo "No valid IP address found for server2 in configuration file" >&2
 
   echo "No valid IP address found for server2 in configuration file" >&2
 
   exit 1
 
   exit 1
Line 383: Line 633:
  
 
#Check remote server is OK
 
#Check remote server is OK
remote_server_status=$( echo "test" |
+
remote_server_status=$( echo "test" | \
   $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands" )
+
   $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" )
  
 
if [ "x""$remote_server_status" == "xbusy" ]; then
 
if [ "x""$remote_server_status" == "xbusy" ]; then
Line 397: Line 647:
 
fi
 
fi
  
incremental_backups=$( echo "query_incremental" |
+
#Get major Zimbra version
  $SSH "$remote_address" "/opt/zimbra/live_sync/sync_commands" )
+
zimbra_version=$( zmcontrol -v | grep -Eo "[0-9][^\.]*" | head -n 1 )
 +
if [ ${#zimbra_version} -lt 1 ]; then
 +
zimbra_version=0
 +
fi
  
 
case $1 in
 
case $1 in
 
   start)
 
   start)
     if [ -f $pid_file_redo ] || [ -f $pid_file_ldap ]; then
+
    #Check for processes from this script and also redolog replay. Don't count PID files older than uptime.
 +
     if [ -f "$PID_FILE_REDO" ] && \
 +
        [ $(( $( date +%s ) - $( stat -c '%Y' "$PID_FILE_REDO" ) )) -lt $( cat /proc/uptime | grep -Eo "[0-9]+" | head -n 1 ) ]; then
 +
      pid_found="yes"
 +
    fi
 +
    if [ -f "$PID_FILE_LDAP" ] && \
 +
        [ $(( $( date +%s ) - $( stat -c '%Y' "$PID_FILE_LDAP" ) )) -lt $( cat /proc/uptime | grep -Eo "[0-9]+" | head -n 1 ) ]; then
 +
      pid_found="yes"
 +
    fi
 +
    if [ $pid_found ] || \
 +
        ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep &>/dev/null; then
 
       echo "Proccess already running"
 
       echo "Proccess already running"
 
     else
 
     else
 
       echo -n "Starting processes..."
 
       echo -n "Starting processes..."
       ldap_live_sync >>"$log_file" 2>&1 &
+
      get_zimbra_config_globals
       echo $! >"$pid_file_ldap"
+
      echo "***************************************" >>"$LOG_FILE"
       redo_log_live_sync >>"$log_file" 2>&1 &
+
      logit 3 "Starting $( basename "$0" )" >>"$LOG_FILE"
       echo $! >"$pid_file_redo"
+
      logit 3 "Incremental backups enabled : $incremental_backups" >>"$LOG_FILE"
 +
      logit 3 "Convertd enabled : $convertd_enabled" >>"$LOG_FILE"
 +
 
 +
       ldap_live_sync >>"$LOG_FILE" 2>&1 &
 +
       echo $! >"$PID_FILE_LDAP"
 +
       redo_log_live_sync >>"$LOG_FILE" 2>&1 &
 +
       echo $! >"$PID_FILE_REDO"
 
       echo "done"
 
       echo "done"
 
     fi
 
     fi
 
     ;;
 
     ;;
 
   stop)
 
   stop)
     touch "$stop_file"
+
     touch "$STOP_FILE"
     [ -d "$lock_dir" ] && echo "Waiting for sync operations to complete..."
+
     [ -d "$LOCK_STATE_DIR" ] && echo "Waiting for sync operations to complete..."
     while [ -d "$lock_dir" ]; do
+
     while [ -d "$LOCK_STATE_DIR" ]; do
 
       sleep 5
 
       sleep 5
 
     done
 
     done
     rm -f "$stop_file"
+
     rm -f "$STOP_FILE"
 
     replay_redo_logs
 
     replay_redo_logs
 
     kill_everything
 
     kill_everything
Line 425: Line 694:
 
     ;;
 
     ;;
 
   status)
 
   status)
     if [ -f  $pid_file_redo ] && ps $( head -n 1 $pid_file_redo 2>/dev/null ) &>/dev/null; then
+
    if ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep &>/dev/null; then
 +
      echo "redolog is being replayed"
 +
      replay_stat=0
 +
    else
 +
      replay_stat=3
 +
    fi
 +
     if [ -f  $PID_FILE_REDO ] && ps $( head -n 1 $PID_FILE_REDO 2>/dev/null ) &>/dev/null; then
 
       echo "redo log sync process OK"
 
       echo "redo log sync process OK"
 
       redo_stat=0
 
       redo_stat=0
Line 432: Line 707:
 
       redo_stat=3
 
       redo_stat=3
 
     fi
 
     fi
     if [ -f  $pid_file_ldap ] && ps $( head -n 1 $pid_file_ldap 2>/dev/null ) &>/dev/null; then
+
     if [ -f  $PID_FILE_LDAP ] && ps $( head -n 1 $PID_FILE_LDAP 2>/dev/null ) &>/dev/null; then
 
       echo "ldap sync process OK"
 
       echo "ldap sync process OK"
 
       ldap_stat=0
 
       ldap_stat=0
Line 439: Line 714:
 
       ldap_stat=3
 
       ldap_stat=3
 
     fi
 
     fi
     [ $ldap_stat == 3 ] && [ $redo_stat == 3 ] && exit 3
+
     [ $ldap_stat == 3 ] && [ $redo_stat == 3 ] && [ $replay_stat == 3 ] && exit 3
 
     [ $ldap_stat == 0 ] && [ $redo_stat == 0 ] && exit 0
 
     [ $ldap_stat == 0 ] && [ $redo_stat == 0 ] && exit 0
 
     exit 1
 
     exit 1
Line 449: Line 724:
 
     trap quitting INT TERM SIGINT SIGTERM
 
     trap quitting INT TERM SIGINT SIGTERM
 
     if ps aux | grep "redo_log_live_sync" | grep -v grep  &>/dev/null || \
 
     if ps aux | grep "redo_log_live_sync" | grep -v grep  &>/dev/null || \
         ps aux | grep "ldap_live_sync" | grep -v grep  &>/dev/null; then
+
         ps aux | grep "ldap_live_sync" | grep -v grep  &>/dev/null || \
 +
        ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep &>/dev/null; then
 
       echo "Proccess already running"
 
       echo "Proccess already running"
 
     else
 
     else
 
       echo "Starting processes in realtime"
 
       echo "Starting processes in realtime"
 +
      get_zimbra_config_globals
 +
      logit 3 "Incremental backups enabled : $incremental_backups"
 +
      logit 3 "Convertd enabled : $convertd_enabled"
 
       ldap_live_sync &
 
       ldap_live_sync &
       echo $! >"$pid_file_ldap"
+
       echo $! >"$PID_FILE_LDAP"
 
       redo_log_live_sync &
 
       redo_log_live_sync &
       echo $! >"$pid_file_redo"
+
       echo $! >"$PID_FILE_REDO"
 
       while [ 1 ]; do sleep 10; done
 
       while [ 1 ]; do sleep 10; done
 
     fi
 
     fi
 +
    ;;
 
esac
 
esac
 
</pre>
 
</pre>
Line 484: Line 764:
 
# Title      :  sync_commands
 
# Title      :  sync_commands
 
# Author    :  Simon Blandford <simon -at- onepointltd -dt- com>
 
# Author    :  Simon Blandford <simon -at- onepointltd -dt- com>
# Date      :  2011-03-30
+
# Date      :  2013-03-12
# Requires  :  zimbra live_syncd inotify-tools
+
# Requires  :  zimbra sync_commands inotify-tools
 
# Category  :  Administration
 
# Category  :  Administration
# Version    :  1.0.0
+
# Version    :  2.1.3
 
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
 
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
 
# License    :  GPLv3 (see above)
 
# License    :  GPLv3 (see above)
Line 494: Line 774:
 
# Keep two Zimbra servers synchronised in near-realtime, local agent
 
# Keep two Zimbra servers synchronised in near-realtime, local agent
 
##########################################################################
 
##########################################################################
 +
 +
 +
#******************************************************************************
 +
#********************** Main Program ******************************************
 +
#******************************************************************************
 +
  
 
if [ "$( whoami )" != "zimbra" ]; then
 
if [ "$( whoami )" != "zimbra" ]; then
Line 503: Line 789:
 
if echo "$SSH_ORIGINAL_COMMAND" | \
 
if echo "$SSH_ORIGINAL_COMMAND" | \
 
   grep "rsync" | \
 
   grep "rsync" | \
     egrep "/opt/zimbra/redolog/|/opt/zimbra/data/ldap/" &>/dev/null; then
+
     grep -E "/opt/zimbra/redolog/|/opt/zimbra/data/ldap/|/opt/zimbra/backup/sessions/incr" &>/dev/null; then
 
   case "$SSH_ORIGINAL_COMMAND" in
 
   case "$SSH_ORIGINAL_COMMAND" in
 
   *\&*)
 
   *\&*)
Line 547: Line 833:
 
   }
 
   }
  
 +
  #Extract numeric parameter from command name
 +
  param=$( echo "$command" | grep -Eo "[0-9]+" )
 +
  command=$( echo "$command" | grep -Eio "[a-z_]+" )
 +
 
 
   case $command in
 
   case $command in
 
     test)
 
     test)
Line 558: Line 848:
 
       #Wait for redo log roll-over
 
       #Wait for redo log roll-over
 
       check_inotify
 
       check_inotify
 +
      kill -KILL $( ps aux | grep "inotifywait -r /opt/zimbra/redolog" | \
 +
                    grep -v grep | awk '{print $2}' ) &>/dev/null
 
       inotifywait -r /opt/zimbra/redolog -e moved_to
 
       inotifywait -r /opt/zimbra/redolog -e moved_to
 
       ;;
 
       ;;
 
     wait_ldap)
 
     wait_ldap)
       #Wait for ldap changes
+
       #Wait for ldap changes. Ignore log changes.
 
       check_inotify
 
       check_inotify
 +
      kill -KILL $( ps aux | grep "inotifywait -r /opt/zimbra/data/ldap" | \
 +
                    grep -v grep | awk '{print $2}' ) &>/dev/null
 
       inotifywait -r /opt/zimbra/data/ldap -e modify \
 
       inotifywait -r /opt/zimbra/data/ldap -e modify \
 
         -e attrib -e close_write -e moved_to -e moved_from \
 
         -e attrib -e close_write -e moved_to -e moved_from \
 +
        --exclude "logs\/log\.|accesslog" \
 
         -e move -e delete -e delete_self
 
         -e move -e delete -e delete_self
 +
      ;;
 +
    dump_ldap)
 +
      #Extract the LDIF database and stream it
 +
      /opt/zimbra/libexec/zmslapcat  "/tmp/zimbraldif"
 +
      cat "/tmp/zimbraldif/ldap.bak"
 +
      rm -rf "/tmp/zimbraldif"
 
       ;;
 
       ;;
 
     stream)
 
     stream)
 
       #Live-stream redolog
 
       #Live-stream redolog
 +
      #Kill any hanging previous tail commands
 +
      kill -KILL $( ps aux | grep "tail -c +0 -f /opt/zimbra/redolog/redo.log" | \
 +
                    grep -v grep | awk '{print $2}' ) &>/dev/null
 
       tail -c +0 -f /opt/zimbra/redolog/redo.log
 
       tail -c +0 -f /opt/zimbra/redolog/redo.log
 +
      ;;
 +
    gather)
 +
      #Gather list of recent redologs from incremental backups and archive
 +
      find '/opt/zimbra/backup/sessions/incr-'*'/redologs' \
 +
          '/opt/zimbra/redolog/archive' \
 +
            -name 'redo*.log' -type f -mtime -$param -print 2>/dev/null | \
 +
            sort
 
       ;;
 
       ;;
 
     purge)
 
     purge)
 
       #Remove old archives
 
       #Remove old archives
       find /opt/zimbra/redolog/archive -mtime +1 -exec rm {} \;
+
       find /opt/zimbra/redolog/archive -type f -mtime +$param -exec rm {} \;
 
       ;;
 
       ;;
 
     query_incremental)
 
     query_incremental)
Line 579: Line 890:
 
       if which zmschedulebackup &>/dev/null && \
 
       if which zmschedulebackup &>/dev/null && \
 
           zmschedulebackup -q | \
 
           zmschedulebackup -q | \
           egrep -o "i([[:space:]]+[0-9\*\-]+){5}" &>/dev/null; then
+
           grep -Eo "i([[:space:]]+[0-9\*\-]+){5}" &>/dev/null; then
 
         echo "true"
 
         echo "true"
 
       else
 
       else
Line 622: Line 933:
 
This will rollover if the size of the redo log is over 1KB after 30 mins, which is very likely unless the mail server is not sending or receiving any mail at all during this time.
 
This will rollover if the size of the redo log is over 1KB after 30 mins, which is very likely unless the mail server is not sending or receiving any mail at all during this time.
  
You may want to reduce the zimbraRedoLogRolloverMinFileAge even further while setting and testing up this script just so you don't have to wait too long to see stuff happening between the severs.
+
You may want to reduce the zimbraRedoLogRolloverMinFileAge even further while setting and testing this script just so you don't have to wait too long to see stuff happening between the severs.
  
 
==Mirror Server==
 
==Mirror Server==
 
The mirror server should ideally have the same operating system as the live server and must have exactly the same version of Zimbra installed.
 
The mirror server should ideally have the same operating system as the live server and must have exactly the same version of Zimbra installed.
 +
 +
The hostname ''must'' also be exactly the same.
  
 
=====Install inotify-tools=====
 
=====Install inotify-tools=====
Line 655: Line 968:
  
 
The following rsync command is run on the '''mirror''' server. Substitute the hostname or IP address of the live server as required in the command below.
 
The following rsync command is run on the '''mirror''' server. Substitute the hostname or IP address of the live server as required in the command below.
 +
 +
Note:Copying the sparse files used by LDAP in Zimbra 8+ takes a very long time even though the file is small.
  
 
As user '''root''':
 
As user '''root''':
 
<pre>
 
<pre>
 
service zimbra stop
 
service zimbra stop
rsync -aHz --force --delete live_server:/opt/zimbra/ /opt/zimbra/
+
rsync -aHz --force --delete --sparse live_server:/opt/zimbra/ /opt/zimbra/
 
</pre>
 
</pre>
  
Line 669: Line 984:
  
 
On the '''mirror''' server as user '''root''':
 
On the '''mirror''' server as user '''root''':
<pre>rsync -aHz --force --delete live_server:/opt/zimbra/ /opt/zimbra/</pre>
+
<pre>rsync -aHz --force --delete --sparse live_server:/opt/zimbra/ /opt/zimbra/</pre>
  
 
On the '''live''' server as user '''root''':
 
On the '''live''' server as user '''root''':
Line 682: Line 997:
 
Not only have we copied all the Zimbra data from the live to mirror server, we have also copied the script and SSH keys. We should now be able to try running the script.
 
Not only have we copied all the Zimbra data from the live to mirror server, we have also copied the script and SSH keys. We should now be able to try running the script.
  
As user '''zimbra''':
+
On the '''mirror''' server as user '''zimbra''':
 
<pre>
 
<pre>
 
cd /opt/zimbra/live_sync
 
cd /opt/zimbra/live_sync
Line 692: Line 1,007:
 
<pre>tail -f log/live_sync.log</pre>
 
<pre>tail -f log/live_sync.log</pre>
 
(CTRL-C to exit tail command)
 
(CTRL-C to exit tail command)
 +
 +
===Init script===
 +
I have also created an init script that may be useful for Ubuntu and Redhat/CentOS
 +
==== CentOS/Redhat ====
 +
Copy the below script as /etc/init.d/zimbra_live_sync.
 +
 +
On both the live and the mirror server make the script executable and add the script to chkconfig.
 +
<pre>chmod 755 /etc/init.d/zimbra_live_sync
 +
chkconfig --add zimbra_live_sync</pre>
 +
On the live server make sure it doesn't start on boot.
 +
<pre>chkconfig zimbra_live_sync off</pre>
 +
On the mirror server ensure that Zimbra doesn't start on boot but the live sync script does.
 +
<pre>chkconfig zimbra off
 +
chkconfig zimbra_live_sync_on</pre>
 +
 +
==== Ubuntu ====
 +
Copy the below script as /etc/init.d/zimbra_live_sync.
 +
 +
On both the live and the mirror server make the script executable.
 +
<pre>chmod 755 /etc/init.d/zimbra_live_sync</pre>
 +
On the mirror server ensure Zimbra doens't start on boot but the live sync script does.
 +
<pre>update-rc.d -f zimbra remove
 +
update-rc.d zimbra_live_sync defaults</pre>
 +
 +
==== The init script ====
 +
 +
<pre>
 +
#!/bin/bash
 +
#
 +
# ***** BEGIN LICENSE BLOCK *****
 +
#    This program is free software: you can redistribute it and/or modify
 +
#    it under the terms of the GNU General Public License as published by
 +
#    the Free Software Foundation, either version 3 of the License, or
 +
#    (at your option) any later version.
 +
 +
#    This program is distributed in the hope that it will be useful,
 +
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
 +
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +
#    GNU General Public License for more details.
 +
 +
#    You should have received a copy of the GNU General Public License
 +
#    along with this program.  If not, see <http://www.gnu.org/licenses/>
 +
# ***** END LICENSE BLOCK *****
 +
#
 +
#
 +
# Init file for zimbra live sync
 +
#
 +
# chkconfig: 345 99 01
 +
# description: Zimbra live sync service
 +
#
 +
### BEGIN INIT INFO
 +
# Provides:      zimbra_live_sync
 +
# Required-Start: $network $remote_fs $syslog $time nscd cron
 +
# Required-Stop:  $network $remote_fs $syslog $time
 +
# Default-Start:  3 5
 +
# Description:    Zimbra live sync service
 +
### END INIT INFO
 +
 +
case "$1" in
 +
        gentle-restart)
 +
                su - zimbra -c "live_sync/live_syncd stop"
 +
                su - zimbra -c "live_sync/live_syncd start"
 +
                RETVAL=$?
 +
                ;;
 +
        restart)
 +
                su - zimbra -c "live_sync/live_syncd kill"
 +
                su - zimbra -c "live_sync/live_syncd start"
 +
                RETVAL=$?
 +
                ;;
 +
        start)
 +
                su - zimbra -c "live_sync/live_syncd start"
 +
                RETVAL=$?
 +
                ;;
 +
        gentle-stop)
 +
                su - zimbra -c "live_sync/live_syncd stop"
 +
                RETVAL=$?
 +
                ;;
 +
        stop)
 +
                su - zimbra -c "live_sync/live_syncd kill"
 +
                RETVAL=$?
 +
                ;;
 +
        status)
 +
                su - zimbra -c "live_sync/live_syncd status"
 +
                RETVAL=$?
 +
                ;;
 +
        *)
 +
                echo $"Usage: $0 {start|stop|restart|gentle-stop|gentle-restart|status}"
 +
                RETVAL=1
 +
                ;;
 +
esac
 +
exit $RETVAL
 +
</pre>
  
 
==Failover==
 
==Failover==
  
If the live server fails then the procedure on the mirror server is simply to stop the live_sync script then start Zimbra.
+
If the live server fails then the procedure on the '''mirror''' server is simply to stop the live_sync script then start Zimbra.
  
 
<pre>su - zimbra
 
<pre>su - zimbra
Line 702: Line 1,109:
 
zmcontrol start</pre>
 
zmcontrol start</pre>
  
==Failback==
+
==Fallback==
  
 
Simply run the script on the server to fail back to i.e. live and mirror are now reversed.
 
Simply run the script on the server to fail back to i.e. live and mirror are now reversed.
  
As user '''zimbra''':
+
As user '''zimbra''' (on ex-live server to be restored back to live):
 
<pre>
 
<pre>
 
cd /opt/zimbra/live_sync
 
cd /opt/zimbra/live_sync
Line 723: Line 1,130:
  
 
==Warm or very warm standby==
 
==Warm or very warm standby==
To replay redo logs only requires that the mailbox process is stopped. This is done automatically by the script. The script will work whether Zimbra has been started on the mirror or not as it will enable or disable services as and when it needs them. Keep the rest of Zimbra running will drastically reduce the time it takes to fail over. This is only an advantage when access to the server domain can be quickly flipped or has a failover mechanism.
+
To replay redo logs only requires that the mailbox process is stopped. This is done automatically by the script. The script will work whether Zimbra has been started on the mirror or not as it will enable or disable services as and when it needs them. Keeping the rest of Zimbra running will drastically reduce the time it takes to fail over. This is only an advantage when access to the server domain can be quickly flipped or has a failover mechanism.
  
 
==How and why it all works==
 
==How and why it all works==
Line 734: Line 1,141:
 
If the redolog can be piped to a mirror server in real time then all the mirror server has to do is keep replaying the logs every so often and it will keep the same state as the live server. The only other thing to keep up to date is the LDAP database. Fortunately, the LDAP database doesn't change that often so it is quite easy to keep it synced on a directory level.
 
If the redolog can be piped to a mirror server in real time then all the mirror server has to do is keep replaying the logs every so often and it will keep the same state as the live server. The only other thing to keep up to date is the LDAP database. Fortunately, the LDAP database doesn't change that often so it is quite easy to keep it synced on a directory level.
  
The easiest way to transfer the redologs is to use rsync. The only problem with that is that rsync does not run continuously. It also won't handle the archiving of redo.log very efficiently. When redo.log is renamed and moved to the archive rsync will delete it at the remote location then transfer it all over again to its new location in the archive. If we can catch this move taking place then we can move and rename the file on the mirror server before running rsync. Then rsync has very little to do, in theory, nothing. Another issue with rsync is that the file may be in the process of being written to when it is copied. This results in an incomplete file at the mirror. However, redologs are only ever appended to and so only the last record will be corrupted. Zimbra is designed to be tolerant of redolog corruption otherwise it would be of limited use as a disaster recovery tool.
+
The easiest way to transfer the redologs is to use rsync. The only problem with that is that rsync does not run continuously. It also won't handle the archiving of redo.log very efficiently. When redo.log is renamed and moved to the archive rsync will delete it at the remote location then transfer it all over again to its new location in the archive. If we can catch this move taking place then we can move and rename the file on the mirror server before running rsync. Then rsync has very little to do, in theory, nothing except delete any files that have been purged. Another issue with rsync is that the file may be in the process of being written to when it is copied. This results in an incomplete file at the mirror. However, redologs are only ever appended to and so only the last record will be corrupted. Zimbra is designed to be tolerant of redolog corruption otherwise it would be of limited use as a disaster recovery tool.
 +
 
 +
To keep the redolog live, the "tail -f" command is used over ssh to pipe the file to the mirror. By calling "tail -f -c +0" it tails right back to the zeroth byte of the file, effectively a copy-then-stream command.
 +
 
 +
=====Redolog purging=====
 +
If a Network edition is detected and incremental backups are enabled then the redologs are replayed before any rsync is performed as well as after. This ensures that everything is replayed before the files all disappear to the backup directory.
  
To keep the redolog live, the "tail -f" command is used over ssh to pipe the file to the mirror. By call "tail -f -c +0" it tails right back to the zeroth byte of the file, effectively a copy-then-stream command.
+
For the open source edition, or Network edition with no incremental backups scheduled, the redologs are purged if they are more than a day old and have been replayed. If the mirror server is down then the redologs will just accumulate on the live server to be replayed when the sync process is restarted before being purged.
  
 
=====LDAP=====
 
=====LDAP=====
LDAP stores it's data in /opt/zimbra/data/ldap. This can be copied using rsync to the mirror as long as no changes take place during the copy. The directory is monitored for this during the rsync operation and repeated if there was any change during that time. I am slightly concerned as to how this may pan out on really busy servers but seems viable on the low hundreds of users that I have tested.
+
LDAP stores it's data in /opt/zimbra/data/ldap. This can be copied using rsync to the mirror as long as no changes take place during the copy. On versions of Zimbra older than 8.0, the directory is monitored for this during the rsync operation and repeated if there was any change during that time. For Zimbra 8+ the directory is monitored as before but changes are transfered using an LDIF export and import. This is necessitated by the long time it takes to rsync the sparse files that LDAP now uses to store data. Yes, it would be neater for old versions to use the LDIF method too but it wasn't broke so I didn't risk fixing it yet.
 +
 
 +
==Known issues==
 +
* If the connection breaks at the very moment that the live stream of the redo.log starts, before the tail command reaches the point where it is tailing instead of cataloguing the file, then some of the redo.log will not make it to the mirror resulting in some loss of transactions. Fortunately, this is only ever likely very just after the log has rolled over so the worst-case losses should be minimal.
 +
* LDAP is only checked every ten minutes so some losses are possible if the connection breaks in that time. However, LDAP isn't expected to change very often unless something major like a batch account migration is taking place.
 +
* If any redologs go missing, or can't be replayed successfully for any reason, then there will be gaps synced email events. Mail may go missing on the mirror server. Check the live_sync logs for any log sequence numbers that appear not to have been transferred and processed. In the case of suspected data loss, stop all services and repeat the initial rsync process.
 +
 
 +
==Nagios Integration==
 +
The script now generates useful status information for Nagios. The time since the last successful operations is measured and Nagios can raise an alert if the any part of the script appears to have not been successful for a longer than expected time.
 +
 
 +
The Nagios script is to be run on the currently Live server and can be run as any user as long as that user has read access to the status files created by live_syncd. These are in /opt/zimbra/live_sync/status.
 +
 
 +
<pre>
 +
#!/bin/bash
 +
#    This program is free software: you can redistribute it and/or modify
 +
#    it under the terms of the GNU General Public License as published by
 +
#    the Free Software Foundation, either version 3 of the License, or
 +
#    (at your option) any later version.
 +
 
 +
#    This program is distributed in the hope that it will be useful,
 +
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
 +
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +
#    GNU General Public License for more details.
 +
 
 +
#    You should have received a copy of the GNU General Public License
 +
#    along with this program.  If not, see <http://www.gnu.org/licenses/>
 +
 
 +
##########################################################################
 +
# Title      :  check_zimbra_live_sync
 +
# Author    :  Simon Blandford <simon -at- onepointltd -dt- com>
 +
# Date      :  2012-08-25
 +
# Requires  :  live_syncd
 +
# Category  :  Administration
 +
# Version    :  2.0.0
 +
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
 +
# License    :  GPLv3 (see above)
 +
##########################################################################
 +
# Description
 +
# Nagios plug-in for Zimbra live sync script
 +
##########################################################################
 +
 
 +
#******************************************************************************
 +
#********************** Constants *********************************************
 +
#******************************************************************************
 +
 
 +
ZIMBRA_DIR="/opt/zimbra"
 +
BASE_DIR="$ZIMBRA_DIR""/live_sync"
 +
STATUS_DIR="$BASE_DIR""/status"
 +
 
 +
#Files that need age testing
 +
LAST_GOOD_REDO_REPLAY="$STATUS_DIR""/last_good_redo_replay"
 +
LAST_GOOD_REDO_SYNC="$STATUS_DIR""/last_good_redo_sync"
 +
LAST_GOOD_REDO_STREAM="$STATUS_DIR""/last_good_redo_stream"
 +
LAST_GOOD_LDAP_SYNC="$STATUS_DIR""/last_good_ldap_sync"
 +
LAST_GOOD_LDAP_START="$STATUS_DIR""/last_good_ldap_start"
 +
 
 +
#******************************************************************************
 +
#********************** Functions *********************************************
 +
#******************************************************************************
 +
 
 +
usage () {
 +
  echo "Usage: check_zimbra_live_sync -w <hours> -c <hours>"
 +
  exit 3
 +
}
 +
 
 +
file_age () {
 +
  echo $(( ($( date +%s) - $( stat -c "%Y" "$1" )) / 3600 ))
 +
}
 +
 
 +
#Extract name of function being tested from file name
 +
function_id () {
 +
  echo "$1" | grep -Eo "[^_]*_[^_]*$"
 +
}
 +
 
 +
file_report () {
 +
  local file_under_test
 +
  file_under_test="$1"
 +
  age_of_file_under_test=$( file_age "$file_under_test" )
 +
 
 +
  #No file returns UNKOWN status
 +
  if [ ! -f "$file_under_test" ]; then
 +
    if [ ${#affected_function_list} -gt 1 ] && [ $status_code -eq 3 ]; then
 +
      affected_function_list="$affected_function_list"",""$( function_id "$file_under_test" )"
 +
    else
 +
      affected_function_list="$( function_id "$file_under_test" )"
 +
    fi
 +
    status_code=3
 +
    status_msg="Unknown"
 +
    return
 +
  fi
 +
 
 +
  #Test for files older than critical number of seconds. Only test if no unknown status
 +
  if [ $status_code -lt 3 ]; then
 +
    if [ $age_of_file_under_test -ge $c ]; then
 +
      if [ ${#affected_function_list} -gt 1 ] && [ $status_code -eq 2 ]; then
 +
        affected_function_list="$affected_function_list"",""$( function_id "$file_under_test" )"
 +
        affected_function_list="$affected_function_list""(""$age_of_file_under_test"" hours)"
 +
      else
 +
        affected_function_list="$( function_id "$file_under_test" )""(""$age_of_file_under_test"" hours)"
 +
      fi
 +
      status_code=2
 +
      status_msg="Critical"
 +
      return 
 +
    fi
 +
  fi
 +
 
 +
  #Test for files older than warning number of seconds. Only test if no unknown or critical status
 +
  if [ $status_code -lt 2 ]; then
 +
    if [ $age_of_file_under_test -ge $w ]; then
 +
      if [ ${#affected_function_list} -gt 1 ] && [ $status_code -eq 1 ]; then
 +
        affected_function_list="$affected_function_list"",""$( function_id "$file_under_test" )"
 +
        affected_function_list="$affected_function_list""(""$age_of_file_under_test"" hours)"
 +
      else
 +
        affected_function_list="$( function_id "$file_under_test" )""(""$age_of_file_under_test"" hours)"
 +
      fi
 +
      status_code=1
 +
      status_msg="Warning"
 +
      return 
 +
    fi
 +
  fi
 +
}
 +
 
 +
#******************************************************************************
 +
#********************** Main Program ******************************************
 +
#******************************************************************************
 +
 
 +
status_code=0
 +
status_msg="OK"
 +
affected_function_list="Live sync up to date"
 +
 
 +
for i in $@; do
 +
  if [ $get_w ]; then
 +
    w=$( echo "$i" | grep -Eo "[0-9]+" );
 +
    unset get_w
 +
  fi
 +
  if [ $get_c ]; then
 +
    c=$( echo "$i" | grep -Eo "[0-9]+" );
 +
    unset get_c
 +
  fi
 +
  [ "x""$i" == "x-w" ] && get_w=1
 +
  [ "x""$i" == "x-c" ] && get_c=1
 +
done
 +
 
 +
if [ ! $w ] || [ ! $c ]; then
 +
  usage
 +
fi
 +
 
 +
file_report "$LAST_GOOD_REDO_REPLAY"
 +
file_report "$LAST_GOOD_REDO_SYNC"
 +
file_report "$LAST_GOOD_REDO_STREAM"
 +
file_report "$LAST_GOOD_LDAP_SYNC"
 +
file_report "$LAST_GOOD_LDAP_START"
 +
 
 +
echo "$status_msg"" : ""$affected_function_list"
 +
exit $status_code
 +
</pre>
 +
 
 +
I use the following line in /etc/nagios/nrpe.cfg or in the /etc/nagios/nrpe.d directory to allow the script to be called by nrpe.
 +
<pre>command[check_zimbra_live_sync]=/usr/lib/nagios/plugins/contrib/check_zimbra_live_sync -w $ARG1$ -c $ARG2$</pre>
 +
You may need to adjust the path to the nagios plugin script depending on where the contributed scripts are stored.
 +
 
 +
==Change log==
 +
<pre>
 +
Version 2.1.5 2016-05-18
 +
 
 +
1) egrep and fgrep is deprecated. Change to grep -E and grep -F where used.
 +
 
 +
Version 2.1.4 2013-03-12
 +
 
 +
1) HSM detection needs ldap, mailbox and mysql. Now done once at start.
 +
 
 +
Version 2.1.2 2013-03-12
 +
 
 +
1) Added check to ensure mailboxd is kept running if HSM is being used
 +
 
 +
2) Suspend ldap/mailbox updates while HSM job is running
 +
 
 +
Version 2.1.1 2013-02-12
 +
 
 +
1) Change grep -o "[0-9]*" to grep -o "[0-9]+" for RHEL in sync_commands
 +
 
 +
Version 2.1.0 2013-02-11
 +
 
 +
1) Compatible with Zimbra version 8
 +
 
 +
2) Kill hanging sync_commands process on live server
 +
 
 +
3) Rename LDAP_DIR constant to LDAP_TEMP_DIR
 +
 
 +
4) Make rsync of ldap database aware of sparse files
 +
 
 +
5) For Zimbra version 8 and above use ldif export/import instead of rsync
 +
 
 +
Version 2.0.1 2013-19-12
 +
 
 +
1) Preserve mailboxd runing state to allow warmer standby and functioning HSM
 +
 
 +
2) Don't refuse to start because of stale pid files after a system reboot
 +
 
 +
 
 +
Version 2.0.0 2012-08-29
 +
 
 +
1) Started changes log :)
 +
 
 +
2) Constants in upper case, variables in lower case
 +
 
 +
3) Exclude ldap log changes from inotify watch
 +
 
 +
4) Nagios plug-in to monitor time since last successful operation
 +
 
 +
5) Redologs are archived once successfully replayed
 +
 
 +
6) Redologs are retrieved from incremental backups on Zimbra Network edition
 +
 
 +
7) Redologs are purged independently after specified time on both servers
 +
 
 +
8) Magic path names and numbers now defined in constants
 +
 
 +
9) Log reporting by level and with filtering
 +
 
 +
10) Script aware of the java redolog replay process
 +
 
 +
11) convertd runs during redolog replay if enabled so indexing will work
 +
</pre>
  
 
[[Category:Backup and Restore]]
 
[[Category:Backup and Restore]]
 +
[[Category:Disaster Recovery]]
 +
[[Category:Administration]]

Latest revision as of 11:35, 6 November 2018


Attention.png

This article describes the steps to move a ZCS server to a new physical or virtual server. This wiki article is NOT supported by the Zimbra Support team for Network Edition Customers. The only two supported method to follow are the Network_Edition_Disaster_Recovery and Ajcody-Notes-Server-Move wiki pages. Server moves not following these two wiki pages will not be supported by the Zimbra Support team.

Zimbra version and platform

This script was developed and tested on Release 7 and 8, both open source and network editions on Redhat/CentOS 5 and Ubuntu 10.04LTS and 12.04LTS.


The original author has discontinued his support to the solution. Development has been taken over and available on GitLab: https://gitlab.com/yetopen/zimbra-live-sync

Introduction

This is an experimental solution to providing near-live synchronisation between two Zimbra servers so that one of them is live and the other is kept in a warm or very warm standby state.

The system is symmetrical. The sync can work in reverse when the mirror server becomes the active server. This allows easy fall-back to the original server once the failover condition is resolved.

Features:

  • LDAP, message store, indexes and metadata kept in sync
  • Mirror server can be brought online in a few minutes
  • Live sync of redolog
  • Minimum bandwidth, only changes to ldap and redolog transfered
  • All communication and data over SSH with unique key
  • Operates as "zimbra" user
  • Works on both Open Source or Network edition
  • Sync can work in either direction but only one way at a time

Preparation

Exactly the same version of Zimbra must be installed on both the live and mirror server. To start with we work on the live server. There is no need to stop Zimbra for most of the install. Only a short amount of down-time will need to be scheduled later to perform a final rsync operation between the two servers.

Install inotify-tools

For Redhat/Centos this is...

As user root:

yum install -y inotify-tools

For Ubuntu this is...

As user root

apt-get install inotify-tools
Create log rotation

The script will create a log file which can be handled by logrotate.

As user root:

echo "/opt/zimbra/live_sync/log/live_sync.log {
    daily
    missingok
    copytruncate
    rotate 7
    notifempty
    compress
}">/etc/logrotate.d/zimbra_live_sync
Create application directory

The script will live under the /opt/zimbra directory.

As user root:

mkdir /opt/zimbra/live_sync
chown zimbra.zimbra /opt/zimbra/live_sync
SSH key

Create the SSH key and just press return every time you are prompted for a passphrase.

As user zimbra:

cd /opt/zimbra/.ssh
ssh-keygen -b 4096 -f live_sync
echo "command=\"/opt/zimbra/live_sync/sync_commands\" $( cat live_sync.pub )">>authorized_keys

Main script

The following script should be saved as live_syncd in the /opt/zimbra/live_sync directory. This should be owned by user zimbra and made executable.

#!/bin/bash
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.

#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.

#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>

##########################################################################
# Title      :  live_syncd
# Author     :  Simon Blandford <simon -at- onepointltd -dt- com>
# Date       :  2013-03-12
# Requires   :  zimbra sync_commands inotify-tools
# Category   :  Administration
# Version    :  2.1.5
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
# License    :  GPLv3 (see above)
##########################################################################
# Description
# Keep two Zimbra servers synchronised in near-realtime
##########################################################################


#******************************************************************************
#********************** Constants *********************************************
#******************************************************************************

LOG_LEVEL=5
REDO_LOG_HISTORY_DAYS=10
ERROR_CLEAR_MINUTES=10
LDAP_CHECK_MINUTES_INTERVAL=10

ZIMBRA_DIR="/opt/zimbra"
BASE_DIR="$ZIMBRA_DIR""/live_sync"
LOCKING_DIR="$BASE_DIR""/lock"
PID_DIR="$BASE_DIR""/pid"
LOG_DIR="$BASE_DIR""/log"
LOG_FILE="$LOG_DIR""/live_sync.log"
LDAP_TEMP_DIR="$BASE_DIR""/ldap"
LDAP_TEMP_LDIF="$BASE_DIR""/ldif.bak"
STATUS_DIR="$BASE_DIR""/status"
SSH_IDENTITY_FILE="$ZIMBRA_DIR""/.ssh/live_sync"
REDOLOG_DIR="$ZIMBRA_DIR""/redolog"
REDO_LOG_FILE="$REDOLOG_DIR""/redo.log"
ARCHIVE_DIR="$REDOLOG_DIR""/archive"
LIVE_SYNC_ARCHIVE_DIR="$REDOLOG_DIR""/live_sync_archives"
LDAP_DATA_DIR="$ZIMBRA_DIR""/data/ldap/"
BACKUP_DIR="$ZIMBRA_DIR""/backup"
SYNC_COMMANDS_SCRIPT="$BASE_DIR""/sync_commands"
SSH="ssh -i ""$SSH_IDENTITY_FILE"" -o StrictHostKeyChecking=no -o CheckHostIP=no"\
" -o PreferredAuthentications=hostbased,publickey"
LOCK_STATE_DIR="$LOCKING_DIR""/live_sync.lock"
STOP_FILE="$STATUS_DIR""/live_sync.stop"
LAST_GOOD_REDO_REPLAY="$STATUS_DIR""/last_good_redo_replay"
LAST_GOOD_REDO_SYNC="$STATUS_DIR""/last_good_redo_sync"
LAST_GOOD_REDO_STREAM="$STATUS_DIR""/last_good_redo_stream"
LAST_GOOD_LDAP_SYNC="$STATUS_DIR""/last_good_ldap_sync"
LAST_GOOD_LDAP_START="$STATUS_DIR""/last_good_ldap_start"
WATCHES_FILE="$STATUS_DIR""/watches"
PID_FILE_LDAP="$PID_DIR""/ldap_live_sync.pid"
PID_FILE_REDO="$PID_DIR""/redo_log_live_sync.pid"
CONF_FILE="$BASE_DIR""/live_sync.conf"

#******************************************************************************
#********************** Functions *********************************************
#******************************************************************************

#Format for log output with errors and warnings going to >&2
logit () {
  logit_1 () {
    echo -n "$( date ) :"
    case $ in
      1)
        echo -n "Error :"
        ;;
      2)
        echo -n "Warning :"
        ;;
      3)
        echo -n "Info :"
        ;;
      4)
        echo -n "Debug :"
        ;;
    esac
    echo $@
  }
  local msg_level output_chan
  if [ $1 -le $LOG_LEVEL ]; then
    msg_level=$1
    shift
    if [ $msg_level -le 2 ]; then
      logit_1 $@ >&2
    else
      logit_1 $@
    fi
  fi
}

#Detect HSM
detect_hsm () {
  local retval
  #LDAP must be running
  ldap status &>/dev/null || ldap start &>/dev/null
  #MySQL must be running
  mysql.server status &>/dev/null || mysql.server start &>/dev/null
  #Preserve mailbox running state
  zmmailboxdctl status &>/dev/null
  prev_zmmailbox_status=$?
  zmmailboxdctl start &>/dev/null
  zmvolume -l | grep "type: secondaryMessage" >/dev/null
  retval=$?
   if [ $prev_zmmailbox_status -ne 0 ]; then
    zmmailboxdctl stop &>/dev/null
  fi
  return $retval
}

#Ensure ldap, convertd and mysql servers are running and then replay redo logs
replay_redo_logs () {
  local server_failed

  ldap status &>/dev/null || ldap start &>/dev/null
  mysql.server status &>/dev/null || mysql.server start &>/dev/null
  server_failed=0
  if ! ldap status &>/dev/null; then
    logit 1 "Start of local ldap server failed"
    ldap status >&2
    #Return error to trigger a "break" in while loop
    server_failed=1
  fi
  if ! mysql.server status &>/dev/null; then
    logit 1 "Start of local mysql server failed"
    mysql.server status >&2
    #Return error to trigger a "break" in while loop
    server_failed=1
  fi
  if [ "x""$convertd_enabled" == "xtrue" ]; then
    #Make sure indexing works while replaying redo log 
    zmconvertctl status &>/dev/null || zmconvertctl start &>/dev/null
    if ! zmconvertctl status &>/dev/null; then
      logit 2 "Start of local convertd servers failed"
      zmconvertctl status >&2
    fi
  fi
  [ $server_failed -eq 1 ] && return 1
  logit 3 "Replaying redologs..."
  if ! zmplayredo >/dev/null; then
    logit 2 "Replay of redolog failed"
    #No error returned here since "break" is not necessary
  else
    #If no errors then archive redo log files
    if ! mkdir -p "$LIVE_SYNC_ARCHIVE_DIR"; then
      logit 1 "Unable to create directory $LIVE_SYNC_ARCHIVE_DIR"
      exit 1
    fi
    mv -f "$ARCHIVE_DIR""/"* "$LIVE_SYNC_ARCHIVE_DIR""/" 2>/dev/null
    touch "$LAST_GOOD_REDO_REPLAY"
  fi
  logit 3 "Replaying redologs done"
  return 0
}

#The redo log sync daemon
redo_log_live_sync () {
  local stream_pid archived_file i archived_redo_log_file prev_zmmailbox_status secondary_storage

  logit 3 "Starting redo log live sync process"

  #Wait for lock directory to be successfully created
  while ! mkdir "$LOCK_STATE_DIR" &>/dev/null; do
    sleep 2
  done
  logit 3 "Detecting if HSM used"
  if detect_hsm; then
    logit 3 "HSM Detected"
    secondary_storage="yes"
  else
    logit 3 "No HSM Detected"
  fi
  rmdir "$LOCK_STATE_DIR"
  
  while [ ! -f "$STOP_FILE" ]; do
    while [ ! -f "$STOP_FILE" ]; do
      #Wait for lock directory to be successfully created
      while ! mkdir "$LOCK_STATE_DIR" &>/dev/null; do
        sleep 2
      done
      [ -f "$STOP_FILE" ] && break
      logit 3 "Syncing redologs..."
      #If incremental backups are enabled then gather redo logs from backups and copy
      #to local archive directory
      redo_sync_fail="false"
      for archived_redo_log_file in $( echo "gather""$REDO_LOG_HISTORY_DAYS" | \
          $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" ); do
        if [ -f "$LIVE_SYNC_ARCHIVE_DIR""/""$( basename "$archived_redo_log_file" )" ]; then
          logit 4 "Already processed so skipping: $archived_redo_log_file"
        else
          logit 4 "Syncing incremental backup file: $archived_redo_log_file"
          if ! rsync -z -e "$SSH" --size-only "$remote_address"":""$archived_redo_log_file" \
              "$ARCHIVE_DIR""/".; then
            logit 2 "Rsync of a redolog, $archived_redo_log_file, failed"
            redo_sync_fail="true"
          fi
        fi
      done

      #Suspend if HSM is running
      if which zmhsm >/dev/null && zmhsm -u | grep "Currently running" >/dev/null; then
        logit 3 "Replaying redologs is suspended while HSM process is active"
      else
      
        #Mailbox process must not be running now. Preserve state and stop.
        zmmailboxdctl status &>/dev/null
        prev_zmmailbox_status=$?
        if [ $prev_zmmailbox_status -eq 0 ]; then
          zmmailboxdctl stop &>/dev/null
        fi
        sleep 2
        if zmmailboxdctl status &>/dev/null; then
          logit 1 "Unable to stop local Zimbra mailbox service"
          return 1
        fi
      
        logit 4 "Syncing $REDO_LOG_FILE"
        if ! rsync -e "$SSH" -z \
          "$remote_address"":$REDO_LOG_FILE" "$REDO_LOG_FILE"; then
          logit 2 "Rsync of $REDO_LOG_FILE failed"
          redo_sync_fail="true"
        fi
        logit 4 "Syncing $REDO_LOG_FILE done"
        if [ "x""$redo_sync_fail" == "xfalse" ]; then
          touch "$LAST_GOOD_REDO_SYNC"
        else
          break
        fi
        logit 4 "Syncing redologs done"
        logit 4 "Purging redolog directory and archives"
        #Purge local redolog directory
        find $REDOLOG_DIR -mtime +$REDO_LOG_HISTORY_DAYS -type f -exec rm {} \;
        #Purge any interrupted rsync files
        find $REDOLOG_DIR -name '.redo*' -type f -exec rm {} \;
        logit 4 "Purge redolog directory and archives done"
        replay_redo_logs || break
      
        #Restore mailboxd to previous running state or start if HSM is being used
        if [ $prev_zmmailbox_status -eq 0 ] || \
            [ "x""$secondary_storage" == "xyes" ] >/dev/null; then
          logit 4 "Re-starting Zimbra mailbox service"
          zmmailboxdctl start &>/dev/null
          if ! zmmailboxdctl status &>/dev/null; then
            logit 2 "Unable to re-start local Zimbra mailbox service"
          fi
        fi
      fi
      
      #If there are no incremental backups then remote archive directory will need purging
      if [ "x""$incremental_backups" != "xtrue" ]; then
        logit 4 "Purging remote redolog directory"
        echo "purge""$REDO_LOG_HISTORY_DAYS" | \
          $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT"
        logit 4 "Purging remote redolog directory done"
      fi
      #Establish copy-and-live-stream of current redo.log file
      logit 4 "Live streaming redolog"
      echo stream | \
        $SSH "$remote_address" \
        "$SYNC_COMMANDS_SCRIPT" >"$REDO_LOG_FILE" &
      stream_pid=$!
      disown $stream_pid
      #Delay as PID was sometimes not being found if checked immediately
      sleep 5
      #If successfully established stream then sit and wait for move to archive
      if ps $stream_pid | grep "$SYNC_COMMANDS_SCRIPT" &>/dev/null; then
        logit 4 "Live streaming redolog established"
        touch "$LAST_GOOD_REDO_STREAM"
        #Remove lock file, this is resting point
        rmdir "$LOCK_STATE_DIR" &>/dev/null
        #Wait for name to be passed of new archive file after redo.log is moved on remote server
        #This is normal resting point of this process
        archived_file=$( echo wait_redo | \
          $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" | \
          tail -n 1 | grep -Eo "redo-.*log" )
        #Kill stream
        kill -KILL $( ps aux | grep "$SYNC_COMMANDS_SCRIPT" | \
          grep -v grep | awk '{print $2}' ) &>/dev/null
        #Mirror move operation on local server
        if echo "$archived_file" | grep -E "redo-.*log" &>/dev/null; then
          logit 4 "Moving redo.log to $archived_file"
          mv -f "$REDO_LOG_FILE" "$ARCHIVE_DIR""/""$archived_file" 2>/dev/null
        else
          logit 2 "Archive file name not found"
        fi
        [ -f "$STOP_FILE" ] && break
      else
        logit 2 "Failed to start redolog streaming, PID=$stream_pid"
        break
      fi
    done
    rmdir "$LOCK_STATE_DIR" &>/dev/null
    #Wait $ERROR_CLEAR_MINUTES minutes for error to error to clear
    i=0
    while [ $(( i++ )) -lt 60 ] && [ ! -f "$STOP_FILE" ]; do
      sleep $ERROR_CLEAR_MINUTES
    done
  done
  logit 3 "Ending redo log live sync process"
}

#The ldap sync daemon
ldap_live_sync () {
  local ldap_wait_pid i last_ldap_success_state

  last_ldap_success_state="false"
  
  logit 3  "Starting ldap live sync process"
  while [ ! -f "$STOP_FILE" ]; do
    while [ ! -f "$STOP_FILE" ]; do
      #Wait for lock directory to be successfully created
      while ! mkdir "$LOCK_STATE_DIR" &>/dev/null; do
        sleep 3
      done
      if [ $zimbra_version -lt 8 ]; then
        logit 3 "Syncing ldap using rsync"
        #Use rsync for Zimbra older than verion 8
        while [ 1 ]; do
          #Check for changes during ldap sync operation
          echo wait_ldap | \
            $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" &>"$WATCHES_FILE" &
          ldap_wait_pid=$!
          disown $ldap_wait_pid
          if ! ps "$ldap_wait_pid" &>/dev/null; then
            logit 2 "Unable to establish watch on remote LDAP directory, no ldap sync performed"
            break
          fi
          #Wait for watches to be established
          while ! grep "established" "$WATCHES_FILE" &>/dev/null && \
              ps "$ldap_wait_pid" &>/dev/null; do
            sleep 1
          done
          #Echo out status
          cat "$WATCHES_FILE"
          rm -f "$WATCHES_FILE"
          
          
          #Rsync remote server to temporary local ldap directory
          if ! rsync -e "$SSH" -aHz --sparse --force --delete \
            "$remote_address"":$LDAP_DATA_DIR""/" "$LDAP_TEMP_DIR""/"; then
            logit 2 "Rsync of ldap failed"
            break
          else
            touch "$LAST_GOOD_LDAP_SYNC"
          fi
          ps $ldap_wait_pid &>/dev/null && break
          logit 3 "Ldap changed during rsync. Re-syncing."
          sleep 10
        done
        kill -KILL $ldap_wait_pid &>/dev/null
      else
        #Use ldif export for Zimbra 8 and over
        logit 3 "Syncing ldap using ldif"
        if ! echo dump_ldap | \
          $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" >"$LDAP_TEMP_LDIF"; then
          logit 2 "Unable to fetch remote LDIF, no LDAP sync performed"
          break
        else
          touch "$LAST_GOOD_LDAP_SYNC"
        fi
      fi
      if which zmhsm >/dev/null && zmhsm -u | grep "Currently running" >/dev/null; then
        logit 3 "LDAP update is suspended while HSM process is active"
      else
        #Stop ldap
        ldap status &>/dev/null && ldap stop &>/dev/null
        if ldap status &>/dev/null; then
          logit 1 "Unable to stop local ldap server"
          break
        fi
        if [ $zimbra_version -lt 8 ]; then
          #Use rsync for Zimbra older than verion 8
          #rsync temporary local ldap directory to real local ldap directory
          rsync -aH --sparse "$LDAP_TEMP_DIR""/" "$LDAP_DATA_DIR""/"
        else
          #Use LDIF import for Zimbra 8 and over
          rm -rf "$LDAP_DATA_DIR""/mdb" && \
          mkdir -p "$LDAP_DATA_DIR""/mdb/db" && \
          mkdir -p "$LDAP_DATA_DIR""/mdb/log" && \
          /opt/zimbra/libexec/zmslapadd "$LDAP_TEMP_LDIF"
          if [ $? != 0 ]; then
            logit 2 "Unable to import LDIF into local LDAP"
            break
          fi
        fi
        #Restart ldap
        ldap status &>/dev/null || ldap start &>/dev/null
        if ! ldap status &>/dev/null; then
          logit 1 "Unable to restart local ldap server"
          last_ldap_success_state="false"
        else
          last_ldap_success_state="true"
        fi
        logit 4 "Syncing LDAP done"
      fi
      rmdir "$LOCK_STATE_DIR" &>/dev/null
      [ -f "$STOP_FILE" ] && break
      #Wait for change in remote ldap over $LDAP_CHECK_MINUTES_INTERVAL intervals
      echo wait_ldap | \
        $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" &
      ldap_wait_pid=$!
      disown $ldap_wait_pid
      while [ ! -f "$STOP_FILE" ]; do
        logit 4 "Start new LDAP monitor period"
        #Repeat last ldap success so that no ldap change is not
        #interpreted by Nagios as no ldap success.
        if [ "x""$last_ldap_success_state" == "xtrue" ]; then
          touch "$LAST_GOOD_LDAP_START"
        fi
        #Restart wait for ldap change if required
        if ! ps $ldap_wait_pid &>/dev/null; then
          echo wait_ldap | \
            $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" &
          ldap_wait_pid=$!
          disown $ldap_wait_pid
        fi
        #Wait $LDAP_CHECK_MINUTES_INTERVAL minutes
        i=0
        while [ $(( i++ )) -lt 60 ] && [ ! -f "$STOP_FILE" ]; do
          sleep $LDAP_CHECK_MINUTES_INTERVAL
        done
        #If wait process is not still running then there was a change
        ps $ldap_wait_pid &>/dev/null || break
      done
    done
    rmdir "$LOCK_STATE_DIR" &>/dev/null
    #Wait $ERROR_CLEAR_MINUTES minutes for error to error to clear
    i=0
    while [ $(( i++ )) -lt 60 ] && [ ! -f "$STOP_FILE" ]; do
      sleep $ERROR_CLEAR_MINUTES
    done
  done
  logit 3 "Ending ldap live sync process"
}

get_zimbra_config_globals () {
  #Query whether incremental backups are enabled
  incremental_backups=$( echo "query_incremental" | \
    $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" )
    
  #Query whether convertd is installed and enabled
  ldap status &>/dev/null || ldap start &>/dev/null
  if ! ldap status &>/dev/null; then
    logit 1 "Unable to start local ldap server"
    exit 1
  fi
  if [ $( zmprov -l  gs `zmhostname` | \
          grep -E "(zimbraServiceInstalled|zimbraServiceEnabled):[[:space:]]*convertd" | \
          wc -l  ) -eq 2 ]; then
    convertd_enabled="true"
  else
    convertd_enabled="false"
  fi
}

kill_everything () {
  touch "$STOP_FILE"
  kill -KILL $( head -n 1 "$PID_FILE_LDAP" 2>/dev/null ) &>/dev/null
  kill -KILL $( head -n 1 "$PID_FILE_REDO" 2>/dev/null ) &>/dev/null
  kill -KILL $( ps aux | grep "live_syncd start" | grep -v grep | awk '{print $2}' ) &>/dev/null
  kill -KILL $( ps aux | grep "redo_log_live_sync" | grep -v grep | awk '{print $2}' ) &>/dev/null
  kill -KILL $( ps aux | grep "ldap_live_sync" | grep -v grep | awk '{print $2}' ) &>/dev/null
  kill -KILL $( ps aux | \
    grep "$SYNC_COMMANDS_SCRIPT" | grep -v grep | awk '{print $2}' ) &>/dev/null
  kill -KILL $( ps aux | grep "rsync" | grep -E "$REDOLOG_DIR""|""$LDAP_DATA_DIR""|""$BACKUP_DIR" | \
    awk '{print $2}' ) &>/dev/null
  #Kill redolog playback if running
  kill -KILL $( ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep | \
    awk '{print $2}' ) &>/dev/null
  rm -f "$STOP_FILE"
  rm -f "$PID_FILE_LDAP"
  rm -f "$PID_FILE_REDO"
  rmdir "$LOCK_STATE_DIR" &>/dev/null
}

quitting () {
  echo "Quitting"
  #Kill any hanging processes
  kill_everything
  trap - INT TERM SIGINT SIGTERM
  echo 'kill -KILL $( ps aux | grep live_syncd | grep -v grep | awk '"'"'{print $2}'"'"' ) &>/dev/null' | \
    at now && sleep 1 && rmdir "$LOCK_STATE_DIR" &>/dev/null
  exit
}


#******************************************************************************
#********************** Main Program ******************************************
#******************************************************************************

if [ "$( whoami )" != "zimbra" ]; then
  echo "Must run as zimbra user" >&2
  exit 1
fi

mkdir -p "$LOCKING_DIR"
mkdir -p "$PID_DIR"
mkdir -p "$LOG_DIR"
mkdir -p "$LDAP_TEMP_DIR"
mkdir -p "$STATUS_DIR"
chmod 755 "$STATUS_DIR"

if [ ! -f "$CONF_FILE" ]; then
  echo "Configuration file, $CONF_FILE, not found" >&2
  exit 1
fi

source "$CONF_FILE"

#Find all local addresses
server_addresses=$( /sbin/ifconfig | grep inet | \
  grep -Eio "addr:[[:space:]]*(([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5})" | \
  sed "s/addr://" | tr -d " \t" )

#Check configured server addresses are valid
if ! echo "$server1" | \
    grep -Ei "([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5}" &>/dev/null; then
  echo "No valid IP address found for server1 in configuration file" >&2
  exit 1
fi
if ! echo "$server2" | \
    grep -Ei "([0-9]+\.){3}[0-9]+|[0-9a-f]+(:[0-9a-f]*){5}" &>/dev/null; then
  echo "No valid IP address found for server2 in configuration file" >&2
  exit 1
fi

#Deduce local address and assume other address is remote machine
if echo "$server_addresses" | grep "$server1" &>/dev/null; then
  local_address="$server1"
  remote_address="$server2"
else
  if echo "$server_addresses" | grep "$server2" &>/dev/null; then
    local_address="$server2"
    remote_address="$server1"
  else
    echo "Unable to identify local server address and assume remote address" >&2
    exit 1
  fi
fi

#Check remote server is OK
remote_server_status=$( echo "test" | \
  $SSH "$remote_address" "$SYNC_COMMANDS_SCRIPT" )

if [ "x""$remote_server_status" == "xbusy" ]; then
  echo "Remote server appears to have live_syncd process running" >&2
  echo "This can not run on both servers" >&2
  exit 1
fi

if [ "x""$remote_server_status" != "xOK" ]; then
  echo "Unable to run commands on remote server" >&2
  exit 1
fi

#Get major Zimbra version
zimbra_version=$( zmcontrol -v | grep -Eo "[0-9][^\.]*" | head -n 1 )
if [ ${#zimbra_version} -lt 1 ]; then
 zimbra_version=0
fi

case $1 in
  start)
    #Check for processes from this script and also redolog replay. Don't count PID files older than uptime.
    if [ -f "$PID_FILE_REDO" ] && \
        [ $(( $( date +%s ) - $( stat -c '%Y' "$PID_FILE_REDO" ) )) -lt $( cat /proc/uptime | grep -Eo "[0-9]+" | head -n 1 ) ]; then
      pid_found="yes"
    fi
    if [ -f "$PID_FILE_LDAP" ] && \
        [ $(( $( date +%s ) - $( stat -c '%Y' "$PID_FILE_LDAP" ) )) -lt $( cat /proc/uptime | grep -Eo "[0-9]+" | head -n 1 ) ]; then
      pid_found="yes" 
    fi
    if [ $pid_found ] || \
        ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep &>/dev/null; then
      echo "Proccess already running"
    else
      echo -n "Starting processes..."
      get_zimbra_config_globals
      echo "***************************************" >>"$LOG_FILE"
      logit 3 "Starting $( basename "$0" )" >>"$LOG_FILE"
      logit 3 "Incremental backups enabled : $incremental_backups" >>"$LOG_FILE"
      logit 3 "Convertd enabled : $convertd_enabled" >>"$LOG_FILE"
  
      ldap_live_sync >>"$LOG_FILE" 2>&1 &
      echo $! >"$PID_FILE_LDAP"
      redo_log_live_sync >>"$LOG_FILE" 2>&1 &
      echo $! >"$PID_FILE_REDO"
      echo "done"
    fi
    ;;
  stop)
    touch "$STOP_FILE"
    [ -d "$LOCK_STATE_DIR" ] && echo "Waiting for sync operations to complete..."
    while [ -d "$LOCK_STATE_DIR" ]; do
      sleep 5
    done
    rm -f "$STOP_FILE"
    replay_redo_logs
    kill_everything
    echo "done"
    ;;
  status)
    if ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep &>/dev/null; then
      echo "redolog is being replayed"
      replay_stat=0
    else
      replay_stat=3
    fi
    if [ -f  $PID_FILE_REDO ] && ps $( head -n 1 $PID_FILE_REDO 2>/dev/null ) &>/dev/null; then
      echo "redo log sync process OK"
      redo_stat=0
    else
      echo "redolog sync process stopped"
      redo_stat=3
    fi
    if [ -f  $PID_FILE_LDAP ] && ps $( head -n 1 $PID_FILE_LDAP 2>/dev/null ) &>/dev/null; then
      echo "ldap sync process OK"
      ldap_stat=0
    else
      echo "ldap sync process stopped"
      ldap_stat=3
    fi
    [ $ldap_stat == 3 ] && [ $redo_stat == 3 ] && [ $replay_stat == 3 ] && exit 3
    [ $ldap_stat == 0 ] && [ $redo_stat == 0 ] && exit 0
    exit 1
    ;;
  kill)
    kill_everything
    ;;
  *)
    trap quitting INT TERM SIGINT SIGTERM
    if ps aux | grep "redo_log_live_sync" | grep -v grep  &>/dev/null || \
        ps aux | grep "ldap_live_sync" | grep -v grep  &>/dev/null || \
        ps aux | grep -E "zimbra.*java.*PlaybackUtil" | grep -v grep &>/dev/null; then
      echo "Proccess already running"
    else
      echo "Starting processes in realtime"
      get_zimbra_config_globals
      logit 3 "Incremental backups enabled : $incremental_backups"
      logit 3 "Convertd enabled : $convertd_enabled"
      ldap_live_sync &
      echo $! >"$PID_FILE_LDAP"
      redo_log_live_sync &
      echo $! >"$PID_FILE_REDO"
      while [ 1 ]; do sleep 10; done
    fi
    ;;
esac

Remote command script

The following script should be saved as sync_commands in the /opt/zimbra/live_sync directory. This should be owned by user zimbra and made executable.

#!/bin/bash
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.

#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.

#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>

##########################################################################
# Title      :  sync_commands
# Author     :  Simon Blandford <simon -at- onepointltd -dt- com>
# Date       :  2013-03-12
# Requires   :  zimbra sync_commands inotify-tools
# Category   :  Administration
# Version    :  2.1.3
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
# License    :  GPLv3 (see above)
##########################################################################
# Description
# Keep two Zimbra servers synchronised in near-realtime, local agent
##########################################################################


#******************************************************************************
#********************** Main Program ******************************************
#******************************************************************************


if [ "$( whoami )" != "zimbra" ]; then
  echo "Must run as zimbra user" >&2
  exit 1
fi

#Check for rsync of redolog or ldap
if echo "$SSH_ORIGINAL_COMMAND" | \
  grep "rsync" | \
    grep -E "/opt/zimbra/redolog/|/opt/zimbra/data/ldap/|/opt/zimbra/backup/sessions/incr" &>/dev/null; then
  case "$SSH_ORIGINAL_COMMAND" in
  *\&*)
    echo "Rejected"
    ;;
  *\(*)
    echo "Rejected"
    ;;
  *\{*)
    echo "Rejected"
    ;;
  *\;*)
    echo "Rejected"
    ;;
  *\<*)
    echo "Rejected"
    ;;
  *\`*)
    echo "Rejected"
    ;;
  rsync\ --server*)
    $SSH_ORIGINAL_COMMAND
    ;;
  *)
    echo "Rejected"
    ;;
  esac
else
  #Not rsync
  case "$#" in
  0) read command
    ;;
  *) command=$1
    ;;
  esac

  check_inotify () {
    if ! which inotifywait &>/dev/null; then
      echo "inotifywait not found" >&2
      echo "Please install inotify-tools" >&2
      exit 1
    fi
  }

  #Extract numeric parameter from command name
  param=$( echo "$command" | grep -Eo "[0-9]+" )
  command=$( echo "$command" | grep -Eio "[a-z_]+" )
  
  case $command in
    test)
      if ps aux | grep "live_syncd" | grep -v grep &>/dev/null; then
        echo "busy"
      else
        echo "OK"
      fi
      ;;
    wait_redo)
      #Wait for redo log roll-over
      check_inotify
      kill -KILL $( ps aux | grep "inotifywait -r /opt/zimbra/redolog" | \
                    grep -v grep | awk '{print $2}' ) &>/dev/null
      inotifywait -r /opt/zimbra/redolog -e moved_to
      ;;
    wait_ldap)
      #Wait for ldap changes. Ignore log changes.
      check_inotify
      kill -KILL $( ps aux | grep "inotifywait -r /opt/zimbra/data/ldap" | \
                    grep -v grep | awk '{print $2}' ) &>/dev/null
      inotifywait -r /opt/zimbra/data/ldap -e modify \
        -e attrib -e close_write -e moved_to -e moved_from \
        --exclude "logs\/log\.|accesslog" \
        -e move -e delete -e delete_self
      ;;
    dump_ldap)
      #Extract the LDIF database and stream it
      /opt/zimbra/libexec/zmslapcat  "/tmp/zimbraldif"
      cat "/tmp/zimbraldif/ldap.bak"
      rm -rf "/tmp/zimbraldif"
      ;;
    stream)
      #Live-stream redolog
      #Kill any hanging previous tail commands
      kill -KILL $( ps aux | grep "tail -c +0 -f /opt/zimbra/redolog/redo.log" | \
                    grep -v grep | awk '{print $2}' ) &>/dev/null
      tail -c +0 -f /opt/zimbra/redolog/redo.log
      ;;
    gather)
      #Gather list of recent redologs from incremental backups and archive
      find '/opt/zimbra/backup/sessions/incr-'*'/redologs' \
           '/opt/zimbra/redolog/archive' \
            -name 'redo*.log' -type f -mtime -$param -print 2>/dev/null | \
            sort
      ;;
    purge)
      #Remove old archives
      find /opt/zimbra/redolog/archive -type f -mtime +$param -exec rm {} \;
      ;;
    query_incremental)
      #Query whether incremental backups are scheduled
      if which zmschedulebackup &>/dev/null && \
          zmschedulebackup -q | \
          grep -Eo "i([[:space:]]+[0-9\*\-]+){5}" &>/dev/null; then
        echo "true"
      else
        echo "false"
      fi
      ;;
    *)
      rsync
      ;;
  esac
fi

Configuration File

The configuration file simply contains the IP addresses of the live and mirror server. The order is not important since this is worked out by the script by seeing which IP address is assigned to the local machine. The configuration file name is saved as live_sync.conf and saved in the /opt/zimbra/live_sync directory and readable by user zimbra. The following is an example, you obviously should use the real IP addresses of your own live and mirror servers.

server1="192.168.108.10"
server2="192.168.108.11"

Enabling redo.log

For the Network edition, redo logs are already being created and are periodically moved to create incremental backups. For the open source version redo logs archiving must be enabled.

To see the current redo log related settings type the following as user zimbra

zmprov gacf | grep "RedoLog"

To enable redo log rollover on the open source version, type...

zmprov mcf zimbraRedoLogDeleteOnRollover FALSE
zmprov mcf zimbraRedoLogEnabled TRUE

You may also want to make the redo log rotation more frequent to guarantee a file-system consistent redo log on the mirror server at least up to the last, say, thirty minutes. The live-streamed redo.log may not be consistent although it is unlikely this will ever be a problem except with the very last record in the log.

For example, to force rollover every half and hour, type...

zmprov mcf zimbraRedoLogRolloverFileSizeKB 1
zmprov mcf zimbraRedoLogRolloverMinFileAge 30

This will rollover if the size of the redo log is over 1KB after 30 mins, which is very likely unless the mail server is not sending or receiving any mail at all during this time.

You may want to reduce the zimbraRedoLogRolloverMinFileAge even further while setting and testing this script just so you don't have to wait too long to see stuff happening between the severs.

Mirror Server

The mirror server should ideally have the same operating system as the live server and must have exactly the same version of Zimbra installed.

The hostname must also be exactly the same.

Install inotify-tools

For Redhat/Centos this is...

As user root:

yum install -y inotify-tools

For Ubuntu this is...

As user root

apt-get install inotify-tools
Create log rotation

The logrotate configuration also needs to be done on the mirror server

As user root:

echo "/opt/zimbra/live_sync/log/live_sync.log {
    daily
    missingok
    copytruncate
    rotate 7
    notifempty
    compress
}">/etc/logrotate.d/zimbra_live_sync
First rsync between server

We now perform the first copy of the zimbra directory between the live and mirror server. On the mirror server we must stop Zimbra. We leave Zimbra running on the live server for now to reduce downtime.

The following rsync command is run on the mirror server. Substitute the hostname or IP address of the live server as required in the command below.

Note:Copying the sparse files used by LDAP in Zimbra 8+ takes a very long time even though the file is small.

As user root:

service zimbra stop
rsync -aHz --force --delete --sparse live_server:/opt/zimbra/ /opt/zimbra/
Second rsync between server

This is where we need to stop Zimbra on the live server so that we can copy a consistent /opt/zimbra directory from the live to the mirror server. This is the only downtime required.

On the live server as user root:

service zimbra stop

On the mirror server as user root:

rsync -aHz --force --delete --sparse live_server:/opt/zimbra/ /opt/zimbra/

On the live server as user root:

service zimbra start

On the mirror server as user root (just to make sure we have a viable copy of Zimbra):

service zimbra start
service zimbra status
service zimbra stop

Running the script

Not only have we copied all the Zimbra data from the live to mirror server, we have also copied the script and SSH keys. We should now be able to try running the script.

On the mirror server as user zimbra:

cd /opt/zimbra/live_sync
./live_syncd start

All being well the script has started without any complaints and we can now tail the log file to see that it is syncing as expected.

tail -f log/live_sync.log

(CTRL-C to exit tail command)

Init script

I have also created an init script that may be useful for Ubuntu and Redhat/CentOS

CentOS/Redhat

Copy the below script as /etc/init.d/zimbra_live_sync.

On both the live and the mirror server make the script executable and add the script to chkconfig.

chmod 755 /etc/init.d/zimbra_live_sync
chkconfig --add zimbra_live_sync

On the live server make sure it doesn't start on boot.

chkconfig zimbra_live_sync off

On the mirror server ensure that Zimbra doesn't start on boot but the live sync script does.

chkconfig zimbra off
chkconfig zimbra_live_sync_on

Ubuntu

Copy the below script as /etc/init.d/zimbra_live_sync.

On both the live and the mirror server make the script executable.

chmod 755 /etc/init.d/zimbra_live_sync

On the mirror server ensure Zimbra doens't start on boot but the live sync script does.

update-rc.d -f zimbra remove
update-rc.d zimbra_live_sync defaults

The init script

#!/bin/bash
# 
# ***** BEGIN LICENSE BLOCK *****
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.

#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.

#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>
# ***** END LICENSE BLOCK *****
# 
#
# Init file for zimbra live sync
#
# chkconfig: 345 99 01
# description: Zimbra live sync service
#
### BEGIN INIT INFO
# Provides:       zimbra_live_sync
# Required-Start: $network $remote_fs $syslog $time nscd cron
# Required-Stop:  $network $remote_fs $syslog $time
# Default-Start:  3 5
# Description:    Zimbra live sync service
### END INIT INFO

case "$1" in
        gentle-restart)
                su - zimbra -c "live_sync/live_syncd stop"
                su - zimbra -c "live_sync/live_syncd start"
                RETVAL=$?
                ;;
        restart)
                su - zimbra -c "live_sync/live_syncd kill"
                su - zimbra -c "live_sync/live_syncd start"
                RETVAL=$?
                ;;
        start)
                su - zimbra -c "live_sync/live_syncd start"
                RETVAL=$?
                ;;
        gentle-stop)
                su - zimbra -c "live_sync/live_syncd stop"
                RETVAL=$?
                ;;
        stop)
                su - zimbra -c "live_sync/live_syncd kill"
                RETVAL=$?
                ;;
        status)
                su - zimbra -c "live_sync/live_syncd status"
                RETVAL=$?
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart|gentle-stop|gentle-restart|status}"
                RETVAL=1
                ;;
esac
exit $RETVAL

Failover

If the live server fails then the procedure on the mirror server is simply to stop the live_sync script then start Zimbra.

su - zimbra
cd live_sync
./live_syncd stop
zmcontrol start

Fallback

Simply run the script on the server to fail back to i.e. live and mirror are now reversed.

As user zimbra (on ex-live server to be restored back to live):

cd /opt/zimbra/live_sync
./live_syncd start

Once the script has caught up and synced the two servers together. Stop Zimbra on the other server.

As user zimbra on mirror (failover server)

zmcontrol stop

As user zimbra on live (restored server)

cd /opt/zimbra/live_sync
./live_syncd stop
zmcontrol zimbra start

Warm or very warm standby

To replay redo logs only requires that the mailbox process is stopped. This is done automatically by the script. The script will work whether Zimbra has been started on the mirror or not as it will enable or disable services as and when it needs them. Keeping the rest of Zimbra running will drastically reduce the time it takes to fail over. This is only an advantage when access to the server domain can be quickly flipped or has a failover mechanism.

How and why it all works

Introduction

Zimbra employs several different databases to store messages, message indexes, meta-data, account information and configuration. Although it is possible to synchronise two Zimbra servers at the disk level using DRBD or VSphere, the amount of disk operations from all these databases that need to be replicated would probably take up a lot of bandwidth which may be debilitating and/or expensive to implement if the two servers are in remote locations.

Fortunately, Zimbra keeps a log of almost all it's transactions in the redolog. The only thing not logged here are changes to the LDAP database. An incremental backup is made up of an LDAP dump and a collection of redologs. An incremental backup can be used to bring a backup server up to date if the if the last full backup of the backup server was more recent than the oldest log in the redolog.

Redolog

If the redolog can be piped to a mirror server in real time then all the mirror server has to do is keep replaying the logs every so often and it will keep the same state as the live server. The only other thing to keep up to date is the LDAP database. Fortunately, the LDAP database doesn't change that often so it is quite easy to keep it synced on a directory level.

The easiest way to transfer the redologs is to use rsync. The only problem with that is that rsync does not run continuously. It also won't handle the archiving of redo.log very efficiently. When redo.log is renamed and moved to the archive rsync will delete it at the remote location then transfer it all over again to its new location in the archive. If we can catch this move taking place then we can move and rename the file on the mirror server before running rsync. Then rsync has very little to do, in theory, nothing except delete any files that have been purged. Another issue with rsync is that the file may be in the process of being written to when it is copied. This results in an incomplete file at the mirror. However, redologs are only ever appended to and so only the last record will be corrupted. Zimbra is designed to be tolerant of redolog corruption otherwise it would be of limited use as a disaster recovery tool.

To keep the redolog live, the "tail -f" command is used over ssh to pipe the file to the mirror. By calling "tail -f -c +0" it tails right back to the zeroth byte of the file, effectively a copy-then-stream command.

Redolog purging

If a Network edition is detected and incremental backups are enabled then the redologs are replayed before any rsync is performed as well as after. This ensures that everything is replayed before the files all disappear to the backup directory.

For the open source edition, or Network edition with no incremental backups scheduled, the redologs are purged if they are more than a day old and have been replayed. If the mirror server is down then the redologs will just accumulate on the live server to be replayed when the sync process is restarted before being purged.

LDAP

LDAP stores it's data in /opt/zimbra/data/ldap. This can be copied using rsync to the mirror as long as no changes take place during the copy. On versions of Zimbra older than 8.0, the directory is monitored for this during the rsync operation and repeated if there was any change during that time. For Zimbra 8+ the directory is monitored as before but changes are transfered using an LDIF export and import. This is necessitated by the long time it takes to rsync the sparse files that LDAP now uses to store data. Yes, it would be neater for old versions to use the LDIF method too but it wasn't broke so I didn't risk fixing it yet.

Known issues

  • If the connection breaks at the very moment that the live stream of the redo.log starts, before the tail command reaches the point where it is tailing instead of cataloguing the file, then some of the redo.log will not make it to the mirror resulting in some loss of transactions. Fortunately, this is only ever likely very just after the log has rolled over so the worst-case losses should be minimal.
  • LDAP is only checked every ten minutes so some losses are possible if the connection breaks in that time. However, LDAP isn't expected to change very often unless something major like a batch account migration is taking place.
  • If any redologs go missing, or can't be replayed successfully for any reason, then there will be gaps synced email events. Mail may go missing on the mirror server. Check the live_sync logs for any log sequence numbers that appear not to have been transferred and processed. In the case of suspected data loss, stop all services and repeat the initial rsync process.

Nagios Integration

The script now generates useful status information for Nagios. The time since the last successful operations is measured and Nagios can raise an alert if the any part of the script appears to have not been successful for a longer than expected time.

The Nagios script is to be run on the currently Live server and can be run as any user as long as that user has read access to the status files created by live_syncd. These are in /opt/zimbra/live_sync/status.

#!/bin/bash
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.

#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.

#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>

##########################################################################
# Title      :  check_zimbra_live_sync
# Author     :  Simon Blandford <simon -at- onepointltd -dt- com>
# Date       :  2012-08-25
# Requires   :  live_syncd
# Category   :  Administration
# Version    :  2.0.0
# Copyright  :  Simon Blandford, Onepoint Consulting Limited
# License    :  GPLv3 (see above)
##########################################################################
# Description
# Nagios plug-in for Zimbra live sync script
##########################################################################

#******************************************************************************
#********************** Constants *********************************************
#******************************************************************************

ZIMBRA_DIR="/opt/zimbra"
BASE_DIR="$ZIMBRA_DIR""/live_sync"
STATUS_DIR="$BASE_DIR""/status"

#Files that need age testing
LAST_GOOD_REDO_REPLAY="$STATUS_DIR""/last_good_redo_replay"
LAST_GOOD_REDO_SYNC="$STATUS_DIR""/last_good_redo_sync"
LAST_GOOD_REDO_STREAM="$STATUS_DIR""/last_good_redo_stream"
LAST_GOOD_LDAP_SYNC="$STATUS_DIR""/last_good_ldap_sync"
LAST_GOOD_LDAP_START="$STATUS_DIR""/last_good_ldap_start"

#******************************************************************************
#********************** Functions *********************************************
#******************************************************************************

usage () {
  echo "Usage: check_zimbra_live_sync -w <hours> -c <hours>" 
  exit 3
}

file_age () {
  echo $(( ($( date +%s) - $( stat -c "%Y" "$1" )) / 3600 ))
}

#Extract name of function being tested from file name
function_id () {
  echo "$1" | grep -Eo "[^_]*_[^_]*$"
}

file_report () {
  local file_under_test
  file_under_test="$1"
  age_of_file_under_test=$( file_age "$file_under_test" )
  
  #No file returns UNKOWN status
  if [ ! -f "$file_under_test" ]; then
    if [ ${#affected_function_list} -gt 1 ] && [ $status_code -eq 3 ]; then
      affected_function_list="$affected_function_list"",""$( function_id "$file_under_test" )"
    else
      affected_function_list="$( function_id "$file_under_test" )"
    fi
    status_code=3
    status_msg="Unknown"
    return
  fi
  
  #Test for files older than critical number of seconds. Only test if no unknown status
  if [ $status_code -lt 3 ]; then
    if [ $age_of_file_under_test -ge $c ]; then
      if [ ${#affected_function_list} -gt 1 ] && [ $status_code -eq 2 ]; then
        affected_function_list="$affected_function_list"",""$( function_id "$file_under_test" )"
        affected_function_list="$affected_function_list""(""$age_of_file_under_test"" hours)"
      else
        affected_function_list="$( function_id "$file_under_test" )""(""$age_of_file_under_test"" hours)"
      fi
      status_code=2
      status_msg="Critical"
      return   
    fi
  fi
  
  #Test for files older than warning number of seconds. Only test if no unknown or critical status
  if [ $status_code -lt 2 ]; then
    if [ $age_of_file_under_test -ge $w ]; then
      if [ ${#affected_function_list} -gt 1 ] && [ $status_code -eq 1 ]; then
        affected_function_list="$affected_function_list"",""$( function_id "$file_under_test" )"
        affected_function_list="$affected_function_list""(""$age_of_file_under_test"" hours)"
      else
        affected_function_list="$( function_id "$file_under_test" )""(""$age_of_file_under_test"" hours)"
      fi
      status_code=1
      status_msg="Warning"
      return   
    fi
  fi
}

#******************************************************************************
#********************** Main Program ******************************************
#******************************************************************************

status_code=0
status_msg="OK"
affected_function_list="Live sync up to date"

for i in $@; do
  if [ $get_w ]; then
    w=$( echo "$i" | grep -Eo "[0-9]+" );
    unset get_w
  fi
  if [ $get_c ]; then
    c=$( echo "$i" | grep -Eo "[0-9]+" );
    unset get_c
  fi
  [ "x""$i" == "x-w" ] && get_w=1
  [ "x""$i" == "x-c" ] && get_c=1
done

if [ ! $w ] || [ ! $c ]; then
  usage
fi

file_report "$LAST_GOOD_REDO_REPLAY"
file_report "$LAST_GOOD_REDO_SYNC"
file_report "$LAST_GOOD_REDO_STREAM"
file_report "$LAST_GOOD_LDAP_SYNC"
file_report "$LAST_GOOD_LDAP_START"

echo "$status_msg"" : ""$affected_function_list"
exit $status_code

I use the following line in /etc/nagios/nrpe.cfg or in the /etc/nagios/nrpe.d directory to allow the script to be called by nrpe.

command[check_zimbra_live_sync]=/usr/lib/nagios/plugins/contrib/check_zimbra_live_sync -w $ARG1$ -c $ARG2$

You may need to adjust the path to the nagios plugin script depending on where the contributed scripts are stored.

Change log

Version 2.1.5 2016-05-18

1) egrep and fgrep is deprecated. Change to grep -E and grep -F where used.

Version 2.1.4 2013-03-12

1) HSM detection needs ldap, mailbox and mysql. Now done once at start.

Version 2.1.2 2013-03-12

1) Added check to ensure mailboxd is kept running if HSM is being used

2) Suspend ldap/mailbox updates while HSM job is running

Version 2.1.1 2013-02-12

1) Change grep -o "[0-9]*" to grep -o "[0-9]+" for RHEL in sync_commands

Version 2.1.0 2013-02-11

1) Compatible with Zimbra version 8

2) Kill hanging sync_commands process on live server

3) Rename LDAP_DIR constant to LDAP_TEMP_DIR

4) Make rsync of ldap database aware of sparse files

5) For Zimbra version 8 and above use ldif export/import instead of rsync

Version 2.0.1 2013-19-12

1) Preserve mailboxd runing state to allow warmer standby and functioning HSM

2) Don't refuse to start because of stale pid files after a system reboot


Version 2.0.0 2012-08-29

1) Started changes log :)

2) Constants in upper case, variables in lower case

3) Exclude ldap log changes from inotify watch

4) Nagios plug-in to monitor time since last successful operation

5) Redologs are archived once successfully replayed

6) Redologs are retrieved from incremental backups on Zimbra Network edition

7) Redologs are purged independently after specified time on both servers

8) Magic path names and numbers now defined in constants

9) Log reporting by level and with filtering

10) Script aware of the java redolog replay process

11) convertd runs during redolog replay if enabled so indexing will work
Jump to: navigation, search