Ajcody-Clustering: Difference between revisions

mNo edit summary
 
(46 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{NotOfficial}}
{{BC|Zeta Alliance}}                         <!-- Note, this will also add [[Category: Zeta Alliance]] to bottom of wiki page. -->
__FORCETOC__                              <!-- Will force a TOC regards of size of article. __NOTOC__  if no TOC is wanted. -->
<div class="col-md-12 ibox-content">
==Clustering Topics==            <!-- Normally will reflect page title. Is listed at very top of page. -->
{{KB|{{ZETA}}|{{ZCS 8.5}}|{{ZCS 8.0}}|{{ZCS 7.0}}|}}            <!-- Can only handle 3 ZCS versions. -->
{{WIP}}       


==Actual Clustering Topics Homepage==
===Actual Clustering Topics Homepage===


Please se [[Ajcody-Clustering]]
----
 
Please see [[Ajcody-Clustering]]


===My Other Clustering Pages===
===My Other Clustering Pages===
----


* [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]]
* [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]]
* [[Ajcody-Notes-Of-Customer-Cluster-Upgrade]]
* [[Ajcody-Notes-Of-Customer-Cluster-Upgrade]]
RFE I made based upon the experience above:
* "QA & Corrections For Cluster Upgrade Document"
** http://bugzilla.zimbra.com/show_bug.cgi?id=31026
===Clustering For ZCS 8 And Above===
----
====ZCS 8 Specifics====
From the ZCS 8.0.5 Release Notes:
'''''Red Hat Cluster Suite is not available with ZCS 8.0 To streamline support efforts, we will only test and certify in house availability solutions. Today VMware offers clustering failover and automated recovery through VMware HA. Zimbra integrates with the VMware HA cluster infrastructure to heartbeat Zimbra application services and provide automated recover in the event of a service failure. (Bugs 72215/72216)'''''
* Private bugs about the EOL of RHCS
* Deprecate RHCS in 7.2
** http://bugzilla.zimbra.com/show_bug.cgi?id=72215
* EOL RHCS
** http://bugzilla.zimbra.com/show_bug.cgi?id=72216
'''''If you use Zimbra Clustering'''''
''Zimbra Clustering is no longer available for ZCS 8.0. VMware provides integrated high availability between Zimbra and VMware HA for automated recovery of critical Zimbra application services and server components in the event of an application or infrastructure failure. Third party solutions such as network load balancers, storage mirroring, or OS clustering solutions like Red Hat Cluster Suite may be used in your deployment, but are not specifically tested or certified by VMware. (Bug75821)''
* Clustering not in ZCS 8.0
** http://bugzilla.zimbra.com/show_bug.cgi?id=75821
=====Statement On Zimbra Support And "Clustering" For ZCS 8+ [In My Own Words - Ajcody]=====
'''''I've sent this off to our PM team to review and to provide formal comments and documentation updates on it. - Ajcody'''''
'''''Official''''' and '''''Full support''''' are '''NOT''' terms Zimbra Support applies to Redhat RHCS or Vmware's Clustering options since our team does not support either of those two options directly. We do not support customers in configuring or trouble-shooting the clustering components for Redhat's RHCS or Vmware's Clustering.
Will and can RHCS and Vmware's Clustering work with ZCS, yes. Does our team offer support in configuring, trouble-shooting, or diagnosing the clustering component of the setup - '''no'''. You would need to contact Redhat if you went with RHEL and RHCS or Vmware and your Linux OS distribution support channel if you went with Vmware Clustering.
'''Note''' - The zimbra clustering zimlets are completely removed with ZCS 8.6 . The Vmware clustering monitoring scripts are a separate issues that falls outside of general clustering support.
====Future Releases After ZCS 8====
Please see the following blog about our '''Always ON, Carrier Grade Architecture HA''' features we are hoping to have in future releases:
* http://blog.zimbra.com/blog/archives/2013/09/project-always-on.html
* http://blog.zimbra.com/blog/archives/2013/04/zimbra-judaspriest-release-update-1.html
===Vmware Virtualization and Clustering===
----
====Overview====
From the ZCS 8.0.5 Release Notes:
'''''Red Hat Cluster Suite is not available with ZCS 8.0 To streamline support efforts, we will only test and certify in house availability solutions. Today VMware offers clustering failover and automated recovery through VMware HA. Zimbra integrates with the VMware HA cluster infrastructure to heartbeat Zimbra application services and provide automated recover in the event of a service failure. (Bugs 72215/72216)'''''
* Private bugs about the EOL of RHCS
* Deprecate RHCS in 7.2
** http://bugzilla.zimbra.com/show_bug.cgi?id=72215
* EOL RHCS
** http://bugzilla.zimbra.com/show_bug.cgi?id=72216
'''''VMware Heartbeat Service'''''
''ZCS utilizes the VMware application programming interface that allows software providers to deploy application monitoring components inside a VMware guest OS and inform VMware HA when problems arise. The VMware-heartbeat service provides information to the VMware HA components on the health and availability of the ZCS. If you have VMware HA components installed, the VMware HA service is shown as enabled on the Server>Services page.''
'''''General Description of VMware HA'''''
''VMware HA provides a simple, reliable way to increase the availability of virtual machines hosting critical applications. VMware HA is a visualization- based distributed infrastructure service of VMware vSphere 4.1+, which monitors the health of virtual machines and the VMware ESX® hosts upon which they reside. If a fault is detected, the virtual machine is automatically restarted on another ESX host with adequate capacity to host it. VMware HA is included in all vSphere editions and can be enabled on a VMware cluster with a single check box. As VMware HA utilizes the storage and network connectivity already in place to support vMotion, enabling high availability is as simple as ensuring you have adequate server capacity to handle failure of one or more ESX hosts.''
====ZCS Scripts For Vmware-HA====
The three scripts for vmware-ha are:
* /opt/zimbra/bin/zmhactl
* /opt/zimbra/libexec/vmware-heartbeat
* /opt/zimbra/libexec/vmware-appmonitor
We have an open documentation RFE to better explain these :
* "Documentation for VMware HA scripts in Zimbra"
** http://bugzilla.zimbra.com/show_bug.cgi?id=76386
Some older bugs describing a little more about the vmware-heartbeat check:
* "VMware HA does not work with ZCS8 or ZCA8"
** http://bugzilla.zimbra.com/show_bug.cgi?id=76780
* "Service level control of VMware HA Clustering heartbeat"
** http://bugzilla.zimbra.com/show_bug.cgi?id=56716
====Enable Vmware-HA Service====
To confirm the service is available to enable:
[As the zimbra user]
<pre>zmprov gs `zmhostname` zimbraServiceInstalled | grep vmware-ha
zimbraServiceInstalled: vmware-ha
</pre>
To enable vmware-ha on a server:
[As the zimbra user]
<pre>
su - zimbra
zmprov ms `zmhostname` +zimbraServiceEnabled vmware-ha
zmhactl start
</pre>
====Vmware Performance Recommendations For Vmware Clustering====
See
* [[Performance_Recommendations_for_Virtualizing_Zimbra_with_VMware_vSphere_4#vSphere_Cluster_Recommendations]]
====VMware Host Based Replication (HBR)====
Questions should be addressed to Vmware Support. This feature happens outside of Zimbra and is not QA tested by us.
Private Bug [Referenced here in case it's later made public]:
* Qualification of ZCS with VMware Host Based Replication (HBR)
** http://bugzilla.zimbra.com/show_bug.cgi?id=44992
====Other References====
See also:
* "Zimbra on NFS Storage through VMware ESX"
** http://bugzilla.zimbra.com/show_bug.cgi?id=50635
** This is NFS for the virtualized storage within ESX, not within the OS.
*** Current NFS policy, when the nfs mount is done within the OS.
**** http://bugzilla.zimbra.com/show_bug.cgi?id=33221#c8
* "No support for RHEL Clustering on top of VMware"
** http://bugzilla.zimbra.com/show_bug.cgi?id=53728
====Webinar On Virtualizing Zimbra with VMware vSphere and NetApp NFS====
Learn how the combination of VMware vSphere and NetApp file-based storage provides full support for virtualization and all of vSphere's advanced features (vMotion, Storage vMotion, DRS), simplifies high availability through the use of VMware HA and server consolidation.
View Webinar Now (Duration: 45 minutes)
* http://www.zimbra.com/about/webinar_form.php?showID=21
===Critical Bugs/RFE's - False Restarts And So Forth===
====Log Rotation Causes Cluster Failover====
See:
* "Log rotation causes cluster failover"
** http://bugzilla.zimbra.com/show_bug.cgi?id=36042
Recommendations to work around bug until fix is released - 2 methods:
* First Method - disable log rotation
** Remove the zmmtaconfigctl restart from the /etc/logrotate.d/zimbra file so that it will not attempt to restart the service that is being detected as down.  We can remove the line that reads:
<pre>su - zimbra -c "/opt/zimbra/bin/zmmtaconfigctl restart"</pre>
** That will stop these failures, but will affect the logging for zmmtaconfigctl, probably causing it to write to a nonexistent file.  We haven't seen problems in zmmtaconfig for a long time, so this is a pretty low risk workaround.
* Second Method - disable software monitoring in general
** See [[Ajcody-Clustering#Software_Monitoring_Causes_Problems]] below.
====Software Monitoring Causes Problems====
* "Request Modification to zmcluctl to support hardware only failover with Redhat Cluster Manager"
** http://bugzilla.zimbra.com/show_bug.cgi?id=25456
*** '''''Note:''' Beginning in ZCS 6.0, hardware only failover with Redhat Cluster Manager is supported.''
* To Disable software monitoring. 
** This will prevent failover if zmcluctl finds a service down.  It will not prevent failover if there is a hardware fault detected by the cluster software.
*** $zmcluctl mode hardware
*** or
*** $zmlocalconfig -e zimbra_cluster_mode=hardware
*** Note from bug/rfe - Added switch for software or hardware only based fail-over implemented as a lc zimbra_cluster_mode attribute.  default mode is software meaning any non-zero exit status from zmcontrol status will trigger a failover.  hardware mode means that zmcluctl status will always return a zero exit status and will not consult zmcontrol status for the state.
** To also increase the chance of getting more information in the log on what might be going on:
*** In /opt/zimbra-cluster/bin/zmcluctl you should see a line like:
**** my @output = `su - zimbra -c 'zmcontrol status'`;
*** Change that to:
**** my @output = `su - zimbra -c 'date >> /opt/zimbra/log/zmcluster-status.log ; zmcontrol status >> /opt/zimbra/log/zmcluster-status.log 2>> /opt/zimbra/log/zmcluster-status.log'`;
*** That should give us more logging. I believe zmcluctl is read every time from disk when it does the check, so no restart of services should be needed.
====RHEL 5 Clusters And Cisco Switches====
Please see the following:
* "Openais appears to fail, causing cluster member to fence"
** https://bugzilla.redhat.com/show_bug.cgi?id=469874
*** The last comment mentions a cisco issue being the cause [cisco switches are used internally for ibm blades].
*** Comment 9 states most likely hardware configuration issue w/ switch or iptables.
* "Cman kills first node in initial cluster setup"
** https://bugzilla.redhat.com/show_bug.cgi?id=485026
====Mysql Related Items Impacting Cluster====
Please see:
* "Mysql crash recovery causes repeated software failover"
** http://bugzilla.zimbra.com/show_bug.cgi?id=36690
*** "finding root cause to failed failover situations as secondary issue from bug 36690"
**** http://bugzilla.zimbra.com/show_bug.cgi?id=37830
* "Flush dirty innodb pages in mysql prior to shutting down."
** http://bugzilla.zimbra.com/show_bug.cgi?id=37231
====Other Misc Bug/RFEs====
Other bugs/rfe's you might be interested in looking at:
* "zmcluctl status can return errors even when services are up"
** http://bugzilla.zimbra.com/show_bug.cgi?id=24868
====Failover Occurring Prior To Mysql Shutting Down - RHCS====
Here is the summary of the situation that was sent for a case on the matter.
: Regarding RHCS desired configuration -
: In particular, it looks like mysqld can at times take quite a long time to shutdown, especially in situations when write IO is limited on the mysql db partition, and mysqld must write cache data to disk safely before shutting down. We believe this should be reflected in the "rc=1" output from "zmcontrol stop" - if mysqld does not stop within the 60 seconds via /opt/zimbra/bin/zmcontrol (via /opt/zimbra/bin/mysql.server), and then what appears to be happening is that RHCS either tries to unmount the filesystem and/or just kill -KILL the running process, which in turns causes mysqld to die ungracefully and cause the mysqld startup on the other box to go through a long log replay. The net result is one of two scenarios:
:: (a) mysqld takes longer to startup than it would have taken to shutdown (normal shutdown can take 0-10 minutes if mysqld has to flush a bunch of data to write out)
:: (b) mysqld is still running when RHCS tries to unmount the filesystem, which then fails, and therefore the failover fails
: What we'd like to be able to recommend is a more robust failover logic, such as:
:: 1. Run zmcontrol stop, check output
:: 2. If failed, wait X seconds (e.g., 120 seconds)
:: 3. Run zmcontrol stop again
:: 4. If failed, wait X seconds
:: 5. Run zmcontrol third time
:: 6. If failed third time, try a kill -KILL and force unmount
: If RHCS can be configured to do this, it would be the best way to handle this situation. The problem with modifying zmcontrol to have this logic internally is that it should not be granted the decision to wait indefinitely, in the case that mysqld is having a critical problem and is not shutting down. Having an external process manage this logic allows for additional triangulation on the determination of the situation, and allows additional capabilities such as monitoring the files in /opt/zimbra/db/data to confirm that files are changing. For example, if mysqld is still running and no files have changed in /opt/zimbra/db/data for a significant amount of time, it may indicate that mysqld is locked up, and other steps may be required. zmcontrol would not be able to handle all of this logic internally due to the needs for additional external monitoring outside of the zmcontrol process, including return code from the script.
RFE that was made about Zimbra adjusting the our script and the wait time:
* "mysql.server should wait longer to stop mysqld"
** http://bugzilla.zimbra.com/show_bug.cgi?id=52490


===Good Summary For RHEL Clustering===
===Good Summary For RHEL Clustering===
----


This is a good solid summary about RHEL clustering:
This is a good solid summary about RHEL clustering:
Line 17: Line 263:


===Active-Active Clustering===
===Active-Active Clustering===
----


There is a bug(rfe) for active-active configuration. Please see:
There is a bug(rfe) for active-active configuration. Please see:
Line 23: Line 271:


===Non-San Based Fail Over HA/Cluster Type Configuration===
===Non-San Based Fail Over HA/Cluster Type Configuration===
----


This RFE covers issues when your wanting a "copy" of the data to reside on an independent server - LAN/WAN.
This RFE covers issues when your wanting a "copy" of the data to reside on an independent server - LAN/WAN.
Line 29: Line 279:
*"Disaster recovery through server to server sync (beta)"
*"Disaster recovery through server to server sync (beta)"
** http://bugzilla.zimbra.com/show_bug.cgi?id=11423
** http://bugzilla.zimbra.com/show_bug.cgi?id=11423
*"HA/DR through Log Shipping"
** http://bugzilla.zimbra.com/show_bug.cgi?id=42231


===RFE's/Bug Related To Supporting Clustering Options===
===RFE's/Bug Related To Supporting Clustering Options===
----


* "Add VCS cluster support for Suse ES 10"
* "Add VCS cluster support for Suse ES 10"
Line 48: Line 302:
* "Add clustering support for Mac OS X"
* "Add clustering support for Mac OS X"
** http://bugzilla.zimbra.com/show_bug.cgi?id=23731
** http://bugzilla.zimbra.com/show_bug.cgi?id=23731
===Other Clustering RFE's And Bugs===
----
* "RFE: How-to for adding additional mounts on existing cluster deployment"
** http://bugzilla.zimbra.com/show_bug.cgi?id=48141
* "RFE: Expand documentation or installer for 1+1 to X+1"
** http://bugzilla.zimbra.com/show_bug.cgi?id=52507


===HA-Linux (Heartbeat)===
===HA-Linux (Heartbeat)===


References:
----


{{:Ajcody-Notes-HA-Linux-How-To}}
{{:Ajcody-Notes-HA-Linux-How-To}}
----
[[Category: Community Sandbox]]
[[Category: Author:Ajcody]]
[[Category: Zeta Alliance]]

Latest revision as of 15:37, 20 June 2016

Clustering Topics

   KB 2700        Last updated on 2016-06-20  




0.00
(0 votes)
24px ‎  - This is Zeta Alliance Certified Documentation. The content has been tested by the Community.


Actual Clustering Topics Homepage


Please see Ajcody-Clustering

My Other Clustering Pages


RFE I made based upon the experience above:

Clustering For ZCS 8 And Above


ZCS 8 Specifics

From the ZCS 8.0.5 Release Notes:

Red Hat Cluster Suite is not available with ZCS 8.0 To streamline support efforts, we will only test and certify in house availability solutions. Today VMware offers clustering failover and automated recovery through VMware HA. Zimbra integrates with the VMware HA cluster infrastructure to heartbeat Zimbra application services and provide automated recover in the event of a service failure. (Bugs 72215/72216)

If you use Zimbra Clustering

Zimbra Clustering is no longer available for ZCS 8.0. VMware provides integrated high availability between Zimbra and VMware HA for automated recovery of critical Zimbra application services and server components in the event of an application or infrastructure failure. Third party solutions such as network load balancers, storage mirroring, or OS clustering solutions like Red Hat Cluster Suite may be used in your deployment, but are not specifically tested or certified by VMware. (Bug75821)

Statement On Zimbra Support And "Clustering" For ZCS 8+ [In My Own Words - Ajcody]

I've sent this off to our PM team to review and to provide formal comments and documentation updates on it. - Ajcody

Official and Full support are NOT terms Zimbra Support applies to Redhat RHCS or Vmware's Clustering options since our team does not support either of those two options directly. We do not support customers in configuring or trouble-shooting the clustering components for Redhat's RHCS or Vmware's Clustering.

Will and can RHCS and Vmware's Clustering work with ZCS, yes. Does our team offer support in configuring, trouble-shooting, or diagnosing the clustering component of the setup - no. You would need to contact Redhat if you went with RHEL and RHCS or Vmware and your Linux OS distribution support channel if you went with Vmware Clustering.

Note - The zimbra clustering zimlets are completely removed with ZCS 8.6 . The Vmware clustering monitoring scripts are a separate issues that falls outside of general clustering support.

Future Releases After ZCS 8

Please see the following blog about our Always ON, Carrier Grade Architecture HA features we are hoping to have in future releases:

Vmware Virtualization and Clustering


Overview

From the ZCS 8.0.5 Release Notes:

Red Hat Cluster Suite is not available with ZCS 8.0 To streamline support efforts, we will only test and certify in house availability solutions. Today VMware offers clustering failover and automated recovery through VMware HA. Zimbra integrates with the VMware HA cluster infrastructure to heartbeat Zimbra application services and provide automated recover in the event of a service failure. (Bugs 72215/72216)

VMware Heartbeat Service

ZCS utilizes the VMware application programming interface that allows software providers to deploy application monitoring components inside a VMware guest OS and inform VMware HA when problems arise. The VMware-heartbeat service provides information to the VMware HA components on the health and availability of the ZCS. If you have VMware HA components installed, the VMware HA service is shown as enabled on the Server>Services page.

General Description of VMware HA

VMware HA provides a simple, reliable way to increase the availability of virtual machines hosting critical applications. VMware HA is a visualization- based distributed infrastructure service of VMware vSphere 4.1+, which monitors the health of virtual machines and the VMware ESX® hosts upon which they reside. If a fault is detected, the virtual machine is automatically restarted on another ESX host with adequate capacity to host it. VMware HA is included in all vSphere editions and can be enabled on a VMware cluster with a single check box. As VMware HA utilizes the storage and network connectivity already in place to support vMotion, enabling high availability is as simple as ensuring you have adequate server capacity to handle failure of one or more ESX hosts.

ZCS Scripts For Vmware-HA

The three scripts for vmware-ha are:

  • /opt/zimbra/bin/zmhactl
  • /opt/zimbra/libexec/vmware-heartbeat
  • /opt/zimbra/libexec/vmware-appmonitor

We have an open documentation RFE to better explain these :

Some older bugs describing a little more about the vmware-heartbeat check:

Enable Vmware-HA Service

To confirm the service is available to enable:

[As the zimbra user]

zmprov gs `zmhostname` zimbraServiceInstalled | grep vmware-ha
zimbraServiceInstalled: vmware-ha

To enable vmware-ha on a server:

[As the zimbra user]

su - zimbra
zmprov ms `zmhostname` +zimbraServiceEnabled vmware-ha
zmhactl start

Vmware Performance Recommendations For Vmware Clustering

See

VMware Host Based Replication (HBR)

Questions should be addressed to Vmware Support. This feature happens outside of Zimbra and is not QA tested by us.

Private Bug [Referenced here in case it's later made public]:


Other References

See also:

Webinar On Virtualizing Zimbra with VMware vSphere and NetApp NFS

Learn how the combination of VMware vSphere and NetApp file-based storage provides full support for virtualization and all of vSphere's advanced features (vMotion, Storage vMotion, DRS), simplifies high availability through the use of VMware HA and server consolidation.

View Webinar Now (Duration: 45 minutes)

Critical Bugs/RFE's - False Restarts And So Forth

Log Rotation Causes Cluster Failover

See:

Recommendations to work around bug until fix is released - 2 methods:

  • First Method - disable log rotation
    • Remove the zmmtaconfigctl restart from the /etc/logrotate.d/zimbra file so that it will not attempt to restart the service that is being detected as down. We can remove the line that reads:
su - zimbra -c "/opt/zimbra/bin/zmmtaconfigctl restart"
    • That will stop these failures, but will affect the logging for zmmtaconfigctl, probably causing it to write to a nonexistent file. We haven't seen problems in zmmtaconfig for a long time, so this is a pretty low risk workaround.
  • Second Method - disable software monitoring in general

Software Monitoring Causes Problems

  • To Disable software monitoring.
    • This will prevent failover if zmcluctl finds a service down. It will not prevent failover if there is a hardware fault detected by the cluster software.
      • $zmcluctl mode hardware
      • or
      • $zmlocalconfig -e zimbra_cluster_mode=hardware
      • Note from bug/rfe - Added switch for software or hardware only based fail-over implemented as a lc zimbra_cluster_mode attribute. default mode is software meaning any non-zero exit status from zmcontrol status will trigger a failover. hardware mode means that zmcluctl status will always return a zero exit status and will not consult zmcontrol status for the state.
    • To also increase the chance of getting more information in the log on what might be going on:
      • In /opt/zimbra-cluster/bin/zmcluctl you should see a line like:
        • my @output = `su - zimbra -c 'zmcontrol status'`;
      • Change that to:
        • my @output = `su - zimbra -c 'date >> /opt/zimbra/log/zmcluster-status.log ; zmcontrol status >> /opt/zimbra/log/zmcluster-status.log 2>> /opt/zimbra/log/zmcluster-status.log'`;
      • That should give us more logging. I believe zmcluctl is read every time from disk when it does the check, so no restart of services should be needed.

RHEL 5 Clusters And Cisco Switches

Please see the following:

Mysql Related Items Impacting Cluster

Please see:

Other Misc Bug/RFEs

Other bugs/rfe's you might be interested in looking at:

Failover Occurring Prior To Mysql Shutting Down - RHCS

Here is the summary of the situation that was sent for a case on the matter.

Regarding RHCS desired configuration -
In particular, it looks like mysqld can at times take quite a long time to shutdown, especially in situations when write IO is limited on the mysql db partition, and mysqld must write cache data to disk safely before shutting down. We believe this should be reflected in the "rc=1" output from "zmcontrol stop" - if mysqld does not stop within the 60 seconds via /opt/zimbra/bin/zmcontrol (via /opt/zimbra/bin/mysql.server), and then what appears to be happening is that RHCS either tries to unmount the filesystem and/or just kill -KILL the running process, which in turns causes mysqld to die ungracefully and cause the mysqld startup on the other box to go through a long log replay. The net result is one of two scenarios:
(a) mysqld takes longer to startup than it would have taken to shutdown (normal shutdown can take 0-10 minutes if mysqld has to flush a bunch of data to write out)
(b) mysqld is still running when RHCS tries to unmount the filesystem, which then fails, and therefore the failover fails
What we'd like to be able to recommend is a more robust failover logic, such as:
1. Run zmcontrol stop, check output
2. If failed, wait X seconds (e.g., 120 seconds)
3. Run zmcontrol stop again
4. If failed, wait X seconds
5. Run zmcontrol third time
6. If failed third time, try a kill -KILL and force unmount
If RHCS can be configured to do this, it would be the best way to handle this situation. The problem with modifying zmcontrol to have this logic internally is that it should not be granted the decision to wait indefinitely, in the case that mysqld is having a critical problem and is not shutting down. Having an external process manage this logic allows for additional triangulation on the determination of the situation, and allows additional capabilities such as monitoring the files in /opt/zimbra/db/data to confirm that files are changing. For example, if mysqld is still running and no files have changed in /opt/zimbra/db/data for a significant amount of time, it may indicate that mysqld is locked up, and other steps may be required. zmcontrol would not be able to handle all of this logic internally due to the needs for additional external monitoring outside of the zmcontrol process, including return code from the script.

RFE that was made about Zimbra adjusting the our script and the wait time:

Good Summary For RHEL Clustering


This is a good solid summary about RHEL clustering:

http://www.linuxjournal.com/article/9759

Active-Active Clustering


There is a bug(rfe) for active-active configuration. Please see:

http://bugzilla.zimbra.com/show_bug.cgi?id=19700

Non-San Based Fail Over HA/Cluster Type Configuration


This RFE covers issues when your wanting a "copy" of the data to reside on an independent server - LAN/WAN.

Please see:

RFE's/Bug Related To Supporting Clustering Options


Other Clustering RFE's And Bugs


HA-Linux (Heartbeat)



HA-Linux How-To For Testing And Educational Use

References:


Actual HA-Linux How-To For Testing And Educational Use Homepage

Please see Ajcody-Notes-HA-Linux-How-To

Motive Behind How-To

I hope this gives an easy way to setup through some clustering concepts for an administrator to gain some real-world experience when they currently have none. I plan on walking through each "function" that is behind clustering rather than jumping to an end setup (Linux-HA, Shared Storage, And Zimbra).

The structure will be:

  • Setup two machines (physical or virtual)
    • Emphasis physical hostname / ip vs. the hostname and ip address that will be for HA.
  • Setup virtual hostname and ip address for HA.
    • Explain and do ip failover between the two machines.
  • Setup a disk mount, we'll use probably use a nfs export from a third machine.
    • This will give us an example of expanding the HA conf files to move beyond the ip address failover.
    • Adjust HA conf's to now export via nfs a local directory from each server. This will not be a shared physical disk of course.
  • Setup a shared disk between the two servers and include it in the HA conf files.
    • Can use drbd or maybe figure out a way to share a virtual disk between the two vm's.
  • Setup a very simple application to include between the two machines. Something like apache or cups.
  • Go back and now readjust all variables between monitoring type (automatic) failover and simple manually initiated.


Jump to: navigation, search