Ajcody-Clustering: Difference between revisions
Line 18: | Line 18: | ||
* [[Ajcody-Notes-Of-Customer-Cluster-Upgrade]] | * [[Ajcody-Notes-Of-Customer-Cluster-Upgrade]] | ||
RFE I made based | RFE I made based upon the experience above: | ||
* "QA & Corrections For Cluster Upgrade Document" | * "QA & Corrections For Cluster Upgrade Document" | ||
** http://bugzilla.zimbra.com/show_bug.cgi?id=31026 | ** http://bugzilla.zimbra.com/show_bug.cgi?id=31026 |
Revision as of 19:21, 12 March 2010
- This article is NOT official Zimbra documentation. It is a user contribution and may include unsupported customizations, references, suggestions, or information. |
Clustering Topics
Actual Clustering Topics Homepage
Please see Ajcody-Clustering
My Other Clustering Pages
- Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores
- Ajcody-Notes-Of-Customer-Cluster-Upgrade
RFE I made based upon the experience above:
- "QA & Corrections For Cluster Upgrade Document"
Critical Bugs/RFE's - False Restarts And So Forth
Log Rotation Causes Cluster Failover
See:
- "Log rotation causes cluster failover"
Recommendations to work around bug until fix is released - 2 methods:
- First Method - disable log rotation
- Remove the zmmtaconfigctl restart from the /etc/logrotate.d/zimbra file so that it will not attempt to restart the service that is being detected as down. We can remove the line that reads:
su - zimbra -c "/opt/zimbra/bin/zmmtaconfigctl restart"
- That will stop these failures, but will affect the logging for zmmtaconfigctl, probably causing it to write to a nonexistent file. We haven't seen problems in zmmtaconfig for a long time, so this is a pretty low risk workaround.
- Second Method - disable software monitoring in general
Software Monitoring Causes Problems
- "Request Modification to zmcluctl to support hardware only failover with Redhat Cluster Manager"
- http://bugzilla.zimbra.com/show_bug.cgi?id=25456
- Note: Beginning in ZCS 6.0, hardware only failover with Redhat Cluster Manager is supported.
- http://bugzilla.zimbra.com/show_bug.cgi?id=25456
- To Disable software monitoring.
- This will prevent failover if zmcluctl finds a service down. It will not prevent failover if there is a hardware fault detected by the cluster software.
- To disable it, check /opt/zimbra-cluster/bin/zmcluctl and find the 'status' section. It's the last of the three (start, stop, status). You will need to find the lines that read 'exit($rc);' and change them to read 'exit(0);'.
- To also increase the chance of getting more information in the log on what might be going on:
- In /opt/zimbra-cluster/bin/zmcluctl you should see a line like:
- my @output = `su - zimbra -c 'zmcontrol status'`;
- Change that to:
- my @output = `su - zimbra -c 'date >> /opt/zimbra/log/zmcluster-status.log ; zmcontrol status >> /opt/zimbra/log/zmcluster-status.log 2>> /opt/zimbra/log/zmcluster-status.log'`;
- That should give us more logging. I believe zmcluctl is read every time from disk when it does the check, so no restart of services should be needed.
- In /opt/zimbra-cluster/bin/zmcluctl you should see a line like:
RHEL 5 Clusters And Cisco Switches
Please see the following:
- "Openais appears to fail, causing cluster member to fence"
- https://bugzilla.redhat.com/show_bug.cgi?id=469874
- The last comment mentions a cisco issue being the cause [cisco switches are used internally for ibm blades].
- Comment 9 states most likely hardware configuration issue w/ switch or iptables.
- https://bugzilla.redhat.com/show_bug.cgi?id=469874
- "Cman kills first node in initial cluster setup"
Other Misc Bug/RFEs
Other bugs/rfe's you might be interested in looking at:
- "zmcluctl status can return errors even when services are up"
Good Summary For RHEL Clustering
This is a good solid summary about RHEL clustering:
http://www.linuxjournal.com/article/9759
Active-Active Clustering
There is a bug(rfe) for active-active configuration. Please see:
http://bugzilla.zimbra.com/show_bug.cgi?id=19700
Non-San Based Fail Over HA/Cluster Type Configuration
This RFE covers issues when your wanting a "copy" of the data to reside on an independent server - LAN/WAN.
Please see:
- "Disaster recovery through server to server sync (beta)"
RFE's/Bug Related To Supporting Clustering Options
- "Add VCS cluster support for Suse ES 10"
- http://bugzilla.zimbra.com/show_bug.cgi?id=24303
- SuSE Clustering resources - These might be useful.
- "Cluster Configuration on SLES"
- "Clustering Your Novell Groupwise Servers - Xen & Heartbeat2 on SLES"
- "SuSE paper about HA and Virtual Servers - PDF"
- What SLES has for clustering - Linux Virtual Server
- "Add support for otther cluster software lifekeeper and mc/service guard"
- "Add clustering support for Mac OS X"
HA-Linux (Heartbeat)
HA-Linux How-To For Testing And Educational Use
References:
- HA-Linux Project Homepage
- Howto: Highly available Zimbra cluster using Heartbeat and DRBD
- DRBD Homepage
- DRBD currently unsupported at this time.
- 1. "disaster recovery through server to server sync (beta)"
- 2. "add active-active support to zcs" , marked as a dup of the above.
- DRBD currently unsupported at this time.
Actual HA-Linux How-To For Testing And Educational Use Homepage
Please see Ajcody-Notes-HA-Linux-How-To
Motive Behind How-To
I hope this gives an easy way to setup through some clustering concepts for an administrator to gain some real-world experience when they currently have none. I plan on walking through each "function" that is behind clustering rather than jumping to an end setup (Linux-HA, Shared Storage, And Zimbra).
The structure will be:
- Setup two machines (physical or virtual)
- Emphasis physical hostname / ip vs. the hostname and ip address that will be for HA.
- Setup virtual hostname and ip address for HA.
- Explain and do ip failover between the two machines.
- Setup a disk mount, we'll use probably use a nfs export from a third machine.
- This will give us an example of expanding the HA conf files to move beyond the ip address failover.
- Adjust HA conf's to now export via nfs a local directory from each server. This will not be a shared physical disk of course.
- Setup a shared disk between the two servers and include it in the HA conf files.
- Can use drbd or maybe figure out a way to share a virtual disk between the two vm's.
- Setup a very simple application to include between the two machines. Something like apache or cups.
- Go back and now readjust all variables between monitoring type (automatic) failover and simple manually initiated.