Ajcody-Notes-Of-Customer-Cluster-Upgrade: Difference between revisions

mNo edit summary
mNo edit summary
Line 1: Line 1:
# Context Of Notes
=Context Of Notes=
The following are some notes I took while following the steps listed in [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores|Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]] with a customer.
The following are some notes I took while following the steps listed in [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores|Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]] with a customer.


----
----


## Configuration of customers site:
==Configuration of customers site==
** 2 LDAP servers
** 2 LDAP servers
** 3 MTA's
** 3 MTA's
Line 13: Line 13:
----
----


## RFE (Internal/Website): Zimbra Homepage should have Document link tab
==RFE (Internal/Website): Zimbra Homepage should have Document link tab==
** HomePage should have more prominent placement for a documentation button. “Tab” next to the Products & Support in the “Community, Forums, Products, Support, Partners, About, Buy” area.
** HomePage should have more prominent placement for a documentation button. “Tab” next to the Products & Support in the “Community, Forums, Products, Support, Partners, About, Buy” area.


----
----


## RFE (Internal/Website): Download page should state binary good for 32 & 64 bit
==RFE (Internal/Website): Download page should state binary good for 32 & 64 bit==
** Download links should state binary is good for 32 and 64 bit.
** Download links should state binary is good for 32 and 64 bit.
*** http://www.zimbra.com/products/downloads.html
*** http://www.zimbra.com/products/downloads.html
Line 24: Line 24:
----
----


## RFE (Documentation) Cross reference of cluster install/upgrade documents and options
==RFE (Documentation) Cross reference of cluster install/upgrade documents and options==
** References to "upgrade" documents and "rolling upgrade" within the cluster documentation as well as the Release Notes (Upgrade section)
** References to "upgrade" documents and "rolling upgrade" within the cluster documentation as well as the Release Notes (Upgrade section)
*** Also, statement about multi-server configurations in regards to 32/64 bit. Mixed platform type throughout servers.
*** Also, statement about multi-server configurations in regards to 32/64 bit. Mixed platform type throughout servers.
Line 30: Line 30:
----
----


## RFE (Documentation) Add comments in upgrade document about the precautionary backup.  
==RFE (Documentation) Add comments in upgrade document about the precautionary backup.==
** Comment in documents about when one can start backup in regards to the precautions of backing them up.
** Comment in documents about when one can start backup in regards to the precautions of backing them up.
*** Some full backups can take hours & hours and if customer waits until backups completely finish first they might not be able to upgrade in their downtime windows that their constrained by. Giving advice on how to optimize this would be helpful.
*** Some full backups can take hours & hours and if customer waits until backups completely finish first they might not be able to upgrade in their downtime windows that their constrained by. Giving advice on how to optimize this would be helpful.
Line 36: Line 36:
----
----


## RFE (Documentation Updates & QA needed for below) New Document For Upgrade Process Planning
==RFE (Documentation Updates & QA needed for below) New Document For Upgrade Process Planning==
** Please see the following:
** Please see the following:
*** http://wiki.zimbra.com/index.php?title=Ajcody-Server-Topics#How_Do_I_Make_Sure_I_Don.27t_Lose_Emails_During_Upgrade.3F_If_I_Need_To_Fail_Back_To_Old_Version.3F
*** http://wiki.zimbra.com/index.php?title=Ajcody-Server-Topics#How_Do_I_Make_Sure_I_Don.27t_Lose_Emails_During_Upgrade.3F_If_I_Need_To_Fail_Back_To_Old_Version.3F
Line 42: Line 42:
----
----


## RFE (Documentation – Cluster Upgrade) Clarity around steps in upgrade doc for “ip addr del”
==RFE (Documentation – Cluster Upgrade) Clarity around steps in upgrade doc for “ip addr del”==
** Give better clarity to the process of bringing down the production ip (ip addr del) in the upgrade document. Ip that the cluster service is using. And reference that the ip/interface will come back when the cluster is re-enabled.
** Give better clarity to the process of bringing down the production ip (ip addr del) in the upgrade document. Ip that the cluster service is using. And reference that the ip/interface will come back when the cluster is re-enabled.
** When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:  
** When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:  
Line 51: Line 51:
----
----


## RFE (Documentation) or Bug: Error output during upgrade about keystore
==RFE (Documentation) or Bug: Error output during upgrade about keystore==
** During upgrade on ldap, received this message:
** During upgrade on ldap, received this message:
** Got this message:
** Got this message:
Line 60: Line 60:
----
----


## RFE (Documentation) Upgrade documents don’t mention the MTA Auth Host variable and give guidance.
==RFE (Documentation) Upgrade documents don’t mention the MTA Auth Host variable and give guidance.==
** No explanation in upgrade doc's in regards to this:
** No explanation in upgrade doc's in regards to this:
***  <pre>MTA Auth Host. ** marked unconfigured</pre>
***  <pre>MTA Auth Host. ** marked unconfigured</pre>
Line 69: Line 69:
----
----


## RFE (Documentation) Doc’s might mention saslauthd errors one might encounter do to different upgrade process one picks
==RFE (Documentation) Doc’s might mention saslauthd errors one might encounter do to different upgrade process one picks==
** MTA upgrade's had:
** MTA upgrade's had:
*** <pre>Starting mta...FAILED</pre>
*** <pre>Starting mta...FAILED</pre>
Line 79: Line 79:
----
----


## RFE (Documentation) Upgrade documentation doesn’t mention about “cluster service name” and explanation based upon different cluster configurations.
==RFE (Documentation) Upgrade documentation doesn’t mention about “cluster service name” and explanation based upon different cluster configurations.==
** Documentation/upgrade confusion (upgrade on cluster/mailstore):
** Documentation/upgrade confusion (upgrade on cluster/mailstore):
*** Customer asks, “Why does the script ask for the cluster service name this box is running as? Shouldn't it know, or be able to find out that information?“
*** Customer asks, “Why does the script ask for the cluster service name this box is running as? Shouldn't it know, or be able to find out that information?“
Line 87: Line 87:




## RFE (Documentation) or Bug ? Upgrade can’t determine the number of users.
==RFE (Documentation) or Bug ? Upgrade can’t determine the number of users.==
** Documentation upgrade script concern:
** Documentation upgrade script concern:
:::<pre>
:::<pre>
Line 96: Line 96:
----
----


## RFE (Documentation) or Bug ? Upgrade output sometimes give odd error when checking version
==RFE (Documentation) or Bug ? Upgrade output sometimes give odd error when checking version==
** Documentation upgrade script concerning:
** Documentation upgrade script concerning:
:::<pre>
:::<pre>
Line 110: Line 110:
----
----


## RFE (Documentation) Upgrade document doesn’t mention or explain “Choose a Zimbra service to upgrade”
==RFE (Documentation) Upgrade document doesn’t mention or explain “Choose a Zimbra service to upgrade”==
** Documentation upgrade script concern (not documented at all in the upgrade doc,):
** Documentation upgrade script concern (not documented at all in the upgrade doc,):
:::<pre>
:::<pre>
Line 124: Line 124:
----
----


## RFE (Documentation) Errors concerning “Choose a Zimbra service to upgrade” part of the upgrade.
==RFE (Documentation) Errors concerning “Choose a Zimbra service to upgrade” part of the upgrade.==
** Customer says, “Is this supposed to happen? The only way you can "stop" installer script is crtl-c”
** Customer says, “Is this supposed to happen? The only way you can "stop" installer script is crtl-c”
:::<pre>
:::<pre>
Line 147: Line 147:
----
----


## RFE (Documentation – Cluster Upgrade) Upgrade document should state execute as root or zimbra
==RFE (Documentation – Cluster Upgrade) Upgrade document should state execute as root or zimbra==
** Doesn't reference if commands should be done as root and/or zimbra.
** Doesn't reference if commands should be done as root and/or zimbra.
:::<pre>
:::<pre>
Line 157: Line 157:
----
----


## RFE (Documentation – Clusters) Upgrade doc’s should include necessary commands to check status of cluster.
==RFE (Documentation – Clusters) Upgrade doc’s should include necessary commands to check status of cluster.==
** Documentation should include reference to the command – clustat - to confirm status of nodes before moving onto next node upgrade
** Documentation should include reference to the command – clustat - to confirm status of nodes before moving onto next node upgrade
:::<pre>
:::<pre>
Line 177: Line 177:
----
----


## RFE (Documentation) or Bug – Following  
==RFE (Documentation) or Bug – Following upgrade document instructions for ip addr del, the cluster node reboots==
** Documentation - ip addr del caused cluster reboot:
** Documentation - ip addr del caused cluster reboot:
:::<pre>
:::<pre>
Line 186: Line 186:
----
----


* RFE (Documentation) Documentation should mention log files to view while upgrade process is ongoing
==RFE (Documentation) Documentation should mention log files to view while upgrade process is ongoing==
** Documentation - reference log for clustering activity:
** Documentation - reference log for clustering activity:
** By default, all cluster messages go into the RHEL log messages file (/var/log/messages) unless /etc/syslog.conf shows differently - like [ daemon.* /var/log/cluster ]
** By default, all cluster messages go into the RHEL log messages file (/var/log/messages) unless /etc/syslog.conf shows differently - like [ daemon.* /var/log/cluster ]
Line 193: Line 193:
----
----


* RFE (Documentation) Documentation for basic cluster usage and config files. For administrator’s that are taking over administration of cluster and might not have basic skill sets.
==RFE (Documentation) Documentation for basic cluster usage and config files.==
* For administrator’s that are taking over administration of cluster and might not have basic skill sets.
** Make reference to /etc/cluster and basic comment about config file.
** Make reference to /etc/cluster and basic comment about config file.


----
----


* FOLLOW-UP with customer of this case
==FOLLOW-UP with customer of this case==
** Follow up with customer, logger running on right node:
** Follow up with customer, logger running on right node:
*** Customer, "It wants to install logger. Logger should only be on maila.. That is where the db-data mount point is allocated for the logger store."
*** Customer, "It wants to install logger. Logger should only be on maila.. That is where the db-data mount point is allocated for the logger store."
Line 204: Line 205:
----
----


* RFE or BUG – Install script update, active or standby cluster service name?
==RFE or BUG – Install script update, active or standby cluster service name?==
* Script update:  
* Script update:  
** <pre>./install.sh --cluster standby
** <pre>./install.sh --cluster standby
Line 213: Line 214:
----
----


* RFE or BUG – Install script update, output warning about backups being invalidated before continuation.
==RFE or BUG – Install script update, output warning about backups being invalidated before continuation==
* Script update:
* Script update:
** Before upgrade has started, it should output warning about older backups being invalidated.
** Before upgrade has started, it should output warning about older backups being invalidated.
Line 228: Line 229:
----
----


* RFE (Documentation – Cluster Upgrade) – Documentation should show how to confirm failover steps as verifications before finalizing last steps to bringing everything back into production status.
==RFE (Documentation – Cluster Upgrade) – Documentation should show how to confirm failover steps as verifications before finalizing last steps to bringing everything back into production status.==
** Documentation - confirm failover steps:
** Documentation - confirm failover steps:
** try relocating service on node  to standby
** try relocating service on node  to standby
Line 238: Line 239:
----
----


* RFE (Documentation – Cluster Upgrade – My wiki page steps)
==RFE (Documentation – Cluster Upgrade – My wiki page steps)==
* Double check my wiki page about these steps/checks.
* Double check my wiki page about these steps/checks.
** Once store is done, do backups, and then remove ipchains, open access, start service (etc.) on mta's
** Once store is done, do backups, and then remove ipchains, open access, start service (etc.) on mta's

Revision as of 21:47, 14 August 2008

Context Of Notes

The following are some notes I took while following the steps listed in Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores with a customer.


Configuration of customers site

    • 2 LDAP servers
    • 3 MTA's
    • 3 Node Cluster
      • 2 are separate mailstores
      • 1 is the failover for either of the two mailstores

RFE (Internal/Website): Zimbra Homepage should have Document link tab

    • HomePage should have more prominent placement for a documentation button. “Tab” next to the Products & Support in the “Community, Forums, Products, Support, Partners, About, Buy” area.

RFE (Internal/Website): Download page should state binary good for 32 & 64 bit


RFE (Documentation) Cross reference of cluster install/upgrade documents and options

    • References to "upgrade" documents and "rolling upgrade" within the cluster documentation as well as the Release Notes (Upgrade section)
      • Also, statement about multi-server configurations in regards to 32/64 bit. Mixed platform type throughout servers.

RFE (Documentation) Add comments in upgrade document about the precautionary backup.

    • Comment in documents about when one can start backup in regards to the precautions of backing them up.
      • Some full backups can take hours & hours and if customer waits until backups completely finish first they might not be able to upgrade in their downtime windows that their constrained by. Giving advice on how to optimize this would be helpful.

RFE (Documentation Updates & QA needed for below) New Document For Upgrade Process Planning


RFE (Documentation – Cluster Upgrade) Clarity around steps in upgrade doc for “ip addr del”

    • Give better clarity to the process of bringing down the production ip (ip addr del) in the upgrade document. Ip that the cluster service is using. And reference that the ip/interface will come back when the cluster is re-enabled.
    • When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:
      • zmcontrol stop
      • ip addr del <service ip> dev eth0
      • umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com>

RFE (Documentation) or Bug: Error output during upgrade about keystore

    • During upgrade on ldap, received this message:
    • Got this message:
      • cp: cannot stat `/opt/zimbra/.saveconfig/keystore': No such file or directory
      • chown: cannot access `/opt/zimbra/conf/keystore': No such file or directory
    • Upgrade progressed past it though.

RFE (Documentation) Upgrade documents don’t mention the MTA Auth Host variable and give guidance.

    • No explanation in upgrade doc's in regards to this:
      • MTA Auth Host. ** marked unconfigured
    • Though, the single server cluster install document did mention:
      • zimbra-mta - MTA Auth host:
      • Change the MTA’s auth host name to the cluster service host (mail.example.com)

RFE (Documentation) Doc’s might mention saslauthd errors one might encounter do to different upgrade process one picks

    • MTA upgrade's had:
      • Starting mta...FAILED
      • saslauthd[8779] :set_auth_mech   : failed to initialize mechanism zimbra
      • zmsaslauthdctl failed to start
    • (caused by iptables on mta's - resolved when upgrades done and iptables opened again [connection to ldap])
      • Customer used iptables to block delivery of mail during upgrade.

RFE (Documentation) Upgrade documentation doesn’t mention about “cluster service name” and explanation based upon different cluster configurations.

    • Documentation/upgrade confusion (upgrade on cluster/mailstore):
      • Customer asks, “Why does the script ask for the cluster service name this box is running as? Shouldn't it know, or be able to find out that information?“
    • Enter the active cluster service name for this node: [maila.DOMAIN.edu] zclusta.DOMAIN.edu
    • http://bugzilla.zimbra.com/show_bug.cgi?id=28982
      • “cluster install should default to a valid cluster service name”


RFE (Documentation) or Bug ? Upgrade can’t determine the number of users.

    • Documentation upgrade script concern:
Warning: Could not determine the number of users on this system.
If you exceed the number of licensed users (18000) then you will not be able to create new users.
Do you wish to continue? [N]

RFE (Documentation) or Bug ? Upgrade output sometimes give odd error when checking version

    • Documentation upgrade script concerning:
Insert non-formatted text here</nowiki>Checking 5.0.5_GA
Updating from 5.0.5_GA
ERROR: account.NO_SUCH_SERVER (no such server: maila.DOMAIN.edu)
Checking 5.0.6_GA
Updating from 5.0.6_GA
Checking 5.0.7_GA
    • It continues on with the process and then it finished the configuration...

RFE (Documentation) Upgrade document doesn’t mention or explain “Choose a Zimbra service to upgrade”

    • Documentation upgrade script concern (not documented at all in the upgrade doc,):
Setting up zimbra crontab...done.
Moving /tmp/zmsetup.01001900-000011271728.log to /opt/zimbra/log
Configuration complete - press return to exit
Choose a Zimbra service to upgrade:
1) zclusta.DOMAIN.edu
2) zclustb.DOMAIN.edu
Choose from above (1-2):
    • This is VERY confusing and has no documentation references in upgrade document. Get this prompt on all nodes (2 active and 1 failover [fail for either of the active ones])

RFE (Documentation) Errors concerning “Choose a Zimbra service to upgrade” part of the upgrade.

    • Customer says, “Is this supposed to happen? The only way you can "stop" installer script is crtl-c”
Choose a Zimbra service to upgrade:
1) zclusta.DOMAIN.edu
2) zclustb.DOMAIN.edu
Choose from above (1-2): 2
... OCF_RESKEY_service_name=zclustb.DOMAIN.edu /opt/zimbra-cluster/bin/zmcluctl stop
... ip addr del 199.17.81.81 dev eth0
RTNETLINK answers: Cannot assign requested address
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf: not found
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data: not found
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index: not found
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log: not found
... umount
    • Customer, “And then we're back at the prompt. Now we're thoroughly confused, and nervous.”

RFE (Documentation – Cluster Upgrade) Upgrade document should state execute as root or zimbra

    • Doesn't reference if commands should be done as root and/or zimbra.
2. When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:
a. zmcontrol stop
b. ip addr del <service ip> dev eth0
c. umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com>

RFE (Documentation – Clusters) Upgrade doc’s should include necessary commands to check status of cluster.

    • Documentation should include reference to the command – clustat - to confirm status of nodes before moving onto next node upgrade
[root@maila ~]# clustat
Member Status: Quorate
Member Name Status
------ ---- ------
maila.DOMAIN.edu Online, Local, rgmanager
mailb.DOMAIN.edu Online, rgmanager
mailc.DOMAIN.edu Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
zclusta.DOMAIN.edu (maila.DOMAIN.edu) disabled
zclustb.DOMAIN.edu mailb.DOMAIN.edu started
No iscsi mounts mounted.

RFE (Documentation) or Bug – Following upgrade document instructions for ip addr del, the cluster node reboots

    • Documentation - ip addr del caused cluster reboot:
su - zimbra -c 'zmcontrol stop'
ip addr del 199.17.81.70 dev bond1
    • then we lost connection, and the machine rebooted.

RFE (Documentation) Documentation should mention log files to view while upgrade process is ongoing

    • Documentation - reference log for clustering activity:
    • By default, all cluster messages go into the RHEL log messages file (/var/log/messages) unless /etc/syslog.conf shows differently - like [ daemon.* /var/log/cluster ]
    • /var/log/messages /var/log/zimbra.log /opt/zimbra/log/mailbox.log & zmmailboxd.log, and /tmp/zmsetup.log

RFE (Documentation) Documentation for basic cluster usage and config files.

  • For administrator’s that are taking over administration of cluster and might not have basic skill sets.
    • Make reference to /etc/cluster and basic comment about config file.

FOLLOW-UP with customer of this case

    • Follow up with customer, logger running on right node:
      • Customer, "It wants to install logger. Logger should only be on maila.. That is where the db-data mount point is allocated for the logger store."

RFE or BUG – Install script update, active or standby cluster service name?

  • Script update:
    • ./install.sh --cluster standby
Installing a cluster standby node.

Enter the active cluster service name for this node: [mailc.DOMAIN.edu]

  • Should it say "Enter the standby cluster service name"? This line is confusing to administrator.

RFE or BUG – Install script update, output warning about backups being invalidated before continuation

 In 5.0.x you can restore from an older minor version backup (i.e. an older 5.0.x backup, but not a 4.5.x backup).  

There was a bug about zmrestore not handling db schema changes, but that had been fixed in 5.0.5.”


RFE (Documentation – Cluster Upgrade) – Documentation should show how to confirm failover steps as verifications before finalizing last steps to bringing everything back into production status.

    • Documentation - confirm failover steps:
    • try relocating service on node to standby
      • clusvcsadm -r zclusta.DOMAIN.edu -m mailc.DOMAIN.edu

clustat

clustat -i 1

    • And then back and forth between standby and each active. End up with services back on primary nodes when done.

RFE (Documentation – Cluster Upgrade – My wiki page steps)

  • Double check my wiki page about these steps/checks.
    • Once store is done, do backups, and then remove ipchains, open access, start service (etc.) on mta's
    • Confirm cert issues are resolved once mta's can speak to ldap servers
Jump to: navigation, search