Ajcody-Notes-Of-Customer-Cluster-Upgrade: Difference between revisions

mNo edit summary
mNo edit summary
 
(54 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The following are some notes I took while following the steps listed in [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores|Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]] with a customer.
{{Archive}}


* Configuration of customers site:
=Context Of Notes=
** 2 LDAP servers
** 3 MTA's
** 3 Node Cluster
*** 2 are separate mailstores
*** 1 is the failover for either of the two mailstores


* RFE (Internal/Website): Zimbra Homepage should have Document link tab
'''This was for ZCS 6 I believe, it was when ZCS had it's own internal HA processes/scripts.'''
** HomePage should have more prominent placement for a documentation button. “Tab” next to the Products & Support in the “Community, Forums, Products, Support, Partners, About, Buy” area.


* RFE (Internal/Website): Download page should state binary good for 32 & 64 bit
The following are some notes I took while following the steps listed in [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores|Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]] with a customer. I have not created any bugs or RFE's from this yet.
** Download links should state binary is good for 32 and 64 bit.
*** http://www.zimbra.com/products/downloads.html


* RFE (Documentation) Cross reference of cluster install/upgrade documents and options
Bug filed for Documentation and QA issues:
** References to "upgrade" documents and "rolling upgrade" within the cluster documentation as well as the Release Notes (Upgrade section)
* "QA & Corrections For Cluster Upgrade Document"
*** Also, statement about multi-server configurations in regards to 32/64 bit. Mixed platform type throughout servers.
** http://bugzilla.zimbra.com/show_bug.cgi?id=31026


* RFE (Documentation) Add comments in upgrade document about the precautionary backup.
==Configuration of customers site==
** Comment in documents about when one can start backup in regards to the precautions of backing them up.
* 2 LDAP servers
*** Some full backups can take hours & hours and if customer waits until backups completely finish first they might not be able to upgrade in their downtime windows that their constrained by. Giving advice on how to optimize this would be helpful.
* 3 MTA's
* 3 Node Cluster
** 2 are separate mailstores
** 1 is the failover for either of the two mailstores


* RFE (Documentation Updates & QA needed for below) New Document For Upgrade Process Planning
** Please see the following:
*** http://wiki.zimbra.com/index.php?title=Ajcody-Server-Topics#How_Do_I_Make_Sure_I_Don.27t_Lose_Emails_During_Upgrade.3F_If_I_Need_To_Fail_Back_To_Old_Version.3F


* RFE (Documentation – Cluster Upgrade) Clarity around steps in upgrade doc for “ip addr del”
==RFE (Internal/Website): Zimbra Homepage should have Document link tab==
** Give better clarity to the process of bringing down the production ip (ip addr del) in the upgrade document. Ip that the cluster service is using. And reference that the ip/interface will come back when the cluster is re-enabled.
* HomePage should have more prominent placement for a documentation button. “Tab” next to the Products & Support in the “Community, Forums, Products, Support, Partners, About, Buy” area.
** When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:
*** <pre>zmcontrol stop</pre>
*** <pre>ip addr del <service ip> dev eth0</pre>
*** <pre>umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com></pre>


* RFE (Documentation) or Bug: Error output during upgrade about keystore
** During upgrade on ldap, received this message:
** Got this message:
*** <pre>cp: cannot stat `/opt/zimbra/.saveconfig/keystore': No such file or directory</pre>
*** <pre>chown: cannot access `/opt/zimbra/conf/keystore': No such file or directory</pre>
** Upgrade progressed past it though.


* RFE (Documentation) Upgrade documents don’t mention the MTA Auth Host variable and give guidance.
==RFE (Internal/Website): Download page should state binary good for Clusters as well==
** No explanation in upgrade doc's in regards to this:
* Download links should state binary is good for cluster installs as well.
***  <pre>MTA Auth Host. ** marked unconfigured</pre>
** http://www.zimbra.com/products/downloads.html
** Though, the single server cluster install document did mention:
* I already have an internal RFE submitted to include service pack version of the OS that we QA against and/or support.
*** zimbra-mta - MTA Auth host:
*** Change the MTA’s auth host name to the cluster service host (mail.example.com)


* RFE (Documentation) Doc’s might mention saslauthd errors one might encounter do to different upgrade process one picks
** MTA upgrade's had:
*** <pre>Starting mta...FAILED</pre>
*** <pre>saslauthd[8779] :set_auth_mech  : failed to initialize mechanism zimbra</pre>
*** <pre>zmsaslauthdctl failed to start</pre>
** (caused by iptables on mta's - resolved when upgrades done and iptables opened again [connection to ldap])
*** Customer used iptables to block delivery of mail during upgrade.


* RFE (Documentation) Upgrade documentation doesn’t mention about “cluster service name” and explanation based upon different cluster configurations.
==RFE (Documentation) Cross reference of cluster install/upgrade documents and options==
** Documentation/upgrade confusion (upgrade on cluster/mailstore):
* References to "upgrade" documents and "rolling upgrade" within the cluster documentation as well as the Release Notes (Upgrade section)
*** Customer asks, “Why does the script ask for the cluster service name this box is running as? Shouldn't it know, or be able to find out that information?“
** Also, statement about multi-server configurations in regards to 32/64 bit. Mixed platform type throughout servers.
** <pre>Enter the active cluster service name for this node: [maila.DOMAIN.edu] zclusta.DOMAIN.edu</pre>
** http://bugzilla.zimbra.com/show_bug.cgi?id=28982
*** “cluster install should default to a valid cluster service name”


* RFE (Documentation) or Bug ? Upgrade can’t determine the number of users.
===Dev Comments===
** Documentation upgrade script concern:
::: <pre>Warning: Could not determine the number of users on this system.
::: If you exceed the number of licensed users (18000) then you will not be able to create new users.
::: Do you wish to continue? [N]</pre>


* RFE (Documentation) or Bug ? Upgrade output sometimes give odd error when checking version
This isn't appropriate for a cluster upgrade document. Familiarity with ZCS upgrade guides and releasenotes should be prereq to cluster upgrades.
** Documentation upgrade script concerning:
 
***
==RFE (Documentation) Add comments in upgrade document about the precautionary backup.==
<pre>Insert non-formatted text here</nowiki>Checking 5.0.5_GA
* Comment in documents about when one can start backup in regards to the precautions of backing them up.
** Some full backups can take hours & hours and if customer waits until backups completely finish first they might not be able to upgrade in their downtime windows that their constrained by. Giving advice on how to optimize this would be helpful.
 
===Dev Comments===
 
This isn't appropriate for a cluster upgrade document. Timing is going to vary greatly by deployment size, customers should be responsible for their operational time frames.
 
==RFE (Documentation Updates & QA needed for below) New Document For Upgrade Process Planning==
* Please see the following:
** http://wiki.zimbra.com/index.php?title=Ajcody-Server-Topics#How_Do_I_Make_Sure_I_Don.27t_Lose_Emails_During_Upgrade.3F_If_I_Need_To_Fail_Back_To_Old_Version.3F
 
 
==RFE (Documentation – Cluster Upgrade) Clarity around steps in upgrade doc for “ip addr del”==
* Give better clarity to the process of bringing down the production ip (ip addr del) in the upgrade document. Ip that the cluster service is using. And reference that the ip/interface will come back when the cluster is re-enabled.
* When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:
<pre>
zmcontrol stop
ip addr del <service ip> dev eth0
umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com>
</pre>
 
===Dev Comments===
 
The active node upgrade script handles this automatically.  Page 2 Step 2 can be removed from both upgrade guides.
 
==RFE (Documentation) or Bug: Error output during upgrade about keystore==
* During upgrade on ldap, received this message:
* Got this message:
<pre>
cp: cannot stat `/opt/zimbra/.saveconfig/keystore': No such file or directory
chown: cannot access `/opt/zimbra/conf/keystore': No such file or directory
</pre>
 
* Upgrade progressed past it though.
 
===Dev Comments===
 
This was fixed in 5.0.11 upgrades and is not directly related to cluster installs.
 
==RFE (Documentation) Upgrade documents don’t mention the MTA Auth Host variable and give guidance.==
* No explanation in upgrade doc's in regards to this:
**  <pre>MTA Auth Host. ** marked unconfigured</pre>
* Though, the single server cluster install document did mention:
** zimbra-mta - MTA Auth host:
** Change the MTA’s auth host name to the cluster service host (mail.example.com)
 
===Dev Comments===
 
This is a user error.  There is a clear warning during the intial install that they need to correct zimbraMtaAuthHost after they install their first mailstore.
 
==RFE (Documentation) Doc’s might mention saslauthd errors one might encounter do to different upgrade process one picks==
* MTA upgrade's had:
 
<pre>
saslauthd[8779] :set_auth_mech  : failed to initialize mechanism zimbra
zmsaslauthdctl failed to start
</pre>
 
* (caused by iptables on mta's - resolved when upgrades done and iptables opened again [connection to ldap])
** Customer used iptables to block delivery of mail during upgrade.
 
===Dev Comments===
 
This is user error.  We can not possible anticipate all the inappropriate things customers do.
 
==RFE (Documentation) Upgrade documentation doesn’t mention about “cluster service name” and explanation based upon different cluster configurations.==
* Documentation/upgrade confusion (upgrade on cluster/mailstore):
** Customer asks, “Why does the script ask for the cluster service name this box is running as? Shouldn't it know, or be able to find out that information?“
* <pre>Enter the active cluster service name for this node: [maila.DOMAIN.edu] zclusta.DOMAIN.edu</pre>
* http://bugzilla.zimbra.com/show_bug.cgi?id=28982
** “cluster install should default to a valid cluster service name”
 
 
==RFE (Documentation) or Bug ? Upgrade can’t determine the number of users.==
* Documentation upgrade script concern:
<pre>
Warning: Could not determine the number of users on this system.
If you exceed the number of licensed users (18000) then you will not be able to create new users.
Do you wish to continue? [N]
</pre>
 
===Dev Comments===
 
Please file a separate bug for this.
 
==RFE (Documentation) or Bug ? Upgrade output sometimes give odd error when checking version==
* Documentation upgrade script concerning:
<pre>
Insert non-formatted text here</nowiki>Checking 5.0.5_GA
Updating from 5.0.5_GA
Updating from 5.0.5_GA
ERROR: account.NO_SUCH_SERVER (no such server: maila.DOMAIN.edu)
ERROR: account.NO_SUCH_SERVER (no such server: maila.DOMAIN.edu)
Line 80: Line 134:
</pre>
</pre>
** It continues on with the process and then it finished the configuration...


* RFE (Documentation) Upgrade document doesn’t mention or explain “Choose a Zimbra service to upgrade”
* It continues on with the process and then it finished the configuration...
** Documentation upgrade script concern (not documented at all in the upgrade doc,):
 
**
===Dev Comments===
<pre> Setting up zimbra crontab...done.
 
  Moving /tmp/zmsetup.01001900-000011271728.log to /opt/zimbra/log
This is likely misconfiguration on the end-users part.  localconfig zimbra_server_hostname should be set to the cluster service name not the local hostname.
  Configuration complete - press return to exit  
 
  Choose a Zimbra service to upgrade:
==RFE (Documentation) Upgrade document doesn’t mention or explain “Choose a Zimbra service to upgrade”==
    1) zclusta.DOMAIN.edu
* Documentation upgrade script concern (not documented at all in the upgrade doc,):
    2) zclustb.DOMAIN.edu
<pre>
  Choose from above (1-2):
Setting up zimbra crontab...done.
Moving /tmp/zmsetup.01001900-000011271728.log to /opt/zimbra/log
Configuration complete - press return to exit  
Choose a Zimbra service to upgrade:
1) zclusta.DOMAIN.edu
2) zclustb.DOMAIN.edu
Choose from above (1-2):
</pre>
</pre>
* This is VERY confusing and has no documentation references in upgrade document. Get this prompt on all nodes (2 active and 1 failover [fail for either of the active ones])
* This is VERY confusing and has no documentation references in upgrade document. Get this prompt on all nodes (2 active and 1 failover [fail for either of the active ones])


* RFE (Documentation) Errors concerning “Choose a Zimbra service to upgrade” part of the upgrade.
==RFE (Documentation) Errors concerning “Choose a Zimbra service to upgrade” part of the upgrade.==
** Customer says, “Is this supposed to happen? The only way you can "stop" installer script is crtl-c”
* Customer says, “Is this supposed to happen? The only way you can "stop" installer script is crtl-c”
*** <pre>
<pre>
  Choose a Zimbra service to upgrade:
Choose a Zimbra service to upgrade:
    1) zclusta.DOMAIN.edu
1) zclusta.DOMAIN.edu
    2) zclustb.DOMAIN.edu
2) zclustb.DOMAIN.edu
  Choose from above (1-2): 2
Choose from above (1-2): 2
  ... OCF_RESKEY_service_name=zclustb.DOMAIN.edu /opt/zimbra-cluster/bin/zmcluctl stop
... OCF_RESKEY_service_name=zclustb.DOMAIN.edu /opt/zimbra-cluster/bin/zmcluctl stop
  ... ip addr del 199.17.81.81 dev eth0
... ip addr del 199.17.81.81 dev eth0
  RTNETLINK answers: Cannot assign requested address
RTNETLINK answers: Cannot assign requested address
  ... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf
  umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf: not found
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf: not found
    ... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data
  umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data: not found
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data: not found
  ... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index
  umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index: not found
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index: not found
  ... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log
  umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log: not found
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log: not found
  ... umount</pre>
... umount</pre>
** Customer, “And then we're back at the prompt. Now we're thoroughly confused, and nervous.”
* Customer, “And then we're back at the prompt. Now we're thoroughly confused, and nervous.”
 
===Dev Comments===
 
Choose the service name for the active cluster node.


* RFE (Documentation – Cluster Upgrade) Upgrade document should state execute as root or zimbra
==RFE (Documentation – Cluster Upgrade) Upgrade document should state execute as root or zimbra==
** Doesn't reference if commands should be done as root and/or zimbra.
* Doesn't reference if commands should be done as root and/or zimbra.
** <pre>
<pre>
2. When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:  
2. When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:  
   a. zmcontrol stop  
   a. zmcontrol stop  
   b. ip addr del <service ip> dev eth0  
   b. ip addr del <service ip> dev eth0  
   c. umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com>
   c. umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com></pre>
</pre>
 
===Dev Comments===
 
Step is no longer necessary and can be removed from the documentation.


* RFE (Documentation – Clusters) Upgrade doc’s should include necessary commands to check status of cluster.
==RFE (Documentation – Clusters) Upgrade doc’s should include necessary commands to check status of cluster.==
** Documentation should include reference to the command – clustat - to confirm status of nodes before moving onto next node upgrade
* Documentation should include reference to the command – clustat - to confirm status of nodes before moving onto next node upgrade
*** <pre>[root@maila ~]# clustat
<pre>
[root@maila ~]# clustat
  Member Status: Quorate
  Member Status: Quorate
   
   
   Member Name                              Status
   Member Name                              Status
  ------ ----                              ------
------ ----                              ------
   maila.DOMAIN.edu                        Online, Local, rgmanager
   maila.DOMAIN.edu                        Online, Local, rgmanager
  mailb.DOMAIN.edu                        Online, rgmanager
mailb.DOMAIN.edu                        Online, rgmanager
   mailc.DOMAIN.edu                        Online, rgmanager
   mailc.DOMAIN.edu                        Online, rgmanager
 
   Service Name        Owner (Last)                  State         
   Service Name        Owner (Last)                  State         
  ------- ----        ----- ------                  -----         
------- ----        ----- ------                  -----         
   zclusta.DOMAIN.edu  (maila.DOMAIN.edu)            disabled         
   zclusta.DOMAIN.edu  (maila.DOMAIN.edu)            disabled         
   zclustb.DOMAIN.edu  mailb.DOMAIN.edu              started
   zclustb.DOMAIN.edu  mailb.DOMAIN.edu              started
Line 143: Line 210:
</pre>
</pre>


* RFE (Documentation) or Bug – Following
===Dev Comments===
** Documentation - ip addr del caused cluster reboot:
 
*** <pre>su - zimbra -c 'zmcontrol stop'
This is covered under, "After all active node and standby node upgrades are complete".
ip addr del 199.17.81.70 dev bond1</pre>
** then we lost connection, and the machine rebooted.


* RFE (Documentation) Documentation should mention log files to view while upgrade process is ongoing
==RFE (Documentation) or Bug – Following upgrade document instructions for ip addr del, the cluster node reboots==
** Documentation - reference log for clustering activity:
* Documentation - ip addr del caused cluster reboot:
<pre>
su - zimbra -c 'zmcontrol stop'
ip addr del 199.17.81.70 dev bond1
</pre>
* then we lost connection, and the machine rebooted.
 
===Dev Comments===
 
Log into the boxes via the admin interfaces not the service IP's.  This is clearly end-user error.  Also bond devices are not supported, there maybe an RFE for that already.
 
==RFE (Documentation) Documentation should mention log files to view while upgrade process is ongoing==
* Documentation - reference log for clustering activity:
** By default, all cluster messages go into the RHEL log messages file (/var/log/messages) unless /etc/syslog.conf shows differently - like [ daemon.* /var/log/cluster ]
** By default, all cluster messages go into the RHEL log messages file (/var/log/messages) unless /etc/syslog.conf shows differently - like [ daemon.* /var/log/cluster ]
**  /var/log/messages  /var/log/zimbra.log /opt/zimbra/log/mailbox.log & zmmailboxd.log, and /tmp/zmsetup.log
**  /var/log/messages  /var/log/zimbra.log /opt/zimbra/log/mailbox.log & zmmailboxd.log, and /tmp/zmsetup.log


* RFE (Documentation) Documentation for basic cluster usage and config files. For administrator’s that are taking over administration of cluster and might not have basic skill sets.
===Dev Comments===
 
This is covered by the normal ZCS upgrade guide (prerequisite reading) and is not necessary to duplicate in the cluster upgrade guide.
 
==RFE (Documentation) Documentation for basic cluster usage and config files.==
* For administrator’s that are taking over administration of cluster and might not have basic skill sets.
** Make reference to /etc/cluster and basic comment about config file.
** Make reference to /etc/cluster and basic comment about config file.


* FOLLOW-UP with customer of this case
===Dev Comments===
** Follow up with customer, logger running on right node:
 
*** Customer, "It wants to install logger. Logger should only be on maila.. That is where the db-data mount point is allocated for the logger store."
RHCS configuration and administration is not within the scope of the ZCS cluster upgrade guide or administration guides. 
Recommend customer become familiar with RHCS documentation and/or training before taking over administration responsbilities.


* RFE or BUG – Install script update, active or standby cluster service name?
==FOLLOW-UP with customer of this case==
* Follow up with customer, logger running on right node:
** Customer, "It wants to install logger. Logger should only be on maila.. That is where the db-data mount point is allocated for the logger store."
 
 
==RFE or BUG – Install script update, active or standby cluster service name?==
* Script update:  
* Script update:  
** <pre>./install.sh --cluster standby
<pre>
./install.sh --cluster standby
  Installing a cluster standby node.
  Installing a cluster standby node.
Enter the active cluster service name for this node: [mailc.DOMAIN.edu]</pre>
Enter the active cluster service name for this node: [mailc.DOMAIN.edu]
</pre>
* Should it say "Enter the standby cluster service name"? This line is confusing to administrator.
* Should it say "Enter the standby cluster service name"? This line is confusing to administrator.


* RFE or BUG – Install script update, output warning about backups being invalidated before continuation.
===Dev Comments===
 
Please file a separate bug.
 
==RFE or BUG – Install script update, output warning about backups being invalidated before continuation==
* Script update:
* Script update:
** Before upgrade has started, it should output warning about older backups being invalidated.
** Before upgrade has started, it should output warning about older backups being invalidated.
*** http://files.zimbra.com/website/docs/ZCS_Cluster_Upgrade_for_Multi-Node_Configuration_from_5.0.x_to_5.0.x.pdf
*** http://files.zimbra.com/website/docs/ZCS_Cluster_Upgrade_for_Multi-Node_Configuration_from_5.0.x_to_5.0.x.pdf
**** <pre>"Important:  After the upgrade, run a full backup immediately! Changes in this release invalidate all old backups.
**** '''"Important:  After the upgrade, run a full backup immediately! Changes in this release invalidate all old backups. In 5.0.x you can restore from an older minor version backup (i.e. an older 5.0.x backup, but not a 4.5.x backup). There was a bug about zmrestore not handling db schema changes, but that had been fixed in 5.0.5.”'''
  In 5.0.x you can restore from an older minor version backup (i.e. an older 5.0.x backup, but not a 4.5.x backup).
*** Incorrect upgrade documentation regarding backups
  There was a bug about zmrestore not handling db schema changes, but that had been fixed in 5.0.5.”</pre>
**** See - http://bugzilla.zimbra.com/show_bug.cgi?id=30218
*** See - http://bugzilla.zimbra.com/show_bug.cgi?id=30218
*** See - http://bugzilla.zimbra.com/show_bug.cgi?id=26804,
*** See - http://bugzilla.zimbra.com/show_bug.cgi?id=26804,
*** See - http://bugzilla.zimbra.com/show_bug.cgi?id=26624#c5
*** See - http://bugzilla.zimbra.com/show_bug.cgi?id=26624#c5
Line 181: Line 274:
**** See - http://bugzilla.zimbra.com/show_bug.cgi?id=27819
**** See - http://bugzilla.zimbra.com/show_bug.cgi?id=27819


* RFE (Documentation – Cluster Upgrade) – Documentation should show how to confirm failover steps as verifications before finalizing last steps to bringing everything back into production status.
===Dev Comments===
** Documentation - confirm failover steps:
 
** try relocating service on node to standby
This is covered in the upgrade guide as the first and last step. Upgrade script clearly indicates that backups are moved when incompatible versions are detected.
*** <pre>clusvcsadm -r zclusta.DOMAIN.edu -m mailc.DOMAIN.edu
 
==RFE (Documentation – Cluster Upgrade) – Documentation should show how to confirm failover==
* Documentation - should show how to confirm failover steps as verifications before finalizing last steps to bringing everything back into production status.:
* try relocating service on node to standby
<pre>
clusvcsadm -r zclusta.DOMAIN.edu -m mailc.DOMAIN.edu
clustat
clustat
clustat -i 1</pre>
clustat -i 1
** And then back and forth between standby and each active. End up with services back on primary nodes when done.
</pre>
* And then back and forth between standby and each active. End up with services back on primary nodes when done.


* RFE (Documentation – Cluster Upgrade – My wiki page steps)
===Dev Comments===
 
This is covered under Testing the Cluster Set up.  Additional detail can be found in the RHCS guides.
 
==RFE (Documentation – Cluster Upgrade – My wiki page steps)==
* Double check my wiki page about these steps/checks.
* Double check my wiki page about these steps/checks.
** Once store is done, do backups, and then remove ipchains, open access, start service (etc.) on mta's
** [[Ajcody-Notes-Upgrade-Options#Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores|Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores]]
** Confirm cert issues are resolved once mta's can speak to ldap servers
*** Once store is done, do backups, and then remove ipchains, open access, start service (etc.) on mta's
*** Confirm cert issues are resolved once mta's can speak to ldap servers
 
----


[[Category: Community Sandbox]]
[[Category: Community Sandbox]]
[[Category:Upgrade]]
[[Category:Cluster]]
[[Category: Author:Ajcody]]
[[Category: Zeta Alliance]]

Latest revision as of 01:28, 21 June 2016


Context Of Notes

This was for ZCS 6 I believe, it was when ZCS had it's own internal HA processes/scripts.

The following are some notes I took while following the steps listed in Upgrade_Steps_for_Multi-Servers_with_Clustered_Mailstores with a customer. I have not created any bugs or RFE's from this yet.

Bug filed for Documentation and QA issues:

Configuration of customers site

  • 2 LDAP servers
  • 3 MTA's
  • 3 Node Cluster
    • 2 are separate mailstores
    • 1 is the failover for either of the two mailstores


RFE (Internal/Website): Zimbra Homepage should have Document link tab

  • HomePage should have more prominent placement for a documentation button. “Tab” next to the Products & Support in the “Community, Forums, Products, Support, Partners, About, Buy” area.


RFE (Internal/Website): Download page should state binary good for Clusters as well

  • Download links should state binary is good for cluster installs as well.
  • I already have an internal RFE submitted to include service pack version of the OS that we QA against and/or support.


RFE (Documentation) Cross reference of cluster install/upgrade documents and options

  • References to "upgrade" documents and "rolling upgrade" within the cluster documentation as well as the Release Notes (Upgrade section)
    • Also, statement about multi-server configurations in regards to 32/64 bit. Mixed platform type throughout servers.

Dev Comments

This isn't appropriate for a cluster upgrade document. Familiarity with ZCS upgrade guides and releasenotes should be prereq to cluster upgrades.

RFE (Documentation) Add comments in upgrade document about the precautionary backup.

  • Comment in documents about when one can start backup in regards to the precautions of backing them up.
    • Some full backups can take hours & hours and if customer waits until backups completely finish first they might not be able to upgrade in their downtime windows that their constrained by. Giving advice on how to optimize this would be helpful.

Dev Comments

This isn't appropriate for a cluster upgrade document. Timing is going to vary greatly by deployment size, customers should be responsible for their operational time frames.

RFE (Documentation Updates & QA needed for below) New Document For Upgrade Process Planning


RFE (Documentation – Cluster Upgrade) Clarity around steps in upgrade doc for “ip addr del”

  • Give better clarity to the process of bringing down the production ip (ip addr del) in the upgrade document. Ip that the cluster service is using. And reference that the ip/interface will come back when the cluster is re-enabled.
  • When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following:
 zmcontrol stop
 ip addr del <service ip> dev eth0
 umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com>

Dev Comments

The active node upgrade script handles this automatically. Page 2 Step 2 can be removed from both upgrade guides.

RFE (Documentation) or Bug: Error output during upgrade about keystore

  • During upgrade on ldap, received this message:
  • Got this message:
cp: cannot stat `/opt/zimbra/.saveconfig/keystore': No such file or directory
chown: cannot access `/opt/zimbra/conf/keystore': No such file or directory
  • Upgrade progressed past it though.

Dev Comments

This was fixed in 5.0.11 upgrades and is not directly related to cluster installs.

RFE (Documentation) Upgrade documents don’t mention the MTA Auth Host variable and give guidance.

  • No explanation in upgrade doc's in regards to this:
    • MTA Auth Host. ** marked unconfigured
  • Though, the single server cluster install document did mention:
    • zimbra-mta - MTA Auth host:
    • Change the MTA’s auth host name to the cluster service host (mail.example.com)

Dev Comments

This is a user error. There is a clear warning during the intial install that they need to correct zimbraMtaAuthHost after they install their first mailstore.

RFE (Documentation) Doc’s might mention saslauthd errors one might encounter do to different upgrade process one picks

  • MTA upgrade's had:
saslauthd[8779] :set_auth_mech   : failed to initialize mechanism zimbra
zmsaslauthdctl failed to start
  • (caused by iptables on mta's - resolved when upgrades done and iptables opened again [connection to ldap])
    • Customer used iptables to block delivery of mail during upgrade.

Dev Comments

This is user error. We can not possible anticipate all the inappropriate things customers do.

RFE (Documentation) Upgrade documentation doesn’t mention about “cluster service name” and explanation based upon different cluster configurations.

  • Documentation/upgrade confusion (upgrade on cluster/mailstore):
    • Customer asks, “Why does the script ask for the cluster service name this box is running as? Shouldn't it know, or be able to find out that information?“
  • Enter the active cluster service name for this node: [maila.DOMAIN.edu] zclusta.DOMAIN.edu
  • http://bugzilla.zimbra.com/show_bug.cgi?id=28982
    • “cluster install should default to a valid cluster service name”


RFE (Documentation) or Bug ? Upgrade can’t determine the number of users.

  • Documentation upgrade script concern:
Warning: Could not determine the number of users on this system.
If you exceed the number of licensed users (18000) then you will not be able to create new users.
Do you wish to continue? [N]

Dev Comments

Please file a separate bug for this.

RFE (Documentation) or Bug ? Upgrade output sometimes give odd error when checking version

  • Documentation upgrade script concerning:
Insert non-formatted text here</nowiki>Checking 5.0.5_GA
Updating from 5.0.5_GA
ERROR: account.NO_SUCH_SERVER (no such server: maila.DOMAIN.edu)
Checking 5.0.6_GA
Updating from 5.0.6_GA
Checking 5.0.7_GA
…
  • It continues on with the process and then it finished the configuration...

Dev Comments

This is likely misconfiguration on the end-users part. localconfig zimbra_server_hostname should be set to the cluster service name not the local hostname.

RFE (Documentation) Upgrade document doesn’t mention or explain “Choose a Zimbra service to upgrade”

  • Documentation upgrade script concern (not documented at all in the upgrade doc,):
Setting up zimbra crontab...done.
Moving /tmp/zmsetup.01001900-000011271728.log to /opt/zimbra/log
Configuration complete - press return to exit 
Choose a Zimbra service to upgrade:
1) zclusta.DOMAIN.edu
2) zclustb.DOMAIN.edu
Choose from above (1-2):
  • This is VERY confusing and has no documentation references in upgrade document. Get this prompt on all nodes (2 active and 1 failover [fail for either of the active ones])

RFE (Documentation) Errors concerning “Choose a Zimbra service to upgrade” part of the upgrade.

  • Customer says, “Is this supposed to happen? The only way you can "stop" installer script is crtl-c”
Choose a Zimbra service to upgrade:
 1) zclusta.DOMAIN.edu
 2) zclustb.DOMAIN.edu
Choose from above (1-2): 2
... OCF_RESKEY_service_name=zclustb.DOMAIN.edu /opt/zimbra-cluster/bin/zmcluctl stop
... ip addr del 199.17.81.81 dev eth0
RTNETLINK answers: Cannot assign requested address
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/conf: not found
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/db/data: not found
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/index: not found
... umount /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log
umount: /opt/zimbra-cluster/mountpoints/zclustb.DOMAIN.edu/log: not found
... umount
  • Customer, “And then we're back at the prompt. Now we're thoroughly confused, and nervous.”

Dev Comments

Choose the service name for the active cluster node.

RFE (Documentation – Cluster Upgrade) Upgrade document should state execute as root or zimbra

  • Doesn't reference if commands should be done as root and/or zimbra.
2. When the active node has been upgraded, stop the server, remove the service IP and the mount. Type the following: 
  a. zmcontrol stop 
  b. ip addr del <service ip> dev eth0 
  c. umount /opt/zimbra-cluster/mountpoints/<clusterservicename.com>

Dev Comments

Step is no longer necessary and can be removed from the documentation.

RFE (Documentation – Clusters) Upgrade doc’s should include necessary commands to check status of cluster.

  • Documentation should include reference to the command – clustat - to confirm status of nodes before moving onto next node upgrade
[root@maila ~]# clustat
 Member Status: Quorate
 
  Member Name                              Status
 ------ ----                              ------
  maila.DOMAIN.edu                        Online, Local, rgmanager
 mailb.DOMAIN.edu                        Online, rgmanager
  mailc.DOMAIN.edu                        Online, rgmanager

  Service Name         Owner (Last)                   State         
 ------- ----         ----- ------                   -----         
  zclusta.DOMAIN.edu  (maila.DOMAIN.edu)            disabled        
  zclustb.DOMAIN.edu  mailb.DOMAIN.edu              started
 No iscsi mounts mounted.

Dev Comments

This is covered under, "After all active node and standby node upgrades are complete".

RFE (Documentation) or Bug – Following upgrade document instructions for ip addr del, the cluster node reboots

  • Documentation - ip addr del caused cluster reboot:
su - zimbra -c 'zmcontrol stop'
 ip addr del 199.17.81.70 dev bond1
  • then we lost connection, and the machine rebooted.

Dev Comments

Log into the boxes via the admin interfaces not the service IP's. This is clearly end-user error. Also bond devices are not supported, there maybe an RFE for that already.

RFE (Documentation) Documentation should mention log files to view while upgrade process is ongoing

  • Documentation - reference log for clustering activity:
    • By default, all cluster messages go into the RHEL log messages file (/var/log/messages) unless /etc/syslog.conf shows differently - like [ daemon.* /var/log/cluster ]
    • /var/log/messages /var/log/zimbra.log /opt/zimbra/log/mailbox.log & zmmailboxd.log, and /tmp/zmsetup.log

Dev Comments

This is covered by the normal ZCS upgrade guide (prerequisite reading) and is not necessary to duplicate in the cluster upgrade guide.

RFE (Documentation) Documentation for basic cluster usage and config files.

  • For administrator’s that are taking over administration of cluster and might not have basic skill sets.
    • Make reference to /etc/cluster and basic comment about config file.

Dev Comments

RHCS configuration and administration is not within the scope of the ZCS cluster upgrade guide or administration guides. Recommend customer become familiar with RHCS documentation and/or training before taking over administration responsbilities.

FOLLOW-UP with customer of this case

  • Follow up with customer, logger running on right node:
    • Customer, "It wants to install logger. Logger should only be on maila.. That is where the db-data mount point is allocated for the logger store."


RFE or BUG – Install script update, active or standby cluster service name?

  • Script update:
./install.sh --cluster standby
 Installing a cluster standby node.
Enter the active cluster service name for this node: [mailc.DOMAIN.edu]
  • Should it say "Enter the standby cluster service name"? This line is confusing to administrator.

Dev Comments

Please file a separate bug.

RFE or BUG – Install script update, output warning about backups being invalidated before continuation

Dev Comments

This is covered in the upgrade guide as the first and last step. Upgrade script clearly indicates that backups are moved when incompatible versions are detected.

RFE (Documentation – Cluster Upgrade) – Documentation should show how to confirm failover

  • Documentation - should show how to confirm failover steps as verifications before finalizing last steps to bringing everything back into production status.:
  • try relocating service on node to standby
clusvcsadm -r zclusta.DOMAIN.edu -m mailc.DOMAIN.edu
clustat
clustat -i 1
  • And then back and forth between standby and each active. End up with services back on primary nodes when done.

Dev Comments

This is covered under Testing the Cluster Set up. Additional detail can be found in the RHCS guides.

RFE (Documentation – Cluster Upgrade – My wiki page steps)


Jump to: navigation, search