|
|
Line 1: |
Line 1: |
− | <div class="col-md-12"><br></div>
| + | #REDIRECT [[Zimbra_NG_Modules/Zimbra_NG_HSM/Item_Deduplication]] |
− | <div class="col-md-12"><br></div>
| |
− | <ol class="breadcrumb">
| |
− | <li>[[Main Page|Zimbra Wiki]]</li>
| |
− | <li>[[Zimbra_Next_Generation_Modules]]</li>
| |
− | <li>[[Zimbra_NG_HSM]]</li>
| |
− | <li class="active">Zimbra NG HSM - Item Deduplication</li>
| |
− | </ol>
| |
− | __NOTOC__
| |
− | <div class="col-md-12"><br /></div>
| |
− | <div class="col-md-9">
| |
− | <h2 class="title-header" style="padding-bottom: 9px; border-bottom: 4px solid #0087c3;">Zimbra NG HSM - Item Deduplication</h2>
| |
− | <div class="col-md-12">
| |
− | <div class="ibox-content">
| |
− | <div class="post animated fadeInLeft animation-delay-8" style="padding-top:5px">
| |
− | <div class="panel panel-default">
| |
− | <div class="panel-body">
| |
− | <div class="row">
| |
− | == What is Item Deduplication ==
| |
− | Item Deduplication is a technique that allows to save disk space by storing a single copy of an item and referencing it multiple times instead of storing multiple copies of the same item and referencing each copy only once.
| |
− | | |
− | This might seem a minor improvement, in theory, but in practical use can make a huge difference. Think about that user, the one that improperly sends nice and unnecessary 15Mb "motivational" or "funny" presentations to a-hundred-and-something-recipient-all-in-the-"to:"-field.
| |
− | | |
− | === Item Deduplication in Zimbra ===
| |
− | Item Deduplication is performed by Zimbra at the moment of storing a new item in the [[Zimbra_Next_Generation_Modules/Zimbra_NG_HSM/Zimbra_Stores|Primary Volume]].
| |
− | | |
− | When a new item is being created its "message ID" is compared to a list of cached items, and in case of a match a hardlink to the cached message's BLOB is created instead of a whole new BLOB for the message.
| |
− | | |
− | The dedupe cache is managed in Zimbra 8 through the following config attributes:
| |
− | | |
− | '''zimbraPrefDedupeMessagesSentToSelf'''
| |
− | | |
− | Used to set the deduplication behaviour for sent-to-self messages.
| |
− | <pre>
| |
− | <attr id="144" name="zimbraPrefDedupeMessagesSentToSelf" type="enum" value="dedupeNone,secondCopyifOnToOrCC,dedupeAll" cardinality="single"
| |
− | optionalIn="account,cos" flags="accountInherited,domainAdminModifiable">
| |
− | <defaultCOSValue>dedupeNone</defaultCOSValue>
| |
− | <desc>dedupeNone|secondCopyIfOnToOrCC|moveSentMessageToInbox|dedupeAll</desc>
| |
− | </attr>
| |
− | </pre>
| |
− | | |
− | '''zimbraMessageIdDedupeCacheSize'''
| |
− | | |
− | Number of cached Message IDs.
| |
− | <pre>
| |
− | <attr id="334" name="zimbraMessageIdDedupeCacheSize" type="integer" cardinality="single" optionalIn="globalConfig" min="0">
| |
− | <globalConfigValue>3000</globalConfigValue>
| |
− | <desc>
| |
− | Number of Message-Id header values to keep in the LMTP dedupe cache.
| |
− | Subsequent attempts to deliver a message with a matching Message-Id
| |
− | to the same mailbox will be ignored. A value of 0 disables deduping.
| |
− | </desc>
| |
− | </attr>
| |
− | </pre>
| |
− | | |
− | '''zimbraPrefMessageIdDedupingEnabled'''
| |
− | | |
− | Manage deduplication at Account or COS-level.
| |
− | <pre>
| |
− | <attr id="1198" name="zimbraPrefMessageIdDedupingEnabled" type="boolean" cardinality="single" optionalIn="account,cos" flags="accountInherited"
| |
− | since="8.0.0">
| |
− | <defaultCOSValue>TRUE</defaultCOSValue>
| |
− | <desc>
| |
− | Account-level switch that enables message deduping. See zimbraMessageIdDedupeCacheSize for more details.
| |
− | </desc>
| |
− | </attr>
| |
− | </pre>
| |
− | | |
− | ''' zimbraMessageIdDedupeCacheTimeout '''
| |
− | | |
− | Timeout for each entry in the dedupe cache.
| |
− | <pre>
| |
− | <attr id="1340" name="zimbraMessageIdDedupeCacheTimeout" type="duration" cardinality="single" optionalIn="globalConfig" since="7.1.4">
| |
− | <globalConfigValue>0</globalConfigValue>
| |
− | <desc>
| |
− | Timeout for a Message-Id entry in the LMTP dedupe cache. A value of 0 indicates no timeout.
| |
− | zimbraMessageIdDedupeCacheSize limit is ignored when this is set to a non-zero value.
| |
− | </desc>
| |
− | </attr>
| |
− | </pre>
| |
− | (older Zimbra versions might use different attributes or lack some of them)
| |
− | | |
− | == Item Deduplication and Zimbra NG HSM ==
| |
− | The Zimbra NG HSM module features a "doDeduplicate" operation that parses a target volume to find and deduplicate any duplicated item.
| |
− | | |
− | Doing so you will save even more disk space, as while Zimbra's automatic deduplication is bound to a limited cache, Zimbra NG HSM deduplication will also find and take care of multiple copies of the same email regardless of any cache or timing.
| |
− | | |
− | Running the "doDeduplicate" operation is also highly suggested after a migration or a large data import in order to optimize your storage usage.
| |
− | | |
− | === Running a Volume Deduplication ===
| |
− | ==== Via the Zimbra Next Generation Modules Administration Zimlet ====
| |
− | To run a volume deduplication via the Zimbra Next Generation Modules Administration Zimlet simply click on the "Zimbra NG HSM" tab select the volume you wish to deduplicate and press the "Deduplicate" button:
| |
− | | |
− | | |
− | ==== Via the Zimbra Next Generation Modules CLI ====
| |
− | <pre>
| |
− | zimbra@mailserver:~$ zxsuite powerstore doDeduplicate
| |
− | | |
− | command doDeduplicate requires more parameters
| |
− | | |
− | Syntax:
| |
− | zxsuite powerstore doDeduplicate {volume_name} [attr1 value1 [attr2 value2...]]
| |
− | | |
− | PARAMETER LIST
| |
− | | |
− | NAME TYPE EXPECTED VALUES DEFAULT
| |
− | volume_name(M) String[,..]
| |
− | dry_run(O) Boolean true|false false
| |
− | | |
− | (M) == mandatory parameter, (O) == optional parameter
| |
− | | |
− | Usage example:
| |
− | | |
− | zxsuite powerstore dodeduplicate secondvolume
| |
− | Starts a deduplication on volume secondvolume
| |
− | </pre>
| |
− | | |
− | To list all available volumes, you can use the ''`zxsuite powerstore getAllVolumes`'' command.
| |
− | | |
− | | |
− | === "doDeduplicate" stats ===
| |
− | The "doDeduplicate" operation is a valid target for the "monitor" command, meaning that you can watch the command's statistics while it's running through the `zxsuite powerstore monitor [operationID]` command.
| |
− | | |
− | ''Sample Output''
| |
− | <pre>
| |
− | Current Pass (Digest Prefix): 63/64
| |
− | Checked Mailboxes: 148/148
| |
− | Deduplicated/duplicated Blobs: 64868/137089
| |
− | Already Deduplicated Blobs: 71178
| |
− | Skipped Blobs: 0
| |
− | Invalid Digests: 0
| |
− | Total Space Saved: 21.88 GB
| |
− | </pre>
| |
− | | |
− | * "Current Pass (Digest Prefix)" - The "doDeduplicate" command will analyze the BLOBS in groups based on the first characted of their digest (name).
| |
− | * "Checked Mailboxes" - The number of mailboxes analyzed for the current pass.
| |
− | * "Deduplicated/duplicated Blobs" - Number of BLOBS deduplicated by the current operation / Number of total duplicated items on the volume.
| |
− | * "Already Deduplicated Blobs" - Number of deduplicated blobs on the volume (duplicated blobs that have been deduplicated by a previous run).
| |
− | * "Skipped Blobs" - BLOBs that have not been analyzed, usually because of a read error or missing file.
| |
− | * "Invalid Digests" - BLOBs with a bad digest (name different from the actual digest of the file).
| |
− | * "Total Space Saved" - Amount of disk space freed by the doDeduplicate operation.
| |
− | | |
− | | |
− | Looking at the sample output above we can see that:
| |
− | * The operation is running the second to last pass on the last mailbox
| |
− | * 137089 duplicated BLOBs have been found, 71178 of which have already been deduplicated previously.
| |
− | * The current operation deduplicated 64868 BLOBs, for a total disk space saving of 21.88GB
| |
− | </div>
| |
− | </div>
| |
− | <div class="col-md-9">
| |
− | <div class="panel-footer">
| |
− | <p><i class="fa fa-clock-o"></i> Aug 25, 2016 - [https://www.zimbra.com/email-server-software/ Know more »]</p>
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | <div class="col-md-3"><br /></div>
| |
− | <div class="col-md-3">
| |
− | <div class="panel panel-zimbrared-light-border">
| |
− | <div class="panel-heading">
| |
− | <h3 class="panel-title"><i class="fa fa-gear pull-left"></i> Zimbra Next Generation Modules</h3>
| |
− | </div>
| |
− | <div class="panel-body">
| |
− | {{ZNG}}
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | <div class="col-md-3">
| |
− | <div class="panel panel-primary-light-border">
| |
− | <div class="panel-heading">
| |
− | <h3 class="panel-title"><i class="fa fa-info-circle pull-left"></i> Zimbra Next Generation Modules Resources</h3>
| |
− | </div>
| |
− | <div class="panel-body">
| |
− | {{ZNGL}}
| |
− | </div>
| |
− | </div>
| |
− | </div>
| |
− | <div class="clearfix"></div>
| |
− | <div class="col-md-12"><br></div>
| |
− | {{FH}}
| |