Zimbra Next Generation Modules/Zimbra NG HSM/Item Deduplication: Difference between revisions

m (1 revision imported: Zimbra NG)
 
Line 1: Line 1:
<div class="col-md-12"><br></div>
#REDIRECT [[Zimbra_NG_Modules/Zimbra_NG_HSM/Item_Deduplication]]
<div class="col-md-12"><br></div>
<ol class="breadcrumb">
  <li>[[Main Page|Zimbra Wiki]]</li>
  <li>[[Zimbra_Next_Generation_Modules]]</li>
  <li>[[Zimbra_NG_HSM]]</li>
  <li class="active">Zimbra NG HSM - Item Deduplication</li>
</ol>
__NOTOC__
<div class="col-md-12"><br /></div>
<div class="col-md-9">
    <h2 class="title-header" style="padding-bottom: 9px; border-bottom: 4px solid #0087c3;">Zimbra NG HSM - Item Deduplication</h2>
    <div class="col-md-12">
        <div class="ibox-content">
            <div class="post animated fadeInLeft animation-delay-8" style="padding-top:5px">
                <div class="panel panel-default">
                    <div class="panel-body">
                        <div class="row">
== What is Item Deduplication ==
Item Deduplication is a technique that allows to save disk space by storing a single copy of an item and referencing it multiple times instead of storing multiple copies of the same item and referencing each copy only once.
 
This might seem a minor improvement, in theory, but in practical use can make a huge difference. Think about that user, the one that improperly sends nice and unnecessary 15Mb "motivational" or "funny" presentations to a-hundred-and-something-recipient-all-in-the-"to:"-field.
 
=== Item Deduplication in Zimbra ===
Item Deduplication is performed by Zimbra at the moment of storing a new item in the [[Zimbra_Next_Generation_Modules/Zimbra_NG_HSM/Zimbra_Stores|Primary Volume]].
 
When a new item is being created its "message ID" is compared to a list of cached items, and in case of a match a hardlink to the cached message's BLOB is created instead of a whole new BLOB for the message.
 
The dedupe cache is managed in Zimbra 8 through the following config attributes:
 
'''zimbraPrefDedupeMessagesSentToSelf'''
 
Used to set the deduplication behaviour for sent-to-self messages.
<pre>
<attr id="144" name="zimbraPrefDedupeMessagesSentToSelf" type="enum" value="dedupeNone,secondCopyifOnToOrCC,dedupeAll" cardinality="single"
optionalIn="account,cos" flags="accountInherited,domainAdminModifiable">
  <defaultCOSValue>dedupeNone</defaultCOSValue>
  <desc>dedupeNone|secondCopyIfOnToOrCC|moveSentMessageToInbox|dedupeAll</desc>
</attr>
</pre>
 
'''zimbraMessageIdDedupeCacheSize'''
 
Number of cached Message IDs.
<pre>
<attr id="334" name="zimbraMessageIdDedupeCacheSize" type="integer" cardinality="single" optionalIn="globalConfig" min="0">
  <globalConfigValue>3000</globalConfigValue>
  <desc>
    Number of Message-Id header values to keep in the LMTP dedupe cache.
    Subsequent attempts to deliver a message with a matching Message-Id
    to the same mailbox will be ignored.  A value of 0 disables deduping.
  </desc>
</attr>
</pre>
 
'''zimbraPrefMessageIdDedupingEnabled'''
 
Manage deduplication at Account or COS-level.
<pre>
<attr id="1198" name="zimbraPrefMessageIdDedupingEnabled" type="boolean" cardinality="single" optionalIn="account,cos" flags="accountInherited"
since="8.0.0">
  <defaultCOSValue>TRUE</defaultCOSValue>
  <desc>
    Account-level switch that enables message deduping.  See zimbraMessageIdDedupeCacheSize for more details.
  </desc>
</attr>
</pre>
 
''' zimbraMessageIdDedupeCacheTimeout '''
 
Timeout for each entry in the dedupe cache.
<pre>
<attr id="1340" name="zimbraMessageIdDedupeCacheTimeout" type="duration" cardinality="single" optionalIn="globalConfig" since="7.1.4">
  <globalConfigValue>0</globalConfigValue>
  <desc>
    Timeout for a Message-Id entry in the LMTP dedupe cache. A value of 0 indicates no timeout.
    zimbraMessageIdDedupeCacheSize limit is ignored when this is set to a non-zero value.
  </desc>
</attr>
</pre>
(older Zimbra versions might use different attributes or lack some of them)
 
== Item Deduplication and Zimbra NG HSM ==
The Zimbra NG HSM module features a "doDeduplicate" operation that parses a target volume to find and deduplicate any duplicated item.
 
Doing so you will save even more disk space, as while Zimbra's automatic deduplication is bound to a limited cache, Zimbra NG HSM deduplication will also find and take care of multiple copies of the same email regardless of any cache or timing.
 
Running the "doDeduplicate" operation is also highly suggested after a migration or a large data import in order to optimize your storage usage.
 
=== Running a Volume Deduplication ===
==== Via the Zimbra Next Generation Modules Administration Zimlet ====
To run a volume deduplication via the Zimbra Next Generation Modules Administration Zimlet simply click on the "Zimbra NG HSM" tab select the volume you wish to deduplicate and press the "Deduplicate" button:
 
 
==== Via the Zimbra Next Generation Modules CLI ====
<pre>
zimbra@mailserver:~$ zxsuite powerstore doDeduplicate
 
command doDeduplicate requires more parameters
 
Syntax:
  zxsuite powerstore doDeduplicate {volume_name} [attr1 value1 [attr2 value2...]]
 
PARAMETER LIST
 
NAME              TYPE          EXPECTED VALUES    DEFAULT
volume_name(M)    String[,..]                     
dry_run(O)        Boolean        true|false        false
 
(M) == mandatory parameter, (O) == optional parameter
 
Usage example:
 
zxsuite powerstore dodeduplicate secondvolume
Starts a deduplication on volume secondvolume
</pre>
 
To list all available volumes, you can use the ''`zxsuite powerstore getAllVolumes`'' command.
 
 
=== "doDeduplicate" stats ===
The "doDeduplicate" operation is a valid target for the "monitor" command, meaning that you can watch the command's statistics while it's running through the `zxsuite powerstore monitor [operationID]` command.
 
''Sample Output''
<pre>
Current Pass (Digest Prefix):  63/64
Checked Mailboxes:            148/148
Deduplicated/duplicated Blobs: 64868/137089
Already Deduplicated Blobs:    71178
Skipped Blobs:                0
Invalid Digests:              0
Total Space Saved:            21.88 GB
</pre>
 
* "Current Pass (Digest Prefix)" - The "doDeduplicate" command will analyze the BLOBS in groups based on the first characted of their digest (name).
* "Checked Mailboxes" - The number of mailboxes analyzed for the current pass.
* "Deduplicated/duplicated Blobs" - Number of BLOBS deduplicated by the current operation / Number of total duplicated items on the volume.
* "Already Deduplicated Blobs" - Number of deduplicated blobs on the volume (duplicated blobs that have been deduplicated by a previous run).
* "Skipped Blobs" - BLOBs that have not been analyzed, usually because of a read error or missing file.
* "Invalid Digests" - BLOBs with a bad digest (name different from the actual digest of the file).
* "Total Space Saved" - Amount of disk space freed by the doDeduplicate operation.
 
 
Looking at the sample output above we can see that:
* The operation is running the second to last pass on the last mailbox
* 137089 duplicated BLOBs have been found, 71178 of which have already been deduplicated previously.
* The current operation deduplicated 64868 BLOBs, for a total disk space saving of 21.88GB
                        </div>
                    </div>
                    <div class="col-md-9">
                        <div class="panel-footer">
                            <p><i class="fa fa-clock-o"></i> Aug 25, 2016 - [https://www.zimbra.com/email-server-software/ Know more »]</p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>
<div class="col-md-3"><br /></div>
<div class="col-md-3">
    <div class="panel panel-zimbrared-light-border">
        <div class="panel-heading">
            <h3 class="panel-title"><i class="fa fa-gear pull-left"></i> Zimbra Next Generation Modules</h3>
        </div>
        <div class="panel-body">
            {{ZNG}}
        </div>
    </div>
</div>
<div class="col-md-3">
    <div class="panel panel-primary-light-border">
        <div class="panel-heading">
            <h3 class="panel-title"><i class="fa fa-info-circle pull-left"></i> Zimbra Next Generation Modules Resources</h3>
        </div>
        <div class="panel-body">
            {{ZNGL}}
        </div>
    </div>
</div>
<div class="clearfix"></div>
<div class="col-md-12"><br></div>
{{FH}}

Latest revision as of 13:21, 29 November 2017

Jump to: navigation, search