Zimbra Suite Plus/Zimbra HSM Plus/Item Deduplication: Difference between revisions
(Created page with "<div class="col-md-12"><br></div> <div class="col-md-12"><br></div> <ol class="breadcrumb"> <li>Zimbra Wiki</li> <li>Zimbra_Suite_Plus</li> <li>Zimbr...") |
m (1 revision imported: Zimbra Suite Plus) |
(No difference)
|
Latest revision as of 00:56, 8 September 2016
Zimbra HSM Plus - Item Deduplication
What is Item Deduplication
Item Deduplication is a technique that allows to save disk space by storing a single copy of an item and referencing it multiple times instead of storing multiple copies of the same item and referencing each copy only once.
This might seem a minor improvement, in theory, but in practical use can make a huge difference. Think about that user, the one that improperly sends nice and unnecessary 15Mb "motivational" or "funny" presentations to a-hundred-and-something-recipient-all-in-the-"to:"-field.
Item Deduplication in Zimbra
Item Deduplication is performed by Zimbra at the moment of storing a new item in the Primary Volume.
When a new item is being created its "message ID" is compared to a list of cached items, and in case of a match a hardlink to the cached message's BLOB is created instead of a whole new BLOB for the message.
The dedupe cache is managed in Zimbra 8 through the following config attributes:
zimbraPrefDedupeMessagesSentToSelf
Used to set the deduplication behaviour for sent-to-self messages.
<attr id="144" name="zimbraPrefDedupeMessagesSentToSelf" type="enum" value="dedupeNone,secondCopyifOnToOrCC,dedupeAll" cardinality="single" optionalIn="account,cos" flags="accountInherited,domainAdminModifiable"> <defaultCOSValue>dedupeNone</defaultCOSValue> <desc>dedupeNone|secondCopyIfOnToOrCC|moveSentMessageToInbox|dedupeAll</desc> </attr>
zimbraMessageIdDedupeCacheSize
Number of cached Message IDs.
<attr id="334" name="zimbraMessageIdDedupeCacheSize" type="integer" cardinality="single" optionalIn="globalConfig" min="0"> <globalConfigValue>3000</globalConfigValue> <desc> Number of Message-Id header values to keep in the LMTP dedupe cache. Subsequent attempts to deliver a message with a matching Message-Id to the same mailbox will be ignored. A value of 0 disables deduping. </desc> </attr>
zimbraPrefMessageIdDedupingEnabled
Manage deduplication at Account or COS-level.
<attr id="1198" name="zimbraPrefMessageIdDedupingEnabled" type="boolean" cardinality="single" optionalIn="account,cos" flags="accountInherited" since="8.0.0"> <defaultCOSValue>TRUE</defaultCOSValue> <desc> Account-level switch that enables message deduping. See zimbraMessageIdDedupeCacheSize for more details. </desc> </attr>
zimbraMessageIdDedupeCacheTimeout
Timeout for each entry in the dedupe cache.
<attr id="1340" name="zimbraMessageIdDedupeCacheTimeout" type="duration" cardinality="single" optionalIn="globalConfig" since="7.1.4"> <globalConfigValue>0</globalConfigValue> <desc> Timeout for a Message-Id entry in the LMTP dedupe cache. A value of 0 indicates no timeout. zimbraMessageIdDedupeCacheSize limit is ignored when this is set to a non-zero value. </desc> </attr>
(older Zimbra versions might use different attributes or lack some of them)
Item Deduplication and Zimbra HSM Plus
The Zimbra HSM Plus module features a "doDeduplicate" operation that parses a target volume to find and deduplicate any duplicated item.
Doing so you will save even more disk space, as while Zimbra's automatic deduplication is bound to a limited cache, Zimbra HSM Plus deduplication will also find and take care of multiple copies of the same email regardless of any cache or timing.
Running the "doDeduplicate" operation is also highly suggested after a migration or a large data import in order to optimize your storage usage.
Running a Volume Deduplication
Via the Zimbra Suite Plus Administration Zimlet
To run a volume deduplication via the Zimbra Suite Plus Administration Zimlet simply click on the "Zimbra HSM Plus" tab select the volume you wish to deduplicate and press the "Deduplicate" button:
Via the Zimbra Suite Plus CLI
zimbra@mailserver:~$ zxsuite powerstore doDeduplicate command doDeduplicate requires more parameters Syntax: zxsuite powerstore doDeduplicate {volume_name} [attr1 value1 [attr2 value2...]] PARAMETER LIST NAME TYPE EXPECTED VALUES DEFAULT volume_name(M) String[,..] dry_run(O) Boolean true|false false (M) == mandatory parameter, (O) == optional parameter Usage example: zxsuite powerstore dodeduplicate secondvolume Starts a deduplication on volume secondvolume
To list all available volumes, you can use the `zxsuite powerstore getAllVolumes` command.
"doDeduplicate" stats
The "doDeduplicate" operation is a valid target for the "monitor" command, meaning that you can watch the command's statistics while it's running through the `zxsuite powerstore monitor [operationID]` command.
Sample Output
Current Pass (Digest Prefix): 63/64 Checked Mailboxes: 148/148 Deduplicated/duplicated Blobs: 64868/137089 Already Deduplicated Blobs: 71178 Skipped Blobs: 0 Invalid Digests: 0 Total Space Saved: 21.88 GB
- "Current Pass (Digest Prefix)" - The "doDeduplicate" command will analyze the BLOBS in groups based on the first characted of their digest (name).
- "Checked Mailboxes" - The number of mailboxes analyzed for the current pass.
- "Deduplicated/duplicated Blobs" - Number of BLOBS deduplicated by the current operation / Number of total duplicated items on the volume.
- "Already Deduplicated Blobs" - Number of deduplicated blobs on the volume (duplicated blobs that have been deduplicated by a previous run).
- "Skipped Blobs" - BLOBs that have not been analyzed, usually because of a read error or missing file.
- "Invalid Digests" - BLOBs with a bad digest (name different from the actual digest of the file).
- "Total Space Saved" - Amount of disk space freed by the doDeduplicate operation.
Looking at the sample output above we can see that:
- The operation is running the second to last pass on the last mailbox
- 137089 duplicated BLOBs have been found, 71178 of which have already been deduplicated previously.
- The current operation deduplicated 64868 BLOBs, for a total disk space saving of 21.88GB
Zimbra Suite Plus
Latest Version: 3.1.4
- Released on: July 14, 2020
- Compatibility List
- Changelog
- FAQ
- License Management
Zimbra Suite Plus Resources
Here you can find useful resources for your Zimbra Suite Plus environment
- What is Zimbra Suite?
- FAQs
- Downloads
- Zimbra Suite Installation Guide
- Zimbra Client Zimlet
- Information and advice for NE Customers
- How to migrate Zimbra with Zimbra Migration Tool
- How to perform an incremental Migration with Zimbra Backup Plus
- Known Issues
- How to report a Zimbra Suite Plus Issue