Zimbra NG Modules/Zimbra NG HSM/Item Deduplication
Zimbra NG HSM - Item Deduplication
What is Item Deduplication
Item Deduplication is a technique that allows to save disk space by storing a single copy of an item and referencing it multiple times instead of storing multiple copies of the same item and referencing each copy only once.
This might seem a minor improvement, in theory, but in practical use can make a huge difference. Think about that user, the one that improperly sends nice and unnecessary 15Mb "motivational" or "funny" presentations to a-hundred-and-something-recipient-all-in-the-"to:"-field.
Item Deduplication in Zimbra
Item Deduplication is performed by Zimbra at the moment of storing a new item in the Primary Volume.
When a new item is being created its "message ID" is compared to a list of cached items, and in case of a match a hardlink to the cached message's BLOB is created instead of a whole new BLOB for the message.
The dedupe cache is managed in Zimbra 8 through the following config attributes:
zimbraPrefDedupeMessagesSentToSelf
Used to set the deduplication behaviour for sent-to-self messages.
<attr id="144" name="zimbraPrefDedupeMessagesSentToSelf" type="enum" value="dedupeNone,secondCopyifOnToOrCC,dedupeAll" cardinality="single" optionalIn="account,cos" flags="accountInherited,domainAdminModifiable"> <defaultCOSValue>dedupeNone</defaultCOSValue> <desc>dedupeNone|secondCopyIfOnToOrCC|moveSentMessageToInbox|dedupeAll</desc> </attr>
zimbraMessageIdDedupeCacheSize
Number of cached Message IDs.
<attr id="334" name="zimbraMessageIdDedupeCacheSize" type="integer" cardinality="single" optionalIn="globalConfig" min="0"> <globalConfigValue>3000</globalConfigValue> <desc> Number of Message-Id header values to keep in the LMTP dedupe cache. Subsequent attempts to deliver a message with a matching Message-Id to the same mailbox will be ignored. A value of 0 disables deduping. </desc> </attr>
zimbraPrefMessageIdDedupingEnabled
Manage deduplication at Account or COS-level.
<attr id="1198" name="zimbraPrefMessageIdDedupingEnabled" type="boolean" cardinality="single" optionalIn="account,cos" flags="accountInherited" since="8.0.0"> <defaultCOSValue>TRUE</defaultCOSValue> <desc> Account-level switch that enables message deduping. See zimbraMessageIdDedupeCacheSize for more details. </desc> </attr>
zimbraMessageIdDedupeCacheTimeout
Timeout for each entry in the dedupe cache.
<attr id="1340" name="zimbraMessageIdDedupeCacheTimeout" type="duration" cardinality="single" optionalIn="globalConfig" since="7.1.4"> <globalConfigValue>0</globalConfigValue> <desc> Timeout for a Message-Id entry in the LMTP dedupe cache. A value of 0 indicates no timeout. zimbraMessageIdDedupeCacheSize limit is ignored when this is set to a non-zero value. </desc> </attr>
(older Zimbra versions might use different attributes or lack some of them)
Item Deduplication and Zimbra NG HSM
The Zimbra NG HSM module features a "doDeduplicate" operation that parses a target volume to find and deduplicate any duplicated item.
Doing so you will save even more disk space, as while Zimbra's automatic deduplication is bound to a limited cache, Zimbra NG HSM deduplication will also find and take care of multiple copies of the same email regardless of any cache or timing.
Running the "doDeduplicate" operation is also highly suggested after a migration or a large data import in order to optimize your storage usage.
Running a Volume Deduplication
Via the Zimbra NG Modules Administration Zimlet
To run a volume deduplication via the Zimbra NG Modules Administration Zimlet simply click on the "Zimbra NG HSM" tab select the volume you wish to deduplicate and press the "Deduplicate" button:
Via the Zimbra NG Modules CLI
zimbra@mailserver:~$ zxsuite powerstore doDeduplicate command doDeduplicate requires more parameters Syntax: zxsuite powerstore doDeduplicate {volume_name} [attr1 value1 [attr2 value2...]] PARAMETER LIST NAME TYPE EXPECTED VALUES DEFAULT volume_name(M) String[,..] dry_run(O) Boolean true|false false (M) == mandatory parameter, (O) == optional parameter Usage example: zxsuite powerstore dodeduplicate secondvolume Starts a deduplication on volume secondvolume
To list all available volumes, you can use the `zxsuite powerstore getAllVolumes` command.
"doDeduplicate" stats
The "doDeduplicate" operation is a valid target for the "monitor" command, meaning that you can watch the command's statistics while it's running through the `zxsuite powerstore monitor [operationID]` command.
Sample Output
Current Pass (Digest Prefix): 63/64 Checked Mailboxes: 148/148 Deduplicated/duplicated Blobs: 64868/137089 Already Deduplicated Blobs: 71178 Skipped Blobs: 0 Invalid Digests: 0 Total Space Saved: 21.88 GB
- "Current Pass (Digest Prefix)" - The "doDeduplicate" command will analyze the BLOBS in groups based on the first characted of their digest (name).
- "Checked Mailboxes" - The number of mailboxes analyzed for the current pass.
- "Deduplicated/duplicated Blobs" - Number of BLOBS deduplicated by the current operation / Number of total duplicated items on the volume.
- "Already Deduplicated Blobs" - Number of deduplicated blobs on the volume (duplicated blobs that have been deduplicated by a previous run).
- "Skipped Blobs" - BLOBs that have not been analyzed, usually because of a read error or missing file.
- "Invalid Digests" - BLOBs with a bad digest (name different from the actual digest of the file).
- "Total Space Saved" - Amount of disk space freed by the doDeduplicate operation.
Looking at the sample output above we can see that:
- The operation is running the second to last pass on the last mailbox
- 137089 duplicated BLOBs have been found, 71178 of which have already been deduplicated previously.
- The current operation deduplicated 64868 BLOBs, for a total disk space saving of 21.88GB
Zimbra NG Modules
Zimbra NG Modules Resources
Here you can find useful resources for your Zimbra NG Modules
- What are the Zimbra NG Modules?
- FAQs
- Downloads
- Zimbra NG Modules Installation Guide
- Zimbra Client Zimlet
- How to migrate Zimbra with Zimbra Migration Tool
- How to perform an incremental Migration with Zimbra NG Modules
- Known Issues