Ajcody-Lucene-Topics

Lucene

   KB 3857        Last updated on 06/20/2016  




0.00
(0 votes)
24px ‎  - This is Zeta Alliance Certified Documentation. The content has been tested by the Community.

Actual Lucene Topics Homepage

Please see Ajcody-Lucene-Topics

Other References to Lucene Index

Please see:

Some General Questions On The Lucene Indexing

Index Directory Numbering

We know the directory under the index volume path is like the following:

/opt/zimbra/index/ "X" / "Y" / index / "Z" /

We believe that "X" is the number which is determined by bitshifting the mailbox_id to the right by 12 bits. That the "Y" is the maibox_id of the user. However, how do you get "Z"?

Answer: It's always '0'.

When Is User Message Store Directory Created

When is the directory for the message data for an account (/opt/zimbra/store/0/...) created? When a message is stored for the first time? Which would also mean, that it would not be created if message data does not exist, correct?

Answer: Yes.

When Is User Index Directory And Index Files Created

Concerning the index directory. We know the index directory already exists even when mail data does not exist [see question above]. When will the index directory be created? With the account's first login?

Answer: The directory is created with user creation. The index files that will exist in the user's directory are created with the first indexing or search event.
Example Walk Through

On my 5.0.24 test box.

Create a test account:

[zimbra@mail37 ~]$ zmprov gmi index-test@`zmhostname`
mailboxId: 6
quotaUsed: 0

Notice that the 'store' directory ISN'T automatically created for the user upon user creation :

[zimbra@mail37 ~]$ ls -latr /opt/zimbra/store/0
total 20
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 14:57 1
drwxr-xr-x 4 zimbra zimbra 4096 Sep 13 14:57 ..
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 14:57 2
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 15:05 3
drwxr-x--- 5 zimbra zimbra 4096 Sep 13 15:05 .

Notice that the 'index' directory IS automatically created for the user upon user creation but the actual indexing files are not :

[zimbra@mail37 ~]$ ls -latr /opt/zimbra/index/0/6/index/0/
total 8
drwxr-x--- 3 zimbra zimbra 4096 Oct 21 12:11 ..
drwxr-x--- 2 zimbra zimbra 4096 Oct 21 12:11 .

Let's see what changes when the user is sent and email BUT they still have not logged in yet:

[zimbra@mail37 ~]$ mail index-test@`zmhostname`
tSubject:test from localhost
test
.
Cc: 

Checking the relevant user directory for changes. Notice we now have a 6 directory, matching the users mailboxId. We don't see any index files though under their index directory:

[zimbra@mail37 ~]$ ls -latr /opt/zimbra/store/0/
mailboxId: 6
quotaUsed: 1563
total 24
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 14:57 1
drwxr-xr-x 4 zimbra zimbra 4096 Sep 13 14:57 ..
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 14:57 2
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 15:05 3
drwxr-x--- 3 zimbra zimbra 4096 Oct 21 12:13 6
drwxr-x--- 6 zimbra zimbra 4096 Oct 21 12:13 .

[zimbra@mail37 ~]$ ls -latr /opt/zimbra/index/0/6/index/0/
total 8
drwxr-x--- 3 zimbra zimbra 4096 Oct 21 12:11 ..
drwxr-x--- 2 zimbra zimbra 4096 Oct 21 12:11 .

Let's see if logging into the webclient as the user changes anything. Log into the webclient and then check the user directories again. Still no change, no index files created.

[zimbra@mail37 ~]$ ls -latr /opt/zimbra/store/0/
mailboxId: 6
quotaUsed: 1563
total 24
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 14:57 1
drwxr-xr-x 4 zimbra zimbra 4096 Sep 13 14:57 ..
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 14:57 2
drwxr-x--- 3 zimbra zimbra 4096 Sep 13 15:05 3
drwxr-x--- 3 zimbra zimbra 4096 Oct 21 12:13 6
drwxr-x--- 6 zimbra zimbra 4096 Oct 21 12:13 .

[zimbra@mail37 ~]$ ls -latr /opt/zimbra/index/0/6/index/0/
total 8
drwxr-x--- 3 zimbra zimbra 4096 Oct 21 12:11 ..
drwxr-x--- 2 zimbra zimbra 4096 Oct 21 12:11 .

Let's do a manual index of the user account and confirm index files are made.

[zimbra@mail37 ~]$ zmprov rim index-test@`zmhostname` start
status: started
[zimbra@mail37 ~]$ ls -latr /opt/zimbra/index/0/6/index/0/
total 20
drwxr-x--- 3 zimbra zimbra 4096 Oct 21 12:11 ..
-rw-r----- 1 zimbra zimbra   45 Oct 21 12:15 segments_2
-rw-r----- 1 zimbra zimbra   20 Oct 21 12:15 segments.gen
-rw-r----- 1 zimbra zimbra 2455 Oct 21 12:15 _0.cfs
drwxr-x--- 2 zimbra zimbra 4096 Oct 21 12:15 .

So far, we've confirm user creation doesn't create the store directory until a message or something similar is processed. That the users index directory path will be created with user creation but the index files will not be. That the index files aren't created when the user first logs in but are created with a manual index [zmprov rim user@domain].

Let's confirm if a 'search' creates the index files. First, I'll remove the existing index files that were made. Then log into the webclient as the user and do an email search. Confirming after words that the index files were made from that search - which it does.

[zimbra@mail37 ~]$ cd /opt/zimbra/index/0/6/index/0/
[zimbra@mail37 0]$ ls
_0.cfs  segments.gen  segments_2
[zimbra@mail37 0]$ rm -rf *
[zimbra@mail37 0]$ ls

Perform webclient search and check index directory again.

[zimbra@mail37 0]$ ls
segments.gen  segments_1

Delete Flag

Does lucene create delete flag when index is deleted?

Answer: Yes.
Delete Flag Operational Details

If so, we believe that it creates only delete flag, and files having an actual index (such as segment file) will be deleted (reuse of disk space) only when segment is merged, or any function for optimization is called, is this correct?

Answer: Yes.
Update
See also this bug/rfe:
"Index data needs to reclaim disk space after deletes"
http://bugzilla.zimbra.com/show_bug.cgi?id=54969
Is It The Same For zmmailboxmove With purgeOld

The above behavior is the same when executing zmmailboxmove with purgeOld?

Answer: No, it physically deletes the entire files.

Cleaning Up Or Shrinking Index For Users

From the ZCS 8 Release Notes:

  1. "large mail volume DOS's lucene"
    • http://bugzilla.zimbra.com/show_bug.cgi?id=76414
      • Index data for mailboxes is never deleted so a mailbox index can become very large over time and might be consuming excess disk space because of the large index data. In 8.0, a new zmprov CLI, compactIndexMailbox (cim) was created to compact index data. This command can be used to reclaim disk space when the index volume starts to become full. To compact a mailbox’s index, type:
      • zmprov cim <name@domain|id> start
        • Note - Depending on the size of the mailbox and the number of deletes this might take awhile. This might require additional free space on the index directory.
        • You can run this command concurrently. It is recommended to run this command during off peak hours. You cannot cancel the command once it is started.
        • To see the status of index compaction on a mailbox, type:
      • zmprov getIndexStats <name@domain|id>
  2. "Sorting by recipient does not appear to work correctly"
    • http://bugzilla.zimbra.com/show_bug.cgi?id=74521
      • Customers currently on ZCS 7.x upgrading to latest version of ZCS will require full re-indexing mailboxes for sort-by recipients feature to work properly. Without full re-indexing the mailbox, sorting by "To" field in the "Sent" folder message view will skip all the mess ages from the sorted results added before the upgrade. Note: re-indexing mailbox is an expensive operation and if this feature is NOT so required then, its NOT recommended to do mailbox re-indexing.

Manually Deleting Lucene Index Directories

Please see King0770-Notes#Manually_Delete_Index_Directories

Performance Tuning

Please see Performance_Tuning_Guidelines_for_Large_Deployments#Lucene_Index

Some smaller notes:

  1. Upgrade to 6.0.8:
  2. These last 2 will decrease Indexing overhead, but obviously with a loss of functionality
    1. set zimbraPrefAutoAddAddressEnabled to FALSE
    2. set zimbraAttachmentsIndexingEnabled to FALSE

Jump to: navigation, search