How the Zimbra "Search" Feature Functions for non-Western Languages

Revision as of 22:43, 2 November 2011 by Gayle (talk | contribs)

When using the Zimbra "Search" feature in non-Western languages, it is possible a character such as a Japanese character is unable to be searched. This page contains frequently asked questions about restrictions if a character is un-searchable, and a the specification for the Zimbra Search feature in ZimbraAnalyzer.

Frequently Asked Questions

1. How to search by overall one-byte character? English is tokenized on a word by word basis, so a search term is per word.

2. How to search by two-byte character(alphabet or numbers)? For fullwidth numbers, Zimbra tokenizer converts them to halfwidth equivalence during tokenization.

3. How to search by two-byte character besides alphabet or numbers, especially for a Japanese character? CJK is tokenized on bigram basis.

http://en.wikipedia.org/wiki/Bigram

4. How to deal with an asterisk "*"? An asterisk "*" is translated into a wildcard only when it appears at the end of a single term query or a phrase query. Otherwise, it is ignored.

5. Is there any character string which is unable to search? Only terms that are indexed are searchable. The tokenization is either by word or bigram respectively as described earlier, so there is no definitive answer to "what character is searchable?".

6. Is the specification the same between the advanced and the standard search? Yes.

7. Is the specification the same between mail search and addressbook search? No, we use a different tokenizer for addressbook.


For a specification of the Zimbra Search feature in ZimbraAnalyzer, go to ZimbraAnalyzer query syntax.

Jump to: navigation, search