Word forms dictionary. Optional, default is empty.
Word forms are applied after tokenizing the incoming text by charset_table rules. They essentialy let you replace one word with another. Normally, that would be used to bring different word forms to a single normal form (eg. to normalize all the variants such as "walks", "walked", "walking" to the normal form "walk"). It can also be used to implement stemming exceptions, because stemming is not applied to words found in the forms list.
Dictionaries are used to normalize incoming words both during indexing
and searching. Therefore, to pick up changes in wordforms file
it's required to reindex and restart searchd.
Word forms support in Sphinx is designed to support big dictionaries well.
They moderately affect indexing speed: for instance, a dictionary with 1 million
entries slows down indexing about 1.5 times. Searching speed is not affected at all.
Additional RAM impact is roughly equal to the dictionary file size,
and dictionaries are shared across indexes: ie. if the very same 50 MB wordforms
file is specified for 10 different indexes, additional searchd
RAM usage will be about 50 MB.
Dictionary file should be in a simple plain text format. Each line should contain source and destination word forms, in exactly the same encoding as specified in charset_type, separated by "greater" sign. Rules from the charset_table will be applied when the file is loaded. So basically it's as case sensitive as your other full-text indexed data, ie. typically case insensitive. Here's the file contents sample:
walks > walk walked > walk walking > walk
There is bundled spelldump utility that
helps you create a dictionary file in the format Sphinx can read
from source .dict and .aff
dictionary files in ispell or MySpell
format (as bundled with OpenOffice).
Starting with version 0.9.9-rc1, you can map several source words to a single destination word. Because the work happens on tokens, not the source text, differences in whitespace and markup are ignored.
core 2 duo > c2d e6600 > c2d core 2duo > c2d
wordforms = /usr/local/sphinx/data/wordforms.txt