anonymous user

Forums   Register   Login   Forgot your login/password?   Search

How to index dash char in sphinx?

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 526 | 527 | 528 | 529 | next »» | Create new thread

oscardb

Name: Oscar Del Ben
Posts: 2

2009-01-20 11:23:45 | reply!


Hello, I read that sphinx threat the dash char ('-') as a word separator. If this is
true, how can I overwrite this behavior? Thanks

Oscar Del Ben

Arantor

Name: Pete Spicer
Posts: 4444

to: oscardb, 2009-01-20 12:06:37 | reply!


> Hello, I read that sphinx threat the dash char ('-') as a word separator. If this is
> true, how can I overwrite this behavior? Thanks

There are three things you can do:

1. You can ignore the character (so hyphenated words get de-hyphenated, e.g. blu-ray
becomes bluray) - just add it to the ignore_chars directive. This will then drop, and not
treat as word-breaking, any standard hyphen (not special em-dashes or en-dashes)

2. You can make it part of the list of characters that are words - add it to
charset_table - however this will treat even a single - as a word.

3. Specific words can be overridden by defining each word as an exception. (See the
exceptions directive for more)

Note that in all cases, you'll need to reindex your data and additionally if you're not
using 0.9.9, you'll need to restart searchd too.

Also, options 1 and 2 will disable the negation syntax (e.g. word -word2 where it would
find documents that contain word but not word2) but you can substitute ! instead (i.e.
word !word2)

oscardb

Name: Oscar Del Ben
Posts: 2

to: Arantor, 2009-01-20 12:50:46 | reply!


Thank you, very exhaustive and helpful.


> There are three things you can do:
>
> 1. You can ignore the character (so hyphenated words get de-hyphenated, e.g. blu-ray
> becomes bluray) - just add it to the ignore_chars directive. This will then drop, and not
> treat as word-breaking, any standard hyphen (not special em-dashes or en-dashes)
>
> 2. You can make it part of the list of characters that are words - add it to
> charset_table - however this will treat even a single - as a word.
>
> 3. Specific words can be overridden by defining each word as an exception. (See the
> exceptions directive for more)
>
> Note that in all cases, you'll need to reindex your data and additionally if you're not
> using 0.9.9, you'll need to restart searchd too.
>
> Also, options 1 and 2 will disable the negation syntax (e.g. word -word2 where it would
> find documents that contain word but not word2) but you can substitute ! instead (i.e.
> word !word2)

rmarscher

Name: Rob Marscher
Posts: 3

to: oscardb, 2009-03-18 20:49:00 | reply!


> Thank you, very exhaustive and helpful.
+1 I needed to figure this out too. Thanks!

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 526 | 527 | 528 | 529 | next »» | Create new thread