Forums Register Login Forgot your login/password? Search
$cl->BuildExcerpts doesn't work for russian text.
Common forum | 1 | 2 | 3 | 4 | 5 | ... | 449 | 450 | 451 | 452 | next »» | Create new thread
|
ethaniel
Name: Ethaniel |
2006-08-26 17:52:29
| reply! $cl->BuildExcerpts doesn't work when we input russian CP-1251 text. It just removes all the russian characters. Converting to UTF or KOI8 doesn't help either. |
|
shodan
Name: Andrew Aksyonoff |
to: ethaniel, 2006-08-27 01:36:30
| reply! > Converting to UTF or KOI8 doesn't help either. It doesn't support encoding other than UTF-8, but UTF-8 really should work. Are you positive you convert both $docs and query $words to UTF-8? |
|
ethaniel
Name: Ethaniel |
to: shodan, 2006-08-27 07:02:54
| reply! > > Converting to UTF or KOI8 doesn't help either. > > It doesn't support encoding other than UTF-8, but UTF-8 really should work. Are you > positive you convert both $docs and query $words to UTF-8? first of all I would like to thank you for this wonderful program. It is just what I wanted to create for a long long time. Now it will really help me out. Now regarding UTF. I did convert to docs and query to UTF. my opts are $opts = array ( "before_match" => "<b>", "after_match" => "</b>", "chunk_separator" => " ... ", "limit" => 400, "around" => 3 ); it returns the same text I enter, it doesn't enclose the query with <b></b>. http://search.nightparty.ru/np.php |
|
shodan
Name: Andrew Aksyonoff |
to: ethaniel, 2006-08-27 09:27:19
| reply! > Now regarding UTF. I did convert to docs and query to UTF. Managed to reproduced that on one of my servers. Will check and fix, thanks for the report! |
|
ethaniel
Name: Ethaniel |
to: shodan, 2006-08-28 16:55:01
| reply! > > Now regarding UTF. I did convert to docs and query to UTF. > > Managed to reproduced that on one of my servers. Will check and fix, thanks for the > report! can't wait for the new version. |
|
shodan
Name: Andrew Aksyonoff |
to: shodan, 2006-08-29 11:41:01
| reply! > Managed to reproduced that on one of my servers. It turns out that charset_table for the index was configured to use SBCS encoding - so excerpts code picked it and, obviously, failed - as it only supports UTF-8 at the moment. To workaround with 0.9.6, you would either use UTF-8 everywhere - or setup a fake index with UTF-8 encoding and proper table, and use this fake index for excerpts generation only. I scheduled to add SBCS support to exceprts generator, will be fixed in some next release. |
|
ethaniel
Name: Ethaniel |
to: shodan, 2006-09-02 16:45:29
| reply! > > Managed to reproduced that on one of my servers. > > It turns out that charset_table for the index was configured to use SBCS encoding - so > excerpts code picked it and, obviously, failed - as it only supports UTF-8 at the moment. > > To workaround with 0.9.6, you would either use UTF-8 everywhere - or setup a fake index > with UTF-8 encoding and proper table, and use this fake index for excerpts generation > only. > > I scheduled to add SBCS support to exceprts generator, will be fixed in some next release. my dbs are cp1251. mysql 4.0.24 (no collation or stuff like that). I set utf-8 in the config file, reindexed and now the search is returning zero results. any ideas? this fix is rather important. |
|
shodan
Name: Andrew Aksyonoff |
to: ethaniel, 2006-09-03 19:02:10
| reply! > my dbs are cp1251. mysql 4.0.24 (no collation or stuff like that). > > I set utf-8 in the config file, reindexed and now the search is returning zero results. If Sphinx expects UTF-8, you need to make MySQL provide UTF-8 encoded data to Sphinx when indexing as well. Something like sql_query_pre = SET CHARACTER_SET_RESULTS UTF-8 should help. |
|
shodan
Name: Andrew Aksyonoff |
to: shodan, 2006-09-04 06:34:54
| reply! > Something like sql_query_pre = SET CHARACTER_SET_RESULTS UTF-8 should help. I've been just told that 4.0.24 does not support UTF-8. In this case, you'll have to setup main Sphinx index to use cp-1251 (and query it in cp-1251) and a fake index to generate excerpts in UTF-8 (and pass document data and query in UTF-8). |
|
ethaniel
Name: Ethaniel |
to: shodan, 2006-09-04 14:36:10
| reply! > > Something like sql_query_pre = SET CHARACTER_SET_RESULTS UTF-8 should help. > > I've been just told that 4.0.24 does not support UTF-8. > > In this case, you'll have to setup main Sphinx index to use cp-1251 (and query it in > cp-1251) and a fake index to generate excerpts in UTF-8 (and pass document data and query > in UTF-8). thanks alot, I guess that should work. When will you release the main fix for this problem? I'd love to use your system in production. |
|
ethaniel
Name: Ethaniel |
to: ethaniel, 2006-09-05 10:09:57
| reply! > thanks alot, I guess that should work. > When will you release the main fix for this problem? I'd love to use your system in > production. it didn't work. $text=array(win2utf($text)); $res = $cl->BuildExcerpts ( $text, "utf8", win2utf($q), $opts ); $res is empty. I use the following function: function win2utf($s){ $c209 = chr(209); $c208 = chr(208); $c129 = chr(129); for($i=0; $i<strlen($s); $i++) { $c=ord($s[$i]); if ($c>=192 and $c<=239) $t.=$c208.chr($c-48); elseif ($c>239) $t.=$c209.chr($c-112); elseif ($c==184) $t.=$c209.$c209; elseif ($c==168) $t.=$c208.$c129; else $t.=$s[$i]; } return $t; } |
|
ethaniel
Name: Ethaniel |
to: ethaniel, 2006-09-05 10:25:12
| reply! > > thanks alot, I guess that should work. > > When will you release the main fix for this problem? I'd love to use your system in > production. > > it didn't work. > > $text=array(win2utf($text)); > $res = $cl->BuildExcerpts ( $text, "utf8", win2utf($q), $opts ); > > $res is empty. > > I use the following function: > function win2utf($s){ > $c209 = chr(209); $c208 = chr(208); $c129 = chr(129); > for($i=0; $i<strlen($s); $i++) { > $c=ord($s[$i]); > if ($c>=192 and $c<=239) $t.=$c208.chr($c-48); > elseif ($c>239) $t.=$c209.chr($c-112); > elseif ($c==184) $t.=$c209.$c209; > elseif ($c==168) $t.=$c208.$c129; > else $t.=$s[$i]; > } > return $t; > } > > PLEASE DISREGARD THIS COMMENT. I FORGOT TO RESTART searchd. Now there is an additional problem. For example i query "blagodaru" in russian. the search returns all results including "blagodara" (which is correct too). but the BuildExcerpts doesn't select the "blagodara" with <b></b>. (I'm in UTF mode). |
|
shodan
Name: Andrew Aksyonoff |
to: ethaniel, 2006-09-05 12:06:29
| reply! > the search returns all results including "blagodara" (which is correct too). > but the BuildExcerpts doesn't select the "blagodara" with <b></b>. This is another feature missing from excerpts generator: as of 0.9.6, it doesn't support stemming. Will hopefully be fixed in next release as well. |
|
ethaniel
Name: Ethaniel |
to: shodan, 2006-09-05 13:44:45
| reply! > Will hopefully be fixed in next release as well. thanks alot :) when are you planning to make the next release? I'd even love to donate someday - your search is perfect for the time being. |
|
shodan
Name: Andrew Aksyonoff |
to: ethaniel, 2006-09-06 07:06:09
| reply! > > Will hopefully be fixed in next release as well. > > thanks alot :) when are you planning to make the next release? Somewhere this month. > your search is perfect for the time being. Thanks :) |
|
dweis
Name: Tristan |
to: shodan, 2006-11-05 11:51:43
| reply! > > > Will hopefully be fixed in next release as well. > > > > thanks alot :) when are you planning to make the next release? > > Somewhere this month. > > > your search is perfect for the time being. > > Thanks :) I couldn't yet tried : is there some improvement about excerpt with 0.9.7 RC1 ? |
|
shodan
Name: Andrew Aksyonoff |
to: dweis, 2006-11-06 06:58:05
| reply! > I couldn't yet tried : is there some improvement about excerpt with 0.9.7 I fixed SBCS excerpts after 0.9.7-rc1. The patch is available upon request. :) |
|
dweis
Name: Tristan |
to: shodan, 2006-11-06 14:38:12
| reply! > > I couldn't yet tried : is there some improvement about excerpt with 0.9.7 > > I fixed SBCS excerpts after 0.9.7-rc1. Thanks, that's a good news ;) > The patch is available upon request. :) I'll wait the 0.9.7 final :) |
Common forum | 1 | 2 | 3 | 4 | 5 | ... | 449 | 450 | 451 | 452 | next »» | Create new thread