anonymous user

Forums   Register   Login   Forgot your login/password?   Search

Hebrew character support

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 512 | 513 | 514 | 515 | next »» | Create new thread

danielil

Name: Daniel Haviv
Posts: 7

2008-03-04 13:25:24 | reply!


Hello,
I'm trying to use sphinx for our mediawiki which has Hebrew articles.
I've added "05D0..05EA" to the charset_table but my search still yields none.
Both my DB and sphinx.conf are set to use utf-8.

Can anyone please help ?
thanks!
Daniel

Nordic

Posts: 299

to: danielil, 2008-03-05 16:26:02 | reply!


In your source definition did you do something like:

source common {
sql_query_pre = SET NAMES 'utf8'
}

This may or may not help ;-)

danielil

Name: Daniel Haviv
Posts: 7

to: Nordic, 2008-03-05 19:07:53 | reply!


> In your source definition did you do something like:
>
> source common {
> sql_query_pre = SET NAMES 'utf8'
> }
>
> This may or may not help ;-)

tried that ... doesn't work.
thanks though

shodan

Name: Andrew Aksyonoff
Posts: 4360

to: danielil, 2008-03-06 03:24:47 | reply!


> > sql_query_pre = SET NAMES 'utf8'
> tried that ... doesn't work.

Did you reindex and restart searchd after that change?

Nordic

Posts: 299

to: danielil, 2008-03-06 13:46:08 | reply!


Some of my work on character code maps might be of help to you.

My applications which utilize Sphinx provide Hebrew search support through my character
maps available: http://speeple.com/unicode-maps.txt

The specific maps & range for Hebrew is:
# Hebrew*
U+FB1D->U+05D9, U+FB1F->U+05F2, U+FB20->U+05E2, U+FB21->U+05D0, U+FB22->U+05D3,
U+FB23->U+05D4, U+FB24->U+05DB, U+FB25->U+05DC, U+FB26->U+05DD, U+FB27->U+05E8,
U+FB28->U+05EA, U+FB2A->U+05E9, U+FB2B->U+05E9, U+FB2C->U+05E9, U+FB2D->U+05E9,
U+FB2E->U+05D0, U+FB2F->U+05D0, U+FB30->U+05D0, U+FB31->U+05D1, U+FB32->U+05D2,
U+FB33->U+05D3, U+FB34->U+05D4, U+FB35->U+05D5, U+FB36->U+05D6, U+FB38->U+05D8,
U+FB39->U+05D9, U+FB3A->U+05DA, U+FB3B->U+05DB, U+FB3C->U+05DC, U+FB3E->U+05DE,
U+FB40->U+05E0, U+FB41->U+05E1, U+FB43->U+05E3, U+FB44->U+05E4, U+FB46->U+05E6,
U+FB47->U+05E7, U+FB48->U+05E8, U+FB49->U+05E9, U+FB4A->U+05EA, U+FB4B->U+05D5,
U+FB4C->U+05D1, U+FB4D->U+05DB, U+FB4E->U+05E4, U+FB4F->U+05D0, U+05D0..U+05F2

* Signifies that improvements could possibly be made (my understand of Hebrew is limited).

danielil

Name: Daniel Haviv
Posts: 7

to: Nordic, 2008-03-06 23:01:38 | reply!


That did the trick!
thanks a lot Nordic, I was ready to give up sphinx :)

> Some of my work on character code maps might be of help to you.
>
> My applications which utilize Sphinx provide Hebrew search support through my character
> maps available: http://speeple.com/unicode-maps.txt
>
> The specific maps & range for Hebrew is:
> # Hebrew*
> U+FB1D->U+05D9, U+FB1F->U+05F2, U+FB20->U+05E2, U+FB21->U+05D0, U+FB22->U+05D3,
> U+FB23->U+05D4, U+FB24->U+05DB, U+FB25->U+05DC, U+FB26->U+05DD, U+FB27->U+05E8,
> U+FB28->U+05EA, U+FB2A->U+05E9, U+FB2B->U+05E9, U+FB2C->U+05E9, U+FB2D->U+05E9,
> U+FB2E->U+05D0, U+FB2F->U+05D0, U+FB30->U+05D0, U+FB31->U+05D1, U+FB32->U+05D2,
> U+FB33->U+05D3, U+FB34->U+05D4, U+FB35->U+05D5, U+FB36->U+05D6, U+FB38->U+05D8,
> U+FB39->U+05D9, U+FB3A->U+05DA, U+FB3B->U+05DB, U+FB3C->U+05DC, U+FB3E->U+05DE,
> U+FB40->U+05E0, U+FB41->U+05E1, U+FB43->U+05E3, U+FB44->U+05E4, U+FB46->U+05E6,
> U+FB47->U+05E7, U+FB48->U+05E8, U+FB49->U+05E9, U+FB4A->U+05EA, U+FB4B->U+05D5,
> U+FB4C->U+05D1, U+FB4D->U+05DB, U+FB4E->U+05E4, U+FB4F->U+05D0, U+05D0..U+05F2
>
> * Signifies that improvements could possibly be made (my understand of Hebrew is limited).

Nordic

Posts: 299

to: danielil, 2008-03-07 11:53:53 | reply!


> That did the trick!
> thanks a lot Nordic, I was ready to give up sphinx :)

No problem.

Are you a fluent Hebrew speaker? Does my character map provide OK results?

danielil

Name: Daniel Haviv
Posts: 7

to: Nordic, 2008-03-07 13:10:35 | reply!


> > That did the trick!
> > thanks a lot Nordic, I was ready to give up sphinx :)
>
> No problem.
>
> Are you a fluent Hebrew speaker? Does my character map provide OK results?
Yes. I'm from Israel.

Your mapping (I only took the range defintion of U+05D0..U+05F2) suits my needs perfectly
(I embeded sphinx into mediawiki).
If I'll encounter any mismatch you'll be updated.

thanks again!

artomb

Name: Albert Lombarte
Posts: 1

to: Nordic, 2012-12-20 09:44:49 | reply!


The charset table worked for me, but I'd like to add a comment.

If your indexed text contains non Hebrew characters it won't be impossible to find those
documents. So if you want to find english names or similar you should consider adding
latin characters as well, a basic example (without letters like Г‚,Г©, etc..) would be:

0..9, A..Z->a..z, _, -, a..z, U+FB1D->U+05D9, U+FB1F->U+05F2, U+FB20->U+05E2,
U+FB21->U+05D0, U+FB22->U+05D3, U+FB23->U+05D4, U+FB24->U+05DB, U+FB25->U+05DC,
U+FB26->U+05DD, U+FB27->U+05E8, U+FB28->U+05EA, U+FB2A->U+05E9, U+FB2B->U+05E9,
U+FB2C->U+05E9, U+FB2D->U+05E9, U+FB2E->U+05D0, U+FB2F->U+05D0, U+FB30->U+05D0,
U+FB31->U+05D1, U+FB32->U+05D2, U+FB33->U+05D3, U+FB34->U+05D4, U+FB35->U+05D5,
U+FB36->U+05D6, U+FB38->U+05D8, U+FB39->U+05D9, U+FB3A->U+05DA, U+FB3B->U+05DB,
U+FB3C->U+05DC, U+FB3E->U+05DE, U+FB40->U+05E0, U+FB41->U+05E1, U+FB43->U+05E3,
U+FB44->U+05E4, U+FB46->U+05E6, U+FB47->U+05E7, U+FB48->U+05E8, U+FB49->U+05E9,
U+FB4A->U+05EA, U+FB4B->U+05D5, U+FB4C->U+05D1, U+FB4D->U+05DB, U+FB4E->U+05E4,
U+FB4F->U+05D0, U+05D0..U+05F2


Notice that I simply added "0..9, A..Z->a..z, _, -, a..z" in the beginning. That makes
possible to find strings like "Call of Duty Ч Ч—Ч©Ч‘ Ч›Ч™Ч•Чќ ЧњЧћЧ©Ч—Ч§ Ч”ЧћЧњЧ—ЧћЧ” "

Thank you

> Some of my work on character code maps might be of help to you.
>
> My applications which utilize Sphinx provide Hebrew search support through my character
> maps available: http://speeple.com/unicode-maps.txt
>
> The specific maps & range for Hebrew is:
> # Hebrew*
> U+FB1D->U+05D9, U+FB1F->U+05F2, U+FB20->U+05E2, U+FB21->U+05D0, U+FB22->U+05D3,
> U+FB23->U+05D4, U+FB24->U+05DB, U+FB25->U+05DC, U+FB26->U+05DD, U+FB27->U+05E8,
> U+FB28->U+05EA, U+FB2A->U+05E9, U+FB2B->U+05E9, U+FB2C->U+05E9, U+FB2D->U+05E9,
> U+FB2E->U+05D0, U+FB2F->U+05D0, U+FB30->U+05D0, U+FB31->U+05D1, U+FB32->U+05D2,
> U+FB33->U+05D3, U+FB34->U+05D4, U+FB35->U+05D5, U+FB36->U+05D6, U+FB38->U+05D8,
> U+FB39->U+05D9, U+FB3A->U+05DA, U+FB3B->U+05DB, U+FB3C->U+05DC, U+FB3E->U+05DE,
> U+FB40->U+05E0, U+FB41->U+05E1, U+FB43->U+05E3, U+FB44->U+05E4, U+FB46->U+05E6,
> U+FB47->U+05E7, U+FB48->U+05E8, U+FB49->U+05E9, U+FB4A->U+05EA, U+FB4B->U+05D5,
> U+FB4C->U+05D1, U+FB4D->U+05DB, U+FB4E->U+05E4, U+FB4F->U+05D0, U+05D0..U+05F2
>
> * Signifies that improvements could possibly be made (my understand of Hebrew is limited).

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 512 | 513 | 514 | 515 | next »» | Create new thread