In a project at work I needed to have sphinx treat accented characters and their unaccented versions the same, eg: é
is equivalent to e, etc. I took this list I found on the sphinx wiki and transformed it into a sphinx friendly charset_table.
Now when I do a search for the string "Héctor Lavoe" it matches the string "Hector Lavoe". Awesome!
Edit: fixed missing "é" in first sentence.
7 comments:
Thank you. Just made my day.
thanks, I tried the same but somehow I couldn't get the format right, dunno why, but copy pasting it from your example works fine..probably an afternoon thing but still helpful!
Works like a charm, made my day too. Thanks for the tips!
Thanks for this. But what about searching with punctuation? Currently I am trying to search for: aaa (bbb)
.. and it doesn't match with anything. It should.
Do I need to add quotes, brackets etc to the charset_table or ignore_words?
How would this scale with very large systems? Do you think it would cause significant delays in fetching?
Thanks a lot ... I was just using sphinx in a metalanguage site [chinese, armenian , japanese..] ..got nut until I find this nice resource... thanks a lot ..
hi!
could you please repost your charset_table? (link in post is broken)
thanks!
Post a Comment