Tuesday, May 27, 2008

sphinx charset_table with unicode character folding

In a project at work I needed to have sphinx treat accented characters and their unaccented versions the same, eg: é is equivalent to e, etc. I took this list I found on the sphinx wiki and transformed it into a sphinx friendly charset_table.

Now when I do a search for the string "Héctor Lavoe" it matches the string "Hector Lavoe". Awesome!

Edit: fixed missing "
é" in first sentence.

3 comments:

Jökull said...

Thank you. Just made my day.

datadevil said...

thanks, I tried the same but somehow I couldn't get the format right, dunno why, but copy pasting it from your example works fine..probably an afternoon thing but still helpful!

MrBrown said...

Works like a charm, made my day too. Thanks for the tips!