Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform Unicode normalization #3

Open
mrumpf opened this issue Jun 25, 2012 · 0 comments
Open

Perform Unicode normalization #3

mrumpf opened this issue Jun 25, 2012 · 0 comments
Assignees

Comments

@mrumpf
Copy link
Member

mrumpf commented Jun 25, 2012

Perform Unicode normalization: http://ahinea.com/en/tech/accented-translate.html
This should be done before adding files to the Lucene index as otherwise searching will be hard for languages with diacritic characters.

But also filenames in the file-system might be written without any diacritic characters to allow easy file searches.

This can be implemented by using the Java5/ICU4J Library class Normalizer:

@ghost ghost assigned amandel Jun 25, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants