Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine decomposibility check and combining class lookup #41

Open
harendra-kumar opened this issue May 8, 2020 · 5 comments
Open

Combine decomposibility check and combining class lookup #41

harendra-kumar opened this issue May 8, 2020 · 5 comments

Comments

@harendra-kumar
Copy link
Member

Currently we need to do three lookups:

  • is it decomposable?
  • if not decomposable:
    • is it combining?
    • combining class when reordering

We can have a single lookup table storing decomposability and combining class. This will get us all the information in one memory access. We may have to store the combining class in the buffer along with the char for later use when reordering is actually done.

It can potentially speed up both NFD and NFC normalizations. Whether it actually will and how much has to be seen by experimenting.

@harendra-kumar
Copy link
Member Author

If needed, in the reorder buffer we store the char + combining class in the higher order bits as a Word32 or Word, so simple word comparison can be used to sort.

@Bodigrim
Copy link
Collaborator

Bodigrim commented May 8, 2020

Sounds like a good idea. We can store "is decomposable" as combining class 255, so that it all boils down to a single long byte array, ~128 Kb. Stil fits CPU cache, I believe.

@harendra-kumar
Copy link
Member Author

Do you want to try this out? It will be exciting to see where we can go with this.

@Bodigrim
Copy link
Collaborator

I can probably migrate getCombiningClass, isCombining and isDecomposable to lookup in the same bytearray, but would not have time to go further and switch Data.Unicode.Internal.NormalizeStream to a single lookup.

@harendra-kumar
Copy link
Member Author

I can try that. You can push your changes to a branch in this repo, we can collaborate on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants