Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release a new version of unicode-data #123

Open
adithyaov opened this issue Jun 11, 2024 · 11 comments
Open

Release a new version of unicode-data #123

adithyaov opened this issue Jun 11, 2024 · 11 comments

Comments

@adithyaov
Copy link
Member

unicode-data-0.4.0.1's test cases seem to break with the newer GHCs. (newer base versions)
See: #118
I can confirm that this is the case for 9.10 and 9.8.
But the CIs for the latest master are passing so the problem seems to have been fixed.

The hackage has version-bounds for base that are incorrect. With base-4.20 above mentioned test fails

Can release a newer version of unicode-data with the fix included?
We can then, update the dependent packages accordingly.
Should we re-revise the version bounds on hackage?

@wismill
Copy link
Collaborator

wismill commented Jun 11, 2024

I am working on further improvements, but if you are in hurry you can release a minor version.

@adithyaov
Copy link
Member Author

I'll make a minor release for the time being. What do you suggest we do about the incorrect version bounds on hackage for v0.4.0.1?
Should we re-revise the version bounds or deprecate the version?

@wismill
Copy link
Collaborator

wismill commented Jun 12, 2024

About the tests: they probably fail because you are comparing to base which has a different Unicode version. I fixed these tests to make them pass when characters are unassigned or changed General Category. They will display a warning for such cases.

If you re-generate using ucd2haskell and bumping Unicode to 15.1 the latest release, tests should pass with base-4.20. So the release is not broken per se, only the test suite.

I am improving the lib before bumping to Unicode 15.1. Notably, I would like to reduce the Addr# blobs and to check the inlining pragmas.

@adithyaov
Copy link
Member Author

If you re-generate using ucd2haskell and bumping Unicode to 15.1 the latest release, tests should pass with base-4.20. So the release is not broken per se, only the test suite.

Gotcha, I'll make a minor release then.
Should I deprecate unicode-data-0.4.0.1? The version bounds are too lax and might result in undefined behaviour if anyone uses unicode primitives from both base and unicode-data simultaniously.

@wismill
Copy link
Collaborator

wismill commented Jun 12, 2024

So it makes sense to completely keep unicode-data in sync with base. We can
possibly make the version bounds for the base dependency restrictive.

I am leaning towards this too, because this may trigger much trickier bugs in workflows. I added tracking of Unicode version in the README, because comments in the code are not very discoverable.

The thing is, text uses case mappings from Unicode 14.0, independently of the version of base. So there is precedent, although this is not a good situation.

Well, the solution would be for everyone to use unicode-data, obviously 😅. Part of unicode-data has been merged into base (now in ghc-internal). Now I am thinking we could move this out from ghc-internal to create unicode-data-core as a new boot/core GHC library. But we should make base depend on it, so that what decides the Unicode version is not directly base anymore, but only unicode-data-core. Thus every package using base and unicode-data would share the same Unicode version. If we include complex case mappings, then make text depends on unicode-data-core as well.

That’s a huge change though, and this will have to go through CLC. But since there are already bits of unicode-data in ghc-internal and that text is desync for case mappings, I guess there will be no strong issue.

We already planned to change the versioning scheme to follow closely the one of Unicode. So I can see the following happening:

  • unicode-data-core-15.0.0
  • unicode-data-15.0, depends on unicode-data-core >= 15.0.0 && < 15.1.0
  • unicode-data-names-15.0, etc.

base, on the contrary, should have lax bounds on unicode-data-core. I do not expect the core API to change anytime soon, so something like unicode-data-core >= 15.0.0 may be enough.

Finally, if we go that road, that means unicode-data-core cannot depends on base anymore.


Will probably have to open a dedicated issue for this, sorry for the wall of text 😅

@wismill
Copy link
Collaborator

wismill commented Jun 12, 2024

Should I deprecate unicode-data-0.4.0.1? The version bounds are too lax and might result in undefined behaviour if anyone uses unicode primitives from both base and unicode-data simultaniously.

I would just fix the version bounds for base. I am just restarting to develop this lib after a long pause, so I am not sure it is in state for a release. I mean if you must, do it, but I am not satisfied with some changes I have done a year ago.

@adithyaov
Copy link
Member Author

@wismill Looks like I somehow managed to delete a comment I made.

Re-writing the essence of comment for context:

Unicode version of base and unicode-data should be in sync as using both
unicode-data and base at once might have unexpected behaviour.
The end user does not care about the unicode version and would use primitives
from both unicode-data and base.

Looks like there is already a lot of thought put into keeping packages in sync.
Once we decide on how we want to do this, you can possibly offload some tasks
to me.

I would just fix the version bounds for base. I am just restarting to develop this lib after a long pause, so I am not sure it is in state for a release. I mean if you must, do it, but I am not satisfied with some changes I have done a year ago.

I will fix the version bounds for base in 0.4.0.1 and make a minor release 0.4.0.2
branching off 0.4.0.1 and updating the unicode version. The minor release is
required for the time being as we need to get streamly working with ghc > 9.4.

Again, thank you for the amazing work!

@wismill
Copy link
Collaborator

wismill commented Jun 12, 2024

updating the unicode version

@adithyaov this is a breaking change. You should bump to 0.5 then.

@Bodigrim
Copy link
Collaborator

To unblock downstream developments I made a revision: https://hackage.haskell.org/package/unicode-data-0.4.0.1/revisions/

@wismill
Copy link
Collaborator

wismill commented Jul 3, 2024

I sent an issue to the CLC, about a new core library unicode-data-core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants