Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Unicode output in cabal-testsuite #10387

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

9999years
Copy link
Collaborator

@9999years 9999years commented Sep 27, 2024

System.Process.createPipe calls (through many intermediaries) GHC.IO.Handle.FD.fdToHandle, whose documentation says:

Makes a binary Handle. This is for historical reasons; it should
probably be a text Handle with the default encoding and newline
translation instead.

The documentation for System.IO.hSetBinaryMode says:

This has the same effect as calling hSetEncoding with char8, together
with hSetNewlineMode with noNewlineTranslation.

But this is a lie, and Unicode written to or read from binary handles is always encoded or decoded as Latin-1, which is always the wrong choice.

Therefore, we explicitly set the output to UTF-8 to keep it consistent between platforms and correct on all modern computers.

See: https://gitlab.haskell.org/ghc/ghc/-/issues/25307


  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

`System.Process.createPipe` calls (through many intermediaries)
`GHC.IO.Handle.FD.fdToHandle`, whose documentation says:

> Makes a binary Handle. This is for historical reasons; it should
> probably be a text Handle with the default encoding and newline
> translation instead.

The documentation for `System.IO.hSetBinaryMode` says:

> This has the same effect as calling `hSetEncoding` with `char8`, together
> with `hSetNewlineMode` with `noNewlineTranslation`.

But this is a lie, and Unicode written to or read from binary handles is
always encoded or decoded as Latin-1, which is always the wrong choice.

Therefore, we explicitly set the output to UTF-8 to keep it consistent
between platforms and correct on all modern computers.

See: https://gitlab.haskell.org/ghc/ghc/-/issues/25307
@9999years
Copy link
Collaborator Author

Note: We set LC_ALL=C in testEnv here:

[ ("LC_ALL", Just "C")

We might want to set GHC_NO_UNICODE=1 to avoid Unicode output entirely in some cases:

https://ghc.gitlab.haskell.org/ghc/doc/users_guide/using.html#envvar-GHC_NO_UNICODE

@9999years 9999years marked this pull request as ready for review September 27, 2024 18:25
@ulysses4ever
Copy link
Collaborator

We set LC_ALL=C in testEnv

I wonder if we should drop it (or set a UTF8-based locale explicitly). It's 2024, and getting out of the way trying to avoid Unicode doesn't sound like a good idea.

Copy link
Collaborator

@geekosaur geekosaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't affect any test output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants