-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Length of the "White Heavy Check Mark" #109055
Comments
Tagging subscribers to this area: @dotnet/area-system-console |
The character U+2705 is a single Code Unit in length, and .NET's The others cannot be represented as a single 16-bit code unit. They need to be represented by two (a surrogate pair). Let's use 🆕 as an example. In UTF-16 code units, it gets encoded as 0xD83C, 0xDD95. We can prove this by changing your example to Console.WriteLine($"The length of \u2705 is {"\u2705".Length}");
Console.WriteLine($"The length of \uD83C\uDD95 is {"\uD83C\uDD95".Length}"); In the console, this will print
So we can see that the "white heavy check mark" only takes a single 16-bit code unit to encode. The 🆕 requires two - so they are encoded as a surrogate pair. |
Length is determined by the number of UTF16 code units required to encode the character, it has nothing to do with the visual width or visual representation of the data. ✅ is Any Unicode code point in the range Not all visual glyphs themselves are represented as a single code point either. There existing combining characters which frequently get used, especially with emoji. For example, |
The |
Tagging subscribers to this area: @dotnet/area-system-text-encoding |
Description
The symbol ✅ (U+2705) is reported as having a length of 1, other visually-wide unicode characters have a length of 2. Abnormal behavior is observed in higher-layer components such as PowerShell/Windows-Terminal due to this behavior.
Reproduction Steps
In a dotnet core console application run:
Additionally, in a Windows Terminal + PowerShell Core instance run:
Confirm that the lengths between the C# and pwsh tests are in agreement.
Paste the line
'✅'.Length
back into the console and move the cursor to the end of the line. Notice that the cursor is placed before theh
character.Expected behavior
I suspect that ✅ should report a length of 2 like the other characters in order to resolve this issue.
Actual behavior
The length of ✅ will be reported as 1, while other characters (such as 🆕) are reported as 2.
When interacting on the console with a powershell statement containing ✅ (such as
'✅'.Length
) it will become apparent when navigating the cursor through this line, that there is a discrepancy between the presentation of the line and the lower level processing (readline).My understanding of this issue suggests that the correct behavior is for this symbol to be treated as two characters like the other symbols mentioned above, this would resolve the odd behavior seen at the terminal level, but perhaps this goes against standards or other complications.
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
Tested on:
The text was updated successfully, but these errors were encountered: