Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json.parse does not handle surrogate pairs in string literals #5445

Open
3 tasks done
eric-wieser opened this issue Sep 24, 2024 · 0 comments
Open
3 tasks done

Json.parse does not handle surrogate pairs in string literals #5445

eric-wieser opened this issue Sep 24, 2024 · 0 comments
Labels
bug Something isn't working P-high We will work on this issue

Comments

@eric-wieser
Copy link
Contributor

eric-wieser commented Sep 24, 2024

Prerequisites

Please put an X between the brackets as you perform the following steps:

Description

The Json parser does not correctly handle surrogate pairs

Context

ASCII-encoded JSON (the default) serialized with Python is sometimes corrupted when loaded in Lean.

Steps to Reproduce

  1. Run the following:
import Lean

-- ok: "\u0000\u0000"
#eval Lean.Json.parse "\"\\ud80f\\udf00\""
  1. Compare to Python
>>> import json
>>> json.loads("\"\\ud80f\\udf00\"")
'\U00013f00'

Expected behavior: The escaped UTF-16 surrogate pair should be combined into a single unicode codepoint. From the JSON RFC:

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".

Actual behavior: The characters become null.

Versions

4.12.0-rc1

Additional Information

[Additional information, configuration or data that might be necessary to reproduce the issue]

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

@eric-wieser eric-wieser added the bug Something isn't working label Sep 24, 2024
@leanprover-bot leanprover-bot added the P-high We will work on this issue label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-high We will work on this issue
Projects
None yet
Development

No branches or pull requests

2 participants