Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting properties as unescaped UTF-16 #1

Open
clehner opened this issue Dec 29, 2020 · 0 comments
Open

Sorting properties as unescaped UTF-16 #1

clehner opened this issue Dec 29, 2020 · 0 comments

Comments

@clehner
Copy link

clehner commented Dec 29, 2020

Hi,

RFC 8785 says properties should be encoded as UTF-16 for sorting in JCS:

Property name strings to be sorted are formatted as arrays of
UTF-16 [UNICODE] code units. The sorting is based on pure value
comparisons, where code units are treated as unsigned integers,
independent of locale settings.

And the sorting should be by their underlying value rather than the escaped form:

The sorting process is applied to property name strings in their
"raw" (unescaped) form. That is, a newline character is treated
as U+000A.

https://tools.ietf.org/html/rfc8785#section-3.2.3

I think our sorting comes from BTreeMap. But the key type is Vec<u8>. So I think the properties are being sorted in UTF-8, and maybe in their escaped form - I'm not sure.

I noticed this via a test case "Test tjs13 Transform JSON literal with wierd canonicalization" from the W3C Transform JSON-LD to RDF test cases:
https://w3c.github.io/json-ld-api/tests/toRdf-manifest.html#tjs13

I added that test case (not passing) to the repo here:
clehner@1130b21

Expected output:

{"\n":"Newline","\r":"Carriage Return","1":"One","</script>":"Browser Challenge","�":"Control�","ö":"Latin Small Letter O With Diaeresis","€":"Euro Sign","😂":"Smiley"}

Actual output:

{"1":"One","</script>":"Browser Challenge","\n":"Newline","\r":"Carriage Return","�":"Control�","ö":"Latin Small Letter O With Diaeresis","€":"Euro Sign","😂":"Smiley"}

I confirmed that the expected output is generated by a different JCS implementation, the json-canonicalize npm module.

I tried to make serde_jcs sort keys with UTF-16, without success:
clehner@2426dae

In my other JCS implemention in Rust I have it working using this line to sort object properties:

entries.sort_by_cached_key(|(key, _)| key.encode_utf16().collect::<Vec<u16>>());

Thanks for making serde_jcs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant