Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoding into a bytearray then using python's .decode() method #43

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

willlahr
Copy link

No description provided.

@willlahr
Copy link
Author

fix for #42

@ccrighton
Copy link

ccrighton commented Jun 28, 2024

This PR handles the #42 Multibyte utf-8 characters are urldecoded incorrectly bug. However, it will throw exceptions on non unicode encoding like %XX and invalid unicode encoding. I think it is worth making a bit more robust so that it doesn't throw exceptions on spurious input. I've attempted to do that with the following branch: https://github.com/ccrighton/phew/tree/utf8-urldecode.

Currently, this branch uses the official unicode replacement character 0xFFFD (? in diamond) in place of invalid utf-8 encoding. Perhaps it could ignore it. Either way it seems better than throwing an exception.

See https://github.com/ccrighton/phew/blob/utf8-urldecode/tests/urldecode_test.py for more detail on behaviour in the event of bad input.

I've also ensured that this code will run on micropython and cpython. PR #43 isn't cpython compatible.

If the consensus that throwing an exception is a better option, I'd still suggest making it cpython compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants