Update irc.py to catch invalid UTF-8, and then try Latin-1 #4

alicetries · 2022-06-26T00:35:48Z

Catches invalid UTF-8 (for example, if a multi-byte unicode character gets cut off.
For example, if "\xF0\x9F\x92\x9C" is cut off and only shows the first character for example, it will fail to decode as UTF-8, but Latin-1 will always succeed

This should fix issue #3, following a similar pattern to how irctokens handles fallback encodings https://github.com/jesopo/irctokens/blob/master/irctokens/line.py#L105-L108

Catches invalid UTF-8 (for example, if a multi-byte unicode character gets cut off. For example, if "\xF0\x9F\x92\x9C" is cut off and only shows the first character for example, it will fail to decode as UTF-8, but Latin-1 will always succeed

alicetries closed this by deleting the head repository Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update irc.py to catch invalid UTF-8, and then try Latin-1 #4

Update irc.py to catch invalid UTF-8, and then try Latin-1 #4

alicetries commented Jun 26, 2022

Update irc.py to catch invalid UTF-8, and then try Latin-1 #4

Update irc.py to catch invalid UTF-8, and then try Latin-1 #4

Conversation

alicetries commented Jun 26, 2022