Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zbar incorrectly detects iso-8859-1 encoded QRCodes as big5 on musl libc #281

Open
mgorny opened this issue Feb 3, 2024 · 4 comments
Open

Comments

@mgorny
Copy link

mgorny commented Feb 3, 2024

When running on musl libc, segno incorrectly detects iso-8859-1 encoded QRcodes as "big5". I've originally noticed this through a test failure in segno package.

An example QRcode file is:
test

On a glibc system this is decoded correctly:

$ zbarimg test.png 
QR-Code:Märchenbücher
scanned 1 barcode symbols from 1 images in 0 seconds

On a musl system, it gets decoded as:

$ zbarimg test.png 
QR-Code:M酺chenb𡡷her
scanned 1 barcode symbols from 1 images in 0 seconds

From debugging, I've established that the problem lies in zbar trying big5 first, and expecting iconv() to fail for this string, as it does on glibc:

$ iconv -f utf8 -t iso-8859-1 <<<'Märchenbücher' | iconv -f big5 -t utf8
M酺chenbiconv: illegal input sequence at position 8

However, it doesn't fail on musl libc:

$ iconv -f utf8 -t iso-8859-1 <<<'Märchenbücher' | iconv -f big5 -t utf8
M酺chenb𡡷her

Confirmed with zbar as of a549566, musl 1.2.3 (Gentoo) and 1.2.4_git20230717 (Alpine).

@mgorny
Copy link
Author

mgorny commented Feb 3, 2024

Apparently the difference is that glibc rejects codes for "user-defined" Big5 characters, where musl uses them. If I shorten the string to Märchen, I can reproduce the same problem on a glibc system.

@tormodvolden
Copy link

Just to spell it out (please correct me if I am wrong), "ä" (a with umlaut) is 0xe4 in iso-8859-1, which is a valid start byte for Big5 (in the "Less frequently used characters" set). If the following byte is 0x40-0x7e or 0xa1-0xfe, it can be a valid Big5, so e.g. "är" will pass as big5, whereas "ä." (or a string ending with "ä") will fail.

So zbar will favour Big5 in such cases although it should have favoured iso-8859-1 which is the default for QR codes per the standard.

"ü" (u with umlaut) is 0xfc in iso-8859-1, which is a valid start byte for Big5 in the "Reserved for user-defined characters" set. Which fails on glibc but passes as Big5 on musl (independently of the following byte?).

@tormodvolden
Copy link

And if we were to ignore Big5, "är" would pass as valid SJIS which zbar currently favours over iso-8859-1.

tormodvolden added a commit to tormodvolden/zbar that referenced this issue Aug 23, 2024
Correct the range checking to evaluate both bytes together.

The ranges are from the table at https://en.wikipedia.org/wiki/Big5

References mchehab#281

Signed-off-by: Tormod Volden <[email protected]>
@tormodvolden
Copy link

The wrong detection as big5 is also reported in #212.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants