zbar incorrectly detects iso-8859-1 encoded QRCodes as big5 on musl libc #281

mgorny · 2024-02-03T17:05:26Z

When running on musl libc, segno incorrectly detects iso-8859-1 encoded QRcodes as "big5". I've originally noticed this through a test failure in segno package.

An example QRcode file is:

On a glibc system this is decoded correctly:

$ zbarimg test.png 
QR-Code:Märchenbücher
scanned 1 barcode symbols from 1 images in 0 seconds

On a musl system, it gets decoded as:

$ zbarimg test.png 
QR-Code:M酺chenb𡡷her
scanned 1 barcode symbols from 1 images in 0 seconds

From debugging, I've established that the problem lies in zbar trying big5 first, and expecting iconv() to fail for this string, as it does on glibc:

$ iconv -f utf8 -t iso-8859-1 <<<'Märchenbücher' | iconv -f big5 -t utf8
M酺chenbiconv: illegal input sequence at position 8

However, it doesn't fail on musl libc:

$ iconv -f utf8 -t iso-8859-1 <<<'Märchenbücher' | iconv -f big5 -t utf8
M酺chenb𡡷her

Confirmed with zbar as of a549566, musl 1.2.3 (Gentoo) and 1.2.4_git20230717 (Alpine).

The text was updated successfully, but these errors were encountered:

mgorny · 2024-02-03T17:13:07Z

Apparently the difference is that glibc rejects codes for "user-defined" Big5 characters, where musl uses them. If I shorten the string to Märchen, I can reproduce the same problem on a glibc system.

tormodvolden · 2024-08-22T08:15:59Z

Just to spell it out (please correct me if I am wrong), "ä" (a with umlaut) is 0xe4 in iso-8859-1, which is a valid start byte for Big5 (in the "Less frequently used characters" set). If the following byte is 0x40-0x7e or 0xa1-0xfe, it can be a valid Big5, so e.g. "är" will pass as big5, whereas "ä." (or a string ending with "ä") will fail.

So zbar will favour Big5 in such cases although it should have favoured iso-8859-1 which is the default for QR codes per the standard.

"ü" (u with umlaut) is 0xfc in iso-8859-1, which is a valid start byte for Big5 in the "Reserved for user-defined characters" set. Which fails on glibc but passes as Big5 on musl (independently of the following byte?).

tormodvolden · 2024-08-22T11:27:27Z

And if we were to ignore Big5, "är" would pass as valid SJIS which zbar currently favours over iso-8859-1.

Correct the range checking to evaluate both bytes together. The ranges are from the table at https://en.wikipedia.org/wiki/Big5 References mchehab#281 Signed-off-by: Tormod Volden <[email protected]>

tormodvolden · 2024-08-23T08:33:25Z

The wrong detection as big5 is also reported in #212.

tormodvolden mentioned this issue Aug 23, 2024

Allow Sbinary and --xml together #226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zbar incorrectly detects iso-8859-1 encoded QRCodes as big5 on musl libc #281

zbar incorrectly detects iso-8859-1 encoded QRCodes as big5 on musl libc #281

mgorny commented Feb 3, 2024

mgorny commented Feb 3, 2024

tormodvolden commented Aug 22, 2024

tormodvolden commented Aug 22, 2024

tormodvolden commented Aug 23, 2024

zbar incorrectly detects iso-8859-1 encoded QRCodes as big5 on musl libc #281

zbar incorrectly detects iso-8859-1 encoded QRCodes as big5 on musl libc #281

Comments

mgorny commented Feb 3, 2024

mgorny commented Feb 3, 2024

tormodvolden commented Aug 22, 2024

tormodvolden commented Aug 22, 2024

tormodvolden commented Aug 23, 2024