Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework libpng error handling to catch version mismatch errors as well #134

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mosra
Copy link
Owner

@mosra mosra commented Nov 26, 2022

Before 7e55228, there was an assert where even patch version jump caused the plugin to abort. Which happens pretty regularly, last with 1.6.38 -> 39, and is nothing that should be treated as a critical failure (I wonder what "libpng tutorial" was that taken from, sigh). The above-referenced commit removed this assert in order to unblock users, however it meant that a hypotetical upgrade from 1.6.x to 1.7 would cause the png_create_{read,write}_struct() to print for example

libpng warning: Application built with libpng-1.7.0 but running with 1.6.39

directly to the stdout (thus, impossible to see on e.g. Android), and then the plugin immediately does on CORRADE_INTERNAL_ASSERT(png.file). This attempts to fix it properly.

The png_create_{read,write}_struct() API accepts error/warning callbacks on its own, so supplying them there instead of in png_set_error_fn() allows it to direct the version mismatch error to us. Which is how it
should have been done in the first place (and again, I have no idea why the official libpng documentation doesn't show that, FFS).

Unfortunately, libpng directs the major/minor version mismatch fatal error to the warning callback, which is extremely confusing, and there's no way to detect and fixup the case apart from comparing the
string itself.

So that's what the first commit in this PR does. But then the changes cause a total breakage on clang-cl, and only there, with longjmp/setjmp causing the program to jump somewhere totally insane, hitting assertions in Debug destructor code paths that only ever get hit when !Debug is used (which it isn't). Random ideas:

  • https://en.cppreference.com/w/cpp/utility/program/longjmp says the following. The plugin attempts to have everything constructed before the first longjmp and destructed after the last setjmp, but maybe not correctly?

    If replacing of std::longjmp() with throw and setjmp() with catch would execute a non-trivial destructor for any automatic object, the behavior of such std::longjmp is undefined.

  • The following patch from @Squareys seems to make the test run again, however we don't yet understand why. I refuse to accept that the assertions, which may construct a Debug instance on stack, are what's causing problems -- that'd mean having to redesign the whole debug output printing...

diff --git a/src/MagnumPlugins/PngImporter/PngImporter.cpp b/src/MagnumPlugins/PngImporter/PngImporter.cpp
index a1b4b893..8c6d60c7 100644
--- a/src/MagnumPlugins/PngImporter/PngImporter.cpp
+++ b/src/MagnumPlugins/PngImporter/PngImporter.cpp
@@ -156,6 +156,10 @@ Containers::Optional<ImageData2D> PngImporter::doImage2D(UnsignedInt, UnsignedIn
     png.info = png_create_info_struct(png.file);
     CORRADE_INTERNAL_ASSERT(png.info);
 
+    // FFS
+    if (setjmp(jmpbuf))
+        return {};
+
     /* Set functions for reading */
     Containers::ArrayView<char> input = _in;
     png_set_read_fn(png.file, &input, [](const png_structp file, const png_bytep data, const png_size_t length) {
@@ -181,6 +185,8 @@ Containers::Optional<ImageData2D> PngImporter::doImage2D(UnsignedInt, UnsignedIn
         /* Types that can be used without conversion */
         case PNG_COLOR_TYPE_GRAY:
             CORRADE_INTERNAL_ASSERT(channels == 1);
+            if (setjmp(jmpbuf))
+                return {};
 
             /* Convert to 8-bit */
             if(bits < 8) {
@@ -221,6 +227,9 @@ Containers::Optional<ImageData2D> PngImporter::doImage2D(UnsignedInt, UnsignedIn
         /* LCOV_EXCL_STOP */
     }
 
+    if (setjmp(jmpbuf))
+        return {};
+
     /* Convert transparency mask to alpha */
     if(png_get_valid(png.file, png.info, PNG_INFO_tRNS)) {
         png_set_tRNS_to_alpha(png.file);
@@ -245,6 +254,8 @@ Containers::Optional<ImageData2D> PngImporter::doImage2D(UnsignedInt, UnsignedIn
 
     /* Initialize data array, align rows to four bytes */
     CORRADE_INTERNAL_ASSERT(bits >= 8);
+    if (setjmp(jmpbuf))
+        return {};
     const std::size_t stride = ((size.x()*channels*bits/8 + 3)/4)*4;
     data = Containers::Array<char>{stride*std::size_t(size.y())};
  • This all worked well before, which means there has to be something wrong in the patch? Is there a middle-ground version of it that behaves like before, but is also able to catch the initial version mismatch error message so it doesn't go to stdout?
  • libpng calls setjmp() on its own inside the create function (to catch the version error and return nullptr), could this be the sole reason for everything catching fire? why is that a problem just for clang-cl?
  • The second commits attempts to use my own jmpbuf because at the point where longjmp is called from the version mismatch error, png.file isn't filled in yet and thus png_jmpbuf(file) can't be called. But it makes no difference to the error on clang-cl.
  • It's not an issue specific to clang-cl 15+, it happens also on clang-cl 12. It was just that libpng got installed via vcpkg before, and vcpkg is a broken crap so it got gradually disabled for most of the CI jobs. This is fixed as of 1366ec5.
  • It doesn't seem to be an issue related to some minor ABI issue between MSVC and clang-cl. It happens even with libpng compiled with clang-cl.
  • There doesn't seem to be any known Windows-specific clang bug related to longjmp. The only vaguely relevant bugs are:
  • __builtin_frame_address(0) returning RSP for x86_64-windows (instead of RBP) llvm/llvm-project#51055
  • __builtin_setjmp saves frame pointer incorrectly for x86_64-w64-windows-gnu (MinGW) target llvm/llvm-project#49486

mosra and others added 4 commits November 26, 2022 08:59
Before 7e55228, there was an assert
where even patch version jump caused the plugin to abort. Which happens
pretty regularly, last with 1.6.38 -> 39, and is nothing that should be
treated as a critical failure (I wonder what "libpng tutorial" was that
taken from, sigh). The above-referenced commit removed this assert in
order to unblock users, however it meant that a hypotetical upgrade from
1.6.x to 1.7 would cause the png_create_{read,write}_struct() to print
for example

    libpng warning: Application built with libpng-1.7.0 but running with 1.6.39

directly to the stdout (thus, impossible to see on e.g. Android), and
then the plugin immediately does on CORRADE_INTERNAL_ASSERT(png.file).
This commit attempts to fix it properly.

The png_create_{read,write}_struct() API accepts error/warning callbacks
on its own, so supplying them there instead of in png_set_error_fn()
allows it to direct the version mismatch error to us. Which is how it
should have been done in the first place (and again, I have no idea why
the *official* libpng documentation doesn't show that, FFS).

Unfortunately, libpng directs the major/minor version mismatch *fatal
error* to the warning callback, which is extremely confusing, and
there's no way to detect and fixup the case apart from comparing the
string itself. Sigh.

Co-authored-by: Squareys <[email protected]>
…sues.

This makes the code no longer use png_jmpbuf() in an attempt to fix the
clang-cl error, as I suspected the misbehavior was due to longjmp being
possibly called even before setjmp() is called in our code.

But, going through libpng sources, it seems that for the version
mismatch check the png_create_*_struct() itself calls setjmp(), so there
should be no problem, and this whole extra complication is not needed.

And, indeed, it has no effect on the behavior -- the clang-cl build
still crashes or asserts due to some stack corruption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant