Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a specific png is being compressed into a jpeg... #27

Open
RamKromberg opened this issue Apr 24, 2022 · 4 comments
Open

a specific png is being compressed into a jpeg... #27

RamKromberg opened this issue Apr 24, 2022 · 4 comments

Comments

@RamKromberg
Copy link

RamKromberg commented Apr 24, 2022

I've found an odd case where a specific png was being converted into a jpeg when going through img2pdf and pdfScale.sh. I've uploaded it and a test script showcasing the issue over here: https://github.com/RamKromberg/pdfScale.sh_is_lossy

To be clear, I'm not sure if it's even a bug seeing how there's no talk of lossness in the pdfScale.sh docs... And I'm not even clear where the issue lies since both img2pdf and pdfScale.sh are showing some odd behavior with this specific sample... But I figured I'd ask you first since I can still extract a png out of the img2pdf's conversion (albeit, an oddly small one...) but not from the pdfScale.sh's pdf.

Hopefully not wasting your time...

@RamKromberg
Copy link
Author

I've put together an ad-hoc pymupdf script that I'm using as in-place replacement for single page inputs which doesn't show the bug: https://github.com/RamKromberg/pdfScale.sh_is_lossy/blob/main/pdfScale.py

I've updated the shell script to reflect it.

Anyhow, this should be enough to show the bug is in pdfScale.sh (or ghostscript) rather than img2pdf.

@tavinus
Copy link
Owner

tavinus commented May 28, 2022

Hi. The conversion is done by Ghostscript. We use -sDEVICE=pdfwrite, which has its own settings for PDF generation. The documentation does not provide much detail on how it treats images. I am guessing this mode will transform all images into JPG.

We do have settings for the resizing and resolutions though:

 --image-downsample <gs-downsample-method>
             Ghostscript Image Downsample Method
             Default: bicubic
             Options: subsample, average, bicubic
 --image-resolution <dpi>
             Resolution in DPI of color and grayscale images in output
             Default: 300

It does not seem like we can tell GS to use PNG instead of JPG in this mode though. If we change the -sDEVICE to one of the PNG modes, it will generate a PNG file, instead of a PDF file.

This post has good info into the problem. He also posted this little snipped to get all the possible options for -sDEVICE=pdfwrite:

 gs -sDEVICE=pdfwrite -o /dev/null -c "currentpagedevice { exch ==only ( ) print == } forall"

I went through all the options printed and could not find any option on image format.

He does mention a few options you could try adding to the GS call in order to avoid processing the images, but some of them will use raw images (which are quite big).

You can tell pdfScale to print its GS call and then add/edit the options you want to test.

 --dry-run, --simulate
             Just simulate execution. Will not run ghostscript
 --print-gs-call, --gs-call
             Print GS call to stdout. Will print at the very end between markers

I would be very interested to get any info from your tests and add options to pdfscale if needed.

@RamKromberg
Copy link
Author

The conversion is done by Ghostscript.

Yeah I suspected that must be the case.

I would be very interested to get any info from your tests and add options to pdfscale if needed.

I believe I found the underlying issue:

The ColorConversionStrategy switch can now be set to LeaveColorUnchanged, Gray, RGB, CMYK or UseDeviceIndependentColor. Note that, particularly for ps2write, LeaveColorUnchanged may still need to convert colors into a different space (ICCbased colors cannot be represented in PostScript for example). ColorConversionStrategy can be specified either as; a string by using the -s switch (-sColorConversionStrategy=RGB) or as a name using the -d switch (-dColorConversionStrategy=/RGB).

( https://www.ghostscript.com/doc/9.54.0/VectorDevices.htm )

That is, PostScript itself, as in, the script and format rather than the GhostScript implementation, doesn't support ICC profiles and leaves the color space conversion to the implementation. So, even if GhostScript were kind enough to treat this issue as a bug / feature-request and apply the color profile to the png and output a png so we won't suffer from compression artifacts in that case, jpegs can also have embedded color profiles and applying them there can't be done losslessly...

In short, unless I'm missing something, I think I've hit dead end when it comes to GhostScript.

Anyhow, I'll add the raw ghostscript command-line to the test unit to show it's not pdfScale.sh's fault and throw-in a magick comparison demonstrating the introduction of compression artifacts.

@RamKromberg
Copy link
Author

p.s. I've updated the script at https://github.com/RamKromberg/pdfScale.sh_is_lossy and added the output samples and diffs with a README.md that explains the issue. Hopefully it will be of some use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants