-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF issues: PDF Linearization data has bad errors #204
Comments
To solve this issue, I'm in trying to understand how to catch errors in PDF Linearization data... |
Linearization data checking by
|
For the PDF WARNING: rice-en.final.presentation.pdf: end of first page section (/E) mismatch: /E = 103552; computed = 132030..132031
WARNING: rice-en.final.presentation.pdf: first page object offset mismatch
WARNING: rice-en.final.presentation.pdf: object count mismatch for page 0: hint table = 47; computed = 49
WARNING: rice-en.final.presentation.pdf: page 1: shared object 9: in computed list but not hint table
WARNING: rice-en.final.presentation.pdf: page 1: shared object 10: in computed list but not hint table
WARNING: rice-en.final.presentation.pdf: page 1: shared object 104: in computed list but not hint table
WARNING: rice-en.final.presentation.pdf: page 1: shared object 105: in computed list but not hint table
...
WARNING: rice-en.final.presentation.pdf: page 1: shared object 146: in computed list but not hint table
WARNING: rice-en.final.presentation.pdf: page 1: shared object 147: in computed list but not hint table
qpdf: operation succeeded with warnings |
I've made the simple experiment to check the PDF generation by Adobe Acrobat vs.
No linearization ( word_simple_pdf_linearized.pdf
I.e. There are two cases:
|
@petervwyatt to be on the same track, which tool did you use for PDF Linearization checking? |
The easiest is probably QPDF (https://github.com/qpdf/qpdf/releases) using Given that standards are usually official "documents of record" and that all versions of PDF/A explicitly prohibit Linearized PDF for valid technical reasons, I would strongly recommend not bothering to output it at all. A lot of implementations just ignore it anyway because (a) it is often wrong or out-of-date; (b) what was previously documented vs implemented by major vendors was different anyway (only corrected in the PDF 2.0 spec); (c) there is no requirement for PDF processors to implement it (i.e. it is entirely optional); and (d) it is a known source of "parser differentials" vulns. With today's super-fast internet speeds (vs 25 years ago when it was invented!) and a modern efficient PDF (ie. compressed cross-reference streams and compressed object streams), 95% of PDFs won't benefit. |
@petervwyatt thank you! |
PDF Linearization data has bad errors - this is optional data so you could always not generate until the bugs are fixed.
From #201
The text was updated successfully, but these errors were encountered: