Skip to content
This repository has been archived by the owner on Dec 22, 2023. It is now read-only.

Port PDF generation to reportlab #59

Merged
merged 24 commits into from
Sep 21, 2023

Conversation

janopae
Copy link

@janopae janopae commented Mar 13, 2023

Since Trelby's PDF generation was implemented in the mid-2000s, it created PDFs by concatenating strings that contained raw PDF commands, and writing the concatenated string into the output file. This PR makes Trelby use the Reportlab PDF Toolkit instead, which offers higher-level abstractions (like drawString or drawCrircle) and solves a lot of problems underneath. This includes handling of non-latin characters in fonts, which is an old feature request that has been reported to Trelby multiple times over the years (most recently in #61). It also enables us to throw away a lot of code that was very hard to maintain, not only because it was very complicated, but also because it required a lot of in-depth knowledge about the PDF standard in order to understand it.

This also fixes #9.

Testing

When developing, I used a script to generate a bunch of different PDFs with the old code and with the code I developed. Whenever I made a change, I compared both versions to each other (both visually and in a text editor) and tried to resolve all differences. Despite developing with a certain scrutiny, this is a huge change that affects a huge complex chunk of the program, so I'm a bit worried whether I missed some edge cases.

If you want to help testing, you could check out this branch and generate some PDFs with trelby, or your could use my script to generate PDFs, and cherry-pick the script on master and do the same. By editing the tests/fixtures/test.trelby file, you can test other scenarios with the script. However, it might be easier to test certain scenarios in the application itself, such as changing fonts etc.

If you don't want to check out my code and execute it, you could still help testing, by checking if you notice any differences in these test PDFs my test script generated with the old and the new code:

Before this PR: screenplay_custom_font.pdf, screenplay.pdf, scenereport.pdf, dialoguechart.pdf

With reportlab: screenplay_custom_font.pdf, screenplay.pdf, scenereport.pdf, dialoguechart.pdf

Note that in all cases, the file sizes are smaller now.

Known limitations

While this resolves #61 for all text on the screenplay itself (including title pages), I gave up with my attempts at trying to position the watermarks using reportlabs higher-level functions, and just passed the literal PDF commands generated by the old code through reportlab. This means that special characters in watermarks (Tools > Generate watermarked PDFs) still have the same problems with special chars as before.

Also, some characters are just not present in the fonts included into the PDF standard. But they can be generated using a custom font, which is proven by this example file: screenplay_custom_font.pdf

Known Regressions

OpenAction

Originally, Trelby inserts an \OpenAction command in some cases, that should make the PDF jump directly to a specific page when it is opened (this is useful e. g. for previewing the current page). This sadly does not seem to be easily achievable in reportlab – it does not support \OpenAction. However, even with PDFs generated before that change that have the \OpenAction, it didn't work in any of the PDF readers I tried (Evince, Okular, pdf.js in Firefox). I think it is likely that this feature only ever worked in Adobe Acrobat.

I could either try to implement this by somehow inserting the \OpenAction into the PDF after it was generated (either using another library or by hacking it into the generated binary file). However, I'm a little drawn to abolish this feature instead. I'd like to hear some feedback by users and the maintainer on how they think about this.

Custom font name without font file

Also previously, it was possible to set a custom font name without specifying a path to the font file. This lead to a PDF file that referenced the font, butonly worked if the PDF file was displayed on a system where that font is present. This is not that easy to do with reportlab. But I think a file that only works if a font is installed on the system goes against everything PDF is about (creating portable files). So I think it's fine if we stop supporting this. However, this needs refinement in the settings GUI, preventing you from setting a font name without a file path.

It seems like we don't need the file name if we don't save the file, and we can also get the string from the reportlab canvas. This is closer to what we do now and doesn't require us to restructure a lot of the code.
…rough

We don't yet take any advantage of reportlab, and the PDFs are broken as reportlab creates its own instructions at the beginning and ignores some of the ones we pass to it
When using reportlab, this doesn't have an effect anyway, as reportlab takes care of the xref table. It seems to ignore all xref related commands inserted manually.
@@ -1037,7 +1021,6 @@ def generatePMLPage(self, pager, pageNr, forPDF, doExtra):
s = text

to.toc = pml.TOCItem(s, to)
pager.doc.addTOC(to.toc)
Copy link
Author

@janopae janopae Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding all table of contents items to the document is not necessary any more – it is now enough to have them on the text objects that should appear in the table of contents, as reportlab automatically takes care of adding new TOC entries to the beginning of the file.

@limburgher
Copy link
Owner

Thank you, this is promising. I get the following when I try to print or export with the code at current.

[gwyn@gwythsefyll trelby]$ ./bin/trelby
11:46:12: Debug: Adding duplicate image handler for 'PNG file'
11:46:12: Debug: Adding duplicate image handler for 'JPEG file'
11:46:15: Debug: window wxMenuBar@0x563d9b8fa800 ("menubar") lost focus even though it didn't have it
11:46:17: Debug: window wxMenuBar@0x563d9b8fa800 ("menubar") lost focus even though it didn't have it
Traceback (most recent call last):
File "/home/gwyn/dev/trelby/janopae/trelby/src/trelby.py", line 2267, in OnExportScript
self.panel.ctrl.OnExportScript()
File "/home/gwyn/dev/trelby/janopae/trelby/src/trelby.py", line 1123, in OnExportScript
data = sp.generatePDF(True)
^^^^^^^^^^^^^^^^^^^^
File "/home/gwyn/dev/trelby/janopae/trelby/src/screenplay.py", line 830, in generatePDF
return pdf.generate(self.generatePML(isExport))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gwyn/dev/trelby/janopae/trelby/src/pdf.py", line 21, in generate
return tmp.generate()
^^^^^^^^^^^^^^
File "/home/gwyn/dev/trelby/janopae/trelby/src/pdf.py", line 219, in generate
initialFontName=self.getFontForFlags(pml.NORMAL),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gwyn/dev/trelby/janopae/trelby/src/pdf.py", line 291, in getFontForFlags
raise Exception('Font name %s is not known and no font file name provided' % customFontInfo.name)
Exception: Font name CourierScreenplay is not known and no font file name provided

@janopae
Copy link
Author

janopae commented Mar 13, 2023

Thanks for trying :) Could it be that you have configured the font to be "CourierScreenplay" without providing a path for the font file?

I just read #9, and saw that setting a font name without a file path was provided there as a workaround. It looks like the old code would use a system font then. I didn't know that setting system fonts by leaving out the file path was possible and did not implement it. So this is another regression, I'm glad that you found it.

pe.mm2points(pmlOp.width),
pe.mm2points(pmlOp.height),
pmlOp.fillType == pml.NO_FILL or pmlOp.fillType == pml.STROKE_FILL,
pmlOp.fillType == pml.FILL or pmlOp.fillType == pml.STROKE_FILL
Copy link
Author

@janopae janopae Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag fillType with the options NO_FILL, STROKE_FILL and FILL on the PML Operation doesn't make so much sense with the reportlab API – 2 seperate boolans for showing the stroke and showing a fill would not only save us from this translation from a flag to 2 bools, it would also make PML more flexible (and probably easier to understand). Maybe this could also be done in this PR.

@Cerabbite
Copy link

I could either try to implement this by somehow inserting the \OpenAction into the PDF after it was generated (either using another library or by hacking it into the generated binary file).

I think it is better to just leave the feature out. It is most likely going to be a lot of work to implement it and I don't think that many people use it anyways especially since you cannot get it to work with the current implementation.

@limburgher
Copy link
Owner

I could either try to implement this by somehow inserting the \OpenAction into the PDF after it was generated (either using another library or by hacking it into the generated binary file).

I think it is better to just leave the feature out. It is most likely going to be a lot of work to implement it and I don't think that many people use it anyways especially since you cannot get it to work with the current implementation.

I concur.

This was referenced Mar 19, 2023
1.5 was the version we previously generated, but the "literal" PDF instruction was ignored by reportlab
Passing the canvas additionally to the output string, we can smoothly transition all operations from rendering to the output string to rendering to the reportlab canvas.

A smooth transitions allows to test the effects of each change individually.

Of course, we will later-on remove  the output string parameter when it's not needed anymore.
This was the last operation 🎉
Doesn't make a visible difference, but if you opened the PDF in a text editor before, you'd see Helvetica as an additional BaseFont
# Conflicts:
#	src/characterreport.py
#	src/dialoguechart.py
#	src/gutil.py
#	src/locationreport.py
#	src/pdf.py
#	src/scenereport.py
#	src/screenplay.py
#	src/scriptreport.py
@janopae janopae force-pushed the port-pdf-generation-to-reportlab branch from a081abc to f4651ea Compare March 26, 2023 22:08
@janopae janopae marked this pull request as ready for review September 17, 2023 14:01
@janopae
Copy link
Author

janopae commented Sep 17, 2023

@limburgher When exporting with old settings that only set a font name (and not a font file), instead of printing an error to the console, Trelby should now show a message window that explains what's wrong and what to do, and then continue exporting by falling back to the default font.

Could you try if it handles the situation gracefully for you now?

@limburgher
Copy link
Owner

It does!

@limburgher
Copy link
Owner

Is this ready to merge?

@janopae
Copy link
Author

janopae commented Sep 21, 2023

Yes it is!

@limburgher limburgher merged commit ff5ae31 into limburgher:main Sep 21, 2023
0 of 4 checks passed
@limburgher
Copy link
Owner

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support non-Latin1 characters can't change PDF fonts
3 participants