Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF tagging: Some bad semantics: footnote, P tags, etc #1003

Closed
ronaldtse opened this issue May 29, 2023 · 17 comments
Closed

PDF tagging: Some bad semantics: footnote, P tags, etc #1003

ronaldtse opened this issue May 29, 2023 · 17 comments
Assignees
Labels

Comments

@ronaldtse
Copy link
Contributor

Various other bad semantics here and there e.g. footnote indicator marked as “Note”, various P tags covering >1 para, invalid reading order in various places seemingly associated with cross-document links, etc.

From: metanorma/mn2pdf#201

@Intelligent2013
Copy link
Contributor

Example P in P:
image

This occurs due the block element note inside the block element p:

<p id="_24f2eb3c-d04a-ae7f-b7f6-d7508208e30d">The mass fraction of extraneous matter and defective kernels in husked and milled rice, whether or not parboiled, determined in accordance with <xref target="AnnexA">
		<span class="citeapp">Annex A</span>
	</xref>, shall not be greater than the values specified in <xref target="table1">
		<span class="citetbl">Table 1</span>
	</xref>.<note id="_37eebd65-fcfd-1ff3-167d-42106bc02dc2">
		<name>NOTE</name>
		<p id="_cffbd604-0bed-6331-229b-1db2259caaa8">Lower mass fractions of moisture are sometimes needed for certain destinations depending on the climate, duration of transport and storage. For further details, see <xref type="inline" target="ISO6322-1">ISO 6322-1</xref>, <xref type="inline" target="ISO6322-2">ISO 6322-2</xref> and <xref type="inline" target="ISO6322-3">ISO 6322-3</xref>.</p>
	</note>
</p>

The fo:block for note should be marked with role="Note".

@Intelligent2013
Copy link
Contributor

The fo:block for note should be marked with role="Note".

No, it's wrong.

From Tagged PDF Best Practice Guide::
image

I.e. note should be moved after p (outside of p). I have to fully refactored the XSLT (with more testing, visual checking, etc.)

@Intelligent2013
Copy link
Contributor

May be there is more simple solution - post-processing PDF tags tree editing via PDFBox after Apache FOP PDF generation. I'll study it.

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 15, 2023

Div...Div:
image
occurs due the helper fo:block-container for alignment:

<fo:block-container margin-left="0mm">
	<fo:block-container margin-left="0mm" margin-right="0mm">

To do:

  • add role="SKIP" for such fo:block-containers

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 23, 2023

  • empty tags should be removed
    For instance:
    image

These empty tags occurs due empty XSL-FO elements fo:block, by two reason:

  • fo:block intended for PDF layout
  • fo:block doesn't contain the text by some condition

There are two approach to remove empty tags:

  • don't add such tags, or pre-process tags tree - need deep dive into Apache FOP sources
  • add role="SKIP" for fo:block if it doesn't contain text - need big XSL-FO refactoring
    1st approach is more preferable, because in XSL-FO there are a lot of workaround solutions for Apache FOP issues.

@Intelligent2013
Copy link
Contributor

empty tags should be removed

In Apache FOP there is a special attribute:

To remove empty fo:blocks from the structure tree, you can set the keep-empty-tags to false.
<accessibility keep-empty-tags="false">true</accessibility>

Added.

@Intelligent2013
Copy link
Contributor

Current footnote tags:
image

Expected structure:

<P> {
	<Reference> {
		<Lbl>
		}
	<Note> {
		<Lbl>
		<P>
	}
}

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 25, 2023
@Intelligent2013
Copy link
Contributor

Tags tree for footnotes almost done:
image

By some reason, Apache FOP adds the tag Div for fo:block-container (marked with role="SKIP"):

<fo:footnote-body role="Note">
	<fo:block-container text-indent="0" start-indent="0" role="SKIP">
		<fo:block font-weight="normal" font-style="normal" text-indent="0" start-indent="0" font-size="10pt" margin-bottom="12pt" role="SKIP">
			<fo:inline keep-with-next.within-line="always" padding-right="3mm" role="Lbl" id="footnote_en_1_2">2)</fo:inline>
			<fo:inline role="P">Formerly denoted as 15 % (<fo:inline role="SKIP" keep-together.within-line="always">m/m</fo:inline>).</fo:inline>
		</fo:block>
	</fo:block-container>
</fo:footnote-body>

Intelligent2013 added a commit to metanorma/mn2pdf that referenced this issue Jun 25, 2023
@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 27, 2023

  • the inner cover page tags tree contains nested 5 Divs:
    image

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 27, 2023

  • wrong tags structure for Figure:
    image

Expected:
image

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 27, 2023

  • each Bibliography item represents as a standalone list (L) with one list item (LI):
    image

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 29, 2023

  • outer Div for list L:
    image

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 29, 2023
@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 29, 2023

  • P inside H1:
    image

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 29, 2023
@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 29, 2023

  • empty Span between section number and title
    image

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 29, 2023
@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 29, 2023

  • wrong tags structure for term title
    image

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 29, 2023
@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 29, 2023

  • wrong Span tag instead of P for inline paragraph:
    image

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 29, 2023
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 29, 2023
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jul 9, 2023
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jul 9, 2023
@Intelligent2013
Copy link
Contributor

Note moved under the paragraph:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

2 participants