Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopted documents: support "stitch" command for "alongside wrap" of two languages #742

Open
ronaldtse opened this issue Feb 13, 2023 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

This issue is to support "alongside wrap" detailed at https://github.com/metanorma/metanorma-bsi/issues/2#issuecomment-852269964

Alongside wrap. A wraps B. B is unmodified. A provides additional content C where C corresponds to a transformation of B (e.g. a translation of B). A provides additional content outside of B and C.

  • EVS EN ISO (alongside wrap). It provides the Estonian translation of the ISO content, ISO Foreword, etc and also translation of the EN Foreword, etc.

The aligning/stitching command is called stitch below.
This is all speculative so the actual commands can differ in real usage.

Alongside wrap

en-iso-44001-english.adoc

= EN ISO 44001
:lang: english

.EN Foreword
...

en-iso-44001-estonian.adoc

= EN ISO 44001
:lang: estonian

.EN Foreword in Estonian
...

evs-en-iso-44001.adoc

= EVS EN ISO 44001

.EVS Foreword
...

stitch::[en-iso-44001-english.adoc,en-iso-44001-estonian.adoc]

Images

Cover:
Screenshot 2023-02-14 at 12 47 03 AM

National foreword with side by side translation:
Screenshot 2023-02-14 at 12 47 13 AM

European cover:
Screenshot 2023-02-14 at 12 47 31 AM

TOC side by side:
Screenshot 2023-02-14 at 12 47 46 AM

European foreword side by side:
Screenshot 2023-02-14 at 12 48 39 AM

Content side by side:
Screenshot 2023-02-14 at 12 48 56 AM

Annex with Estonian table first:
Screenshot 2023-02-14 at 12 49 22 AM

Table continue in Estonian:
Screenshot 2023-02-14 at 12 50 08 AM

English table:
Screenshot 2023-02-14 at 12 50 25 AM

Bibliography with heading in 2 languages, content only English:

Screenshot 2023-02-14 at 12 51 07 AM

Index in Estonian:

Screenshot 2023-02-14 at 12 51 54 AM

Index in English:
Screenshot 2023-02-14 at 12 52 02 AM

Back cover:
Screenshot 2023-02-14 at 12 52 14 AM

Originally posted by @ronaldtse in https://github.com/metanorma/metanorma-bsi/issues/2#issuecomment-852269964

@ronaldtse ronaldtse added the enhancement New feature or request label Feb 13, 2023
@opoudjis
Copy link
Contributor

opoudjis commented Apr 6, 2023

So, stitch:[] is a minimally thought out request to embed two documents simultaneously, where the second document is a translation of the first, and the alignment between the two is to be realised by magic.

It will not be realised by magic. Of the features specified in #420, the multilingual-rendering attributes will still need to be inserted into the two stitched documents, as will :align-cross-elements: : these documents will be marked up for bilingual alignment. At most, the stitching will assume that the clause structure of the two documents is identical, and where it is, it will insert tag in the two corresponding clauses to line them up explicitly.

@ronaldtse
Copy link
Contributor Author

"Magic" is defined here for creating a element correspondence.

The element correspondence between two languages is either manually encoded (e.g. anchors "id1@en" matches "id2@jp") or automatically matched according to sequence.

There are cases of exceptions such as:

  • 1 Document-A element corresponds to multiple, sequential, Document-B elements, vice versa
  • 1 Document-A element corresponds to no element in Document-B (i.e. it is not translated into Document-B), vice versa

@opoudjis
Copy link
Contributor

I will get some bioinformatics algorithm or other to do best-case match between the two sequences

@opoudjis
Copy link
Contributor

This is going to have to be processed as a collection:

  • For Preface + stitch(A, B), Preface + A is a valid document; Preface + A + B is not
  • Anchors will be shared between A and B, and they will need to be rewritten with prefixes to remain unambiguous --- which collections already do.

What we actually want here is a preprocessing mode in collection processing, which

  • identifies the shared initial text in the collection
  • injects the appropriate tags and multilingual-rendering attributes in the two language documents, to keep them aligned

@opoudjis
Copy link
Contributor

It appears multilingual-rendering to date has been implemented for rendering only for JCGM.

@opoudjis
Copy link
Contributor

I am not very enthusiastic about this, but I'm going to implement this in metanorma gem as a postprocessing of Presentation XML. There is code from @Intelligent2013 in JCGM XSLT to handle these tags, but I need to generalise this to HTML and DOC anyway. I will be following his code to arrange elements, including his use of the cross-align element. In fact, I'm going to try and use his XSLT in preprocessing.

@opoudjis
Copy link
Contributor

In collections processing, we really need to do without generating PDF of the individual documents; they will not be reused, and are just dead time for document compilation.

@opoudjis
Copy link
Contributor

The JCGM XSLT has a model of iterating through the first document in the collection, as a master, and all other documents in the collection, as (slaves) ahem, dependents. We cannot do that, because the first document is likeliest to be a preface: we will need markup in the manifest on the status of each document with bilingual alignment.

Elements to be aligned are rendered inside <cross-align/>, which is populated as an XSL:FO table in JCGM. We will retain that, and process cross-align in HTML (and DOC?)

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Apr 12, 2023

@opoudjis the only true JCGM bilingual document is JCGM 200:

JCGM 100 is in both English and French but they are published separately.

There is ISO 2533 ADD 2 that is Trilingual and presented in three columns:
Screenshot 2023-04-12 at 4 03 33 PM

But it is not yet encoded:

@opoudjis
Copy link
Contributor

I am making up a bilingual out of JCGM 100 at the moment, to see how far I can get in reusing Alex's XSLT, and I will be tinkering with that document.

The JCGM 200 document alternating between one and two columns is irritating, but sadly realistic.

@opoudjis
Copy link
Contributor

The excerpted XSLT works (though it is generating an XSL:FO table, and not the <cross-align><align-cell></align-cell><align-cell></align-cell></cross-align> I want to end up with). Its performance is abysmal, because libxslt's node-set() is so much slower than xalan:nodeset() . But this is not a concern for me, as I will simply be running this in Ruby with Nokogiri, and each nodeset is in fact a single document in Nokogiri, which I will keep around as a variable.

@opoudjis
Copy link
Contributor

Performance is better but not great: 47 sec for JCGM 100 (12 MB) to move text in place in parallel columns. Parking code in PR, not yet rendered.

Both I and @Intelligent2013 will need to render Presentation XML cross-align/align-cell into parallel table cells. I will need to do so in a single HTML file (since that is what parallel columns ends up requiring.)

opoudjis added a commit to metanorma/metanorma that referenced this issue Apr 12, 2023
opoudjis added a commit to metanorma/isodoc that referenced this issue Apr 13, 2023
@opoudjis
Copy link
Contributor

For <cross-align> rendering to work in HTML, we are finally going to have to bite the bullet and parse whatever is in Presentation XML in sequence, rather than by query. cross-align takes priority over clauses: it contains them.

opoudjis added a commit to metanorma/isodoc that referenced this issue Apr 13, 2023
opoudjis added a commit to metanorma/metanorma that referenced this issue Apr 13, 2023
opoudjis added a commit to metanorma/metanorma that referenced this issue Apr 13, 2023
opoudjis added a commit to metanorma/metanorma that referenced this issue Apr 13, 2023
@opoudjis
Copy link
Contributor

opoudjis commented Apr 13, 2023

The proof of concept collection is JGCM 100 EN + FR. Am getting bi-column output, but it needs a lot of care, and parsing now needs to be a lot more dumb in just spitting out what it receives in Presentation XML, rather than being opinionated.

Archive.zip

@opoudjis
Copy link
Contributor

opoudjis commented Jul 8, 2024

This issue has been on hold too long. I will update the PRs to reflect current code, and merge them. Will document as experimental functionality.

opoudjis added a commit to metanorma/metanorma that referenced this issue Jul 10, 2024
@opoudjis
Copy link
Contributor

Got metanorma working, but it's based on wrapping clauses inside /cross-align/align-cell. Since we uniformly expect clauses to be children of preface and sections, and we render based on displayorder attributes of clauses, it is more sensible to nest cross-align within clauses.

I won't do that for now, but I will copy displayorder to cross-align from the clause it contains, as it is otherwise preventing rendering at all. But nesting clauses within align-cell is counterproductive, I would rather the alignment be based on attributes, as it is in preliminary processing.

The code is currently specific to JCGM, and I think it will need to be rethought. But walking away from this, now that the PRs are being resolved.

@opoudjis
Copy link
Contributor

cross-align/align-cell is resulting in titles being stranded away from their clause parents; that's why the HTML rendering does not understand any titles at the moment, because isodoc expects to find titles only as children of clauses:

<clause>
<cross-align>
<align-cell>
<title>

Will make processing titles more flexible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🏝 Low priority
Development

No branches or pull requests

2 participants