Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize PAGE text #55

Open
bertsky opened this issue Nov 26, 2021 · 2 comments
Open

serialize PAGE text #55

bertsky opened this issue Nov 26, 2021 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bertsky
Copy link
Member

bertsky commented Nov 26, 2021

In addition to ALTO text/xml, we should support PAGE application/vnd.prima.page+xml files.

(One scenario could be OCR-D processed material.)

@bertsky
Copy link
Member Author

bertsky commented Dec 1, 2021

Workaround in the meantime: apply https://github.com/kba/page-to-alto, as included in the ocrd-fileformat-transform page alto (but you may have to use script-args for page-to-alto, e.g. --dummy-word --no-check-words --no-check-border)

@bertsky
Copy link
Member Author

bertsky commented Dec 1, 2021

For inspiration: https://github.com/dariok/page2tei/blob/master/page2tei-0.xsl

EDIT: but we would have to coordinate that with https://www.deutsches-textarchiv.de/doku/basisformat/

@bertsky bertsky added enhancement New feature or request help wanted Extra attention is needed labels Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant