Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken superscripts (references to bibliography items, footnotes, author affiliations, etc.) #274

Open
XZF0 opened this issue Aug 29, 2024 · 0 comments

Comments

@XZF0
Copy link

XZF0 commented Aug 29, 2024

Superscript elements do not seem to be handled in any particular way - they are just fused with the word they follow. This is a bug, since it results in creating invalid words (and losing the information in the superscript element). This affects probably the majority of science paper pdfs. Mangled names of authors in particular look rather disrespectful :)

  • Separating superscript text from the preceding word with a whitespace would already be a substantial improvement.

  • A configurable representation for superscripts would be even better (escaped square brackets might be a reasonable default, or <sup/> tag, supported in many md viewers).

(handling the semantics of references etc. is probably out of scope for a document parser - the downstream logic should be able to do that, given a reasonable representation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant