You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Superscript elements do not seem to be handled in any particular way - they are just fused with the word they follow. This is a bug, since it results in creating invalid words (and losing the information in the superscript element). This affects probably the majority of science paper pdfs. Mangled names of authors in particular look rather disrespectful :)
Separating superscript text from the preceding word with a whitespace would already be a substantial improvement.
A configurable representation for superscripts would be even better (escaped square brackets might be a reasonable default, or <sup/> tag, supported in many md viewers).
(handling the semantics of references etc. is probably out of scope for a document parser - the downstream logic should be able to do that, given a reasonable representation).
The text was updated successfully, but these errors were encountered:
Superscript elements do not seem to be handled in any particular way - they are just fused with the word they follow. This is a bug, since it results in creating invalid words (and losing the information in the superscript element). This affects probably the majority of science paper pdfs. Mangled names of authors in particular look rather disrespectful :)
Separating superscript text from the preceding word with a whitespace would already be a substantial improvement.
A configurable representation for superscripts would be even better (escaped square brackets might be a reasonable default, or
<sup/>
tag, supported in many md viewers).(handling the semantics of references etc. is probably out of scope for a document parser - the downstream logic should be able to do that, given a reasonable representation).
The text was updated successfully, but these errors were encountered: