Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extended schema for XMP for PDF #118

Open
kmccurley opened this issue Feb 24, 2023 · 2 comments
Open

extended schema for XMP for PDF #118

kmccurley opened this issue Feb 24, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request long term issue

Comments

@kmccurley
Copy link
Member

The standard XMP schemas are pretty minimal, and omit things like ORCID IDs, affiliations, funding agencies, citations, etc. It would be desirable to extend these schemas to allow us to embed more metadata into the PDF, but that requires us to define a schema for it. According to this document extension schemas are required to be included inline in the PDF. This could become as large as the metadata itself, but standards are standards I guess. Notably, Springer-Nature appear to violate this by declaring extended namespaces as follows:

xmlns:sn="http://springernature.com/ns/xmpExtensions/2.0/"
xmlns:author="http://springernature.com/ns/xmpExtensions/2.0/authorinfo/"                                                                                           

I am unable to locate any schema for these, and the URLs don't resolve to anything (they are not required to, but it is encouraged to make them point at an XSD). It's pretty clear how they use them in their PDFs.

<sn:authorInfo>
    <rdf:Bag>
         <rdf:li rdf:parseType="Resource">
               <author:name>Ngoc Khanh Nguyen</author:name>
               <author:orcid>http://orcid.org/0000-0001-8240-6167</author:orcid>
          </rdf:li>
     </rdf:Bag>
</sn:authorInfo>

I propose that we define our own schema starting from a subset of JATS to promote interoperability, create an XSD for it, and store it at http://iacr.org/ns/xmpExtensions/1.0/. Alternatively we could just use some JATS schema itself and embed something like jats:article-meta, referencing their schema. That would at least cover ORCID IDs, affiliations, and funding. since contrib-group may contain aff and orcid, and funding-group may contain funding sources. I prefer doing this rather than reinventing our own.

It remains to be seen how we would include this in the XMP itself, since it's not clear if we have much control over hyperxmp or xmpincl.

@kmccurley kmccurley added enhancement New feature or request long term issue labels Feb 24, 2023
@kmccurley kmccurley self-assigned this Feb 24, 2023
@kmccurley
Copy link
Member Author

It appears that bibliographic citations are also relatively easy, and we could use <ref-list> which is normally part of <back> in JATS. We can use <element-citation> which contains <pub-id> that can have the DOI of a citation and other structured elements like authors, title, journal, etc.

@kmccurley
Copy link
Member Author

I retract what I said about springernature's schema - they included extension schemas inline in their XMP under <pdfaExtension:schemas>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long term issue
Projects
None yet
Development

No branches or pull requests

1 participant