Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure importing http://purl.obolibrary.org/obo/upheno/upheno_root_alignments.owl #929

Open
haideriqbal opened this issue Feb 5, 2024 · 4 comments
Assignees

Comments

@haideriqbal
Copy link

Hi Team,

Upheno isn't being loaded in the OLS at the moment because http://purl.obolibrary.org/obo/upheno/upheno_root_alignments.owl import is failing. Below is the exception which is raised in our pipeline:

org.apache.jena.riot.RiotException: [line: 1, col: 1 ] Content is not allowed in prolog. at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:153) at org.apache.jena.riot.lang.ReaderRIOTRDFXML$ErrorHandlerBridge.fatalError(ReaderRIOTRDFXML.java:313) at org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:47) at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:199) at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.fatalError(XMLHandler.java:229) at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:181) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1471) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:978) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:541) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1224) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:101) at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118) at org.apache.jena.riot.lang.ReaderRIOTRDFXML.parse(ReaderRIOTRDFXML.java:188) at org.apache.jena.riot.lang.ReaderRIOTRDFXML.read(ReaderRIOTRDFXML.java:86) at org.apache.jena.riot.RDFParser.read(RDFParser.java:353) at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:322) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:296) at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:540)

Looking at the file it doesn't look in the correct XML format which is expected by the OLS pipeline.

An earlier issue #919 mentions that this would be fixed in the newer version of upheno so not sure if this has been fixed yet or not.

Please let me know if you need any further information.

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

Hmmmm. Not so sure this is a uPheno error per se - it is totally fine that an ontology imports a non-rdfxml import. @jamesamcl what do you think? You could change the serialisation of the file to RDFXML just to satisfy this OLS requirement because we can, but its not strictly speaking "right" :P

@jamesamcl
Copy link
Collaborator

it is totally fine that an ontology imports a non-rdfxml import

This in itself is fine, but what I don't think is fine is that the HTTP headers returned for the OWL file have content-type as text/plain, so there is actually no way to determine the encoding. OWLAPI gets around this by bruteforcing all the loaders one by one until one of them doesn't throw an exception which I do not think is really in the spirit of "semantic web" :P.

AFAIK we can't fix this on github either. The problem is that the owl file extension does not indicate anything about the actual encoding of the contents, hence the webserver returning a text/plain content type.

Because the vast majority of OWL files in the wild today are RDF/XML, I think OLS assuming RDF/XML in the absence of any other information is the only sensible default. So yes this is an OLS issue because OLS only loads RDF - but also it's a upheno issue because if upheno provides a serialization of the ontology as a plain file with no metadata, it should probably choose the most common OWL representation rather than a less commonly used one (imo).

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

Good argument 😜 ok will you make the change? Or assign Ray.

@cmungall
Copy link
Member

cmungall commented Feb 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants