Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEVER MERGE: Test Release ODK 1.5.2 #3300

Draft
wants to merge 1 commit into
base: testreleaseodk151
Choose a base branch
from

Conversation

matentzn
Copy link
Contributor

This shows the updated release "files" when running

ODK_TAG=v1.5.2 ./run.sh make uberon DEPLOY_GH=false

The log file reflects that indeed, ODK 1.5.2 was used.

@matentzn matentzn changed the base branch from master to testreleaseodk151 June 14, 2024 18:35
@gouttegd
Copy link
Collaborator

Err, I don’t know what this “test release” is supposed to show. It certainly does not show all the consequences of the OWLTools update in ODK 1.5.2.

I did two mock releases in parallel: one with ODK 1.5.1 and the other with ODK 1.5.2 (and thus with the new OWLTools). Both releases were made from a clean state, so all files were rebuilt (contrary to this PR, which seems to have rebuild only the uberon.obo file). Then I compared the release artifacts between the two releases.

In the “main” release files (basically all files in the top-level directory), the only differences are the same as what is shown here in uberon.obo: some tags have been reordered. For example in collected-metazoan.obo:

--- odk151/collected-metazoan.obo       2024-06-15 12:28:46.421533921 +0100
+++ odk152/collected-metazoan.obo       2024-06-15 13:50:00.670285900 +0100
@@ -8817,8 +8817,8 @@
 subset: 3_STAR
 synonym: ">C=O" RELATED [IUPAC]
 synonym: "carbonyl" EXACT IUPAC_NAME [IUPAC]
-synonym: "carbonyl group" EXACT [ChEBI]
 synonym: "carbonyl group" EXACT [UniProt]
+synonym: "carbonyl group" EXACT [ChEBI]
 is_a: CHEBI:51422 ! organodiyl group
 
 [Term]
@@ -1000509,8 +1000509,8 @@
 namespace: uberon
 def: "A zone of skin that is part of a hindlimb [Automatically generated definition]." [OBOL:automatic]
 synonym: "hind limb skin" EXACT [OBOL:automatic]
-synonym: "lower limb skin" EXACT [FMA:23102]
 synonym: "lower limb skin" EXACT [https://orcid.org/0000-0002-0819-0473]
+synonym: "lower limb skin" EXACT [FMA:23102]
 synonym: "skin of hind limb" EXACT [OBOL:automatic]
 synonym: "skin of hindlimb" EXACT [OBOL:automatic]
 synonym: "skin of lower extremity" EXACT [OBOL:automatic]

There are many more differences, however, in the subsets files. This was to be expected, given that most subsets files are produced by OWLTools (with commands such as --make-ontology-from-results or --extract-ontology-subset, that don’t have ROBOT direct equivalents).

The most abundant differences are of this type:

--- odk151/subsets/appendicular-minimal.owl     2024-06-15 12:29:02.632369072 +0100
+++ odk152/subsets/appendicular-minimal.owl     2024-06-15 13:49:52.540365925 +0100
@@ -653,12 +653,7 @@
                 <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0001423"/>
             </owl:Restriction>
         </rdfs:subClassOf>
-        <rdfs:subClassOf>
-            <owl:Restriction>
-                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
-                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0004413"/>
-            </owl:Restriction>
-        </rdfs:subClassOf>
+        <rdfs:subClassOf rdf:nodeID="genid14"/>
         <obo:IAO_0000115>The narrow part of the shaft of the radius just below the head.</obo:IAO_0000115>
         <oboInOwl:hasDbXref>FMA:23479</oboInOwl:hasDbXref>
         <oboInOwl:hasDbXref>SCTID:181942005</oboInOwl:hasDbXref>
@@ -670,15 +665,14 @@
         <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/uberon/core#pheno_slim"/>
         <rdfs:label>neck of radius</rdfs:label>
     </owl:Class>
+    <owl:Restriction rdf:nodeID="genid14">
+        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
+        <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0004413"/>
+    </owl:Restriction>
     <owl:Axiom>
         <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0000199"/>
         <owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
-        <owl:annotatedTarget>
-            <owl:Restriction>
-                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
-                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0004413"/>
-            </owl:Restriction>
-        </owl:annotatedTarget>
+        <owl:annotatedTarget rdf:nodeID="genid14"/>
         <oboInOwl:source>FMA</oboInOwl:source>
     </owl:Axiom>
     <owl:Axiom>

where owl:Restriction are no longer “inlined” where they are needed, but instead declared separately as blank nodes and referenced where they are needed.

Not sure why this is happening, but I don’t think this is incorrect. This might be touching the limits of my familiarity with RDF/XML, though.

Another type of difference, found mostly in the subsets/life-stage-core.obo file, is entirely expected. It’s things like that:

--- odk151/subsets/life-stages-core.obo 2024-06-15 12:29:02.697368411 +0100
+++ odk152/subsets/life-stages-core.obo 2024-06-15 13:49:52.609365246 +0100
@@ -51,7 +51,7 @@
 relationship: part_of UBERON:0000092 ! post-embryonic stage
 relationship: preceded_by UBERON:0000111 ! organogenesis stage
 relationship: precedes UBERON:0000071 ! death stage
-property_value: seeAlso https://github.com/obophenotype/uberon/issues/566 xsd:anyURI
+property_value: seeAlso "https://github.com/obophenotype/uberon/issues/566" xsd:anyURI
 
 [Term]
 id: UBERON:0000068
@@ -826,7 +826,7 @@
 relationship: part_of UBERON:0018378 ! crustacean larval stage
 relationship: preceded_by UBERON:8200002 ! copepodite stage 1
 property_value: http://purl.org/dc/terms/contributor https://orcid.org/0000-0002-2908-3327
-property_value: http://purl.org/dc/terms/date 2021-07-26T15:07:18Z xsd:dateTime
+property_value: http://purl.org/dc/terms/date "2021-07-26T15:07:18Z" xsd:dateTime
 
 [Term]
 id: UBERON:8200004

The following difference (in the same subsets/life-stages-core.obo file), however, is wrong:

@@ -239,7 +239,7 @@
 xref: EFO:0001322
 xref: EHDAA:27
 xref: FBdv:00005288
-xref: IDOMAL:0000302 {https://w3id.org/sssom/mapping_justification="https://w3id.org/semapv/vocab/ManualMappingCuration", https://w3id.org/sssom/author_id="https://orcid.org/0000-0003-4423-4370", https://w3id.org/sssom/mapping_provider="https:/>
+xref: IDOMAL:0000302 {mapping:justification="https://w3id.org/semapv/vocab/ManualMappingCuration", author:id="https://orcid.org/0000-0003-4423-4370", mapping:provider="https://github.com/biopragmatics/biomappings"}
 xref: NCIT:C12601
 xref: PdumDv:0000100
 xref: VHOG:0000745

The serialiser seems to be behaving as if it didn’t know the prefix name to use for condensing SEMAPV IRIs, so it condenses them as if they were OBO IRIs.

The subsets/life-stages-core.obo file is derived from the main uberon.owl file, which does contain a prefix declaration for SEMAPV. ROBOT has no trouble using that prefix declaration to correctly convert the uberon.owl file to OBO, so it’s a OWLTools problem.

@gouttegd
Copy link
Collaborator

The following difference (in the same subsets/life-stages-core.obo file), however, is wrong:

One way to fix that problem in Uberon would be to only use OWLTools to generate the life-stages-core.owl file, and then use ROBOT to derive the OBO version (by a simple convert operation), instead of using the same OWLTools call to generate both the OWL and OBO versions:

--- a/src/ontology/uberon.Makefile
+++ b/src/ontology/uberon.Makefile
@@ -918,11 +918,8 @@ subsets/life-stages-composite.owl: composite-metazoan.owl
        $(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) \
                          -o [email protected] && mv [email protected] $@
 
-subsets/life-stages-core.obo: uberon.owl
-       $(OWLTOOLS) $< --reasoner-query -r elk -l 'life cycle stage' \
-                   --make-ontology-from-results $(URIBASE)/uberon/$@ \
-                   --add-ontology-annotation $(DC)/description "Life cycle stage subset of uberon core (gene>
-                   -o -f obo $@ --reasoner-dispose 2>&1 > [email protected]
+subsets/life-stages-core.obo: subsets/life-stages-core.owl
+       $(ROBOT) convert -i $< --check false -o $@
 
 subsets/life-stages-core.owl: uberon.owl
        $(OWLTOOLS) $< --reasoner-query -r elk -l 'life cycle stage' \
                    --make-ontology-from-results $(URIBASE)/uberon/$@ \
                    --add-ontology-annotation $(DC)/description "Life cycle stage subset of uberon core (gene>
                    -o file://`pwd`/$@ --reasoner-dispose 2>&1 > [email protected]

(But that wouldn’t fix the more general problem that the new version of OWLTools generates incorrectly condensed qualifier tags.)

@balhoff
Copy link
Member

balhoff commented Jun 17, 2024

where owl:Restriction are no longer “inlined” where they are needed, but instead declared separately as blank nodes and referenced where they are needed.
Not sure why this is happening, but I don’t think this is incorrect. This might be touching the limits of my familiarity with RDF/XML, though.

@gouttegd I do think it's incorrect, but it's a purposeful change in OWL API. There is some discussion here: ontodev/robot#1129

@balhoff
Copy link
Member

balhoff commented Jun 17, 2024

The following difference (in the same subsets/life-stages-core.obo file), however, is wrong

This seems to be a problem in OWL API. ☹️ I find this with ROBOT as well.

Input:

Prefix(:=<http://example.org/ontologies/2024/5/17/untitled-ontology-2217/>)
Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
Prefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
Prefix(xml:=<http://www.w3.org/XML/1998/namespace>)
Prefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)
Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)


Ontology(<http://purl.obolibrary.org/obo/foo.owl>

Declaration(Class(<http://purl.obolibrary.org/obo/FOO_1>))
Declaration(AnnotationProperty(<http://www.geneontology.org/formats/oboInOwl#hasDbXref>))
Declaration(AnnotationProperty(<https://w3id.org/sssom/author_id>))
Declaration(AnnotationProperty(<https://w3id.org/sssom/mapping_justification>))


############################
#   Classes
############################

# Class: <http://purl.obolibrary.org/obo/FOO_1> (foo)

AnnotationAssertion(Annotation(<https://w3id.org/sssom/author_id> "https://orcid.org/0000-0003-4423-4370") Annotation(<https://w3id.org/sssom/mapping_justification> "https://w3id.org/semapv/vocab/ManualMappingCuration") <http://www.geneontology.org/formats/oboInOwl#hasDbXref> <http://purl.obolibrary.org/obo/FOO_1> "IDOMAL:0000302")
AnnotationAssertion(rdfs:label <http://purl.obolibrary.org/obo/FOO_1> "foo")
)

Run: robot convert -i foo.ofn -o foo.obo

Output:

format-version: 1.2
ontology: foo

[Term]
id: FOO:1
name: foo
xref: IDOMAL:0000302 {mapping:justification="https://w3id.org/semapv/vocab/ManualMappingCuration", author:id="https://orcid.org/0000-0003-4423-4370"}

@gouttegd
Copy link
Collaborator

This seems to be a problem in OWL API. ☹️ I find this with ROBOT as well.

But in your example, this is expected, because your OFN input file does not define the SSSOM namespace. If you add

Prefix(sssom:=<https://w3id.org/sssom/>)

to the input file, then ROBOT is able to correctly condense the qualifier tags:

xref: IDOMAL:0000302 {sssom:mapping_justification="https://w3id.org/semapv/vocab/ManualMappingCuration", sssom:author_id="https://orcid.org/0000-0003-4423-4370"}

@balhoff
Copy link
Member

balhoff commented Jun 17, 2024

That's good! I don't think the serializer should be making up prefixes for non-OBO namespaces, but it's good it can be worked around. So I'm guessing owltools needs to do what ROBOT does and reuse the existing prefix manager when saving.

@gouttegd
Copy link
Collaborator

I don't think the serializer should be making up prefixes for non-OBO namespaces

This used to be not so much of a problem, because IRI condensation (“CURIEfication”) used to happen only in a few places. Now that it occurs more frequently (e.g. in qualifier tags, in values of consider or replaced_by tags, etc.), there are more risks of an incorrect condensation happening, if people are not careful to declare their prefixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants