Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace entities that do not create nice URIs (i.e. compact well to CURIEs in out pipeline) in NEO #88

Closed
kltm opened this issue Apr 7, 2022 · 2 comments
Labels

Comments

@kltm
Copy link
Member

kltm commented Apr 7, 2022

Recently (#82 (comment)), we noticed a number of oddities in NEO.

In the newest NEO load (and maybe some of these are in the older one), some entities that were not correctly converted to CURIEs--1350337 in total. Some of those are probably not practically important as nobody would be curating to them, but some seem important.

We would like to trace these back to their source files and try and figure out what is going on.

Important seeming anomalies:

http://purl.obolibrary.org/obo/AGI_LocusCode_XYZ : 28986
http://identifiers.org/wormbase/XYZ : 152
http://identifiers.org/uniprot/XYZ : 49
http://purl.bioontology.org/ontology/provisional/XYZ : 17
http://identifiers.org/mgi/MGI:XYZ : 4

Samples of complete list:

alters_location_of
anastomoses_with
anteriorly_connected_to
attached_to
channel_for
channels_from
...
synapsed_by
Tmp_new_group
transitively_anteriorly_connected_to
...
transitively_proximally_connected_to
trunk_part_of
TS01
...
TS28
xunion_of
http://identifiers.org/mgi/MGI:106910
http://identifiers.org/uniprot/A0A5F9CQZ0
http://identifiers.org/wormbase/B0035.8%7CWB%3AF54E12.4%7CWB%3AF55G1.3%7CWB%3AH02I12.6
http://purl.bioontology.org/ontology/provisional/1ddd2e2d-2ace-4c87-8ec6-d3b5730b3e7c
http://purl.obolibrary.org/obo/D96882F1-8709-49AB-BCA9-772A67EA6C33
http://semanticscience.org/resource/SIO_000658
http://www.geneontology.org/formats/oboInOwl#Subset
http://www.w3.org/2002/07/owl#topObjectProperty
http://xmlns.com/foaf/0.1/image

One spin-off from this for MGI is here: geneontology/go-annotation#4105

This is not considered a blocking issue for moving forward with this project and closing it. If closed before completing this project, we can bump this over to the QC one.

@kltm
Copy link
Member Author

kltm commented Apr 9, 2022

AGI_LocusCode

gene_association.tair.gz contains AGI_LocusCode. Like a lot.

UniProtKB

At least some of the anomalous UniProtKBs seem to be exclusively in col 8 in uniprot_reviewed.gpi.gz. Not sure why being in a different column would throw this off...possible due to "has_gene_template" in gpi2obo.pl?

MGI

MGI spoken for at geneontology/go-annotation#4105

WB

Traced anomaly back to c_elegans.PRJNA13758.current.gene_product_info.gpi.gz:

WB CE05165 HIS-48 HIStone CELE_B0035.8 protein taxon:6239 WB:B0035.8|WB:F54E12.4|WB:F55G1.3|WB:H02I12.6 UniProtKB:Q27876

It appears that a parser is taking "WB:B0035.8|WB:F54E12.4|WB:F55G1.3|WB:H02I12.6" and trying to turn the "B0035.8|WB:F54E12.4|WB:F55G1.3|WB:H02I12.6" part into an identifier. That's pretty wild. How/why is GPI column 8 getting parsed? Looks like gpi2obo.pl and it would go into parent and then OBO as relationship: has_gene_template $parent. The code seems wrong there, but I'm not familiar enough with the OBO format and the intention here to make a call on whether that should be dropped or split.

@kltm
Copy link
Member Author

kltm commented Apr 21, 2022

From managers' discussion, important things traced/docced--this is now closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant