Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend GPI format for more detailed information transfer (e.g. withdrawn/merged/split) resources to GO Central #83

Open
kltm opened this issue Sep 12, 2023 · 5 comments
Assignees
Labels
Needs LA approval Needs final approval from the Lead Architect Needs PI Needs PM approval Needs final approval from the Project Manager Needs PO Needs tech doc Needs TL

Comments

@kltm
Copy link
Member

kltm commented Sep 12, 2023

Project link

https://github.com/orgs/geneontology/projects/TBD

Project description

Currently, the only information that can be carried by the GPI format is whether or not an identifier exists. We need to be able to QC and triage items in this window and optionally do automatic updates when possible of merges, track splits and withdrawn identifiers.

The mile-high problem here is that the SoT for identifiers is the resources/MODs, but the annotation/model SoT is GO Central. This introduces a lag and non-automated cleanup that was not previously an issue.

The initial discussion/proposal is at https://docs.google.com/document/d/1GXSKLKWJMNZmUOBbuMhgvPf4WCjDaKLCooOcUF0UMoE/edit

Originally from geneontology/pipeline#239

Current ameliorating actions underway at:

geneontology/go-site#2066
geneontology/go-site#2061

PI

TBD

Product owner (PO)

TBD

Technical lead (TL)

TBD

Other personnel (OP)

TBD

Technical specs

TBD (template: https://docs.google.com/document/d/111UqtS3G0aJZpAijZYI3Da0t94OQpGePlPJsqZE4Tio/edit)

Other comments

N/A

@kltm kltm added Needs LA approval Needs final approval from the Lead Architect Needs PM approval Needs final approval from the Project Manager Needs tech doc Needs PI Needs PO Needs TL labels Sep 12, 2023
@kltm
Copy link
Member Author

kltm commented Sep 13, 2023

From @cmungall data dictionary as linkml; maybe @sierra-moxon can help with finding the right properties and making the schema

@pgaudet
Copy link

pgaudet commented Sep 13, 2023

@kltm proposes to have a proposal for the next GO meeting

@kltm
Copy link
Member Author

kltm commented Oct 5, 2023

Talking to @cmungall , from ontology, will propose:

is_obsolete: bool
replaced_by: id (merge)
n x consider: id (split)

@kltm
Copy link
Member Author

kltm commented Oct 17, 2023

Comment from @tberardini that we may want to consider adding a "date" component to these additions.

@pgaudet
Copy link

pgaudet commented Sep 25, 2024

Other format issues to be considered (outside the scope of this specific project, but need to consider GPI & GAF purposes and specs before implementing what is proposed in this project:

from a GPI file thread on Slack

  • Annotatable entities vs all possible entities in gpi file
    • protein-coding gene, ncRNA-coding gene?
  • Isoform annotation in Noctua - return to the Column 17 paradigm?
    • Could still use an autocomplete
  • Review use of GAF vs GPAD/GPI
    • Why is each file needed?
    • Redundancy of information in GAF, i.e. file sizes
    • Any cost implications?
    • How are they currently used?
    • Files downloads vs APIs
    • Computationally amenable vs human readable files
    • Accuracy of annotation (granularity) - what entity is annotated
    • Implications for downstream analyses
    • % of genome annotated
    • enrichment
    • Annotation display in AmiGO, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs LA approval Needs final approval from the Lead Architect Needs PI Needs PM approval Needs final approval from the Project Manager Needs PO Needs tech doc Needs TL
Projects
Status: Creation (initial requirements document)
Development

No branches or pull requests

2 participants