Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of an unknown enabler in Reactome and GO-CAM #303

Open
huaiyumi opened this issue Nov 30, 2023 · 6 comments
Open

Representation of an unknown enabler in Reactome and GO-CAM #303

huaiyumi opened this issue Nov 30, 2023 · 6 comments

Comments

@huaiyumi
Copy link
Collaborator

huaiyumi commented Nov 30, 2023

There is an email trail among Huaiyu, Peter and David discussing about this. Here is a summary.

In the Reactome reaction (R-HSA-1980118, the "catalyst" has an xref of a ChEBI number CHEBI:36080, which is a protein. The reason is because the catalyst has not been identified. UniProt doesn't have a term (or ID) for an unknown protein. As a workaround, Reactome assigns a ChEBI number to it.
In the GO-CAM conversion, Ubiquitin ligase is used as the enabler. Ubiquitin ligase is the label of the catalyst in Reactome, but the actual identity is unknown. Therefore, the enabler label in GO-CAM is kind of misleading.
The correct way is probably to leave the enabler blank. In GO-CAM spec, the cardinality of enabler can be 0, meaning unknown.

Here is the full e-mail trail, including digressions to SMBL notation and a similar problem with generic versions of other genome-encoded entities (various RNAs and DNAs) -

unusual reaction.docx

@ukemi
Copy link

ukemi commented Nov 30, 2023

Interesting. In other places in the imports, I've seen enablers that are labeled something like 'unknown ubiquitin ligase'.

@nataled
Copy link
Collaborator

nataled commented Nov 30, 2023

@ukemi at least that restricts the possibilities! Just having 'protein' as enabler, seems to me, would be most accurately interpreted as "any protein will do".

@deustp01
Copy link
Collaborator

But to restrict in a consistent way, we need an ontology structure and curators and users trained in its use. It's probably easier to train curators and users to understand that "unknown protein" means just that: the activity has an enabler whose identity has not yet been discovered, and not that anything can enable it.

@ukemi
Copy link

ukemi commented Nov 30, 2023

Yes, and in the OWL instance world I believe that having protein there means some protein, not all proteins.

@ukemi
Copy link

ukemi commented Nov 30, 2023

@deustp01 If a reaction uses a protein that has isoforms, but you don't know which specific isoform is being used, do you annotate to the generic protein identifier?

@deustp01
Copy link
Collaborator

@ukemi by default we always annotate to the canonical / default isoform specified by UniProt unless there is experimental evidence that specifies the use of a different isoform so, yes, in effect we are annotating to the generic identifier because we haven't really examined the possibility of isoform usage.

This is also our rationale for not routinely making sets of all of the isoforms of a UniProt that should be able to enable a particular function and using that set, instead of a single EWAS instance corresponding to the canonical UniProt isoform as the enabler. There is also a biology issue here like the one for paralogs. We assume that all paralogs / all isoforms are equally competent enablers, by default ignoring differences in tissue- or state-specific expression of these variant form that might be pointing to real differences in function (as in the case of the sets of glycolytic enzymes where all set members have the same catalytic activity but are expressed in different tissues and subject to different regulators of their activity).

This leads to problems when UniProt re-edits a SwissProt entry to change the isoform that is the canonical / default one - then our numbering of positions in the protein sequence, e.g. to indicate start and end coordinates and coordinates of specific modified residues can be thrown off. @nataled 's QA tests have enabled us to clean up (almost) all of the 20-year legacy mess caused by these UniProt - Reactome branchings, and we have just introduced a new QA check whereby any change in the checksum of a UniProt entry (which should be triggered by any of these changes in the identity of the canonical sequence) causes all EWASs / proteoforms that refer to the changed UniProt to be flagged for manual review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants