Skip to content

Latest commit

 

History

History
35 lines (30 loc) · 2 KB

gorule-0000001.md

File metadata and controls

35 lines (30 loc) · 2 KB
layout id title contact status fail_mode implementations
rule
GORULE:0000001
Basic GAF checks
Implemented
hard

The following basic checks ensure that submitted gene association files conform to the GAF spec, and come from the original GAF check script.

  • Each line of the GAF file is checked for the correct number of columns, the cardinality of the columns, leading or trailing whitespace
  • Col 1 and all DB abbreviations must be in db-xrefs.yaml (see below)
  • All GO IDs must be extant in current ontology
  • Qualifier, evidence, aspect and DB object columns must be within the list of allowed values
  • DB:Reference, Taxon and GO ID columns are checked for minimal form
  • Date must be in YYYYMMDD format
  • All IEAs over a year old are removed
  • Taxa with a 'representative' group (e.g. MGI for Mus musculus, FlyBase for Drosophila) must be submitted by that group only

Additional notes on identifiers

In some contexts an identifier is represented using two fields, for example col1 (prefix) and col2 (local id) of a GAF or GPAD. The global id is formed by concatenating these with :. In other contexts such as the "With/fron" field, a global ID is specified, which MUST always be prefixed.

In all cases, the prefix MUST be in db-xrefs.yaml. The prefix SHOULD be identical (case-sensitive match) to the database field. If it does not match then it MUST be identical (case-sensitive) to one of the synonyms.

When consuming association files, programs SHOULD repair by replacing prefix synonyms with the canonical form, in addition to reporting on the mismatch. For example, as part of the association file release the submitted files should swap out legacy uses of 'UniProt' with 'UniProtKB'