Skip to content

Practical guide to editing options

Tamsin Jones edited this page Dec 14, 2017 · 7 revisions

So you want to edit the DAO:

Version control/Continuous Integration

You will need a GitHub account and will need to fork your own copy of the drosophila-anatomy-developmental-ontology to your own account.

You will also need a Jenkins Job for your fork. This should be made by cloning the FBbt_GH job and redirecting the repo checkout to your forked version of the fbbt repo. This job will run a range of tests and roll a variety of OBO and OWL versions every time you commit to the repo.

The master version of the DAO lives in almost entirely in OBO format in a file called fbbt-edit.obo. For most purposes this should be the only file that is edited. However, we do keep a small amount of content in OWL: A file of GCIs declaring disjointness between anonymous classes, fbbt-ext.owl, is used to add consistency checks. An additional file of annotation axioms on the ontology ,fbbt_auth_attrib_licence.owl contains information about authorship, licensing and attribution. Ontology editors are referred to in this file via their ORCID. Please add yours to this if you edit the ontology.

OBO to OWL conversion

Whatever editing strategy you use, you should view and test the results in Protege regularly: testing for contradictions, looking to see if the class hierarchy and inferred anonymous classes are as expected and running test queries to check. It is especially important to run test queries that correspond to those used on VFB. If you are developing directly in OBO, via OBO-Edit or by hand editing the OBO file, you will obviously need to convert to OWL to do this. If you are developing in OWL, you will need to regularly do the reverse in order to check the master (OBO version into the repository).

Jenkins will convert between OBO and OWL for you, but it can also be useful to have software for converting between OBO and OWL installed locally (this is essential if you plan to use Protege for editing).

OORT and OBO to OWL conversion GUIs can be obtained from the OWLtools download page. For easy command-line conversion between OBO and OWL follow these instructions. To run OORT and OWLtools from the command line, checkout the owltools repo and follow the README instructions to compile using Maven.

ID management

It is essential that you do not re-use existing IDs. As development often proceeds in parallel branches, you will therefore need an IDspace. Please use the IDspace log to record this for future reference e.g.

YourName:FBbt	300000-301000   namespace-id-rule: \* FBbt:\$sequence(8,300000,301000\)$

(The namespace ID rule has a format that can be pasted into the OBO-Edit ID manager.)

Expressiveness

The DAO is sufficiently large and complicated that reasoning with a full DL reasoner is slow, as is post classification querying. Applications of VFB also depend on web-speed querying. We therefore limit expressiveness to OWL2 EL. Using OBO format (ignoring recent, largely unsupported extensions for cardinality) keeps us almost entirely within EL - the two clear exceptions being inverse object properties and range. This means it is safe* to rely on the EL reasoner ELK. Using OBO format also prevent the use of nested classe expressions. This limitation is occasionally frustrating, but serves to improve readability and ease of editing. We can still refer indirectly to nested classe expressions by using macro expansion (see Osumi-Sutherland et al., 2012 for examples).

(* mostly - it is possible to miss some inferences due to inverse object properties. Range is (or should be) used only in conjunction with disjoints for consistency checking and so using ELK misses some inconsistencies. To cope with these, we run a Jenkins DL consistency checking job on the trunk version of the ontology).

Editing in OBO-Edit

This has historically been our main means of editing. It is still useful for bulk editing tasks and provides the most efficient tools for searching. However, it is quite clunky for many editing tasks that are simple in Protege or by hand editing the OBO file.

Setting up OE

You will need to be running 64bit Java - most machines do nowadays - and you will need plenty of RAM. Recommended RAM allocation on installation: 4GB.

ID management

Add your namespace ID rule, as recorded in the IDspace log, to the ID management tool.

Editing in OBO-Edit

For a guide to editing in OE - please see these slides

Editing in Protege

With recent advances in OBO to OWL round-tripping this is now possible, although should be considered experimental. Ideally this will become the standard way to edit - even if the master stays in OBO. An OBO-plugin for Protege is being discussed at GO and may provide this in the near future.

Setup

You will need to be running 64bit Java - most machines do nowadays - and you will need plenty of RAM. Set max ram to at least 4GB. On a mac this can be done via editing info.plist in Protege.app/Contents/info.plist. Here is an example with max set to 8GB:

                <key>VMOptions</key>
                <array>
            <string>-server</string>
                        <string>-Xms1024M</string>
                        <string>-Xmx8192M</string>
                </array>

Plugins

Note - unlinked ones can be found via default Protege repo, accessible through Preferences.

Essential plugins:

  • URI tools - display URI of entity via Views->Misc->Entity URI
  • Annotation search - Views->Misc->Search annotations
  • ELK reasoner - EL reasoner that is orders of magnitude faster than other OWL reasoners.

Other potentially useful plugins:

  • OBO lint - Provides a report of untranslate-able axioms as flagged by OWLtools. Report may not be complete (?).
  • Image depictions view - Useful if you have image data modelled as individuals. Use in combination with ELK.
  • Existential tree plugin - 'reverse existential tree' is useful for viewing part hierarchies
  • JFACT - DL reasoner - slightly more stable Java native version of FaCT++
  • MoRE - DL reasoner that combined Elk with HermiT (another DL reasoner) or JFACT

ID management / New entities

Edit the content of the 'New entity' tab in Preferences. You will need to specify the OBO foundry standard base URL as well as ID prefix and range.

OBO<->OWL

Remember, the master version still lives in OBO format, so you should convert back to OBO locally and check-in the converted version regularly - reviewing diffs on sourceforge to track editing. To generate an OWL file for editing:

 obolib-owl2obo fbbt-edit.obo -o fbbt-edit.owl

and to convert back:

 obolib-obo2owl fbbt-edit.owl -o fbbt-edit.obo

Gotchas

There is currently limited tooling support to make sure you stay inside OBO expressiveness, although Jenkins checks via OORT will catch a range of problems. Also, untranslatable axioms end up an OBO header tag, so watch for the appearance of these in diffs when you commit.

A core set of OBO tags (name, definition, comment) have a cardinality of 1: Any term can have only a single tag. No such cardinality constraint exists for these when they are translated to OWL annotation property axioms. However, OWL to OBO translation will fail if you accidentally add multiple labels (OBO name), so these are easy to catch.

The values of annotation properties, for example of labels, should all be set to type 'string'. Unfortunately this differs from the default so you will need to set this for every new annotation property axiom.

Care needs to be taken in adding References. These are stored in annotations of annotations.

Limitations

There are no mechanisms for efficient bulk edits to the class hierarchy in Protege. Such edits are better done in OE or using scripted support.

Editing in Aquamacs

OBO format is very hackable and often the most efficient way to edit the ontology is to do so by hand.

Make sure you have OBO mode installed.

(load-file "/Users/djs93/elisp/obo-mode.el")
(autoload 'obo-mode "obo-mode" "Major mode for obo" t)
(add-to-list
'auto-mode-alist
'("\\.\\(obo\\|OBO\\)\\'" . obo-mode))

Dangers:

As with all editing strategies, ID management is essential. If you accidentally add two terms with the same ID, or re-use an existing ID, the two terms will be treated as one by any software that processes it - merging the two terms. In order to manage IDs, you can either use a command-line tool for ID management (details TBA), or open OE with your ontology and use it simply as a source of new IDs.

If you obsolete a term - you will need to find and replace every usage in relationships by hand.

Scripty solutions

The OWL-API is a bit heavyweight for scripting. A number of scripting environments have been build over the top of it for easier programmatic interaction with OWL ontologies and knowledge bases.

Adding these as useful links for those who want to explore. They have not been used for editing the DAO except experimentally.

  • Strix - Scala based scripting on the OWL-API
  • Tawny-OWL - Complete programmatic environment for building and interacting with OWL ontologies and KBs in Clojure.
  • Brain - facade in JAVA over the OWL-API for working inside EL. Has ELK built in. Used underneath VFB.