Refactor dumps pipeline to remove bottleneck caused by connectomics data #24

dosumis · 2022-01-25T14:37:26Z

To be documented:

What content needs to be added where in dumps to preserve current VFB functionality

Q. What axioms need to be present for automated classification of individuals?
A. (I think) KB content + ontologies is currently sufficient. We don't classify by neuron:neuron connectivity (@Clare72 is this true) and classification by neuron - region connectivity is currently opaque to data driven recording of connectivity (although that could change.

Q. What axioms need to be present for SPARQL generated neo labels?
A: All connectivity - but this step does not require reasoning.

Q. What axioms need to be present for reasoning generated neo labels?
A: KB + ontologies (I think)

Q. What A-box axioms need to be loaded into ELK to drive reasoning?
A. Currently only neuron-region connectivity is needed in addition to KB content + ontologies (although check API)

TODO Document how long each step takes.

dosumis · 2022-01-25T16:11:09Z

Short term solutions:

Nico's suggestion - use named graphs to exclude connectomics data from reasoning step

this will still require connectomics to be loaded and dumped to/from triplestore which is slow(ish). A faster solution will be to load connectomics OWL files later in the pipeline. However, this approach would require quite a bit of re-engineering of the Makefile. This is because the SPARQL-based neo: label addition runs against the triplestore and is used in a patsub to structure the Makefile and direct content to be loaded into the various endoints.

Proposal for a clean quick-ish fix.

Dump named graphs separately in initial step.

graph 1: everything except connectomics
graph 2: neuron:neuron connectomics only
graph 3: neuron:region connectomics only
label_graphs: preferred_roots deprecation_label has_image has_neuron_connectivity has_region_connectivity

Reasoning can be done with graph1 alone

Neo4j needs all 3 graphs + label graphs as in current build
OWLERY needs graph1 + graph3. (pre)-reasoning is not needed. Needs to retain the filter step that removes annotation axioms.
SOLR needs graph + label graphs as in current build.

For this approach to work we need to be able to distinguish neuron:neuron connectivity from neuron:region connectivity - which would require edits to code here https://github.com/VirtualFlyBrain/VFB_connectomics_import (needs @admclachlan ) - may take some time.

Super quick and dirty fix to get pipeline running again:

Remove loading of connectomics data to triplestore
Merge in all connectomics OWL files at dumps steps - owlery and neo get all connectomics.
Add additional script to add connectomics flag neo4j:labels directly in PDB & side load these to SOLR.

We will do this. @Robbie1977 will edit Makefile -> PR for us to review.

Experiment worth doing

Change reasoning step and owlery to use WHELK reasoner.

dosumis assigned dosumis, Robbie1977 and hkir-dev Jan 25, 2022

Robbie1977 mentioned this issue Jan 26, 2022

adding connectome sideloading #26

Merged

hkir-dev added a commit that referenced this issue Jan 31, 2022

external ontologies support added #24

bbb20f2

hkir-dev added a commit that referenced this issue Jan 31, 2022

rename fields #24

201a39c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dumps pipeline to remove bottleneck caused by connectomics data #24

Refactor dumps pipeline to remove bottleneck caused by connectomics data #24

dosumis commented Jan 25, 2022 •

edited

Loading

dosumis commented Jan 25, 2022

Refactor dumps pipeline to remove bottleneck caused by connectomics data #24

Refactor dumps pipeline to remove bottleneck caused by connectomics data #24

Comments

dosumis commented Jan 25, 2022 • edited Loading

dosumis commented Jan 25, 2022

Proposal for a clean quick-ish fix.

Super quick and dirty fix to get pipeline running again:

Experiment worth doing

dosumis commented Jan 25, 2022 •

edited

Loading