Skip to content

mdrishti/integrate_trydb_globi_enpkg

 
 

Repository files navigation

This repository contains scripts for integrating species and subsequent traits data from trydb with taxonomic ids from gbif, otol, ncbi and wikidata. At the moment, data for only 25 traits was downloaded from TRY-db. Subsequently, the traits metadata was retrieved from TRY-db website and a subset of enpkg was also retrieved. The csv files retrieved were converted to duckdb (adavantge: on-disk approach for sql queries).

The TRY-db dataset with 25 traits has multiple columns ('data/trydbtemp_Ontop/trydbAll.csv'). These columns have a complex relationship as depicted in the diagram below. TryDbAll_relationsExplained

NOTE: the trydbAll table containing the datasets from the TRY-db is a subset of the actual data.

I. Prerequisites:

  1. For smooth running of the scripts (R,shell), install R (version 4.1.2) and the following R-packages :

a) For accessing taxonomic ids from wikidata, with mappings from gbif and ncbi (taxizedb) and from open treel of life (rotl) install.packages(c("taxizedb", "rotl"))

b) For data manipulation, install dplyr and dbplyr (backend wrapper to convert dplyr code into SQL) install.packages(c("dplyr", "dbplyr"))

c) For the on-disk approach of accessing and querying databases, duckdb's API client for R install.packages("duckdb")

and duckdb

d) For building a Virtual Knowledge Graph (VKG), download Ontop-cli/Ontop-protege bundle (version 5.1.2)

  1. For converting ontology files between multiple formats (e.g.: owl to ttl), install robot.

II. Script to map the TRY plant species name to the gbif, ncbi, wikidata and otol ids

Rscript matchTaxonomy.R

To plot distribution of the TRY-db species matched with ids from ott, ncbi, gbif and wikidata, run

Rscript distTaxonomicIds.R

distributionDB

III. Script to build a duckdb database for Ontop and build the knowledge graph

duckdb data/Ontop_input.db -c "IMPORT DATABASE 'data/trydbtemp_Ontop'" or

sh run_duckdb.sh

The relations between tables are depicted in this diagram. TableRelations_ER_diagram

IV. Script to build the knowledge graph in Ontop
#Set the path in data/Ontop_config/duckdb.properties

sh run_ontop.sh

V. Disclaimer

Tha mappings in the ontop virtual knowledge graph are faulty at the moment. Therefore, the SPARQL query does not result in correct results. Work in progress...

About

Integrate TRY db and GLOBI db data with minimal subset of enpkg

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 79.7%
  • Shell 20.3%