Skip to content

Clinical Hypergraph Query

Past due by over 4 years 0% complete

Implement a unified Spark backend pipeline for querying the knowledge graph, ETL of clinical, environmental, socio-economic data, machine learning model training, and serving based on the FHIR PIT pipeline tool. The unified spark pipeline allows us to provide an end-to-end workflow:

  1. From raw clinical, environmental, and socio-economic data, and curated
    k…

Implement a unified Spark backend pipeline for querying the knowledge graph, ETL of clinical, environmental, socio-economic data, machine learning model training, and serving based on the FHIR PIT pipeline tool. The unified spark pipeline allows us to provide an end-to-end workflow:

  1. From raw clinical, environmental, and socio-economic data, and curated
    knowledge graph data,
  2. Extract features for model training
  3. Serve model output via a query

The machine learning model will provide a way to transform relational tables into n-ary predicates based on learning a function estimator of the joint distribution of the rows in the table, which can be used to incorporate data from ICEES and other data sources such as COHD and clinical profile. This in turn allows us to encode contextual information about our knowledge, such as cohort definition in a uniform manner. For example, currently ICEES's KGS API and COHD's KGS API depends on the ad hoc "query_options" field for contextual information, such as cohort definition and cohort selection. Even though the query graph and the generated knowledge graph are interoperable between the two services, TranQL had to write service specific code to handle the "query_options" which is not interoperable. With n-ary predicates, we can further generalize the representation of contextual information and handle it in a uniform manner in TranQL. For example, if we want to say for patient with ages A, feature B and C are associated, because age is already a feature, we can simply denote this relation as P(A, B, C). This can be generalized to other clinical data sets for which patient or visit level data is available.

Loading