Skip to content

Niema-Lab/SEPIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEPIA - Simulation-based Evaluation of PrIoritization Algorithms

SEPIA is a framework for comparing the accuracies of algorithms that prioritize individuals by risk of transmitting HIV (Human Immunodeficiency Virus).

Table of Contents

  1. Software Dependencies
  2. Installation Guide
  3. Methods
  4. Example Execution

Software Dependencies

SEPIA is written in Python 3 and requires the following dependencies:

sudo apt-get update
sudo apt-get install python3-pip
pip3 install numpy
pip3 install scipy
pip3 install matplotlib
pip3 install seaborn

Additional external packages will also be installed as shown below by running efficacy_functions.py:

from gzip inport open as gopen
from sys import stderr
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
from itertools import repeat

Installation Guide

SEPIA is designed to be used through the bash interface. To install SEPIA, clone the directory to the desired location:

git clone https://github.com/Moshiri-Lab/SEPIA.git

Methods

Functions

SEPIA.py matches all individuals in the user's ordering along with the number of people each individual infected during a specified period of time. Then, computes the Kendall Tau-b correlation coefficient between the user's ordering and the optimal ordering.

usage: SEPIA.py [-h] -m METRIC [-i INPUT] [-t TRANMSISSIONHIST]
                [-c CONTACTNET] -s START [-e END] [-v]

File takes in a prioritization ordering and runs through the SEPIA workflow to
output the Kendall Tau B correlation coefficient between their ordering and
the most optimal ordering, as generated by the chosen metric. If verbose flag
is specified, intermediate data in the process can be outputted to stderr.

  -h, --help            show this help message and exit
  -m METRIC, --metric METRIC
                        Metric of prioritization (1-6) (default: None)
  -i INPUT, --input INPUT
                        Input File - User's Ordering (default: stdin)
  -t TRANMSISSIONHIST, --transmissionHist TRANSMISSIONHIST
                        Transmission History File (default: )
  -c CONTACTNET, --contactNet CONTACTNET
                        Contact History File (default: )
  -s START, --start START
                        Time Start (default: None)
  -e END, --end END     Time End (default: inf)
  -v, --verbose         Print Intermediate List with Individuals Matched to
                        Counts (default: False)

efficacyFunctions.py defines several functions used in the scripts above.

Metrics

We have implemented six distinct metrics to generate optimal orderings, with each defining a unique way of calculating the count values of individuals such that individuals with higher counts will have higher priority in the ordering.

1. Direct Transmissions

In this metric, each individual's count is calculated as the number of individuals that they have directly (one edge away) transmitted HIV to.

The below figure illustrates an example transmission network, with arrows indicating a transmission from one person (node) to another:

In this example, Person A has four outgoing edges, indicating that Person A transmitted HIV to four people and has a direct transmission count of 4. Similarly, Person B has no outgoing edges, so Person B's count is 0.

2. Best Fit Graph

In this metric, each individual's count is calculated as the slope of a best-fit line plotted in a step graph of all of the individual's outgoing transmissions over a specified time period on the horizontal axis. The line of best-fit starts at the event of the individual first transmitting HIV to someone else; this aims to prioritize individuals that transmit HIV to more people over a short time period, as they will have steeper slopes.

With this metric, we hope to take into account that individuals who transmit HIV to others more recently should have higher priority than individuals who transmitted HIV to others longer ago.

The following figure shows the resulting lines of best-fit for two cases:

The graph on the left represents a case in which the individual started transmitting HIV more recently, whereas the graph on the right represents a case in which the individual had multiple outgoing transmissions early in the time period but stopped towards the middle. This design thus gives higher priority to the individual represented by the left side with multiple recent outgoing transmissions, as their slope is greater.

3. Indirect Transmissions

This metric extends Metric 1 in order to analyze an individual's greater impact on the community.

Each individual's count is calculated as the cumulative number of individuals they indirectly (more than 1 edge away) transmitted HIV to for up to any given number of degrees away.

For instance, in the example transmission network from (1), Person A (highlighted in red) directly transmitted HIV to Persons B, C, D, and E (highlighted in yellow), who then transmitted HIV to Persons F, G, H, and I (highlighted in blue), who then transmitted HIV to Persons J, K, L, M, N, and O (highlighted in green). Thus, given the number of degrees away = 2 (2 edges away in the figure), Person A's indirect transmission are F, G, H, and I, which sums to a count of 4. Similarly, given the number of degrees away = 3, Person A's indirect transmissions are F, G, H, I, J, K, L, M, N, and O, for a count of 10.

4. Total Transmissions

This metric merges Metrics 1 and 3 to take into account each individual's direct and indirect transmissions.

Each individual's count is calculated as the cumulative number of individuals that they have directly (1 edge away) and indirectly (2+ edges away) transmitted HIV to for up to any given number of degrees away.

In the example network from (1), at 2 degrees away, Person A (highlighted in red) has 4 direct transmissions (to Persons B, C, D, and E) and 4 indirect transmissions (to Persons F, G, H, and I) for a total count of 8. Similarly, at 3 degrees away, Person A has 4 direct transmissions and 10 indirect transmissions (Persons F, G, H, I, J, K, L, M, N, O) for a total count of 14.

5. Number of Contacts:

This metric measures an individual's priority based on their number of contacts, with an edge in a contact network existing between any two individuals who have a relationship through which HIV may be transmitted.

The figure below illustrates an example contact network corresponding to the transmission network in previous examples:

In this example, Person A has undirected edges between themself and Persons B, C, D, and E, so Person A has a count of 4. Similarly, Person B has indirected edges between themself and Persons A, R, and S, so Person B has a count of 3.

6. Number of Contacts and Transmissions

This metric combines Metrics 1 and 5 in order to take into account each individual's number of direct transmissions and number of contacts.

In the example transmission and contact networks from (1) and (5), Person D has direct transmissions to Persons G and H, and is in contact with Persons A, G, H, and P, so Person D has a total count of 6.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages