Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nuisance variables #2

Merged
merged 70 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
ae8f273
Adding logic and tests to discover and mark latent nodes
pnavada Sep 28, 2023
739a59a
Merge branch 'main' into simplify
cthoyt Sep 29, 2023
9587a24
Update
cthoyt Sep 29, 2023
0052d55
added initial documentations for discover_latent_nodes
srtaheri Oct 1, 2023
53c858e
Add extra notes
cthoyt Oct 1, 2023
1c22b56
Update discover_latent_nodes.py
cthoyt Oct 1, 2023
d747117
Update discover_latent_nodes.py
cthoyt Oct 1, 2023
bae81a6
Merge branch 'main' into simplify
cthoyt Oct 1, 2023
5f7b4d1
Update discover_latent_nodes.py
cthoyt Oct 1, 2023
8c1ca0c
Adding examples from case studies in the paper, fixing tests cases, s…
pnavada Oct 7, 2023
0a00989
Adding test case for multiple outcomes
pnavada Oct 8, 2023
3c46a8f
added necessary import and description to docstring
srtaheri Oct 9, 2023
8e601c8
reformatting
pnavada Oct 9, 2023
e57377c
reformatting code block
pnavada Oct 9, 2023
a77aba8
Update discover_latent_nodes.py
cthoyt Oct 10, 2023
efa2adb
small change
srtaheri Oct 10, 2023
0976774
fixed the code typo in the docstring
srtaheri Oct 10, 2023
77e10a9
Removing coli and adding test cases for a simple graph
pnavada Oct 10, 2023
b0f0f43
completed the module by calling the simplification function and conve…
srtaheri Oct 11, 2023
50cb458
Merge branch 'simplify' of https://github.com/y0-causal-inference/eli…
srtaheri Oct 11, 2023
07dac3c
we don't need mark_latent function anymore because y0 is able to mark…
srtaheri Oct 11, 2023
f07a74b
fixed a small error in doc string code
srtaheri Oct 11, 2023
4daa063
Add high level function
cthoyt Oct 12, 2023
a06422a
Adding the fourth simplification rule, a test to verify simplificatio…
pnavada Oct 16, 2023
e310110
Adding more test cases
pnavada Oct 17, 2023
83ab084
refactoring and adding more test cases for latent simplification
pnavada Oct 23, 2023
2aa7812
adding the framework for testing finding nuisance variables and remov…
pnavada Oct 23, 2023
d876812
Fixing the bug wrt marking nuisance variables as latent , adding test…
pnavada Oct 24, 2023
aac0145
Fixed the failing test cases for simplifying latent dag
pnavada Oct 24, 2023
1a44dc9
Added documentation for remove_latent_variables
pnavada Oct 24, 2023
477008f
Adding documentation for remove_latent_variables
pnavada Oct 24, 2023
25f804b
Adding documentation for test_find_nuisance_variables_for_simple_graph
pnavada Oct 24, 2023
b8e0bf9
updated doc string and function docs. change one function output to a…
srtaheri Oct 25, 2023
8815956
updated the name of a function test
srtaheri Oct 25, 2023
a08be29
small modification
srtaheri Oct 25, 2023
5a493f9
Update init.py
pnavada Oct 25, 2023
e629ca0
Add back all-important high level function
cthoyt Oct 25, 2023
d19fc2a
Update discover_latent_nodes.py
cthoyt Oct 25, 2023
7f0a82d
correct name inconsistently
srtaheri Oct 25, 2023
9e119cf
changing back to remove_latent_variables from mark_nuisance_variables…
pnavada Oct 25, 2023
0fed850
modified docstring according to the suggested todo
srtaheri Oct 25, 2023
7ff08ea
documenting params and return type for remove_latent_variables
pnavada Oct 25, 2023
04aae4d
Merge branch 'simplify' of https://github.com/y0-causal-inference/eli…
pnavada Oct 25, 2023
4fb6846
Adding back Charlie's comment as it was accidentally removed
pnavada Oct 25, 2023
9aa35e7
Update discover_latent_nodes.py
pnavada Oct 25, 2023
c6cc644
module docstring update for discover_latent_nodes
pnavada Oct 25, 2023
bf7c8d1
added the new workflow
srtaheri Oct 25, 2023
dcc6ca5
Merge branch 'simplify' of https://github.com/y0-causal-inference/eli…
srtaheri Oct 25, 2023
a1b968c
Resolving issue with generating docs
pnavada Oct 27, 2023
6deeafc
Adding newline to end of file
pnavada Oct 27, 2023
62f07e2
Use upstreamed implementation
cthoyt Nov 6, 2023
3259768
Add TODOs for examples
cthoyt Nov 6, 2023
b341d3f
Update docs and add todos
cthoyt Nov 6, 2023
3a62c90
Merge branch 'main' into simplify
cthoyt Nov 6, 2023
10dabd2
Update docs
cthoyt Nov 7, 2023
ffe3e34
Adding minimal examples for evans rules
pnavada Nov 14, 2023
e2686a4
updated the docstring
srtaheri Nov 23, 2023
902cc8d
small change
srtaheri Nov 25, 2023
fa43d27
fixing image paths
pnavada Nov 26, 2023
e0fed15
fixing image paths
pnavada Nov 26, 2023
63bd4d6
fix : duplicate object description of eliater.frontdoor_backdoor
pnavada Nov 26, 2023
87dd5be
fix - citation not found
pnavada Nov 26, 2023
cdbcc36
Update
cthoyt Dec 4, 2023
6f0b238
Revert rename
cthoyt Dec 4, 2023
779f0d2
Rename
cthoyt Dec 4, 2023
eed8274
Add TODOs and update the docs with correct examples
cthoyt Dec 4, 2023
e5342c0
Update discover_latent_nodes.py
cthoyt Dec 4, 2023
aaab817
Update discover_latent_nodes.py
cthoyt Dec 4, 2023
4342d87
Update MANIFEST.in
cthoyt Dec 4, 2023
e3c0cfe
Update discover_latent_nodes.py
cthoyt Dec 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,9 @@ Frontdoor-Backdoor Example
.. automodapi:: eliater.frontdoor_backdoor
:no-heading:
:include-all-objects:

Discover Latent Nodes
=====================
.. automodapi:: eliater.discover_latent_nodes
:no-heading:
:include-all-objects:
145 changes: 145 additions & 0 deletions src/eliater/discover_latent_nodes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
"""This module contains methods to discover and mark nuisance nodes in a network.

Given an acyclic directed mixed graph (ADMG), along with the treatment and the outcome
of interest, certain observable variables can be regarded as nuisances. This
classification arises because they do not have any impact on the outcome and should not
be involved in the estimation of the treatment's effect on the outcome. These specific
variables are descendants of the variables on all causal paths that are not ancestors of
the outcome. A causal path, in this context, refers to a directed path that starts from the
treatment and leads to the outcome such that all the arrows on the path have the same direction.
This module is designed to identify these variables and produce a new ADMG in which they are
designated as latent.

This process enables us to concentrate on the fundamental variables needed to estimate the
treatment's impact on the outcome. This focus results in more precise estimates with reduced
variance and bias.

Here is an example of an ADMG where X is the treatment and Y is the outcome is Y. This ADMG has
only one causal path from X to Y which is X -> M1 -> M2 -> Y. The descendants of these variables
that are ancestors of the outcome are R1, R2, and R3. The goal of this example is to identify these
nuisance variables and mark them as latent.
pnavada marked this conversation as resolved.
Show resolved Hide resolved

cthoyt marked this conversation as resolved.
Show resolved Hide resolved
.. code-block:: python

import eliater
import y0
from y0.algorithm.identify import identify_outcomes
from y0.dsl import Variable, X, Y
from y0.graph import NxMixedGraph
from eliater.discover_latent_nodes import mark_latent

M1 = Variable("M1")
M2 = Variable("M2")
R1 = Variable("R1")
R2 = Variable("R2")
R3 = Variable("R3")

graph = NxMixedGraph.from_edges(
directed=[
(X, M1),
(M1, M2),
(M2, Y),
(M1, R1),
(R1, R2),
(R2, R3),
(Y, R3),
],
undirected=[
(X, Y),
],
)

new_graph = mark_latent(graph, treatments = Variable("X"), outcome: Variable("Y"))

# FIXME some unknown magic happens
pnavada marked this conversation as resolved.
Show resolved Hide resolved

estimand = identify_outcomes(new_graph, treatments=X, outcomes=Y)


The new graph now has R1, R2, and R3 marked as latent. Hence, these variables can't be
part of the estimation of the causal query. This decreases the estimation variance and
increases the accuracy of the query estimation.

.. todo::

I still don't see what happens after you mark nodes as latent. This needs to explicitly show all of the steps
required to get to identification.
"""

from typing import Set, Union

import networkx as nx

from y0.dsl import Variable
from y0.graph import NxMixedGraph, set_latent

__all__ = [
"find_all_nodes_in_causal_paths",
"mark_latent",
]


def find_all_nodes_in_causal_paths(
graph: NxMixedGraph,
treatments: Union[Variable, Set[Variable]],
outcomes: Union[Variable, Set[Variable]],
) -> Set[Variable]:
"""Find all the nodes in causal paths from treatments to outcomes.

:param graph: an NxMixedGraph
:param treatments: a list of treatments
:param outcomes: a list of outcomes
:return: the nodes on all causal paths from treatments to outcomes.
"""
if isinstance(treatments, Variable):
treatments = {treatments}
if isinstance(outcomes, Variable):
outcomes = {outcomes}
nodes = set()
for treatment in treatments:
for outcome in outcomes:
for causal_path in nx.all_simple_paths(graph.directed, treatment, outcome):
for node in causal_path:
nodes.add(node)
return nodes


def mark_latent(
graph: NxMixedGraph,
treatments: Union[Variable, Set[Variable]],
outcomes: Union[Variable, Set[Variable]],
) -> NxMixedGraph:
"""Mark the latent nodes in the graph.

Marks the descendants of nodes in all causal paths that are not ancestors of the outcome variables as latent
nodes.

:param graph: an NxMixedGraph
:param treatments: a list of treatments
:param outcomes: a list of outcomes
:returns: The modified graph marked with latent nodes.
"""
if isinstance(treatments, Variable):
treatments = {treatments}
if isinstance(outcomes, Variable):
outcomes = {outcomes}
# Find the nodes on causal paths
nodes_on_causal_paths = find_all_nodes_in_causal_paths(
graph=graph, treatments=treatments, outcomes=outcomes
)
# Find the descendants for the nodes on the causal paths
descendants_of_nodes_on_causal_paths = graph.descendants_inclusive(nodes_on_causal_paths)
# Find the ancestors of outcome variables
ancestors_of_outcome = graph.ancestors_inclusive(outcomes)
# Descendants of nodes on causal paths that are not ancestors of outcome variables
descendants_not_ancestors = descendants_of_nodes_on_causal_paths.difference(
ancestors_of_outcome
)
# Remove treatments and outcomes
descendants_not_ancestors = descendants_not_ancestors.difference(treatments.union(outcomes))
# Mark nodes as latent
# FIXME this operation is currently meaningless in ADMGs, it's supposed to be used on graphs
pnavada marked this conversation as resolved.
Show resolved Hide resolved
# going through the Latent DAG workflow
if descendants_not_ancestors:
set_latent(graph.directed, descendants_not_ancestors)
return graph
101 changes: 101 additions & 0 deletions src/eliater/examples/ecoli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"""Examples for E. coli K-12."""

from y0.graph import NxMixedGraph

graph = NxMixedGraph.from_str_adj(
directed={
"appY": ["appA", "appB", "appX", "hyaA", "hyaB", "hyaF"],
"arcA": [
"rpoS",
"fnr",
"dpiA",
"aceE",
"appY",
"citX",
"cydD",
"dpiB",
"gcvB",
"hyaA",
"hyaB",
"hyaF",
"mdh",
],
"btsR": ["mdh"],
"chiX": ["dpiA", "dpiB"],
"citX": ["dpiB"],
"cra": ["cyoA"],
"crp": [
"dpiA",
"cirA",
"dcuR",
"oxyR",
"fis",
"fur",
"aceE",
"citX",
"cyoA",
"dpiB",
"exuT",
"gadX",
"mdh",
"srIR",
],
"cspA": ["hns"],
"dcuR": ["dpiA", "dpiB"],
"dpiA": ["appY", "citC", "citD", "citX", "dpiB", "exuT", "mdh"],
"dsrA": ["hns", "lrp", "rpoS"],
"fis": ["cyoA", "gadX", "hns", "hyaA", "hyaB", "hyaF"],
"fnr": [
"dcuR",
"dpiA",
"narL",
"aceE",
"amtB",
"aspC",
"citX",
"cydD",
"cyoA",
"dpiB",
"gadX",
"hcp",
],
"fur": ["fnr", "amtB", "aspC", "cirA", "cyoA"],
"gadX": ["amtB", "hns"],
"gcvB": ["lrp", "oxyR", "ydeO"],
"hns": ["appY", "srIR", "ydeO", "yjjQ"],
"ihfA": ["crp", "fnr", "ihfB"],
"ihfB": ["fnr"],
"iscR": ["hyaA", "hyaB", "appX"],
"lrp": ["soxS", "aspC"],
"modE": ["narL"],
"narL": ["citX", "cydD", "dpiB", "hcp", "hyaA", "hyaB", "hyaF", "dcuR", "dpiA"],
"narP": ["hyaA", "hyaB", "hyaF"],
"oxyR": ["fur", "hcp"],
"phoB": ["cra"],
"rpoD": [
"arcA",
"cirA",
"crp",
"dcuR",
"fis",
"fnr",
"fur",
"ihfB",
"lrp",
"narL",
"oxyR",
"phoB",
"rpoS",
"soxS",
"aceE",
"ydeO",
"hns",
"yjjQ",
],
"rpoH": ["cra"],
"rpoS": ["aceE", "appY", "hyaA", "hyaB", "hyaF", "ihfA", "ihfB", "oxyR"],
"soxS": ["fur"],
"srIR": ["gutM"],
"ydeO": ["hyaA", "hyaF", "hyaB"],
}
)
34 changes: 34 additions & 0 deletions src/eliater/examples/sars.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""Examples for SARS-CoV-2 and COVID19."""

from y0.graph import NxMixedGraph

graph = NxMixedGraph.from_str_edges(
directed=[
("SARS_COV2", "ACE2"),
("ACE2", "Ang"),
("Ang", "AGTR1"),
("AGTR1", "ADAM17"),
("ADAM17", "EGF"),
("ADAM17", "TNF"),
("ADAM17", "Sil6r"),
("SARS_COV2", "PRR"),
("PRR", "NFKB"),
("EGFR", "NFKB"),
("TNF", "NFKB"),
("Sil6r", "IL6STAT3"),
("Toci", "Sil6r"),
("NFKB", "IL6AMP"),
("IL6AMP", "cytok"),
("IL6STAT3", "IL6AMP"),
("EGF", "EGFR"),
("Gefi", "EGFR"),
],
undirected=[
("SARS_COV2", "Ang"),
("ADAM17", "Sil6r"),
("PRR", "NFKB"),
("EGF", "EGFR"),
("EGFR", "TNF"),
("EGFR", "IL6STAT3"),
],
)
16 changes: 16 additions & 0 deletions src/eliater/examples/t_cell_signaling_pathway.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""Examples for T cell signaling pathway."""

from y0.graph import NxMixedGraph

graph = NxMixedGraph.from_str_adj(
directed={
"PKA": ["Raf", "Mek", "Erk", "Akt", "Jnk", "P38"],
"PKC": ["Mek", "Raf", "PKA", "Jnk", "P38"],
"Raf": ["Mek"],
"Mek": ["Erk"],
"Erk": ["Akt"],
"Plcg": ["PKC", "PIP2", "PIP3"],
"PIP3": ["PIP2", "Akt"],
"PIP2": ["PKC"],
}
)
Loading
Loading