Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nuisance variables #2

Merged
merged 70 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
ae8f273
Adding logic and tests to discover and mark latent nodes
pnavada Sep 28, 2023
739a59a
Merge branch 'main' into simplify
cthoyt Sep 29, 2023
9587a24
Update
cthoyt Sep 29, 2023
0052d55
added initial documentations for discover_latent_nodes
srtaheri Oct 1, 2023
53c858e
Add extra notes
cthoyt Oct 1, 2023
1c22b56
Update discover_latent_nodes.py
cthoyt Oct 1, 2023
d747117
Update discover_latent_nodes.py
cthoyt Oct 1, 2023
bae81a6
Merge branch 'main' into simplify
cthoyt Oct 1, 2023
5f7b4d1
Update discover_latent_nodes.py
cthoyt Oct 1, 2023
8c1ca0c
Adding examples from case studies in the paper, fixing tests cases, s…
pnavada Oct 7, 2023
0a00989
Adding test case for multiple outcomes
pnavada Oct 8, 2023
3c46a8f
added necessary import and description to docstring
srtaheri Oct 9, 2023
8e601c8
reformatting
pnavada Oct 9, 2023
e57377c
reformatting code block
pnavada Oct 9, 2023
a77aba8
Update discover_latent_nodes.py
cthoyt Oct 10, 2023
efa2adb
small change
srtaheri Oct 10, 2023
0976774
fixed the code typo in the docstring
srtaheri Oct 10, 2023
77e10a9
Removing coli and adding test cases for a simple graph
pnavada Oct 10, 2023
b0f0f43
completed the module by calling the simplification function and conve…
srtaheri Oct 11, 2023
50cb458
Merge branch 'simplify' of https://github.com/y0-causal-inference/eli…
srtaheri Oct 11, 2023
07dac3c
we don't need mark_latent function anymore because y0 is able to mark…
srtaheri Oct 11, 2023
f07a74b
fixed a small error in doc string code
srtaheri Oct 11, 2023
4daa063
Add high level function
cthoyt Oct 12, 2023
a06422a
Adding the fourth simplification rule, a test to verify simplificatio…
pnavada Oct 16, 2023
e310110
Adding more test cases
pnavada Oct 17, 2023
83ab084
refactoring and adding more test cases for latent simplification
pnavada Oct 23, 2023
2aa7812
adding the framework for testing finding nuisance variables and remov…
pnavada Oct 23, 2023
d876812
Fixing the bug wrt marking nuisance variables as latent , adding test…
pnavada Oct 24, 2023
aac0145
Fixed the failing test cases for simplifying latent dag
pnavada Oct 24, 2023
1a44dc9
Added documentation for remove_latent_variables
pnavada Oct 24, 2023
477008f
Adding documentation for remove_latent_variables
pnavada Oct 24, 2023
25f804b
Adding documentation for test_find_nuisance_variables_for_simple_graph
pnavada Oct 24, 2023
b8e0bf9
updated doc string and function docs. change one function output to a…
srtaheri Oct 25, 2023
8815956
updated the name of a function test
srtaheri Oct 25, 2023
a08be29
small modification
srtaheri Oct 25, 2023
5a493f9
Update init.py
pnavada Oct 25, 2023
e629ca0
Add back all-important high level function
cthoyt Oct 25, 2023
d19fc2a
Update discover_latent_nodes.py
cthoyt Oct 25, 2023
7f0a82d
correct name inconsistently
srtaheri Oct 25, 2023
9e119cf
changing back to remove_latent_variables from mark_nuisance_variables…
pnavada Oct 25, 2023
0fed850
modified docstring according to the suggested todo
srtaheri Oct 25, 2023
7ff08ea
documenting params and return type for remove_latent_variables
pnavada Oct 25, 2023
04aae4d
Merge branch 'simplify' of https://github.com/y0-causal-inference/eli…
pnavada Oct 25, 2023
4fb6846
Adding back Charlie's comment as it was accidentally removed
pnavada Oct 25, 2023
9aa35e7
Update discover_latent_nodes.py
pnavada Oct 25, 2023
c6cc644
module docstring update for discover_latent_nodes
pnavada Oct 25, 2023
bf7c8d1
added the new workflow
srtaheri Oct 25, 2023
dcc6ca5
Merge branch 'simplify' of https://github.com/y0-causal-inference/eli…
srtaheri Oct 25, 2023
a1b968c
Resolving issue with generating docs
pnavada Oct 27, 2023
6deeafc
Adding newline to end of file
pnavada Oct 27, 2023
62f07e2
Use upstreamed implementation
cthoyt Nov 6, 2023
3259768
Add TODOs for examples
cthoyt Nov 6, 2023
b341d3f
Update docs and add todos
cthoyt Nov 6, 2023
3a62c90
Merge branch 'main' into simplify
cthoyt Nov 6, 2023
10dabd2
Update docs
cthoyt Nov 7, 2023
ffe3e34
Adding minimal examples for evans rules
pnavada Nov 14, 2023
e2686a4
updated the docstring
srtaheri Nov 23, 2023
902cc8d
small change
srtaheri Nov 25, 2023
fa43d27
fixing image paths
pnavada Nov 26, 2023
e0fed15
fixing image paths
pnavada Nov 26, 2023
63bd4d6
fix : duplicate object description of eliater.frontdoor_backdoor
pnavada Nov 26, 2023
87dd5be
fix - citation not found
pnavada Nov 26, 2023
cdbcc36
Update
cthoyt Dec 4, 2023
6f0b238
Revert rename
cthoyt Dec 4, 2023
779f0d2
Rename
cthoyt Dec 4, 2023
eed8274
Add TODOs and update the docs with correct examples
cthoyt Dec 4, 2023
e5342c0
Update discover_latent_nodes.py
cthoyt Dec 4, 2023
aaab817
Update discover_latent_nodes.py
cthoyt Dec 4, 2023
4342d87
Update MANIFEST.in
cthoyt Dec 4, 2023
e3c0cfe
Update discover_latent_nodes.py
cthoyt Dec 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Examples
========
Frontdoor-Backdoor Example
--------------------------
.. automodapi:: eliater.frontdoor_backdoor
:no-heading:
:include-all-objects:

Escherichia coli K-12 Example
-----------------------------
.. automodapi:: eliater.examples.ecoli
:no-heading:
:include-all-objects:

SARS-CoV-2 Example
------------------
.. automodapi:: eliater.examples.sars
:no-heading:
:include-all-objects:

T cell signaling pathway Example
--------------------------------
.. automodapi:: eliater.examples.t_cell_signaling_pathway
:no-heading:
:include-all-objects:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Table of Contents

installation
usage
examples


Indices and Tables
Expand Down
6 changes: 3 additions & 3 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Frontdoor-Backdoor Example
==========================
.. automodapi:: eliater.frontdoor_backdoor
Remove Nuisance Variables
=========================
.. automodapi:: eliater.discover_latent_nodes
:no-heading:
:include-all-objects:
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ keywords =

[options]
install_requires =
y0>=0.2.2
y0>=0.2.3
scipy
numpy
ananke
Expand Down
6 changes: 6 additions & 0 deletions src/eliater/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# -*- coding: utf-8 -*-

"""A high level, end-to-end causal inference workflow."""

from .discover_latent_nodes import remove_nuisance_variables

__all__ = [
"remove_nuisance_variables",
]
262 changes: 262 additions & 0 deletions src/eliater/discover_latent_nodes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
"""This module contains methods to discover nuisance nodes in a network.

Given an acyclic directed mixed graph (ADMG), along with the treatment and the outcome
of interest, certain observable variables can be regarded as nuisances. This
classification arises because they do not have any impact on the outcome and should not
be involved in the estimation of the treatment's effect on the outcome. These specific
variables are descendants of the variables on all causal paths that are not ancestors of
the outcome. A causal path, in this context, refers to a directed path that starts from the
treatment and leads to the outcome such that all the arrows on the path have the same direction.
This module is designed to identify these variables.

This process enables us to concentrate on the fundamental variables needed to estimate the
treatment's impact on the outcome. This focus results in more precise estimates with reduced
variance and bias. In addition, if this process is combined with the simplification function
:func:`y0.algorithm.simplify_latent.simplify_latent_dag` it can help to remove the nuisance variables
from the graph which leads to simpler, more interpretable, and visually more appealing result.

Here is an example of an ADMG where $X$ is the treatment and $Y$ is the outcome. This ADMG has
only one causal path from $X$ to $Y$ which is $X$ -> $M_1$ -> $Y$. The descendants of these variables
that are not ancestors of the outcome are $R_1$, $R_2$, and $R_3$. The goal of this example is to identify these
nuisance variables and remove them from the graph.

.. figure:: img/discover_latent_nodes_docstring_example.png
:scale: 150 %

cthoyt marked this conversation as resolved.
Show resolved Hide resolved
.. code-block:: python

from eliater.discover_latent_nodes import remove_nuisance_variables
from y0.algorithm.identify import identify_outcomes
from y0.dsl import Variable, X, Y
from y0.graph import NxMixedGraph

M1, R1, R2, R3 = (Variable(x) for x in ("M1", "R1", "R2", "R3"))

graph = NxMixedGraph.from_edges(
directed=[
(X, M1),
(M1, Y),
(M1, R1),
(R1, R2),
(R2, R3),
(Y, R3),
],
undirected=[
(X, Y),
],
)

new_graph = remove_nuisance_variables(graph, treatments=X, outcomes=Y)

The nuisance variables are identified as $R_1$, $R_2$, and $R_3$. The new graph does not contain these variables.
It is simpler than the original graph and only contains variables necessary for estimation of the
causal effect of interest.

.. figure:: img/discover_latent_nodes_docstring_example_output.png
:scale: 130 %

.. code-block:: python

estimand = identify_outcomes(new_graph, treatments=X, outcomes=Y)

The new graph can be used to check if the query is identifiable, and if so, generate an estimand for it.

.. code-block:: python

# Minimal example for evans rule 1 (Transforming latent nodes)
from y0.algorithm.simplify_latent import simplify_latent_dag
import networkx as nx
from y0.dsl import X, Y, Z1
from y0.graph import set_latent

graph = nx.DiGraph()
graph.add_edges_from([(X, Z1), (Z1, Y), (Z1, Z2)])
set_latent(graph, [Z1])
simplified_graph = simplify_latent_dag(graph).graph

The edges in the resultant graph are [($X$, $Z_2$), ($X$, $Y$), ($Z_1^{prime}$, $Z_2$), ($Z_1^{prime}$, $Y$)].
The parent of the latent node $Z_1$ becomes attached to latter's children ($Z_2$ and $Y$).
The edge between $X$ and $Z_1$ is removed, and $Z_1$ is transformed into $Z_1^{prime}$ while remaining connected
to its children.

.. code-block:: python

# Minimal example for evans rule 2 (Removing widow latents)
from y0.algorithm.simplify_latent import simplify_latent_dag
import networkx as nx
from y0.dsl import X, Y, Z1
from y0.graph import set_latent

graph = nx.DiGraph()
graph.add_edges_from([(X, Z1), (X, Y)])
set_latent(graph, [Z1])
simplified_graph = simplify_latent_dag(graph).graph

The edges in the resultant graph are [($X$, $Y$)].
$Z_1$ is removed as it is a latent node with no children.

.. code-block:: python

# Minimal example for evans rule 3 (Removing unidirectional latents)
from y0.algorithm.simplify_latent import simplify_latent_dag
import networkx as nx
from y0.dsl import X, Y, Z1
from y0.graph import set_latent

graph = nx.DiGraph()
graph.add_edges_from([(X, Z1), (Z1, Y), (X, Y)])
set_latent(graph, [Z1])
simplified_graph = simplify_latent_dag(graph).graph

The edges in the resultant graph are [($X$, $Y$)].
$Z_1$ is removed as it is a latent node with a single child.

.. code-block:: python

# Minimal example for evans rule 4 (Removing redundant latents)
from y0.algorithm.simplify_latent import simplify_latent_dag
import networkx as nx
from y0.dsl import X, Y, Z1, Z2, Z3, Z4
from y0.graph import set_latent

graph = nx.DiGraph()
graph.add_edges_from([(X, Y), (Z1, Y), (Z1, Z2), (Z1, Z3), (Z4, Z2), (Z4, Z3)])
set_latent(graph, [Z1, Z4])
simplified_graph = simplify_latent_dag(graph).graph

The edges in the resultant graph are [($X$, $Y$), ($Z_1$, $Y$), ($Z_1$, $Z_2$), ($Z_1$, $Z_3$)].
$Z_4$ is removed as its children are a subset of $Z_1$'s children.
"""

import itertools
from typing import Iterable, Optional, Set, Union

import networkx as nx

from y0.algorithm.simplify_latent import simplify_latent_dag
from y0.dsl import Variable
from y0.graph import DEFAULT_TAG, NxMixedGraph

__all__ = [
"remove_nuisance_variables",
]


def remove_nuisance_variables(
graph: NxMixedGraph,
treatments: Union[Variable, Set[Variable]],
outcomes: Union[Variable, Set[Variable]],
tag: Optional[str] = None,
) -> NxMixedGraph:
"""Find all nuisance variables and remove them based on Evans' simplification rules.

:param graph: an NxMixedGraph
:param treatments: a list of treatments
:param outcomes: a list of outcomes
:param tag: The tag for which variables are latent
:return: the new graph after simplification
"""
lv_dag = mark_nuisance_variables_as_latent(
graph=graph, treatments=treatments, outcomes=outcomes, tag=tag
)
simplified_latent_dag = simplify_latent_dag(lv_dag, tag=tag)
return NxMixedGraph.from_latent_variable_dag(simplified_latent_dag.graph, tag=tag)


def mark_nuisance_variables_as_latent(
graph: NxMixedGraph,
treatments: Union[Variable, Set[Variable]],
outcomes: Union[Variable, Set[Variable]],
tag: Optional[str] = None,
) -> nx.DiGraph:
"""Find all the nuisance variables and mark them as latent.

Mark nuisance variables as latent by first identifying them, then creating a new graph where these
nodes are marked as latent. Nuisance variables are the descendants of nodes in all proper causal paths
that are not ancestors of the outcome variables nodes. A proper causal path is a directed path from
treatments to the outcome. Nuisance variables should not be included in the estimation of the causal
effect as they increase the variance.

:param graph: an NxMixedGraph
:param treatments: a list of treatments
:param outcomes: a list of outcomes
:param tag: The tag for which variables are latent
:return: the modified graph after simplification, in place
"""
if tag is None:
tag = DEFAULT_TAG
nuisance_variables = find_nuisance_variables(graph, treatments=treatments, outcomes=outcomes)
lv_dag = NxMixedGraph.to_latent_variable_dag(graph, tag=tag)
# Set nuisance variables as latent
for node, data in lv_dag.nodes(data=True):
if Variable(node) in nuisance_variables:
data[tag] = True
return lv_dag


def find_all_nodes_in_causal_paths(
graph: NxMixedGraph,
treatments: Union[Variable, Set[Variable]],
outcomes: Union[Variable, Set[Variable]],
) -> Set[Variable]:
"""Find all the nodes in proper causal paths from treatments to outcomes.

A proper causal path is a directed path from treatments to the outcome.

:param graph: an NxMixedGraph
:param treatments: a list of treatments
:param outcomes: a list of outcomes
:return: the nodes on all causal paths from treatments to outcomes.
"""
if isinstance(treatments, Variable):
treatments = {treatments}
if isinstance(outcomes, Variable):
outcomes = {outcomes}

return {
node
for treatment, outcome in itertools.product(treatments, outcomes)
for causal_path in nx.all_simple_paths(graph.directed, treatment, outcome)
for node in causal_path
}


def find_nuisance_variables(
graph: NxMixedGraph,
treatments: Union[Variable, Set[Variable]],
outcomes: Union[Variable, Set[Variable]],
) -> Iterable[Variable]:
"""Find the nuisance variables in the graph.

Nuisance variables are the descendants of nodes in all proper causal paths that are
not ancestors of the outcome variables' nodes. A proper causal path is a directed path
from treatments to the outcome. Nuisance variables should not be included in the estimation
of the causal effect as they increase the variance.

:param graph: an NxMixedGraph
:param treatments: a list of treatments
:param outcomes: a list of outcomes
:returns: The nuisance variables.
"""
if isinstance(treatments, Variable):
treatments = {treatments}
if isinstance(outcomes, Variable):
outcomes = {outcomes}

# Find the nodes on all causal paths
nodes_on_causal_paths = find_all_nodes_in_causal_paths(
graph=graph, treatments=treatments, outcomes=outcomes
)

# Find the descendants of interest
descendants_of_nodes_on_causal_paths = graph.descendants_inclusive(nodes_on_causal_paths)

# Find the ancestors of outcome variables
ancestors_of_outcomes = graph.ancestors_inclusive(outcomes)

descendants_not_ancestors = descendants_of_nodes_on_causal_paths.difference(
ancestors_of_outcomes
)

nuisance_variables = descendants_not_ancestors.difference(treatments.union(outcomes))
return nuisance_variables
Loading
Loading