Merge branch 'main' into summary

pymc-labs · Jun 19, 2024 · 3207e03 · 3207e03
2 parents 22f5008 + 4af4af6
commit 3207e03
Show file tree

Hide file tree

Showing 19 changed files with 6,683 additions and 29 deletions.
diff --git a/.github/workflows/rtd-link-preview.yml b/.github/workflows/rtd-link-preview.yml
@@ -0,0 +1,16 @@
+name: Read the Docs Pull Request Preview
+on:
+  pull_request_target:
+    types:
+      - opened
+
+permissions:
+  pull-requests: write
+
+jobs:
+  documentation-links:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: readthedocs/actions/preview@v1
+        with:
+          project-slug: "causalpy"
diff --git a/.gitignore b/.gitignore
@@ -7,6 +7,8 @@ _build
 build/
 dist/
 docs/_build/
+docs/build/
+docs/jupyter_execute/
 *.vscode
 .coverage
 *.jupyterlab-workspace
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -22,9 +22,10 @@ repos:
         exclude_types: [svg]
       - id: check-yaml
       - id: check-added-large-files
+        exclude: &exclude_pattern 'iv_weak_instruments.ipynb'
         args: ["--maxkb=1500"]
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.4.8
+    rev: v0.4.9
     hooks:
       # Run the linter
       - id: ruff

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -7,9 +7,9 @@ version: 2
 
 # Set the version of Python and other tools you might need
 build:
-  os: ubuntu-20.04
+  os: ubuntu-lts-latest
   tools:
-    python: "3.10"
+    python: "3.11"
     # You can also specify other tool versions:
     # nodejs: "16"
     # rust: "1.55"

diff --git a/Makefile b/Makefile
@@ -10,8 +10,6 @@ lint:
 check_lint:
 	ruff check .
 	ruff format --diff --check .
-	nbqa black --check .
-	nbqa ruff .
 	interrogate .
 
 doctest:

diff --git a/README.md b/README.md
@@ -34,7 +34,7 @@ pip install git+https://github.com/pymc-labs/CausalPy.git
 
 ```python
 import causalpy as cp
-
+import matplotlib.pyplot as plt
 
 # Import and process data
 df = (
@@ -57,6 +57,8 @@ fig, ax = result.plot();
 
 # Get a results summary
 result.summary()
+
+plt.show()
 ```
 
 ## Roadmap

diff --git a/causalpy/data/datasets.py b/causalpy/data/datasets.py
@@ -35,6 +35,7 @@
     "geolift1": {"filename": "geolift1.csv"},
     "risk": {"filename": "AJR2001.csv"},
     "nhefs": {"filename": "nhefs.csv"},
+    "schoolReturns": {"filename": "schoolingReturns.csv"},
 }
 
 

diff --git a/causalpy/data/schoolingReturns.csv b/causalpy/data/schoolingReturns.csv
diff --git a/causalpy/pymc_experiments.py b/causalpy/pymc_experiments.py
@@ -1450,7 +1450,7 @@ def __init__(
                 "mus": [self.ols_beta_first_params, self.ols_beta_second_params],
                 "sigmas": [1, 1],
                 "eta": 2,
-                "lkj_sd": 2,
+                "lkj_sd": 1,
             }
         self.priors = priors
         self.model.fit(

diff --git a/causalpy/pymc_models.py b/causalpy/pymc_models.py
@@ -303,8 +303,8 @@ class InstrumentalVariableRegression(ModelBuilder):
     ...                  "mus": [[-2,4], [0.5, 3]],
     ...                  "sigmas": [1, 1],
     ...                  "eta": 2,
-    ...                  "lkj_sd": 2,
-    ...              })
+    ...                  "lkj_sd": 1,
+    ...              }, None)
     Inference data...
     """
 
@@ -340,7 +340,7 @@ def build_model(self, X, Z, y, t, coords, priors):
                 sigma=priors["sigmas"][1],
                 dims="covariates",
             )
-            sd_dist = pm.HalfCauchy.dist(beta=priors["lkj_sd"], shape=2)
+            sd_dist = pm.Exponential.dist(priors["lkj_sd"], shape=2)
             chol, corr, sigmas = pm.LKJCholeskyCov(
                 name="chol_cov",
                 eta=priors["eta"],
@@ -366,24 +366,52 @@ def build_model(self, X, Z, y, t, coords, priors):
                 shape=(X.shape[0], 2),
             )
 
-    def fit(self, X, Z, y, t, coords, priors):
-        """Draw samples from posterior, prior predictive, and posterior predictive
-        distributions.
+    def sample_predictive_distribution(self, ppc_sampler="jax"):
+        """Function to sample the Multivariate Normal posterior predictive
+        Likelihood term in the IV class. This can be slow without
+        using the JAX sampler compilation method. If using the
+        JAX sampler it will sample only the posterior predictive distribution.
+        If using the PYMC sampler if will sample both the prior
+        and posterior predictive distributions."""
+        random_seed = self.sample_kwargs.get("random_seed", None)
+
+        if ppc_sampler == "jax":
+            with self:
+                self.idata.extend(
+                    pm.sample_posterior_predictive(
+                        self.idata,
+                        random_seed=random_seed,
+                        compile_kwargs={"mode": "JAX"},
+                    )
+                )
+        elif ppc_sampler == "pymc":
+            with self:
+                self.idata.extend(pm.sample_prior_predictive(random_seed=random_seed))
+                self.idata.extend(
+                    pm.sample_posterior_predictive(
+                        self.idata,
+                        random_seed=random_seed,
+                    )
+                )
+
+    def fit(self, X, Z, y, t, coords, priors, ppc_sampler=None):
+        """Draw samples from posterior distribution and potentially
+        from the prior and posterior predictive distributions. The
+        fit call can take values for the
+        ppc_sampler = ['jax', 'pymc', None]
+        We default to None, so the user can determine if they wish
+        to spend time sampling the posterior predictive distribution
+        independently.
         """
 
         # Ensure random_seed is used in sample_prior_predictive() and
         # sample_posterior_predictive() if provided in sample_kwargs.
-        random_seed = self.sample_kwargs.get("random_seed", None)
+        # Use JAX for ppc sampling of multivariate likelihood
 
         self.build_model(X, Z, y, t, coords, priors)
         with self:
             self.idata = pm.sample(**self.sample_kwargs)
-            self.idata.extend(pm.sample_prior_predictive(random_seed=random_seed))
-            self.idata.extend(
-                pm.sample_posterior_predictive(
-                    self.idata, progressbar=False, random_seed=random_seed
-                )
-            )
+        self.sample_predictive_distribution(ppc_sampler=ppc_sampler)
         return self.idata
 
 

diff --git a/causalpy/tests/test_integration_pymc_examples.py b/causalpy/tests/test_integration_pymc_examples.py
@@ -504,6 +504,7 @@ def test_iv_reg():
             sample_kwargs=sample_kwargs
         ),
     )
+    result.model.sample_predictive_distribution(ppc_sampler="pymc")
     assert isinstance(df, pd.DataFrame)
     assert isinstance(data, pd.DataFrame)
     assert isinstance(instruments_data, pd.DataFrame)

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -51,11 +51,18 @@
     "sphinx.ext.autodoc",
     "sphinx.ext.intersphinx",
     "sphinx.ext.mathjax",
+    "sphinx.ext.viewcode",
     "sphinx_autodoc_typehints",
+    "sphinx_copybutton",
 ]
 
 nb_execution_mode = "off"
 
+# configure copy button to avoid copying sphinx or console characters
+copybutton_exclude = ".linenos, .gp"
+copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
+copybutton_prompt_is_regexp = True
+
 source_suffix = {
     ".rst": "restructuredtext",
     ".ipynb": "myst-nb",
@@ -72,8 +79,15 @@
 
 # -- intersphinx config -------------------------------------------------------
 intersphinx_mapping = {
-    "python": ("https://docs.python.org/3", None),
+    "examples": ("https://www.pymc.io/projects/examples/en/latest/", None),
+    "mpl": ("https://matplotlib.org/stable", None),
+    "numpy": ("https://numpy.org/doc/stable/", None),
+    "pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
     "pymc": ("https://www.pymc.io/projects/docs/en/stable/", None),
+    "python": ("https://docs.python.org/3", None),
+    "scikit-learn": ("https://scikit-learn.org/stable/", None),
+    "scipy": ("https://docs.scipy.org/doc/scipy/", None),
+    "xarray": ("https://docs.xarray.dev/en/stable/", None),
 }
 
 # MyST options for working with markdown files.

diff --git a/docs/source/examples.rst b/docs/source/examples.rst
@@ -68,6 +68,7 @@ Instrumental Variables Regression
    :titlesonly:
 
    notebooks/iv_pymc.ipynb
+   notebooks/iv_weak_instruments.ipynb
 
 Inverse Propensity Score Weighting
 =================================

diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst
@@ -46,6 +46,10 @@ Glossary
    Endogenous Variable
       An endogenous variable is a variable in a regression equation such that the variable is correlated with the error term of the equation i.e. correlated with the outcome variable (in the system). This is a problem for OLS regression estimation techniques because endogeniety violates the assumptions of the Gauss Markov theorem.
 
+   Local Average Treatment effect
+   LATE
+      Also known asthe complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the experimental treatment assigned to their sample group. It is the quantity we're estimating in IV designs.
+
    Non-equivalent group designs
    NEGD
       A quasi-experimental design where units are assigned to conditions non-randomly, and not according to a running variable (see Regression discontinuity design). This can be problematic when assigning causal influence of the treatment - differences in outcomes between groups could be due to the treatment or due to differences in the group attributes themselves.
@@ -62,6 +66,9 @@ Glossary
    Pretest-posttest design
       A quasi-experimental design where the treatment effect is estimated by comparing an outcome measure before and after treatment.
 
+   Propensity scores
+      An estimate of the probability of adopting a treatment status. Used in re-weighting schemes to balance observational data.
+
    Quasi-experiment
       An empirical comparison used to estimate the effects of a treatment where units are not assigned to conditions at random.
 
@@ -101,8 +108,6 @@ Glossary
    2SLS
       An estimation technique for estimating the parameters of an IV regression. It takes its name from the fact that it uses two OLS regressions - a first and second stage.
 
-   Propensity scores
-      An estimate of the probability of adopting a treatment status. Used in re-weighting schemes to balance observational data.
 
 
 References

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -12,12 +12,18 @@ A Python package focussing on causal inference for quasi-experiments. The packag
 Installation
 ------------
 
-To get the latest release:
+To get the latest release you can use pip:
 
 .. code-block:: sh
 
    pip install CausalPy
 
+or conda:
+
+.. code-block:: sh
+
+   conda install causalpy -c conda-forge
+
 Alternatively, if you want the very latest version of the package you can install from GitHub:
 
 .. code-block:: sh
@@ -31,6 +37,7 @@ Quickstart
 .. code-block:: python
 
    import causalpy as cp
+   import matplotlib.pyplot as plt
 
 
    # Import and process data
@@ -55,6 +62,8 @@ Quickstart
    # Get a results summary
    result.summary()
 
+   plt.show()
+
 
 Videos
 ------

diff --git a/docs/source/notebooks/iv_weak_instruments.ipynb b/docs/source/notebooks/iv_weak_instruments.ipynb
diff --git a/docs/source/quasi_dags.ipynb b/docs/source/quasi_dags.ipynb
@@ -349,7 +349,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "One nice feature of this set up is that we can evaluate the claim of __strong ignorability__ because it implies that  $T  \\perp\\!\\!\\!\\perp  X | PS(X)$ and this ensures the covariate profiles are balanced across the treatment branches conditional on the propensity score. This is a testable implication of the postulated design! Balance plots and measures are ways in which to evaluate if the offset achieved by your propensity score has worked. It is crucial that PS serve as a balancing score, if the measure cannot serve as a balancing score the collision effect can add to the confounding bias rather than remove it. "
+    "One nice feature of this set up is that we can evaluate the claim of __strong ignorability__ because it implies that  $Z  \\perp\\!\\!\\!\\perp  X | PS(X)$ and this ensures the covariate profiles are balanced across the treatment branches conditional on the propensity score. This is a testable implication of the postulated design! Balance plots and measures are ways in which to evaluate if the offset achieved by your propensity score has worked. It is crucial that PS serve as a balancing score, if the measure cannot serve as a balancing score the collision effect can add to the confounding bias rather than remove it. "
    ]
   },
   {

diff --git a/docs/source/references.bib b/docs/source/references.bib
@@ -76,6 +76,15 @@ @article{acemoglu2001colonial
   year={2001}
 }
 
+@incollection{card1995returns,
+  author={Card, David},
+  title={Using Geographical Variation in College Proximity to Estimate the Return to Schooling},
+  editor={Christofides, L.N. and Grant, E.K. and Swidinsky, R.},
+  booktitle={Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp},
+  year={1995},
+  publisher={University of Toronto Press}
+}
+
 @incollection{forde2024nonparam,
   author    = {Forde, Nathaniel},
   title     = {Bayesian Non-parametric Causal Inference},

diff --git a/pyproject.toml b/pyproject.toml
@@ -39,7 +39,7 @@ dependencies = [
     "scipy",
     "seaborn>=0.11.2",
     "statsmodels",
-    "xarray>=v2022.11.0",
+    "xarray>=v2022.11.0"
 ]
 
 # List additional groups of dependencies here (e.g. development dependencies). Users
@@ -54,17 +54,17 @@ docs = [
     "ipykernel",
     "daft",
     "linkify-it-py",
-    "myst-nb<=1.0.0",
+    "myst-nb!=1.1.0",
     "pathlib",
     "sphinx",
     "sphinx-autodoc-typehints",
     "sphinx_autodoc_defaultargs",
-    "sphinx-design",
+    "sphinx-copybutton",
     "sphinx-rtd-theme",
     "statsmodels",
     "sphinxcontrib-bibtex",
 ]
-lint = ["interrogate", "nbqa", "pre-commit", "ruff"]
+lint = ["interrogate", "pre-commit", "ruff"]
 test = ["pytest", "pytest-cov"]
 
 [metadata]