Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bayesian instrumental variable estimation #213

Merged
merged 34 commits into from
Aug 24, 2023

Conversation

NathanielF
Copy link
Contributor

@NathanielF NathanielF commented Jun 30, 2023

Working through the repo to figure out how to best integrate the multivariate approach to Bayesian instrumental regression as seen in @juanitorduz 's blog. This is very early stages. Just want to make sure to test the basic idea (seems to work!) and get to know the Repo.

In addition happy to take feedback and have a conversation about choices. Will be experimenting a bit this weekend.

Relates to Add functionality for Bayesian Instrumental Variables #212

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@juanitorduz
Copy link
Collaborator

Cool! 🚀 !

@NathanielF
Copy link
Contributor Author

NathanielF commented Jul 2, 2023

Docs example seems to build locally:

image

Model fit

image

@NathanielF
Copy link
Contributor Author

Ok @drbenvincent I think this is starting to look like something concrete.

I've added a new PyMC model type and a PyMC experiment type to run the Bayesian version of the IV regression discussed in @juanitorduz 's blog post. The docs build and I've added an example, and one integration test. I've compared in my example the Bayesian estimate with the two-stage least squares approach estimated with Sklearn's Linear Regression class. I've also chosen to allow priors to be specified for the coefficients. Defaulting to the MLE estimates where none are specified.

Before I add more to the documentation and the write up, and add some more data validation tests and the like. I wanted to check that this approach made sense and fits into the broad patterns you already have here.

Comment on lines 158 to 160
sd_dist = pm.HalfCauchy.dist(beta=2, shape=2)
chol, corr, sigmas = pm.LKJCholeskyCov(
name="chol_cov", eta=2, n=2, sd_dist=sd_dist
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to make these prior parameters also customizable through the coords mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to clear up what you mean here - should I let the coord names and values be changeable or the prior values for the Lkj var. Or both?



@pytest.mark.integration
def test_iv_reg():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check the expected effect estimation to test the implementation itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to parametrize this test with different priors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely intend to add more tests of this type. Just added one to ensure I could run the testing code.

Copy link
Contributor Author

@NathanielF NathanielF Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I've added a parameter recovery example to the accompanying notebook. I just wasn't sure we wanted any long-running tests. It can take a little time to sample the multivariate distribution...

@@ -0,0 +1,4160 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    # %config InlineBackend.figure_format = 'svg'

shall we user "retina" ?


Reply via ReviewNB

Copy link
Collaborator

@drbenvincent drbenvincent Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea. I was experimenting with svg at one point to try to keep file sizes down. But I have a vague memory that svg had some rendering issues on either GitHub or readthedocs.

@@ -0,0 +1,4160 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #6.    simple_ols_reg = sk_lin_reg().fit(X, y)

Not relevant for this PR but I think it would be nicer to test the bayesian estimates with statsmodels so that we get the usual stats like p-values and confidence intervals. WDYT?


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an ambition of specifically adding in statsmodels as a CausalPy backend, but I think you meant just to compare with a simple statsmodels output just for this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't sure of its status as a dependency and realised I could use sklearn to do 2SLS so I didn't have to risk adding any more dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a discussion or issue to discuss this. As I mentioned initially, this is irrelevant to the PR 🙈 !

@juanitorduz
Copy link
Collaborator

@NathanielF It is looking good! I left some minor comments :)

@codecov
Copy link

codecov bot commented Jul 3, 2023

Codecov Report

Merging #213 (cb68c78) into main (31e0039) will increase coverage by 2.60%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #213      +/-   ##
==========================================
+ Coverage   71.26%   73.86%   +2.60%     
==========================================
  Files          19       19              
  Lines        1044     1148     +104     
==========================================
+ Hits          744      848     +104     
  Misses        300      300              
Files Changed Coverage Δ
causalpy/data/datasets.py 92.30% <ø> (ø)
causalpy/pymc_experiments.py 66.66% <100.00%> (+6.41%) ⬆️
causalpy/pymc_models.py 100.00% <100.00%> (ø)
causalpy/tests/test_input_validation.py 100.00% <100.00%> (ø)
causalpy/tests/test_integration_pymc_examples.py 100.00% <100.00%> (ø)

@drbenvincent
Copy link
Collaborator

Nice! Some quick thoughts:

  • I've enabled bibtex references in the docs now, so if you update from main then you can do that in your example notebook. See one of the existing examples... you have to remember to add the reference section info in a markdown cell for the references to show up.
  • We don't have anything specific in the contributing guide on example notebooks. Let me know if you think I should add that in, but at this point things are a bit simpler than pymc-examples.
  • I think Instrumental Variables are worthy of a new section in the TOC, what do you think?

I should have time to take a proper look at the code and example notebook in the next few days.

docs/source/notebooks/iv_pymc.ipynb Outdated Show resolved Hide resolved
@@ -0,0 +1,4154 @@
{
Copy link
Collaborator

@drbenvincent drbenvincent Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth running black on this. It should be done in the regular pre-commit I believe


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this runs at each pre-commit check

docs/source/notebooks/iv_pymc.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/iv_pymc.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/iv_pymc.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/iv_pymc.ipynb Outdated Show resolved Hide resolved
docs/source/notebooks/iv_pymc.ipynb Outdated Show resolved Hide resolved
@drbenvincent
Copy link
Collaborator

This is all looking very promising. So far I've just done a quick review of the example notebook.

Will try to look at the code changes soon.

Need to update from main as there have been quite a few updates :)

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in a previous comment I requested updates to the glossary. Bear in mind that the way switched to a 'proper' sphinx glossary, so once you've updated from main, there should be a glossary.rst file.

@NathanielF NathanielF marked this pull request as ready for review July 22, 2023 21:33
Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

The example notebook is great. Depending what we do with adding separate non-Bayesian classes for IV's, we could just have this one notebook as it already has a component of comparing Bayesian and non-Bayesian approaches. So we could rename the notebook to just be general, removing reference to PyMC.

causalpy/pymc_experiments.py Outdated Show resolved Hide resolved
causalpy/pymc_experiments.py Show resolved Hide resolved
causalpy/pymc_experiments.py Outdated Show resolved Hide resolved
causalpy/pymc_models.py Show resolved Hide resolved
causalpy/pymc_experiments.py Show resolved Hide resolved
@NathanielF
Copy link
Contributor Author

I think i've addressed the above points, but let me know if there is anything missing?

@drbenvincent drbenvincent self-requested a review August 23, 2023 10:45
Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a axis labels for the final plot in the notebook?

I think after this we'll merge :)

as an outcome variable and in the data object to be used as a covariate.
"""
)
check_binary = len(self.data[treatment.strip()].unique()) > 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check might need to be more robust because we're seeing the exception raised in the example notebook where presumably there is no problem.

It could be worth adding a test to check that we aren't getting false positives or false negatives with this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is actually working fine. It just so happens that the treatment variable risk in the model is a score between 0-10. The treatment effect here should be thought of as a dosage treatment rather than a binary treatment here. I've added a line to the write up mentioning this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

@drbenvincent drbenvincent Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. So I misunderstood before - I didn't read that closely and assumed it had to be binary. If that's not the case, then I guess we can remove that exception? Sorry about that.

EDIT: For some reason it wasn't showing changes to me. I see that you changed the wording of the exception. Happy whichever way you want to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it in. I think many of the use-cases will be binary treatments, and it's no harm to be reminded to take care interpreting coefficients. Can revisit when i implement the sklearn/statsmodels IV class if you like.

causalpy/pymc_models.py Outdated Show resolved Hide resolved
@drbenvincent
Copy link
Collaborator

Depending on the previous commend about the exception, this looks ready. Ok for me to merge?

@NathanielF
Copy link
Contributor Author

Happy for you to merge! Thanks!

@drbenvincent drbenvincent changed the title [IV 212] Adding base classes required for bayesian IVs Add Bayesian instrumental variable estimation Aug 24, 2023
@drbenvincent drbenvincent added the enhancement New feature or request label Aug 24, 2023
@drbenvincent drbenvincent linked an issue Aug 24, 2023 that may be closed by this pull request
@drbenvincent drbenvincent merged commit 3070bc9 into pymc-labs:main Aug 24, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add functionality for Bayesian Instrumental Variables
3 participants