Skip to content

Add pytorch deepiv implementation #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ jobs:
kind: [except-customer-scenarios, customer-scenarios]
include:
- kind: "except-customer-scenarios"
extras: "[plt,ray]"
extras: "[nn,plt,ray]"
pattern: "(?!CustomerScenarios)"
install_graphviz: true
version: '3.12'
Expand Down Expand Up @@ -223,16 +223,16 @@ jobs:
extras: ""
- kind: other
opts: '-m "cate_api and not ray" -n auto'
extras: "[plt]"
extras: "[nn,plt]"
- kind: dml
opts: '-m "dml and not ray"'
extras: "[plt]"
extras: "[nn,plt]"
- kind: main
opts: '-m "not (notebook or automl or dml or serial or cate_api or treatment_featurization or ray)" -n 2'
extras: "[plt,dowhy]"
extras: "[nn,plt,dowhy]"
- kind: treatment
opts: '-m "treatment_featurization and not ray" -n auto'
extras: "[plt]"
extras: "[nn,plt]"
- kind: ray
opts: '-m "ray"'
extras: "[ray]"
Expand Down
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,36 @@ lb, ub = est.effect_interval(X_test, alpha=0.05) # OLS confidence intervals
```
</details>

<details>
<summary>Deep Instrumental Variables (click to expand)</summary>

```Python
import keras
from econml.iv.nnet import DeepIV

treatment_model = keras.Sequential([keras.layers.Dense(128, activation='relu', input_shape=(2,)),
keras.layers.Dropout(0.17),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.17),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dropout(0.17)])
response_model = keras.Sequential([keras.layers.Dense(128, activation='relu', input_shape=(2,)),
keras.layers.Dropout(0.17),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.17),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dropout(0.17),
keras.layers.Dense(1)])
est = DeepIV(n_components=10, # Number of gaussians in the mixture density networks)
m=lambda z, x: treatment_model(keras.layers.concatenate([z, x])), # Treatment model
h=lambda t, x: response_model(keras.layers.concatenate([t, x])), # Response model
n_samples=1 # Number of samples used to estimate the response
)
est.fit(Y, T, X=X, Z=Z) # Z -> instrumental variables
treatment_effects = est.effect(X_test)
```
</details>

See the <a href="#references">References</a> section for more details.

### Interpretability
Expand Down
4 changes: 4 additions & 0 deletions doc/map.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions doc/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,16 @@ Doubly Robust (DR) IV
econml.iv.dr.IntentToTreatDRIV
econml.iv.dr.LinearIntentToTreatDRIV

.. _deepiv_api:

DeepIV
^^^^^^

.. autosummary::
:toctree: _autosummary

econml.iv.nnet.DeepIV

.. _tsls_api:

Sieve Methods
Expand Down
2 changes: 2 additions & 0 deletions doc/spec/comparison.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Detailed estimator comparison
+=============================================+==============+==============+==================+=============+=================+============+==============+====================+
| :class:`.SieveTSLS` | Any | Yes | | Yes | Assumed | Yes | Yes | |
+---------------------------------------------+--------------+--------------+------------------+-------------+-----------------+------------+--------------+--------------------+
| :class:`.DeepIV` | Any | Yes | | | | Yes | Yes | |
+---------------------------------------------+--------------+--------------+------------------+-------------+-----------------+------------+--------------+--------------------+
| :class:`.SparseLinearDML` | Any | | Yes | Yes | Assumed | Yes | Yes | Yes |
+---------------------------------------------+--------------+--------------+------------------+-------------+-----------------+------------+--------------+--------------------+
| :class:`.SparseLinearDRLearner` | Categorical | | Yes | | Projected | | Yes | Yes |
Expand Down
86 changes: 86 additions & 0 deletions doc/spec/estimation/deepiv.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
Deep Instrumental Variables
===========================

Instrumental variables (IV) methods are an approach for estimating causal effects despite the presence of confounding latent variables.
The assumptions made are weaker than the unconfoundedness assumption needed in DML.
The cost is that when unconfoundedness holds, IV estimators will be less efficient than DML estimators.
What is required is a vector of instruments :math:`Z`, assumed to casually affect the distribution of the treatment :math:`T`,
and to have no direct causal effect on the expected value of the outcome :math:`Y`. The package offers two IV methods for
estimating heterogeneous treatment effects: deep instrumental variables [Hartford2017]_
and the two-stage basis expansion approach of [Newey2003]_.

The setup of the model is as follows:

.. math::

Y = g(T, X, W) + \epsilon

where :math:`\E[\varepsilon|X,W,Z] = h(X,W)`, so that the expected value of :math:`Y` depends only on :math:`(T,X,W)`.
This is known as the *exclusion restriction*.
We assume that the conditional distribution :math:`F(T|X,W,Z)` varies with :math:`Z`.
This is known as the *relevance condition*.
We want to learn the heterogeneous treatment effects:

.. math::

\tau(\vec{t}_0, \vec{t}_1, \vec{x}) = \E[g(\vec{t}_1,\vec{x},W) - g(\vec{t}_0,\vec{x},W)]

where the expectation is taken with respect to the conditional distribution of :math:`W|\vec{x}`.
If the function :math:`g` is truly non-parametric, then in the special case where :math:`T`, :math:`Z` and :math:`X` are discrete,
the probability matrix giving the distribution of :math:`T` for each value of :math:`Z` needs to be invertible pointwise at :math:`\vec{x}`
in order for this quantity to be identified for arbitrary :math:`\vec{t}_0` and :math:`\vec{t}_1`.
In practice though we will place some parametric structure on the function :math:`g` which will make learning easier.
In deep IV, this takes the form of assuming :math:`g` is a neural net with a given architecture; in the sieve based approaches,
this amounts to assuming that :math:`g` is a weighted sum of a fixed set of basis functions. [1]_

As explained in [Hartford2017]_, the Deep IV module learns the heterogenous causal effects by minimizing the "reduced-form" prediction error:

.. math::

\hat{g}(T,X,W) \equiv \argmin_{g \in \mathcal{G}} \sum_i \left(y_i - \int g(T,x_i,w_i) dF(T|x_i,w_i,z_i)\right)^2

where the hypothesis class :math:`\mathcal{G}` are neural nets with a given architecture.
The distribution :math:`F(T|x_i,w_i,z_i)` is unknown and so to make the objective feasible it must be replaced by an estimate
:math:`\hat{F}(T|x_i,w_i,z_i)`.
This estimate is obtained by modeling :math:`F` as a mixture of normal distributions, where the parameters of the mixture model are
the output of a "first-stage" neural net whose inputs are :math:`(x_i,w_i,z_i)`.
Optimization of the "first-stage" neural net is done by stochastic gradient descent on the (mixture-of-normals) likelihood,
while optimization of the "second-stage" model for the treatment effects is done by stochastic gradient descent with
three different options for the loss:

* Estimating the two integrals that make up the true gradient calculation by independent averages over
mini-batches of data, which are unbiased estimates of the integral.
* Using the modified objective function

.. math::

\sum_i \sum_d \left(y_i - g(t_d,x_i,w_i)\right)^2

where :math:`t_d \sim \hat{F}(t|x_i,w_i,z_i)` are draws from the estimated first-stage neural net. This modified
objective function is not guaranteed to lead to consistent estimates of :math:`g`, but has the advantage of requiring
only a single set of samples from the distribution, and can be interpreted as regularizing the loss with a
variance penalty. [2]_
* Using a single set of samples to compute the gradient of the loss; this will only be an unbiased estimate of the
gradient in the limit as the number of samples goes to infinity.

Training proceeds by splitting the data into a training and test set, and training is stopped when test set performance
(on the reduced form prediction error) starts to degrade.

The output is an estimated function :math:`\hat{g}`. To obtain an estimate of :math:`\tau`, we difference the estimated
function at :math:`\vec{t}_1` and :math:`\vec{t}_0`, replacing the expectation with the empirical average over all
observations with the specified :math:`\vec{x}`.


.. rubric:: Footnotes

.. [1]
Asymptotic arguments about non-parametric consistency require that the neural net architecture (respectively set of basis functions)
are allowed to grow at some rate so that arbitrary functions can be approximated, but this will not be our concern here.
.. [2]
.. math::

& \int \left(y_i - g(t,x_i,w_i)\right)^2 dt \\
=~& y_i - 2 y_i \int g(t,x_i,w_i)\,dt + \int g(t,x_i,w_i)^2\,dt \\
=~& y_i - 2 y_i \int g(t,x_i,w_i)\,dt + \left(\int g(t,x_i,w_i)\,dt\right)^2 + \int g(t,x_i,w_i)^2\,dt - \left(\int g(t,x_i,w_i)\,dt\right)^2 \\
=~& \left(y_i - \int g(t,x_i,w_i)\,dt\right)^2 + \left(\int g(t,x_i,w_i)^2\,dt - \left(\int g(t,x_i,w_i)\,dt\right)^2\right) \\
=~& \left(y_i - \int g(t,x_i,w_i)\,dt\right)^2 + \Var_t g(t,x_i,w_i)
1 change: 1 addition & 0 deletions doc/spec/estimation_iv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ of [Newey2003]_.
.. toctree::
:maxdepth: 2

estimation/deepiv.rst
estimation/two_sls.rst
estimation/orthoiv.rst
2 changes: 1 addition & 1 deletion econml/iv/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) PyWhy contributors. All rights reserved.
# Licensed under the MIT License.

__all__ = ["dml", "dr", "sieve"]
__all__ = ["dml", "dr", "nnet", "sieve"]
6 changes: 6 additions & 0 deletions econml/iv/nnet/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Copyright (c) PyWhy contributors. All rights reserved.
# Licensed under the MIT License.

from ._deepiv import DeepIV, MixtureOfGaussiansModule

__all__ = ["DeepIV, MixtureOfGaussiansModule"]
Loading
Loading