-
Notifications
You must be signed in to change notification settings - Fork 776
create standard dgp for metric aggregation #705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
2d8a94d
5961cbf
821f70d
47ad1ca
0967600
5f55d9b
0fc93aa
507b6cf
2481e2e
7f29409
982f5f0
f815fb5
a489da5
63c496a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -3,6 +3,7 @@ | |||||||||||
import pandas as pd | ||||||||||||
from sklearn.preprocessing import StandardScaler | ||||||||||||
from sklearn.utils import check_random_state | ||||||||||||
from scipy.special import expit | ||||||||||||
|
||||||||||||
_ihdp_sim_file = os.path.join(os.path.dirname(__file__), "ihdp", "sim.csv") | ||||||||||||
_ihdp_sim_data = pd.read_csv(_ihdp_sim_file) | ||||||||||||
|
@@ -93,3 +94,182 @@ def _process_ihdp_sim_data(): | |||||||||||
# Append a column of ones as intercept | ||||||||||||
X = np.insert(X, 0, np.ones(X.shape[0]), axis=1) | ||||||||||||
return T, X | ||||||||||||
|
||||||||||||
|
||||||||||||
class StandardDGP(): | ||||||||||||
|
||||||||||||
def __init__(self, | ||||||||||||
n=1000, | ||||||||||||
d_t=1, | ||||||||||||
d_y=1, | ||||||||||||
d_x=5, | ||||||||||||
d_z=None, | ||||||||||||
discrete_treatment=False, | ||||||||||||
discrete_instrument=False, | ||||||||||||
squeeze_T=False, | ||||||||||||
squeeze_Y=False, | ||||||||||||
nuisance_Y=None, | ||||||||||||
nuisance_T=None, | ||||||||||||
nuisance_TZ=None, | ||||||||||||
theta=None, | ||||||||||||
y_of_t=None, | ||||||||||||
x_eps=1, | ||||||||||||
y_eps=1, | ||||||||||||
t_eps=1 | ||||||||||||
): | ||||||||||||
self.n = n | ||||||||||||
self.d_t = d_t | ||||||||||||
self.d_y = d_y | ||||||||||||
self.d_x = d_x | ||||||||||||
self.d_z = d_z | ||||||||||||
|
||||||||||||
self.discrete_treatment = discrete_treatment | ||||||||||||
self.discrete_instrument = discrete_instrument | ||||||||||||
self.squeeze_T = squeeze_T | ||||||||||||
self.squeeze_Y = squeeze_Y | ||||||||||||
|
||||||||||||
if callable(nuisance_Y): | ||||||||||||
self.nuisance_Y = nuisance_Y | ||||||||||||
else: # else must be dict | ||||||||||||
if nuisance_Y is None: | ||||||||||||
nuisance_Y = {'support': self.d_x, 'degree': 1} | ||||||||||||
nuisance_Y['k'] = self.d_x | ||||||||||||
|
nuisance_Y['k'] = self.d_x | |
assert(isinstance(nuisance_Y, dict), f"nuisance_Y must be a callable or dict, but got {type(nuisance_Y)}") | |
nuisance_Y['k'] = self.d_x |
(and likewise for the other similar arguments)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting nuisance_Y['k']
is not strictly necessary since the default behavior of gen_nuisance
will do this for you, but perhaps it's worth being explicit.
More importantly, making this assignment will mutate the dictionary that was passed in, which is possibly undesirable; probably better to do something like:
nuisance_Y['k'] = self.d_x | |
nuisance_Y = { **nuisance_Y, 'k':self.d_x } |
which will create a new dictionary instead of altering the old one.
Furthermore, what if the dictionary already had a 'k'
entry - do you really want to ignore and override it?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider allowing the user to specify a seed so that results are reproducible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like y_of_t is really a treatment featurizer, is that right? If so, change the name accordingly.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider allowing heavier-tailed distributions than just normal (or incorporate outliers in some other way), here and wherever else you're using randomness.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove dead code unless it has a purpose (like presenting an alternative we might want to use in some specific scenario, in which case also document what that purpose is), in which case also document what that purpose is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(very minor)