Add weighting bootstrap inference (weightingboottest) by mdizadi · Pull Request #1275 · py-econometrics/pyfixest

mdizadi · 2026-04-13T19:27:50Z

Summary

The classical pairs bootstrap (sampling with replacement) is equivalent to weighting observations by counts drawn from a Multinomial(N, 1/N, ..., 1/N) distribution. The Bayesian bootstrap (Rubin 1981) is its continuous analogue: weights are drawn from a Dirichlet(α) distribution, which assigns positive mass to every observation and avoids the discreteness of the multinomial. Both are implemented as weightingboottest(), a new observation-reweighting inference method alongside the existing wildboottest() and analytic vcov() options.

This also adds the first pairs/XY bootstrap implementation in pyfixest — method="multinomial" is provably equivalent to resampling rows with replacement.

API

fit = pf.feols("Y ~ X1 + X2 | f1", data)

# Bayesian bootstrap (Rubin 1981) — Dirichlet(α=1), continuous weights
fit.weightingboottest(reps=500, method="bayesian")

# Multinomial bootstrap — equivalent to classical pairs/XY bootstrap
fit.weightingboottest(reps=500, method="multinomial")

# With clustering (panel-aware: unit weights broadcast to observations)
fit.weightingboottest(reps=500, method="bayesian", cluster="unit_id")

# Return full draw matrix for custom inference
res, draws = fit.weightingboottest(reps=500, return_draws=True)

Implementation notes

Re-demeans per bootstrap iteration with the bootstrap weights using pyfixest's native numba demeaning kernel — no one-hot FE encoding, no external package dependency
~12× faster than wildboottest() on large FE panels (500 FEs, N=5,000)
Handles unbalanced panels correctly via np.unique inverse mapping
Filters zero-weight rows before demeaning to prevent division-by-zero in the numba kernel (arises when multinomial draws exclude an entire FE group)
Stores _X_untransformed_df (pre-demeaning DataFrame) to enable correct per-iteration FE re-absorption, mirroring the existing _Y_untransformed
_X_untransformed_df is freed in lean mode via _clear_attributes()

Test plan

Returns correct DataFrame shape and columns (no FE, single FE, two-way FE)
Bootstrap mean close to point estimate
multinomial SE matches manual pairs bootstrap SE within MC noise
Bootstrap SE (α=1) close to analytic iid SE
Invalid method and alpha raise ValueError
Seed reproducibility
cluster= argument works

Co-developed by @mdizadi and Claude Code (Anthropic).

codecov · 2026-04-13T19:33:21Z

Codecov Report

❌ Patch coverage is 97.64706% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pyfixest/estimation/models/feols_.py	96.66%	2 Missing ⚠️

Flag	Coverage Δ
core-tests	`73.35% <97.64%> (+0.27%)`	⬆️
test-r-core	`50.90% <5.88%> (-0.51%)`	⬇️
test-r-extended	`18.40% <3.52%> (-0.17%)`	⬇️
test-r-fixest	`39.77% <5.88%> (-0.39%)`	⬇️
tests-extended	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
pyfixest/estimation/models/feiv_.py	`89.62% <100.00%> (+2.35%)`	⬆️
pyfixest/utils/dgps.py	`55.80% <ø> (ø)`
pyfixest/estimation/models/feols_.py	`88.00% <96.66%> (-4.43%)`	⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

s3alfisc · 2026-04-14T19:50:27Z

+        self,
+        reps: int,
+        method: Literal["bayesian", "multinomial"] = "bayesian",
+        alpha: float = 1.0,


Maybe we could name the alpha arg differently? Right now I believe we might risk that users confuse alpha from the dirchlet list with the significance alpha level?

Thanks for the comment. It is now changed to concentration.
https://en.wikipedia.org/wiki/Concentration_parameter

mdizadi · 2026-04-21T20:08:51Z

pre-commit.ci please re-run

Implements observation-reweighting bootstrap as a complement to wildboottest() and analytic vcov(). Two methods: - bayesian: Dirichlet(alpha) weights (Rubin 1981), continuous - multinomial: Multinomial(N, 1/N) weights = classical pairs/XY bootstrap Key implementation details: - Re-demeans per iteration with bootstrap weights using native numba demeaning, ~12x faster than wildboottest on large FE panels - Stores _X_untransformed_df (pre-demeaning DataFrame) alongside the existing _Y_untransformed to enable correct per-iteration re-absorption - Filters zero-weight rows before demeaning to avoid division-by-zero in the numba kernel (can arise with multinomial draws) - Handles unbalanced panels via np.unique inverse mapping - _X_untransformed_df freed in lean mode via _clear_attributes() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

…ormat Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… with significance level The `alpha` parameter in the Dirichlet weighting bootstrap is a concentration parameter, not a significance level. Rename to `concentration` to match PyTorch/TensorFlow Probability conventions and clarify its meaning. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Refactor the bootstrap loop in Feols.weightingboottest() to delegate per-iteration fitting to _bootstrap_one_rep(), which subclasses can override. Feiv overrides it to solve 2SLS (demean Y, X, Z with bootstrap weights) instead of OLS, giving correct bootstrap SEs for IV models. Also stores _Z_untransformed_df (pre-demeaning instrument DataFrame) alongside the existing _X_untransformed_df for use in the Feiv bootstrap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fixes RUF002 pre-commit failure triggered by master commit c533be2 which added get_ivf_data() with γ_ambition in the docstring. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mdizadi requested a review from s3alfisc April 13, 2026 19:29

s3alfisc reviewed Apr 14, 2026

View reviewed changes

mdizadi force-pushed the bayesian-bootstrap branch 2 times, most recently from 80261a4 to c79d5e0 Compare April 21, 2026 19:18

Mo and others added 8 commits April 21, 2026 22:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

d23673d

for more information, see https://pre-commit.ci

fix pre-commit: unused var, ambiguous Greek char in docstring, ruff f…

f1f8aee

…ormat Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci: trigger pre-commit.ci re-run

051981f

fix: replace ambiguous Greek gamma with ASCII y in dgps.py docstring

4018e55

Fixes RUF002 pre-commit failure triggered by master commit c533be2 which added get_ivf_data() with γ_ambition in the docstring. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mdizadi force-pushed the bayesian-bootstrap branch from 8c1314c to 4018e55 Compare April 21, 2026 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weighting bootstrap inference (weightingboottest)#1275

Add weighting bootstrap inference (weightingboottest)#1275
mdizadi wants to merge 8 commits intomasterfrom
bayesian-bootstrap

mdizadi commented Apr 13, 2026

Uh oh!

codecov Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

s3alfisc Apr 14, 2026

Uh oh!

mdizadi Apr 16, 2026

Uh oh!

mdizadi commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mdizadi commented Apr 13, 2026

Summary

API

Implementation notes

Test plan

Uh oh!

codecov Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

s3alfisc Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

mdizadi Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

mdizadi commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 13, 2026 •

edited

Loading