ENH: Add plot_overlap_common_support() to DoubleMLIRM by akihiroshimoda · Pull Request #389 · DoubleML/doubleml-for-py

akihiroshimoda · 2026-04-10T07:54:42Z

Description

Adds a new plot_overlap_common_support() method to DoubleMLIRM that visualizes the distribution of estimated propensity scores $\hat{m}_0(X) = \hat{E}[D|X]$ split by treatment and control groups.

This is a diagnostic tool for assessing the positivity (overlap) assumption, which is critical for the validity of IPW-based estimators. When propensity scores cluster near 0 or 1, the inverse probability weights become extreme, leading to inflated variance of the treatment effect estimate.

Changes made:
Interactive plotly visualization with KDE curves for treated and control groups (consistent with the existing sensitivity_plot()
API).
Positivity danger zones: shaded regions and dashed threshold lines near 0 and 1.
Built-in diagnostics annotation: displays the percentage of observations in violation zones.
Automatic UserWarning when >5% of observations have propensity scores outside the safe range, with actionable guidance.
Added comprehensive unit tests in test_irm_overlap_plot.py.

Reference to Issues or PRs

None

Comments

Here is an example of the generated plot:

PR Checklist

Please fill out this PR checklist (see our contributing guidelines for details).

The title of the pull request summarizes the changes made.
The PR contains a detailed description of all changes and additions.
References to related issues or PRs are added.
The code passes all (unit) tests.
Enhancements or new feature are equipped with unit tests.
The changes adhere to the PEP8 standards.

SvenKlaassen · 2026-04-13T08:03:27Z

Thank you very much for this.
I have some general comments:

generally i think histograms are more robust visualization than densities.
Since propensity scores are common in a lot of other models, i think a simple utils function which handles propensity_score and treatment as input would be more generally helpful.
I would also add some type of calibration plot (if have added a suggestion below), this can also help to evaluate the ps fit.

Details on Proposed changes

Replace the IRM-specific public method with a generic public plotting entry point in doubleml.utils.
Add a new public module doubleml/utils/plots.py.
Export the public plotting functions from doubleml/utils/__init__.py.
Prefer array-based plotting functions with signature based on (handling only the single treatment design):
- propensity_score
- treatment
- optional plotting arguments such as bins and density
Use histogram-based diagnostics instead of KDE:
- more robust on bounded support [0, 1]
- easier to interpret when scores are clipped
Move tests mainly to doubleml/utils/tests:
- input validation
- bin handling
- boundary values at 0 and 1
- empty-bin behavior
- return type and basic plot structure

Suggested public API

doubleml.utils.plot_propensity_score_calibration

calibration plot sketch

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns


def plot_propensity_score_calibration(
    propensity_score,
    treatment,
    bins=10,
    density=False,
    palette="colorblind",
):
    """
    Plot propensity score distributions and binned calibration curves.

    Parameters
    ----------
    propensity_score : array-like
        Predicted propensity scores of shape (n_samples,).
    treatment : array-like
        Binary treatment indicator of shape (n_samples,).
    bins : int or array-like
        Number of bins or explicit bin edges.
    density : bool
        If True, histogram heights are normalized.
    palette : str or sequence
        Seaborn palette name or explicit colors.

    Returns
    -------
    fig, axes
        Matplotlib figure and 2x2 axes array.
    """
    ps = np.asarray(propensity_score, dtype=float).reshape(-1)
    tr = np.asarray(treatment).reshape(-1)

    if ps.shape != tr.shape:
        raise ValueError("propensity_score and treatment must have the same shape.")
    if ps.ndim != 1:
        raise ValueError("propensity_score and treatment must be one-dimensional.")
    if not np.isin(tr, [0, 1]).all():
        raise ValueError("treatment must be binary with values 0 and 1.")
    if np.any((ps < 0) | (ps > 1)):
        raise ValueError("propensity_score must lie in [0, 1].")

    tr = tr.astype(int)

    if isinstance(bins, int):
        if bins < 2:
            raise ValueError("bins must be at least 2.")
        bins = np.linspace(0.0, 1.0, bins + 1)
    else:
        bins = np.asarray(bins, dtype=float)
        if bins.ndim != 1 or len(bins) < 2:
            raise ValueError("bins must contain at least two edges.")
        if np.any(np.diff(bins) <= 0):
            raise ValueError("bins must be strictly increasing.")

    x_min, x_max = float(bins[0]), float(bins[-1])
    centers = 0.5 * (bins[:-1] + bins[1:])
    widths = np.diff(bins)

    treated_frac = []
    control_frac = []

    for i in range(len(bins) - 1):
        if i < len(bins) - 2:
            mask = (ps >= bins[i]) & (ps < bins[i + 1])
        else:
            mask = (ps >= bins[i]) & (ps <= bins[i + 1])

        if np.sum(mask) == 0:
            treated_frac.append(np.nan)
            control_frac.append(np.nan)
        else:
            p_treated = np.mean(tr[mask] == 1)
            treated_frac.append(p_treated)
            control_frac.append(1.0 - p_treated)

    colors = sns.color_palette(palette, n_colors=2)
    fig, axes = plt.subplots(2, 2, figsize=(12, 10), gridspec_kw={"height_ratios": [2, 1]})

    sns.histplot(
        ps[tr == 1],
        bins=bins,
        stat="density" if density else "count",
        kde=False,
        color=colors[0],
        ax=axes[0, 0],
        label="Treated",
    )
    axes[0, 0].set_title("Treated: Propensity Score Distribution")
    axes[0, 0].set_xlim(x_min, x_max)
    axes[0, 0].set_ylabel("Density" if density else "Count")
    axes[0, 0].legend()

    sns.histplot(
        ps[tr == 0],
        bins=bins,
        stat="density" if density else "count",
        kde=False,
        color=colors[1],
        ax=axes[0, 1],
        label="Control",
    )
    axes[0, 1].set_title("Control: Propensity Score Distribution")
    axes[0, 1].set_xlim(x_min, x_max)
    axes[0, 1].set_ylabel("Density" if density else "Count")
    axes[0, 1].legend()

    axes[1, 0].bar(centers, treated_frac, width=widths, color=colors[0], alpha=0.7)
    axes[1, 0].plot([x_min, x_max], [x_min, x_max], "k--", label="Ideal calibration")
    axes[1, 0].set_title("Treated: Calibration")
    axes[1, 0].set_xlabel("Predicted propensity score")
    axes[1, 0].set_ylabel("Observed treatment fraction")
    axes[1, 0].set_xlim(x_min, x_max)
    axes[1, 0].set_ylim(0, 1)
    axes[1, 0].legend()

    axes[1, 1].bar(centers, control_frac, width=widths, color=colors[1], alpha=0.7)
    axes[1, 1].plot([x_min, x_max], [1 - x_min, 1 - x_max], "k--", label="Ideal calibration")
    axes[1, 1].set_title("Control: Calibration")
    axes[1, 1].set_xlabel("Predicted propensity score")
    axes[1, 1].set_ylabel("Observed control fraction")
    axes[1, 1].set_xlim(x_min, x_max)
    axes[1, 1].set_ylim(0, 1)
    axes[1, 1].legend()

    fig.suptitle("Propensity Score Calibration")
    plt.tight_layout()
    return fig, axes

- Replace KDE overlap plot with histogram-based calibration plot - Generic array-based API: propensity_score, treatment, bins, density, palette - 2x2 matplotlib figure: histograms + binned calibration curves - Move tests to doubleml/utils/tests/ - Address review feedback from PR DoubleML#389

akihiroshimoda · 2026-04-18T08:17:38Z

Thank you for the detailed feedback, @SvenKlaassen! I've addressed all your suggestions in the latest commit:

1. Histograms instead of KDE densities

Replaced the plotly KDE-based visualization with matplotlib/seaborn histograms, which are more robust on the bounded [0, 1] support.

2. Generic utility function

Extracted the plotting logic into a standalone public function doubleml.utils.plots.plot_propensity_score_calibration(propensity_score, treatment, bins, density, palette).
This accepts raw arrays directly, making it reusable across any model that produces propensity scores.
Exported from doubleml.utils.__init__.py.

3. Calibration plot

Implemented the 2×2 figure layout you sketched: histograms (top row) + binned calibration curves (bottom row) with ideal calibration reference lines.

4. Tests moved to doubleml/utils/tests/

Added comprehensive tests covering: input validation, bin handling, boundary values (0 and 1), empty-bin behavior, return type, and plot structure.
Removed the old IRM-specific test file.

The DoubleMLIRM.plot_overlap_common_support() method now delegates to this utility function, keeping the model-level convenience of extracting predictions automatically.
Here is an example of the updated plot:

All 19 tests pass. Looking forward to your review!

SvenKlaassen · 2026-04-19T07:15:24Z

Thank you.

Can you first start to fix the minor issues which are identified by codacy?
Most are mainly formatting or unused code elements.
Further, i would suggest then to also correspondingly rename the propensity score method in the irm class to not completely focus on overlap e.g. plot_propensity_score.

Add plot_overlap_common_support() method to DoubleMLIRM

254fa95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add plot_overlap_common_support() to DoubleMLIRM#389

ENH: Add plot_overlap_common_support() to DoubleMLIRM#389
akihiroshimoda wants to merge 2 commits intoDoubleML:mainfrom
akihiroshimoda:feature/add-overlap-diagnostic-plot

akihiroshimoda commented Apr 10, 2026

Uh oh!

SvenKlaassen commented Apr 13, 2026

Uh oh!

akihiroshimoda commented Apr 18, 2026

Uh oh!

SvenKlaassen commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akihiroshimoda commented Apr 10, 2026

Description

Reference to Issues or PRs

Comments

PR Checklist

Uh oh!

SvenKlaassen commented Apr 13, 2026

Details on Proposed changes

Suggested public API

calibration plot sketch

Uh oh!

akihiroshimoda commented Apr 18, 2026

Uh oh!

SvenKlaassen commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants