Skip to content

feat(stats): reviewer-proof correlation helpers + prisma correlate#2

Open
javiercuervo wants to merge 3 commits into
mainfrom
feat/stats-correlations
Open

feat(stats): reviewer-proof correlation helpers + prisma correlate#2
javiercuervo wants to merge 3 commits into
mainfrom
feat/stats-correlations

Conversation

@javiercuervo

Copy link
Copy Markdown
Contributor

What

Adds a new prisma.stats subpackage and a prisma correlate CLI command: correlation and robustness statistics designed so every estimate ships with what a sceptical reviewer asks for.

  • fisher_ci(r, n) — Fisher z-transform 95% CI for a Pearson r.
  • correlation_table(df, x, y, group=None) — Pearson r + Fisher-z CI and Spearman ρ, with n, overall and per stratum (the Pearson/Spearman gap flags outlier-driven associations).
  • partial_correlation(df, x, y, covar) — first-order partial r(x,y|covar) (controls for confounders such as general ability).
  • bootstrap_ci(df, x, y) — percentile bootstrap CI for small/fragile cells.
  • missingness_compare(df, indicator, by) — selection-bias check (Mann-Whitney) between present vs missing rows.
  • mixed_model_icc(df, outcome, predictor, group) — random-intercept model + ICC for clustered data (students nested in cohorts/courses); optional statsmodels.

Why

Extracted while hardening the statistical rigour of the 20-60-20 AI-permitted-assessment study (Cuervo, 2026, Universidade de Aveiro) against an adversarial three-reviewer panel: within-stratum reporting, rank robustness, partial correlation for mechanical/ability overlap, bootstrap for small cells, missingness and clustering. Reusable for any education/SLR dataset.

Notes

  • Adds scipy as a core dependency; statsmodels is an optional [stats] extra (only mixed_model_icc needs it).
  • New CLI: prisma correlate --in data.csv --x A --y B [--group G] [--partial Z] [--bootstrap] [--out table.csv].
  • Tests: tests/test_stats.py (5 cases); full suite passes (16/16). CHANGELOG updated.

🤖 Generated with Claude Code

javiercuervo and others added 2 commits June 13, 2026 15:24
Add prisma.stats: fisher_ci, correlation_table (Pearson + Fisher-z CI +
Spearman, overall and per stratum), partial_correlation, bootstrap_ci,
missingness_compare (Mann-Whitney selection check), and mixed_model_icc
(random-intercept clustering + ICC). New 'prisma correlate' CLI command.
scipy becomes a core dependency; statsmodels is an optional [stats] extra.
Extracted while hardening the statistical rigour of the 20-60-20
AI-permitted-assessment study (Cuervo, 2026, Universidade de Aveiro).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@javiercuervo

Copy link
Copy Markdown
Contributor Author

CI arreglada: el step ruff check fallaba en src/prisma/stats/correlations.py con 3 errores que tumbaban los jobs 3.10/3.11/3.12. Aplicado en 3dae65d (un PR = una razón, solo lint): (1) F401 — eliminado Iterable, que estaba importado y sin usar; (2) UP035 — Sequence ahora se importa de collections.abc en lugar de typing (convención Python 3.9+); (3) RUF002 — sustituido el EN DASH ambiguo por - en el docstring de partial_correlation ("x-y association"). ruff check pasa limpio en local. Al re-correr la CI debería ponerse verde. Nota: no toqué el WIP sin commitear (README.md, RELEASING.md, docs/citing-articles.md) — queda pendiente para su propio PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant