`get_censoring_dist` fits KM to disease event instead of censoring

I think there's a bug in the IPCW weights used for Uno's C-index.
In `pillar/datasets/nlst_utils.py`, `get_censoring_dist` does:

`kmf.fit(times, event_observed)`

where `event_observed = d["y"]` (1 = cancer within follow-up, 0 = no cancer). This fits the KM to the disease process, so the resulting curve is the event survival S(t), not the censoring survival G(t).

Then in pillar/metrics/survival.py, _handle_pairs uses it as:
`weight = 1.0 / (censoring_dist[str(int(truth_time))] ** 2)`
which gives 1/S(t)^2 instead of the 1/G(t)^2 that Uno's formula calls for.

I believe the fix is just flipping the event indicator in the KM fit to event_observed = 1 - d["y"] so it estimates the censoring process instead.
I noticed this while comparing dev vs test AUCs across years -- the gap grows with follow-up time indicating a bias for future risk, caused by the monitor metric. That said, in a screening cohort like NLST where cancer is rare, S(t) stays high (~0.95) while G(t) probably drops faster, so fixing this would probably actually increase late-year weights. The fix is still correct per Uno but worth flagging. I know that this implementation was directly adapted from Sybil, which from what I can tell has the same implementation.

Am I reading this right or is the current implementation intentional?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`get_censoring_dist` fits KM to disease event instead of censoring #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

get_censoring_dist fits KM to disease event instead of censoring #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`get_censoring_dist` fits KM to disease event instead of censoring #7