Skip to content

Select regressor beliefs per (event_start, regressor) in forecasting covariate assembly#2155

Open
Copilot wants to merge 6 commits into
mainfrom
copilot/select-latest-regressor-beliefs
Open

Select regressor beliefs per (event_start, regressor) in forecasting covariate assembly#2155
Copilot wants to merge 6 commits into
mainfrom
copilot/select-latest-regressor-beliefs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 8, 2026

Forecast covariate preparation could drop valid regressor values when multiple regressors had beliefs for the same event_start at different belief_times. Selection was done per joined row (event_start), not per regressor, causing false missing data downstream.

  • Covariate selection logic

    • Added per-regressor selection in BasePipeline.split_data_all_beliefs via a helper that picks one value per (event_start, regressor).
    • Applied this consistently across:
      • past regressors (latest belief per event),
      • realized future regressors (closest realized belief per event),
      • forecast-window future regressors (latest admissible forecast belief per event).
    • This preserves independently known regressor values while still producing one wide covariate row per event_start.
  • Safety guard

    • Added explicit validation for selection mode (latest / closest) to prevent silent misuse.
  • Regression coverage

    • Added a focused test that reproduces mixed-belief-time regressor rows for the same event and asserts both regressor values are retained in the assembled future covariates.
# Before (row-level collapse):
event_start=10:00, belief_time=09:45, regressor_A=NaN, regressor_B=7.0

# After (per-regressor selection):
event_start=10:00, regressor_A=5.0, regressor_B=7.0

Copilot AI and others added 4 commits May 8, 2026 02:41
Agent-Logs-Url: https://github.com/FlexMeasures/flexmeasures/sessions/c980e7ce-6145-48fd-a71b-cd845af19b4e

Co-authored-by: BelhsanHmida <149331360+BelhsanHmida@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FlexMeasures/flexmeasures/sessions/c980e7ce-6145-48fd-a71b-cd845af19b4e

Co-authored-by: BelhsanHmida <149331360+BelhsanHmida@users.noreply.github.com>
Copilot AI changed the title [WIP] Select latest known regressor beliefs per regressor Select regressor beliefs per (event_start, regressor) in forecasting covariate assembly May 8, 2026
Copilot AI requested a review from BelhsanHmida May 8, 2026 02:48
Context:
- PR #2155 needed to keep issue #2154's per-regressor covariate fix while preserving main's forecast belief-time filtering from PR #2134.

Change:
- Resolve main merge conflicts in forecasting covariate assembly.
- Select latest known regressor values per event and regressor within each forecast belief-time step.
- Add regression coverage for past/future mixed-belief regressor rows and realized-regressor leakage.
@BelhsanHmida BelhsanHmida marked this pull request as ready for review May 22, 2026 03:55
Copy link
Copy Markdown
Contributor

@BelhsanHmida BelhsanHmida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. This now addresses #2154 cleanly.

I re-reviewed the covariate selection logic after the follow-up changes. The implementation now selects the latest known non-null value per (event_start, regressor) while preserving the forecast belief_time admissibility rules, so it fixes the original row-level collapse without reintroducing future-belief leakage.

I also checked the previous concerns:

  • realized regressor values are selected inside each forecast step, not precomputed across steps;
  • selection uses latest known per regressor, not closest;
  • existing belief-time boundary behavior from #2134 is preserved.

@BelhsanHmida BelhsanHmida requested a review from Flix6x May 22, 2026 03:57
Copy link
Copy Markdown
Contributor

@Flix6x Flix6x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't have problems with the coding. Looks like it solves a real issue. I'd just like to understand the test cases better, and also have a better chance at understanding them years from now ("code is read much more often than it is written." someone wise once wrote).

Comment on lines +285 to +298
Select the latest non-null value per `(event_start, regressor)`.

Parameters
----------
data : pd.DataFrame
Input frame with `event_start`, `belief_time`, and regressor columns.
regressor_columns : list[str]
Regressor columns to select values for independently.

Returns
-------
pd.DataFrame
Wide frame with one row per event_start and one selected value
per regressor column.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +680 to +685
def capture_frame(self, df, sensors, sensor_names, start, end, **kwargs):
if sensor_names == self.future_regressors:
captured_future_frames.append(df.copy())
return df

monkeypatch.setattr(BasePipeline, "detect_and_fill_missing_values", capture_frame)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate your tests. This part seems to be a clever trick to insert data into the pipeline without needing to involve the database.

Comment on lines +689 to +695
assert len(captured_future_frames) == 1
selected = captured_future_frames[0].set_index("event_start")
assert selected.loc[pd.Timestamp("2025-01-08T09:00:00"), regressor_a] == 4.0
assert selected.loc[pd.Timestamp("2025-01-08T10:00:00"), regressor_a] == 5.0
assert selected.loc[pd.Timestamp("2025-01-08T10:00:00"), regressor_b] == 8.0
assert 50.0 not in set(selected[regressor_a].dropna())
assert 80.0 not in set(selected[regressor_b].dropna())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please frame your expectations, specifically, by appending something like , "expected this, because of that" to each assert. Without it, I have to build up my own expectations just from the name of the test function and studying the inserted data. Please offer extra guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Select latest known regressor beliefs per regressor instead of per event_start row

3 participants