Skip to content

Conversation

codeloop
Copy link
Member

@codeloop codeloop commented Sep 7, 2025

Add post-processing step for forecast clipping

This change introduces a new post-processing step for the forecast operator that allows users to clip the forecast output to a specified minimum and maximum value. This is useful for ensuring that the forecast values remain within a realistic or desirable range.

This change includes:

  • A new PostprocessingSteps configuration to define the min/max clipping values.
  • Updates to the operator schema to expose the new configuration.
  • Implementation of the clipping logic using numpy.clip.
  • A new test case to verify the clipping functionality.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Sep 7, 2025
@codeloop codeloop requested a review from prasankh September 8, 2025 02:42
@codeloop codeloop marked this pull request as ready for review September 8, 2025 02:42
@codeloop codeloop enabled auto-merge September 8, 2025 02:42
@codeloop codeloop changed the title Add post-processing step for forecast clipping [ODSC-75899] : Add post-processing step for forecast clipping Sep 8, 2025
prasankh

This comment was marked as resolved.

@codeloop codeloop requested a review from prasankh September 9, 2025 06:15
Copy link
Member

@ahosler ahosler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics need to be generated using the post-processing numbers. Right now they use the pre-processing numbers, which means the metrics are the same whether they post-process or not. Yet the report uses the post-processed numbers. That seems silly.

Do the clip, then generate all numbers and metrics.
And keep postprocessing in backtesting, that's important!

self.postprocessing.set_max_forecast,
)
if min_threshold is not None or max_threshold is not None:
np.clip(forecast_val, min_threshold, max_threshold, out=forecast_val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use "out=forecast_val" instead of "forecast_val=..."?

horizon=self.spec.horizon,
target_column=self.original_target_column,
dt_column=self.spec.datetime_column.name,
postprocessing=self.spec.postprocessing,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass this everywhere if it's already part of "self"?

backtest_spec["output_directory"] = {"url": output_file_path}
backtest_spec["target_category_columns"] = [DataColumns.Series]
backtest_spec["generate_explanations"] = False
backtest_spec.pop('postprocessing', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why we'd want to pop this.
Don't we want to evaluate models based on which is giving the best end result? The post-processing is part of the end result.

I could imagine a case where the true values look like a sin function with a floor of 0. Backtesting may evaluate 2 options: a sin function and a const = sqrt(2).

Backtesting may favour the const function, even though the sin + post-processing would have a 0 MAPE.

I think we should keep postprocessing, but lets chat if you think otherwise.

get_frequency_of_datetime,
)

from ..const import ForecastOutputColumns, SupportedModels, TROUBLESHOOTING_GUIDE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to standardize on 1 linter. It's hard to read these PRs with all this junk

self,
attr,
pd.concat([val_self, val_other], ignore_index=True, axis=0),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming this is just linter

self.postprocessing
if self.postprocessing is not None
else PostprocessingSteps()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more lines than the simpler

if self.postprocessing is None:
self.postprocessing = PostprocessingSteps()

type: integer
required: false
meta:
description: "This can be used to define the maximum forecast in the output."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about
"Set a minimum value for the forecast" and "Set a maximum value for the forecast"

type: dict
required: false
schema:
set_min_forecast:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just "min", "max"?
We don't use "set" anywhere ele and "forecast" is redundant

self.postprocessing.set_max_forecast,
)
if min_threshold is not None or max_threshold is not None:
np.clip(forecast_val, min_threshold, max_threshold, out=forecast_val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break this out into a separate method so it's easier to find in the future. But i love the simplicity of using clip here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants