-
Notifications
You must be signed in to change notification settings - Fork 56
[ODSC-75899] : Add post-processing step for forecast clipping #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics need to be generated using the post-processing numbers. Right now they use the pre-processing numbers, which means the metrics are the same whether they post-process or not. Yet the report uses the post-processed numbers. That seems silly.
Do the clip, then generate all numbers and metrics.
And keep postprocessing in backtesting, that's important!
self.postprocessing.set_max_forecast, | ||
) | ||
if min_threshold is not None or max_threshold is not None: | ||
np.clip(forecast_val, min_threshold, max_threshold, out=forecast_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use "out=forecast_val" instead of "forecast_val=..."?
horizon=self.spec.horizon, | ||
target_column=self.original_target_column, | ||
dt_column=self.spec.datetime_column.name, | ||
postprocessing=self.spec.postprocessing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass this everywhere if it's already part of "self"?
backtest_spec["output_directory"] = {"url": output_file_path} | ||
backtest_spec["target_category_columns"] = [DataColumns.Series] | ||
backtest_spec["generate_explanations"] = False | ||
backtest_spec.pop('postprocessing', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why we'd want to pop this.
Don't we want to evaluate models based on which is giving the best end result? The post-processing is part of the end result.
I could imagine a case where the true values look like a sin function with a floor of 0. Backtesting may evaluate 2 options: a sin function and a const = sqrt(2).
Backtesting may favour the const function, even though the sin + post-processing would have a 0 MAPE.
I think we should keep postprocessing, but lets chat if you think otherwise.
get_frequency_of_datetime, | ||
) | ||
|
||
from ..const import ForecastOutputColumns, SupportedModels, TROUBLESHOOTING_GUIDE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to standardize on 1 linter. It's hard to read these PRs with all this junk
self, | ||
attr, | ||
pd.concat([val_self, val_other], ignore_index=True, axis=0), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming this is just linter
self.postprocessing | ||
if self.postprocessing is not None | ||
else PostprocessingSteps() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is more lines than the simpler
if self.postprocessing is None:
self.postprocessing = PostprocessingSteps()
type: integer | ||
required: false | ||
meta: | ||
description: "This can be used to define the maximum forecast in the output." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
"Set a minimum value for the forecast" and "Set a maximum value for the forecast"
type: dict | ||
required: false | ||
schema: | ||
set_min_forecast: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just "min", "max"?
We don't use "set" anywhere ele and "forecast" is redundant
self.postprocessing.set_max_forecast, | ||
) | ||
if min_threshold is not None or max_threshold is not None: | ||
np.clip(forecast_val, min_threshold, max_threshold, out=forecast_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's break this out into a separate method so it's easier to find in the future. But i love the simplicity of using clip here.
Add post-processing step for forecast clipping
This change introduces a new post-processing step for the forecast operator that allows users to clip the forecast output to a specified minimum and maximum value. This is useful for ensuring that the forecast values remain within a realistic or desirable range.
This change includes:
PostprocessingSteps
configuration to define the min/max clipping values.numpy.clip
.