Skip to content

api(skore): report.metrics.brier_score() and report.metrics.summarize(scoring="brier_score") should have the same behavior #2001

@auguste-probabl

Description

@auguste-probabl

Originally posted by @glemaitre in #1473

Let's take an example:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from skore import ComparisonReport, EstimatorReport

X, y = make_classification(random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
split_data = dict(X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test)

estimator_report_1 = EstimatorReport(LogisticRegression(), **split_data)
estimator_report_2 = EstimatorReport(LinearSVC(), **split_data)

report = ComparisonReport([estimator_report_1, estimator_report_2])

Right now, one can do:

report.metrics.summarize().frame()

and the important line in the report is the following:

Estimator 	LogisticRegression 	LinearSVC
...
Brier score 		0.026684 	NaN
...

Brier score is not defined for LinearSVC because it does not have a predict_proba. This behaviour is nice because it still compute the brier score for the estimator for which it is relevant.

However, the problem occurs if we pass explicitely the metric:

report.metrics.summarize(scoring=["brier_score"])

It will raise the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py:32](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py#line=31), in _AvailableIfDescriptor._check(self, obj, owner)
     31 try:
---> 32     check_result = self.check(obj)
     33 except Exception as e:

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py:9](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py#line=8), in _check_all_checks.<locals>.check(accessor)
      8 def check(accessor: Any) -> bool:
----> 9     return all(check(accessor) for check in checks)

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py:9](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py#line=8), in <genexpr>(.0)
      8 def check(accessor: Any) -> bool:
----> 9     return all(check(accessor) for check in checks)

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py:43](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py#line=42), in _check_estimator_has_method.<locals>.check(accessor)
     41     return True
---> 43 raise AttributeError(
     44     f"Estimator {parent_estimator} is not a supported estimator by "
     45     f"the function called. The estimator should have a `{method_name}` "
     46     "method."
     47 )

AttributeError: Estimator LinearSVC() is not a supported estimator by the function called. The estimator should have a `predict_proba` method.

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[12], line 1
----> 1 report.metrics.brier_score()

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py:713](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py#line=712), in _MetricsAccessor.brier_score(self, data_source, X, y, aggregate)
    653 @available_if(
    654     _check_supported_ml_task(supported_ml_tasks=["binary-classification"])
    655 )
   (...)    662     aggregate: Aggregate | None = ("mean", "std"),
    663 ) -> pd.DataFrame:
    664     """Compute the Brier score.
    665 
    666     Parameters
   (...)    711     Brier score                   0.025...              0.025...
    712     """
--> 713     return self.summarize(
    714         scoring=["brier_score"],
    715         data_source=data_source,
    716         X=X,
    717         y=y,
    718         aggregate=aggregate,
    719     ).frame()

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py:162](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py#line=161), in _MetricsAccessor.summarize(self, data_source, X, y, scoring, scoring_names, scoring_kwargs, pos_label, indicator_favorability, flat_index, aggregate)
     60 def summarize(
     61     self,
     62     *,
   (...)     72     aggregate: Aggregate | None = ("mean", "std"),
     73 ) -> MetricsSummaryDisplay:
     74     """Report a set of metrics for the estimators.
     75 
     76     Parameters
   (...)    160     Recall                       0.97...               0.97...
    161     """
--> 162     results = self._compute_metric_scores(
    163         report_metric_name="summarize",
    164         data_source=data_source,
    165         X=X,
    166         y=y,
    167         scoring=scoring,
    168         pos_label=pos_label,
    169         scoring_kwargs=scoring_kwargs,
    170         scoring_names=scoring_names,
    171         indicator_favorability=indicator_favorability,
    172         aggregate=aggregate,
    173     )
    174     if flat_index:
    175         if isinstance(results.columns, pd.MultiIndex):

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_progress_bar.py:85](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_progress_bar.py#line=84), in progress_decorator.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     83 has_errored = False
     84 try:
---> 85     result = func(*args, **kwargs)
     86     progress.update(
     87         task, completed=progress.tasks[task].total, refresh=True
     88     )
     89     return result

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py:251](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py#line=250), in _MetricsAccessor._compute_metric_scores(self, report_metric_name, data_source, X, y, aggregate, **metric_kwargs)
    246 generator = parallel(
    247     joblib.delayed(getattr(report.metrics, report_metric_name))(**kwargs)
    248     for report in self._parent.reports_
    249 )
    250 individual_results = []
--> 251 for result in generator:
    252     if report_metric_name == "summarize":
    253         # for summarize, the output is a display
    254         individual_results.append(result.frame())

File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/joblib/parallel.py:1913](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/joblib/parallel.py#line=1912), in Parallel._get_sequential_output(self, iterable)
   1911 self.n_dispatched_batches += 1
   1912 self.n_dispatched_tasks += 1
-> 1913 res = func(*args, **kwargs)
   1914 self.n_completed_tasks += 1
   1915 self.print_progress()

File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_estimator/metrics_accessor.py:286](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_estimator/metrics_accessor.py#line=285), in _MetricsAccessor.summarize(self, data_source, X, y, scoring, scoring_names, scoring_kwargs, pos_label, indicator_favorability, flat_index)
    284 # Handle built-in metrics (without underscore prefix)
    285 elif metric in self._SCORE_OR_LOSS_INFO:
--> 286     metric_fn = getattr(self, f"_{metric}")
    287     metrics_kwargs = {"data_source_hash": data_source_hash}
    288     if metric_name is None:

File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py:43](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py#line=42), in _AvailableIfDescriptor.__get__(self, obj, owner)
     39 def __get__(self, obj, owner=None):
     40     if obj is not None:
     41         # delegate only on instances, not the classes.
     42         # this is to allow access to the docstrings.
---> 43         self._check(obj, owner=owner)
     44         out = MethodType(self.fn, obj)
     46     else:
     47         # This makes it possible to use the decorated method as an unbound method,
     48         # for instance when monkeypatching.

File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py:34](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py#line=33), in _AvailableIfDescriptor._check(self, obj, owner)
     32     check_result = self.check(obj)
     33 except Exception as e:
---> 34     raise AttributeError(attr_err_msg) from e
     36 if not check_result:
     37     raise AttributeError(attr_err_msg)

AttributeError: This '_MetricsAccessor' has no attribute '_brier_score'

So we should not have this discrepancy between the two calls and we should only fail when all the estimator does not implement the metric.

The reason why it currently works in the first case but not in the second is because, the EstimatorReport will choose to not compute brier_score when we keep scoring with the default value while we force it to compute it in the second case.

Note that Brier score is just an example of a metric that not every estimator supports.

So far, the agreed-upon specification is:

  • report.metrics.brier_score() and report.metrics.summarize(scoring="brier_score") should always agree.
  • If one of the compared reports does not support a metric (e.g. LinearSVC with Brier score), then the resulting DataFrame after calling report.metrics.summarize should contain np.nan in the corresponding cells. This is the case even if none of the compared reports support a metric (there would be a row of column full of np.nan). In other words, report.metrics.summarize should never raise because a compared report doesn't support a metric.
  • Behavior of EstimatorReport and CrossValidationReport is unchanged: if a metric is not supported, raise an AttributeError.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API 🧑‍💻Improvement of the API facing usersneeds API design 🎨Requires public/private API design before implementation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions