api(skore): `report.metrics.brier_score()` and `report.metrics.summarize(scoring="brier_score")` should have the same behavior

_Originally posted by @glemaitre in [#1473](https://github.com/probabl-ai/skore/issues/1473#issuecomment-2969536808)_

> Let's take an example:
> 
> ```python
> from sklearn.datasets import make_classification
> from sklearn.linear_model import LogisticRegression
> from sklearn.model_selection import train_test_split
> from sklearn.svm import LinearSVC
> from skore import ComparisonReport, EstimatorReport
> 
> X, y = make_classification(random_state=42)
> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
> split_data = dict(X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test)
> 
> estimator_report_1 = EstimatorReport(LogisticRegression(), **split_data)
> estimator_report_2 = EstimatorReport(LinearSVC(), **split_data)
> 
> report = ComparisonReport([estimator_report_1, estimator_report_2])
> ```
> 
> Right now, one can do:
> 
> ```python
> report.metrics.summarize().frame()
> ```
> 
> and the important line in the report is the following:
> 
> ```
> Estimator 	LogisticRegression 	LinearSVC
> ...
> Brier score 		0.026684 	NaN
> ...
> ```
> 
> Brier score is not defined for `LinearSVC` because it does not have a `predict_proba`. This behaviour is nice because it still compute the brier score for the estimator for which it is relevant.
> 
> However, the problem occurs if we pass explicitely the metric:
> 
> ```python
> report.metrics.summarize(scoring=["brier_score"])
> ```
> 
> It will raise the following error:
> 
> ```pytb
> ---------------------------------------------------------------------------
> AttributeError                            Traceback (most recent call last)
> File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py:32](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py#line=31), in _AvailableIfDescriptor._check(self, obj, owner)
>      31 try:
> ---> 32     check_result = self.check(obj)
>      33 except Exception as e:
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py:9](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py#line=8), in _check_all_checks.<locals>.check(accessor)
>       8 def check(accessor: Any) -> bool:
> ----> 9     return all(check(accessor) for check in checks)
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py:9](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py#line=8), in <genexpr>(.0)
>       8 def check(accessor: Any) -> bool:
> ----> 9     return all(check(accessor) for check in checks)
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py:43](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_accessor.py#line=42), in _check_estimator_has_method.<locals>.check(accessor)
>      41     return True
> ---> 43 raise AttributeError(
>      44     f"Estimator {parent_estimator} is not a supported estimator by "
>      45     f"the function called. The estimator should have a `{method_name}` "
>      46     "method."
>      47 )
> 
> AttributeError: Estimator LinearSVC() is not a supported estimator by the function called. The estimator should have a `predict_proba` method.
> 
> The above exception was the direct cause of the following exception:
> 
> AttributeError                            Traceback (most recent call last)
> Cell In[12], line 1
> ----> 1 report.metrics.brier_score()
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py:713](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py#line=712), in _MetricsAccessor.brier_score(self, data_source, X, y, aggregate)
>     653 @available_if(
>     654     _check_supported_ml_task(supported_ml_tasks=["binary-classification"])
>     655 )
>    (...)    662     aggregate: Aggregate | None = ("mean", "std"),
>     663 ) -> pd.DataFrame:
>     664     """Compute the Brier score.
>     665 
>     666     Parameters
>    (...)    711     Brier score                   0.025...              0.025...
>     712     """
> --> 713     return self.summarize(
>     714         scoring=["brier_score"],
>     715         data_source=data_source,
>     716         X=X,
>     717         y=y,
>     718         aggregate=aggregate,
>     719     ).frame()
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py:162](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py#line=161), in _MetricsAccessor.summarize(self, data_source, X, y, scoring, scoring_names, scoring_kwargs, pos_label, indicator_favorability, flat_index, aggregate)
>      60 def summarize(
>      61     self,
>      62     *,
>    (...)     72     aggregate: Aggregate | None = ("mean", "std"),
>      73 ) -> MetricsSummaryDisplay:
>      74     """Report a set of metrics for the estimators.
>      75 
>      76     Parameters
>    (...)    160     Recall                       0.97...               0.97...
>     161     """
> --> 162     results = self._compute_metric_scores(
>     163         report_metric_name="summarize",
>     164         data_source=data_source,
>     165         X=X,
>     166         y=y,
>     167         scoring=scoring,
>     168         pos_label=pos_label,
>     169         scoring_kwargs=scoring_kwargs,
>     170         scoring_names=scoring_names,
>     171         indicator_favorability=indicator_favorability,
>     172         aggregate=aggregate,
>     173     )
>     174     if flat_index:
>     175         if isinstance(results.columns, pd.MultiIndex):
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_progress_bar.py:85](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/utils/_progress_bar.py#line=84), in progress_decorator.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
>      83 has_errored = False
>      84 try:
> ---> 85     result = func(*args, **kwargs)
>      86     progress.update(
>      87         task, completed=progress.tasks[task].total, refresh=True
>      88     )
>      89     return result
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py:251](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_comparison/metrics_accessor.py#line=250), in _MetricsAccessor._compute_metric_scores(self, report_metric_name, data_source, X, y, aggregate, **metric_kwargs)
>     246 generator = parallel(
>     247     joblib.delayed(getattr(report.metrics, report_metric_name))(**kwargs)
>     248     for report in self._parent.reports_
>     249 )
>     250 individual_results = []
> --> 251 for result in generator:
>     252     if report_metric_name == "summarize":
>     253         # for summarize, the output is a display
>     254         individual_results.append(result.frame())
> 
> File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/joblib/parallel.py:1913](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/joblib/parallel.py#line=1912), in Parallel._get_sequential_output(self, iterable)
>    1911 self.n_dispatched_batches += 1
>    1912 self.n_dispatched_tasks += 1
> -> 1913 res = func(*args, **kwargs)
>    1914 self.n_completed_tasks += 1
>    1915 self.print_progress()
> 
> File [~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_estimator/metrics_accessor.py:286](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/skore/skore/src/skore/sklearn/_estimator/metrics_accessor.py#line=285), in _MetricsAccessor.summarize(self, data_source, X, y, scoring, scoring_names, scoring_kwargs, pos_label, indicator_favorability, flat_index)
>     284 # Handle built-in metrics (without underscore prefix)
>     285 elif metric in self._SCORE_OR_LOSS_INFO:
> --> 286     metric_fn = getattr(self, f"_{metric}")
>     287     metrics_kwargs = {"data_source_hash": data_source_hash}
>     288     if metric_name is None:
> 
> File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py:43](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py#line=42), in _AvailableIfDescriptor.__get__(self, obj, owner)
>      39 def __get__(self, obj, owner=None):
>      40     if obj is not None:
>      41         # delegate only on instances, not the classes.
>      42         # this is to allow access to the docstrings.
> ---> 43         self._check(obj, owner=owner)
>      44         out = MethodType(self.fn, obj)
>      46     else:
>      47         # This makes it possible to use the decorated method as an unbound method,
>      48         # for instance when monkeypatching.
> 
> File [~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py:34](http://localhost:8888/~/Documents/scikit-learn-workspace/src/skore/.pixi/envs/dev/lib/python3.12/site-packages/sklearn/utils/_available_if.py#line=33), in _AvailableIfDescriptor._check(self, obj, owner)
>      32     check_result = self.check(obj)
>      33 except Exception as e:
> ---> 34     raise AttributeError(attr_err_msg) from e
>      36 if not check_result:
>      37     raise AttributeError(attr_err_msg)
> 
> AttributeError: This '_MetricsAccessor' has no attribute '_brier_score'
> ```
> 
> So we should not have this discrepancy between the two calls and we should only fail when all the estimator does not implement the metric.
> 
> The reason why it currently works in the first case but not in the second is because, the `EstimatorReport` will choose to not compute `brier_score` when we keep `scoring` with the default value while we force it to compute it in the second case. 

Note that Brier score is just an example of a metric that not every estimator supports.

So far, the agreed-upon specification is:
- `report.metrics.brier_score()` and `report.metrics.summarize(scoring="brier_score")` should always agree.
- If one of the compared reports does not support a metric (e.g. LinearSVC with Brier score), then the resulting DataFrame after calling `report.metrics.summarize` should contain `np.nan` in the corresponding cells. This is the case even if none of the compared reports support a metric (there would be a row of column full of `np.nan`). In other words, `report.metrics.summarize` should never raise because a compared report doesn't support a metric.
- Behavior of EstimatorReport and CrossValidationReport is unchanged: if a metric is not supported, raise an AttributeError.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

api(skore): `report.metrics.brier_score()` and `report.metrics.summarize(scoring="brier_score")` should have the same behavior #2001

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

api(skore): report.metrics.brier_score() and report.metrics.summarize(scoring="brier_score") should have the same behavior #2001

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

api(skore): `report.metrics.brier_score()` and `report.metrics.summarize(scoring="brier_score")` should have the same behavior #2001