- 
                Notifications
    
You must be signed in to change notification settings  - Fork 97
 
Labels
API 🧑💻Improvement of the API facing usersImprovement of the API facing usersready for dev 💻Issue specified enough and ready to be implementedIssue specified enough and ready to be implemented
Description
Is your feature request related to a problem? Please describe.
As a data scientist, I would like to see if my model is overfitting or not, by comparing the score of my model on the train set and on the test set.
As of v0.9.1, to get the summary of metrics on the test set, I can do:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from skore import EstimatorReport
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
ridge_report = EstimatorReport(
    make_pipeline(StandardScaler(), Ridge()),
    X_train=X_train,
    X_test=X_test,
    y_train=y_train,
    y_test=y_test,
)
ridge_report.metrics.summarize().frame()  # default on testand for the train:
ridge_report.metrics.summarize(data_source="train").frame() But I can't have both the train and test sets on the same dataframe.
Describe the solution you'd like
Have something like:
ridge_report.metrics.summarize(data_source="all").frame()for both train and test sets, which would return this kind of dataframe:
Describe alternatives you've considered, if relevant
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
API 🧑💻Improvement of the API facing usersImprovement of the API facing usersready for dev 💻Issue specified enough and ready to be implementedIssue specified enough and ready to be implemented
