Skip to content

Inverted validation curve #51

@aabk-bkaa

Description

@aabk-bkaa

After fitting our model it appears that our validation curve is inverted:

image

The validation RMSE is systematically lower than the training RMSE which does not make intuitive sense to us.

The modelling was produced with the following code:

`
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=1)

lambdas = np.logspace(0, 8, 12)

folds = KFold(n_splits = 5)
MSE_list =[]

for _lambda in tqdm(lambdas):
pipe_preproc = make_pipeline(PolynomialFeatures(2),StandardScaler(),
Lasso(alpha = _lambda, max_iter = 1000))
MSE_train = []
MSE_list_intermediate = []

for train_index, val_index in tqdm(folds.split(X_train,y_train)):
    
    X_tr, y_tr = X_train.iloc[train_index], y_train.iloc[train_index]
    X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index]

    MSE_list_intermediate.append(mse(y_val,pipe_preproc.fit(X_tr,y_tr).predict(X_val))**(1/2))
    
    MSE_train.append(mse(y_train,pipe_preproc.fit(X_tr,y_tr).predict(X_train))**(1/2))

MSE_list.append([_lambda] + MSE_list_intermediate + [np.mean(MSE_list_intermediate)] + [np.mean(MSE_train)])

MSE = pd.DataFrame(MSE_list)
MSE.columns = ["Lambda", "Fold 1", "Fold 2","Fold 3","Fold 4","Fold 5","Mean_RMSE", "Mean_RMSE_Evaluation"]

MSE.to_excel("LASSO_output.xlsx")
`

Can anybody help us.

Kind regards Anton and Søren

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions