Inverted validation curve

After fitting our model it appears that our validation curve is inverted:

![image](https://user-images.githubusercontent.com/67871962/91153988-33c75700-e6c1-11ea-83c7-c6c844fb879d.png)

The validation RMSE is systematically lower than the training RMSE which does not make intuitive sense to us.

The modelling was produced with the following code:

`
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=1)


lambdas =  np.logspace(0, 8, 12)


folds = KFold(n_splits = 5)
MSE_list =[]

for _lambda in tqdm(lambdas):
    pipe_preproc = make_pipeline(PolynomialFeatures(2),StandardScaler(),
                             Lasso(alpha = _lambda, max_iter = 1000))
    MSE_train = []
    MSE_list_intermediate = []
    
    
    
    for train_index, val_index in tqdm(folds.split(X_train,y_train)):
        
        X_tr, y_tr = X_train.iloc[train_index], y_train.iloc[train_index]
        X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index]

        MSE_list_intermediate.append(mse(y_val,pipe_preproc.fit(X_tr,y_tr).predict(X_val))**(1/2))
        
        MSE_train.append(mse(y_train,pipe_preproc.fit(X_tr,y_tr).predict(X_train))**(1/2))

    MSE_list.append([_lambda] + MSE_list_intermediate + [np.mean(MSE_list_intermediate)] + [np.mean(MSE_train)])
    

MSE = pd.DataFrame(MSE_list)
MSE.columns = ["Lambda", "Fold 1", "Fold 2","Fold 3","Fold 4","Fold 5","Mean_RMSE", "Mean_RMSE_Evaluation"]


MSE.to_excel("LASSO_output.xlsx")
`

Can anybody help us.

Kind regards Anton and Søren


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inverted validation curve #51

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inverted validation curve #51

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions