-
Notifications
You must be signed in to change notification settings - Fork 91
Description
After fitting our model it appears that our validation curve is inverted:
The validation RMSE is systematically lower than the training RMSE which does not make intuitive sense to us.
The modelling was produced with the following code:
`
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=1)
lambdas = np.logspace(0, 8, 12)
folds = KFold(n_splits = 5)
MSE_list =[]
for _lambda in tqdm(lambdas):
pipe_preproc = make_pipeline(PolynomialFeatures(2),StandardScaler(),
Lasso(alpha = _lambda, max_iter = 1000))
MSE_train = []
MSE_list_intermediate = []
for train_index, val_index in tqdm(folds.split(X_train,y_train)):
X_tr, y_tr = X_train.iloc[train_index], y_train.iloc[train_index]
X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index]
MSE_list_intermediate.append(mse(y_val,pipe_preproc.fit(X_tr,y_tr).predict(X_val))**(1/2))
MSE_train.append(mse(y_train,pipe_preproc.fit(X_tr,y_tr).predict(X_train))**(1/2))
MSE_list.append([_lambda] + MSE_list_intermediate + [np.mean(MSE_list_intermediate)] + [np.mean(MSE_train)])
MSE = pd.DataFrame(MSE_list)
MSE.columns = ["Lambda", "Fold 1", "Fold 2","Fold 3","Fold 4","Fold 5","Mean_RMSE", "Mean_RMSE_Evaluation"]
MSE.to_excel("LASSO_output.xlsx")
`
Can anybody help us.
Kind regards Anton and Søren
