This project implements Ridge Regression with cross-validation and learning curve analysis for predictive modeling. It provides tools for hyperparameter tuning, model evaluation, and visualization of regression performance metrics.
- Ridge Regression: Implements L2-regularized linear regression for handling multicollinearity and overfitting
- 10-Fold Cross-Validation: Evaluates model performance across different regularization strengths (alpha values from 1e-4 to 1e4)
- Feature Standardization: Compares model performance with and without standardized covariates
- Learning Curves: Visualizes training and validation MSE across different training set sizes
- Optimal Alpha Selection: Automatically identifies the best regularization parameter for model optimization
- Test Set Evaluation: Assesses model generalization on held-out test data
# Run cross-validation for Ridge regression (basic)
python3 Ridge_CV.py
# Run cross-validation with standardization comparison
python3 Ridge_CV_Standardized.py
# Generate learning curves with optimal alpha
python3 Learning_Curves.py
# Evaluate model on test dataset
python3 Test_Evaluation.pySystematically identify optimal regularization parameters through cross-validation, comparing unscaled and standardized feature performance.
Visualize learning curves to assess model bias-variance tradeoff, determine if more data is needed, and evaluate convergence behavior.
Train Ridge regression models on datasets with multiple features (29 covariates) to predict continuous target variables.
Compare MSE metrics across training, validation, and test sets to understand model generalization.
Designed for research purposes in regression analysis and regularization techniques. The repository includes dummy datasets for experimentation and model evaluation. Penn State University (PSU), IST 557 Data Mining. Fall 2025.