This project focuses on building advanced regression models to predict house prices using machine learning techniques. The dataset used is based on the popular Kaggle competition: House Prices - Advanced Regression Techniques, with the dataset sourced from Hands-On ML by Aurรฉlien Gรฉron.
The goal is to predict the final sale price of each house based on a rich set of features. We explore various regression models and preprocessing steps to achieve high accuracy.
- Data cleaning and preprocessing (handling missing values, encoding categorical features, feature engineering)
- Feature scaling using
StandardScaler - Model training and evaluation using
LinearRegression - Batch prediction from CSV files and export of predicted results
- Interactive UI with
Streamlitfor real-time predictions
advanced-house-regression/
โโโ housing.csv # Dataset file
โโโ data_preprocessing.py # Preprocesses and saves the training data
โโโ train_model.py # Trains the model and saves the scaler/model
โโโ predict.py # Script to make a single prediction
โโโ batch_predict.py # Predicts house prices from new_data.csv and exports predictions.csv
โโโ app.py # Streamlit app for web-based prediction
โโโ new_data.csv # Sample new data for batch predictions
โโโ predictions.csv # Output CSV with predictions
โโโ scaler.pkl # Saved StandardScaler object
โโโ linear_regression_model.pkl # Trained model file
โโโ prepared_data.pkl # Scaled and split dataset
โโโ requirements.txt # Python dependencies
โโโ README.md # Project documentation
pip install -r requirements.txtpython3 data_preprocessing.py
python3 train_model.pypython3 predict.pypython3 batch_predict.pystreamlit run app.pyDataset used: housing.csv
pandas
scikit-learn
joblib
streamlit
Predicted House Price: $415,721
Predictions saved to predictions.csv