Deployed on Streamlit: View Live App
- Project Overview
- Dataset
- Data Exploration & Cleaning
- Data Preprocessing
- Model Selection
- Hyperparameter Tuning
- Final Model Performance
- Predictions
- Key Takeaways
- Technologies & Libraries Used
Predict the insurance premium cost for individuals based on their demographic and medical history.
The goal is to estimate accurate premium prices, helping both insurance providers and customers make informed decisions.
- File:
Medicalpremium.csv - Rows: ~1,000
- Columns: 11 features + target (
Insurance Premium)
- Age
- Diabetes (0/1)
- Blood Pressure Problems (0/1)
- Transplants (0/1)
- Chronic Diseases (0/1)
- Height (cm)
- Weight (kg)
- Known Allergies (0/1)
- Cancer History in Family (0/1)
- Number of Major Surgeries
- Insurance Premium (Target)
- Checked for null values → none found
- Verified data ranges:
- Age (18–100)
- Height (100–220 cm)
- Weight (30–200 kg)
- Converted categorical features (
Diabetes,Blood Pressure, etc.) to numeric 0/1 - No duplicate entries found
- Feature Scaling: Applied
StandardScaleron numerical inputs (Age,Height,Weight, etc.) - Saved scaler as
scaler.pklfor deployment consistency
Models evaluated:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
Metrics: MAE, MSE, RMSE, R² Score
Random Forest performed best among all models
Used GridSearchCV with 5-fold cross-validation
Best Parameters:
{
'criterion': 'squared_error',
'max_depth': 8,
'max_features': 'sqrt',
'min_samples_leaf': 1,
'min_samples_split': 5,
'n_estimators': 100
}| Metric | Random Forest |
|---|---|
| MAE | ~665 |
| MSE | ~740,324 |
| RMSE | ~860 |
| R² | ~0.0035 |
- User enters health details (age, conditions, surgeries, etc.) in the Streamlit app
- Input is scaled using saved scaler.pkl
- Model (best_model.pkl) predicts the insurance premium amount
- Random Forest outperformed other regression models
- Scaling features ensures consistent predictions
- Hyperparameter tuning improves model stability
- Deployment on Streamlit allows interactive real-time estimation
- Programming Language: Python 3.10
Libraries:
- pandas – data manipulation
- numpy – numerical operations
- scikit-learn – preprocessing, model selection, evaluation
- matplotlib, seaborn – visualization
- streamlit – web app deployment