Weighted Least Squares (WLS) and Ordinary Least Squares (OLS). Covers heteroscedasticity, Feasible WLS, Huber robust regression, and validates methods with a Monte Carlo simulation
This project presents an in-depth analysis of Weighted Least Squares (WLS) regression, comparing it with Ordinary Least Squares (OLS) and alternative estimation methods in the presence of heteroscedasticity. Through systematic implementation, diagnostic evaluation, and empirical validation, this project provides a comprehensive understanding of regression techniques for heteroscedastic data.
- Investigate the impact of heteroscedasticity on regression estimation
- Compare WLS and OLS performance across multiple metrics
- Implement practical alternatives including Feasible WLS and robust methods
- Validate findings through Monte Carlo simulation
- Provide reproducible Python implementations with thorough diagnostics
- Implementation with known variance weights
- Comparison of efficiency gains over OLS
- Prediction interval analysis
- Two-stage estimation when variance structure is unknown
- Residual-based weight estimation
- Performance comparison with true WLS
- Iterative reweighting approaches
- Huber's M-estimator for outlier resistance
- Convergence behavior analysis
- Residual analysis and heteroscedasticity detection
- Q-Q plots for normality assessment
- Comprehensive model comparison metrics
- Monte Carlo simulation (1000 iterations)
- Bias, variance, and MSE comparison
- Small-sample performance evaluation
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats
from statsmodels.iolib.table import SimpleTable- Artificial dataset with controlled heteroscedasticity
- Two variance groups (low/high) with 3:1 standard deviation ratio
- Quadratic true relationship with linear estimation (intentional misspecification)
- Feasible WLS (FWLS) - Best overall when variance is unknown
- WLS with known weights - Optimal when variance structure is known
- Huber Robust Regression - Excellent outlier resistance
- OLS with HC corrections - Moderate improvement over standard OLS
- Standard OLS - Least efficient under heteroscedasticity
- FWLS performs comparably to WLS with known weights
- Iterative reweighting requires careful implementation to avoid instability
- Model diagnostics are crucial for identifying remaining issues
- Monte Carlo validation confirms theoretical efficiency advantages
pip install numpy matplotlib statsmodels scipy- Clone the repository
- Install required dependencies
- Open and run the Jupyter notebook
- Modify parameters to explore different scenarios
- Variance ratio between groups
- Sample size
- Degree of model misspecification
- Heteroscedasticity patterns
- Number of Monte Carlo iterations
- Heteroscedasticity detected in residual plots
- Prior knowledge of variance structure available
- Prediction precision is a primary concern
- Efficient parameter estimation required
- Outliers present → Huber robust regression
- Variance structure unknown → FWLS
- Limited sample size → OLS with HC corrections
- Computational simplicity needed → Standard OLS
- Residual plots - Check for heteroscedasticity patterns
- Q-Q plots - Assess normality assumption
- Standard error comparison - Evaluate efficiency gains
- Prediction intervals - Compare precision
- Model selection criteria - AIC/BIC comparison
- Monte Carlo results - Validate small-sample performance
WLS minimizes:
Under heteroscedasticity, WLS achieves the Gauss-Markov property (Best Linear Unbiased Estimator), while OLS remains unbiased but inefficient.
Through this analysis, users will understand:
- The impact of heteroscedasticity on regression estimation
- Practical implementation of WLS and alternatives
- Diagnostic techniques for model validation
- Empirical performance evaluation methods
- Trade-offs between different estimation approaches
Contributions are welcome! Please feel free to:
- Report issues or bugs
- Suggest enhancements or additional methods
- Improve documentation
- Share use cases or applications
This project is licensed under the MIT License - see the LICENSE file for details.
- Statsmodels development team for comprehensive statistical tools
- Academic references listed in the notebook
- Open-source community for invaluable resources and support
This notebook is designed for educational and research purposes. Real-world applications may require additional considerations and validation.