A machine learning pipeline for predicting forex price movements using technical indicators and market data. Built to explore quantitative finance and learn ML techniques.
This project trains multiple ML models to predict next-day returns for major currency pairs using 90+ engineered features from technical analysis, market indices, commodities, and bond yields.
Models: Random Forest, Gradient Boosting, Ridge Regression
Data: 5 years of daily forex, equity indices, commodities, bonds, VIX
Features: Technical indicators (MACD, RSI, Bollinger Bands), time features, cross-market correlations
# Install dependencies
pip install pandas numpy scikit-learn yfinance matplotlib seaborn scipy joblib
# Run pipeline
python fetch_data.py # Collect data
python model.py # Train models
python visualize_results.py # Generate plots├── fetch_data.py # Data collection from Yahoo Finance
├── features.py # Feature engineering (90+ indicators)
├── model.py # Model training & evaluation
├── visualize_results.py # Performance visualization
├── models/ # Saved models & results
└── visualizations/ # Generated plots
- 7 forex pairs (EUR/USD, GBP/USD, USD/JPY, USD/CHF, AUD/USD, USD/CAD, NZD/USD)
- 5 market indices (S&P 500, FTSE, DAX, Nikkei, ASX)
- 4 commodities (Gold, Oil, Silver, Copper)
- Bond yields (US 10Y, 2Y) and VIX
- Technical: MA, EMA, MACD, RSI, Bollinger Bands, Stochastic, ATR
- Price: Returns, log returns, momentum, ROC
- Volatility: Multi-timeframe standard deviations
- Time: Cyclical encoding (day/month)
- Cross-Market: Currency pair correlations, yield spreads
- Model comparison (RMSE, MAE, R²)
- Prediction vs actual plots
- Residual analysis with Q-Q plots
- Feature importance rankings
- Cumulative returns (ML strategy vs buy-and-hold)
- Price forecast charts
EUR/USD next-day return prediction on test set:
| Model | RMSE | MAE | R² |
|---|---|---|---|
| Random Forest | 0.0035 | 0.0025 | 0.15 |
| Gradient Boosting | 0.0036 | 0.0026 | 0.14 |
| Ridge Regression | 0.0037 | 0.0027 | 0.12 |
Top Features: Recent returns, MA differences, RSI, cross-pair correlations
Economics & Finance: Technical analysis, macro indicators, market relationships, efficient market hypothesis
Machine Learning: Time series features, train/test splitting for sequential data, ensemble methods, cross-validation, model evaluation
Engineering: Modular Python design, data pipelines, model persistence, reproducible experiments







