An end-to-end ML pipeline for predicting customer churn in the telecom industry β from raw data to a production-ready REST API.
Features β’ Architecture β’ Tech Stack β’ Setup β’ API
ChurnX is a production-grade machine learning system that predicts which telecom customers are likely to churn β enabling retention teams to act before it's too late.
It covers the full ML lifecycle:
- π Exploratory Data Analysis (EDA)
- βοΈ Feature engineering pipeline
- π€ Model training with hyperparameter tuning (Optuna)
- π Experiment tracking (MLflow)
- β Data validation (Great Expectations)
- π REST API serving (FastAPI)
- π₯οΈ Interactive demo UI (Gradio)
- π³ Docker deployment
| Feature | Description | Tech |
|---|---|---|
| π EDA Notebook | Deep dive into churn patterns and feature distributions | Jupyter, Seaborn, Matplotlib |
| βοΈ Feature Pipeline | Automated feature engineering and preprocessing | Scikit-learn Pipelines |
| π€ Multi-Model Training | XGBoost, LightGBM, and Scikit-learn models compared | XGBoost, LightGBM |
| π Hyperparameter Tuning | Automated tuning with Optuna | Optuna |
| π Experiment Tracking | All runs logged with metrics and artifacts | MLflow |
| β Data Validation | Schema and quality checks on input data | Great Expectations |
| π REST API | Production-ready prediction endpoint | FastAPI + Uvicorn |
| π₯οΈ Demo UI | Interactive prediction interface | Gradio |
| π³ Containerized | One-command deployment | Docker |
| π CI/CD | Automated testing and workflows | GitHub Actions |
Raw Telco Data
β
Data Validation (Great Expectations)
β
Feature Engineering Pipeline
β
Model Training β XGBoost / LightGBM / Sklearn
β
Hyperparameter Tuning (Optuna)
β
Experiment Tracking (MLflow)
β
Best Model Selected
β
βββββββββββββββββββββββ
β FastAPI REST API β β Production serving
βββββββββββββββββββββββ
β Gradio Demo UI β β Interactive demo
βββββββββββββββββββββββ
ChurnX/
β
βββ notebooks/
β βββ EDA.ipynb # Exploratory data analysis
β
βββ scripts/
β βββ prepare_processed_data.py # Data preprocessing
β βββ run_pipeline.py # Full training pipeline
β βββ test_fastapi.py # API tests
β βββ test_pipeline_phase1.py # Pipeline unit tests
β βββ test_pipeline_phase2.py # Integration tests
β
βββ src/
β βββ app/ # FastAPI application
β βββ data/ # Data loading & validation
β βββ features/ # Feature engineering
β βββ models/ # Model training & evaluation
β βββ serving/ # Prediction serving logic
β βββ utils/ # Shared utilities
β
βββ .github/workflows/ # CI/CD pipelines
βββ dockerfile # Docker configuration
βββ requirements.txt
βββ README.md
ML Framework β Scikit-learn, XGBoost, LightGBM
Experiment Track β MLflow
Hyperparameter β Optuna
Data Validation β Great Expectations
API Serving β FastAPI + Uvicorn + Gunicorn
Demo UI β Gradio
Data Processing β Pandas, NumPy
Visualization β Matplotlib, Seaborn
Statistical β Statsmodels, SciPy
Containerization β Docker
CI/CD β GitHub Actions
git clone https://github.com/Tan167/ChurnX.git
cd ChurnXpython -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windowspip install -r requirements.txtpython scripts/run_pipeline.pyuvicorn src.app.main:app --reload --port 8000python src/app/gradio_app.py# Build image
docker build -t churnx .
# Run container
docker run -p 8000:8000 churnxPOST /predict{
"tenure": 24,
"MonthlyCharges": 65.5,
"TotalCharges": 1572.0,
"Contract": "Month-to-month",
"InternetService": "Fiber optic",
"PaymentMethod": "Electronic check",
"TechSupport": "No",
"OnlineSecurity": "No"
}Response:
{
"churn_probability": 0.82,
"churn_prediction": true,
"risk_level": "High",
"confidence": 0.91
}| Model | Accuracy | ROC-AUC | Precision | Recall |
|---|---|---|---|---|
| XGBoost | ~80% | ~0.85 | ~0.78 | ~0.82 |
| LightGBM | ~79% | ~0.84 | ~0.77 | ~0.81 |
| Logistic Regression | ~76% | ~0.81 | ~0.74 | ~0.78 |
Exact metrics depend on train/test split and hyperparameter tuning run.
All training runs are tracked with MLflow. To view the dashboard:
mlflow uiOpen http://localhost:5000 to compare runs, metrics, and artifacts.