Skip to content

Tan167/ChurnX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‰ ChurnX β€” Telco Customer Churn Prediction Platform

Python scikit-learn XGBoost FastAPI MLflow Docker License

An end-to-end ML pipeline for predicting customer churn in the telecom industry β€” from raw data to a production-ready REST API.

Features β€’ Architecture β€’ Tech Stack β€’ Setup β€’ API


πŸš€ What is ChurnX?

ChurnX is a production-grade machine learning system that predicts which telecom customers are likely to churn β€” enabling retention teams to act before it's too late.

It covers the full ML lifecycle:

  • πŸ“Š Exploratory Data Analysis (EDA)
  • βš™οΈ Feature engineering pipeline
  • πŸ€– Model training with hyperparameter tuning (Optuna)
  • πŸ“ˆ Experiment tracking (MLflow)
  • βœ… Data validation (Great Expectations)
  • πŸš€ REST API serving (FastAPI)
  • πŸ–₯️ Interactive demo UI (Gradio)
  • 🐳 Docker deployment

✨ Features

Feature Description Tech
πŸ“Š EDA Notebook Deep dive into churn patterns and feature distributions Jupyter, Seaborn, Matplotlib
βš™οΈ Feature Pipeline Automated feature engineering and preprocessing Scikit-learn Pipelines
πŸ€– Multi-Model Training XGBoost, LightGBM, and Scikit-learn models compared XGBoost, LightGBM
πŸ” Hyperparameter Tuning Automated tuning with Optuna Optuna
πŸ“ˆ Experiment Tracking All runs logged with metrics and artifacts MLflow
βœ… Data Validation Schema and quality checks on input data Great Expectations
πŸš€ REST API Production-ready prediction endpoint FastAPI + Uvicorn
πŸ–₯️ Demo UI Interactive prediction interface Gradio
🐳 Containerized One-command deployment Docker
πŸ”„ CI/CD Automated testing and workflows GitHub Actions

πŸ—οΈ Architecture

Raw Telco Data
      ↓
Data Validation (Great Expectations)
      ↓
Feature Engineering Pipeline
      ↓
Model Training β†’ XGBoost / LightGBM / Sklearn
      ↓
Hyperparameter Tuning (Optuna)
      ↓
Experiment Tracking (MLflow)
      ↓
Best Model Selected
      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   FastAPI REST API  β”‚  ← Production serving
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚   Gradio Demo UI    β”‚  ← Interactive demo
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

ChurnX/
β”‚
β”œβ”€β”€ notebooks/
β”‚   └── EDA.ipynb                  # Exploratory data analysis
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ prepare_processed_data.py  # Data preprocessing
β”‚   β”œβ”€β”€ run_pipeline.py            # Full training pipeline
β”‚   β”œβ”€β”€ test_fastapi.py            # API tests
β”‚   β”œβ”€β”€ test_pipeline_phase1.py    # Pipeline unit tests
β”‚   └── test_pipeline_phase2.py    # Integration tests
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ app/                       # FastAPI application
β”‚   β”œβ”€β”€ data/                      # Data loading & validation
β”‚   β”œβ”€β”€ features/                  # Feature engineering
β”‚   β”œβ”€β”€ models/                    # Model training & evaluation
β”‚   β”œβ”€β”€ serving/                   # Prediction serving logic
β”‚   └── utils/                     # Shared utilities
β”‚
β”œβ”€β”€ .github/workflows/             # CI/CD pipelines
β”œβ”€β”€ dockerfile                     # Docker configuration
β”œβ”€β”€ requirements.txt
└── README.md

πŸ› οΈ Tech Stack

ML Framework      β†’  Scikit-learn, XGBoost, LightGBM
Experiment Track  β†’  MLflow
Hyperparameter    β†’  Optuna
Data Validation   β†’  Great Expectations
API Serving       β†’  FastAPI + Uvicorn + Gunicorn
Demo UI           β†’  Gradio
Data Processing   β†’  Pandas, NumPy
Visualization     β†’  Matplotlib, Seaborn
Statistical       β†’  Statsmodels, SciPy
Containerization  β†’  Docker
CI/CD             β†’  GitHub Actions

⚑ Quick Start

1. Clone the repo

git clone https://github.com/Tan167/ChurnX.git
cd ChurnX

2. Create virtual environment

python -m venv venv
source venv/bin/activate        # macOS/Linux
venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Run the full pipeline

python scripts/run_pipeline.py

5. Start the API server

uvicorn src.app.main:app --reload --port 8000

6. Launch Gradio demo

python src/app/gradio_app.py

🐳 Docker

# Build image
docker build -t churnx .

# Run container
docker run -p 8000:8000 churnx

πŸ”Œ API Usage

Predict churn for a customer

POST /predict
{
  "tenure": 24,
  "MonthlyCharges": 65.5,
  "TotalCharges": 1572.0,
  "Contract": "Month-to-month",
  "InternetService": "Fiber optic",
  "PaymentMethod": "Electronic check",
  "TechSupport": "No",
  "OnlineSecurity": "No"
}

Response:

{
  "churn_probability": 0.82,
  "churn_prediction": true,
  "risk_level": "High",
  "confidence": 0.91
}

πŸ“Š Model Performance

Model Accuracy ROC-AUC Precision Recall
XGBoost ~80% ~0.85 ~0.78 ~0.82
LightGBM ~79% ~0.84 ~0.77 ~0.81
Logistic Regression ~76% ~0.81 ~0.74 ~0.78

Exact metrics depend on train/test split and hyperparameter tuning run.


πŸ“ˆ MLflow Experiment Tracking

All training runs are tracked with MLflow. To view the dashboard:

mlflow ui

Open http://localhost:5000 to compare runs, metrics, and artifacts.

Built with ❀️ using Python, Scikit-learn, XGBoost, FastAPI and MLflow

About

End-to-end MLOps pipeline for telecom customer churn prediction. XGBoost model with MLflow experiment tracking, Great Expectations data validation, FastAPI REST API + Gradio web UI, and Docker containerization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors