Local MLOps Fraud Classifier

Run a fraud-detection model on your laptop with Docker, MLflow, FastAPI, and Airflow. No cloud account is needed.

Fraud Use Case (data to decisions)

Each row in data/raw/transactions.csv is a card-not-present transaction with simple numeric and categorical fields.
A RandomForest pipeline scores every row as fraud or not fraud.
Goal: stop risky payments, surface odd cases for manual review, and keep a full record of how the model was trained and checked.
Flow:

Architecture overview

Project structure (what is what)

Path	Description
`api/`	FastAPI service that loads the Production model from MLflow.
`airflow/dags/`	Training and monitoring DAGs (`fraud_training`, `fraud_monitoring`).
`airflow/Dockerfile`	Airflow image that shares the same Python deps as the rest of the repo.
`data/raw/`	Main training CSV.
`data/production/`	Time-stamped datasets saved during monitoring runs.
`mlruns/`	Shared MLflow experiments and artifacts.
`src/mlops_lab/train.py`	Training script that reads data, builds the pipeline, logs to MLflow.
`src/mlops_lab/monitor_drift.py`	Drift simulation plus monitoring run.
`src/mlops_lab/generate_data.py`	Helper to create synthetic data.
`docker-compose.yml`	Starts Postgres, MLflow, Airflow, trainer, and API.
`Makefile`	Shortcuts for image builds and stack commands.

MLOps principles in practice

Principle	Why it matters	Where to look
Repeatable training	Get the same result each time you run the pipeline	Config in `src/mlops_lab/config.py`, data helpers in `src/mlops_lab/data.py`, trainer in `src/mlops_lab/train.py`.
Experiment tracking and model registry	Know which run produced which model	MLflow services in `docker-compose.yml`, logging inside `train.py`, usage in `api/main.py`.
Same runtime everywhere	Training and serving use the same packages	`Dockerfile`, `airflow/Dockerfile`, and the Compose stack.
Scheduled workflows	Automate retraining and checks	Airflow DAGs `fraud_training_dag.py` and `fraud_monitoring_dag.py`.
Fast model rollout	Serve the newest good model	Promotion gate in `train.py` (AUC check) plus FastAPI loading `models:/fraud-classifier/Production`.
Feedback from production	Catch drift before users do	`monitor_drift.py`, data drops in `data/production/`, monitoring experiment `fraud-detection-monitoring`.

High-level execution steps (with screenshots)

Prep the dataset
```
make data  # or drop your own CSV into data/raw/
```
Screenshot placeholder: dataset preview.
Build project images
```
make build
```
Screenshot placeholder: terminal build output.
Launch the local platform
```
make up
```
Kick off training (manual or via Airflow DAG)
```
make trainer
```
Trigger drift monitoring
```
make monitor
```
Screenshot placeholder: Airflow monitoring DAG run.

Query the API

make api-health
make api-predict
make api-predict-batch

Local URLs and apps

Service	URL	Notes
MLflow UI	http://localhost:5001	See experiments `fraud-detection` and `fraud-detection-monitoring`.
FastAPI	http://localhost:8000	Health check at `/health`.
FastAPI docs	http://localhost:8000/docs	Swagger UI for manual tests.
Airflow UI	http://localhost:8080	Login `admin` / `admin`.
Postgres (MLflow + Airflow meta)	`localhost:5432`	User/password set in `docker-compose.yml`.

Data schema

Column	Type	Notes
`amount`	float	Payment amount in base currency (log-normal draw).
`customer_age`	int	Age in years.
`country`	categorical	Country code such as DE, US, NG.
`merchant_risk`	int (0/1)	Flag if the merchant is risky.
`is_vpn`	int (0/1)	Flag if the shopper used a VPN or proxy.
`is_night`	int (0/1)	Flag if the payment happened at night.
`tx_last_24h`	int	Number of recent transactions for this shopper.
`chargebacks_6m`	int	Count of chargebacks in the last six months.
`is_fraud`	int (0/1)	Label generated from the fraud probability.

Training, promotion, and monitoring flows

Training (src/mlops_lab/train.py, DAG fraud_training)
- Load data/raw/transactions.csv.
- Split into train/validation with fixed seeds.
- Build preprocessing + RandomForest pipeline.
- Log metrics, confusion matrix, and artifacts to MLflow.
- Register the run as fraud-classifier and promote if AUC ≥ 0.80.
Promotion and serving
- MLflow stage change moves the best run to Production.
- FastAPI watches models:/fraud-classifier/Production and reloads automatically.
Drift monitoring (src/mlops_lab/monitor_drift.py, DAG fraud_monitoring)
- Copy baseline data and inject drift (country mix, VPN share, amount shifts).
- Save the snapshot to data/production/transactions_<ts>.csv.
- Score with the Production model and log metrics to fraud-detection-monitoring.
- Use saved CSV + metrics to decide when to retrain.
Why its Important
- When monitoring metrics drop, trigger the trainer so the registry and API stay aligned with real data.

Model serving flow & quick CLI tests

Trainer promotes a model to Production.
FastAPI (api/main.py) loads that stage during startup and holds a client connection to MLflow.
/predict validates a JSON row, builds a Pandas frame, and calls the stored sklearn pipeline.
/predict_batch accepts many rows at once for lower overhead scoring jobs.
Response JSON includes fraud scores and decisions so other tools can block or allow the payment. Batch scoring also scales offline workloads: run larger CSVs through /predict_batch (or a job that calls the model directly) to score thousands of rows without scaling the API horizontally.

Quick checks:

# Service health
curl http://localhost:8000/health

Single row (/predict)

# High-risk example
curl -X POST http://localhost:8000/predict \
     -H "Content-Type: application/json" \
     -d '{
           "features": {
             "amount": 975.0,
             "customer_age": 62,
             "country": "NG",
             "merchant_risk": 1,
             "is_vpn": 1,
             "is_night": 1,
             "tx_last_24h": 18,
             "chargebacks_6m": 5
           }
         }'

# Low-risk example
curl -X POST http://localhost:8000/predict \
     -H "Content-Type: application/json" \
     -d '{
           "features": {
             "amount": 35.0,
             "customer_age": 34,
             "country": "DE",
             "merchant_risk": 0,
             "is_vpn": 0,
             "is_night": 0,
             "tx_last_24h": 1,
             "chargebacks_6m": 0
           }
         }'

Batch (/predict_batch)

curl -X POST http://localhost:8000/predict_batch \
     -H "Content-Type: application/json" \
     -d '{
           "rows": [
             {
               "amount": 975.0,
               "customer_age": 62,
               "country": "NG",
               "merchant_risk": 1,
               "is_vpn": 1,
               "is_night": 1,
               "tx_last_24h": 18,
               "chargebacks_6m": 5
             },
             {
               "amount": 45.0,
               "customer_age": 29,
               "country": "DE",
               "merchant_risk": 0,
               "is_vpn": 0,
               "is_night": 0,
               "tx_last_24h": 2,
               "chargebacks_6m": 0
             }
           ]
         }'

Next explorations

Push monitoring alerts to Slack, email, or PagerDuty.
Try other models (XGBoost, LightGBM) and compare runs in MLflow.
Add data validation (Great Expectations, Pandera) before training.
Capture explainability data or sample real production requests during monitoring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local MLOps Fraud Classifier

Fraud Use Case (data to decisions)

Architecture overview

Project structure (what is what)

MLOps principles in practice

High-level execution steps (with screenshots)

Local URLs and apps

Data schema

Training, promotion, and monitoring flows

Model serving flow & quick CLI tests

Next explorations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
airflow		airflow
api		api
data/raw		data/raw
images		images
src/mlops_lab		src/mlops_lab
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Local MLOps Fraud Classifier

Fraud Use Case (data to decisions)

Architecture overview

Project structure (what is what)

MLOps principles in practice

High-level execution steps (with screenshots)

Local URLs and apps

Data schema

Training, promotion, and monitoring flows

Model serving flow & quick CLI tests

Next explorations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages