Skip to content

anushkundu/demand-forecasting-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š DemandCast AI โ€” Demand Forecasting & Inventory Optimization System

ML-powered demand predictions for retail store operations, reducing inventory waste by $192K annually through 64% more accurate forecasting.

Python FastAPI Docker License


๐ŸŽฏ Problem Statement

Multi-location retail operations lose 8-15% of revenue to food waste caused by inaccurate demand forecasting. Store managers currently use simple moving averages and manual judgment for ordering decisions, leading to:

  • Overstocking on slow days โ†’ Food expires โ†’ Money thrown away
  • Understocking on peak days โ†’ Empty shelves โ†’ Lost revenue

This system solves both problems by predicting daily item-level demand across store locations with 9.42% error โ€” enabling precise, data-driven inventory ordering.


๐Ÿ“Š Results

Model MAE RMSE MAPE Rยฒ vs Baseline
Baseline (7-Day MA) 721.8 1336.5 26.47% 0.8519 โ€”
Linear Regression 402.8 710.5 21.02% 0.9581 20.6% better
Random Forest 254.7 496.8 10.76% 0.9795 59.4% better
XGBoost 221.0 440.9 9.42% 0.9839 64.4% better โ˜…
LightGBM 229.2 446.2 9.98% 0.9835 62.3% better

Best Model: XGBoost โ€” predicts daily demand with only 9.42% average error.


๐Ÿ’ฐ Business Impact

Impact Area Value Details
๐Ÿ“‰ Forecast Error Reduction 64.4% vs moving average baseline
๐Ÿ—‘๏ธ Annual Waste Reduction $192,000 across 10 high-revenue stores
๐Ÿ“ฃ Marketing Reallocation $200,000+ identified misallocated promotion spend
๐Ÿ“… Weekend Demand Insight 45% Saturday sales higher than Tuesday
๐Ÿท๏ธ Promotion Effectiveness 42% vs 3% GROCERY vs BABY CARE response gap

๐Ÿ—๏ธ Technical Architecture

BigQuery (3M+ transactions, SQL EDA)
    โ†“
PySpark (60+ engineered features)
    โ†“
XGBoost Model (9.42% MAPE, Rยฒ = 0.9839)
    โ†“
FastAPI (REST API) + Streamlit (Dashboard)
    โ†“
Docker (Containerized Deployment)

๐Ÿ“ Project Structure

demand-forecasting-system/
โ”‚
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ __init__.py              # Python package init
โ”‚   โ”œโ”€โ”€ main.py                  # FastAPI application (5 endpoints)
โ”‚   โ””โ”€โ”€ model.py                 # Model loading & prediction logic
โ”‚
โ”œโ”€โ”€ dashboard/
โ”‚   โ””โ”€โ”€ app.py                   # Streamlit dashboard (5 tabs)
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ best_model_xgb.pkl      # Trained model file
โ”‚   โ””โ”€โ”€ feature_columns.json     # Feature list (60+ features)
โ”‚
โ”œโ”€โ”€ notebooks/
โ”‚   โ”œโ”€โ”€ 01_EDA_BigQuery.ipynb         
โ”‚   โ”œโ”€โ”€ 02_Feature_Engineering.ipynb  
โ”‚   โ””โ”€โ”€ 03_Model_Training.ipynb       
โ”‚
โ”œโ”€โ”€ results/
โ”‚   โ”œโ”€โ”€ eda/                     # EDA visualizations
โ”‚   โ”‚   โ”œโ”€โ”€ category_sales.png
โ”‚   โ”‚   โ”œโ”€โ”€ weekly_seasonality.png
โ”‚   โ”‚   โ”œโ”€โ”€ monthly_seasonality.png
โ”‚   โ”‚   โ”œโ”€โ”€ sales_trend.png
โ”‚   โ”‚   โ”œโ”€โ”€ promotion_impact.png
โ”‚   โ”‚   โ”œโ”€โ”€ store_analysis.png
โ”‚   โ”‚   โ””โ”€โ”€ zero_sales_analysis.png
โ”‚   โ”œโ”€โ”€ model_comparison.png
โ”‚   โ”œโ”€โ”€ feature_importance.png
โ”‚   โ”œโ”€โ”€ actual_vs_predicted.png
โ”‚   โ”œโ”€โ”€ model_results.csv
โ”‚   โ””โ”€โ”€ feature_importance.csv
โ”‚
โ”œโ”€โ”€ Dockerfile                   # Container configuration
โ”œโ”€โ”€ docker-compose.yml           # Multi-service orchestration
โ”œโ”€โ”€ requirements.txt             # Python dependencies
โ””โ”€โ”€ .gitignore

๐Ÿš€ Quick Start

Option 1: Run with Docker (Recommended)

# Clone the repository
git clone https://github.com/anushkundu/demand-forecasting-system.git
cd demand-forecasting-system

# Build and run both API + Dashboard
docker-compose up --build

# API:       http://localhost:8000/docs
# Dashboard: http://localhost:8501

Option 2: Run Locally

# Clone the repository
git clone https://github.com/anushkundu/demand-forecasting-system.git
cd demand-forecasting-system

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start API (Terminal 1)
uvicorn app.main:app --reload --port 8000

# Start Dashboard (Terminal 2 โ€” activate venv first)
streamlit run dashboard/app.py

Option 3: API Only

pip install fastapi uvicorn lightgbm scikit-learn pandas numpy pydantic
uvicorn app.main:app --reload --port 8000

# Open: http://localhost:8000/docs

๐Ÿ”Œ API Endpoints

Endpoint Method Description
/ GET Health check โ€” confirms API is running
/health GET Detailed health status
/quick-predict POST Simple prediction (4-5 inputs)
/predict POST Full prediction (all features)
/model-info GET Model performance and metadata

Quick Prediction Example

import requests

response = requests.post(
    "http://localhost:8000/quick-predict",
    json={
        "yesterday_sales": 600,
        "last_week_same_day": 550,
        "weekly_average": 500,
        "day_of_week": 7,
        "is_promotion": True,
        "is_holiday": False
    }
)

print(response.json())
# {
#     "predicted_demand": 650,
#     "confidence": "medium",
#     "recommendation": "High demand โ€” increase stock order",
#     "features_provided": 7,
#     "features_expected": 60
# }

Full Prediction Example

response = requests.post(
    "http://localhost:8000/predict",
    json={
        "sales_lag_1": 420,
        "sales_lag_7": 450,
        "rolling_mean_7": 410,
        "day_of_week": 7,
        "month": 12,
        "onpromotion": 5,
        "is_weekend": 1,
        "is_holiday": 0,
        "rolling_mean_14": 400,
        "rolling_mean_30": 390,
        "oil_price": 52.3,
        "store_type_encoded": 4,
        "family_encoded": 0,
        "category_avg_all_stores": 380
    }
)
print(response.json())

cURL Example

curl -X POST "http://localhost:8000/quick-predict" \
  -H "Content-Type: application/json" \
  -d '{
    "yesterday_sales": 420,
    "last_week_same_day": 450,
    "weekly_average": 410,
    "day_of_week": 7,
    "is_promotion": true,
    "is_holiday": false
  }'

๐Ÿ”ฌ Feature Engineering

Engineered 60+ predictive features across 4 tiers using PySpark:

Tier Category Features Examples
1 Calendar 19 day_of_week, month, cyclical sin/cos encoding, is_weekend, is_december
2 Lag & Rolling 16 sales_lag_1/7/14/28/365, rolling_mean_7/14/30, rolling_std
3 Advanced 15 promotion saturation, holiday proximity, WoW/MoM/YoY momentum
4 Expert 14 demand regime, cross-store comparison, z-scores, CV, interactions

Feature Importance (Top 10)

Rank Feature Importance Business Meaning
1 rolling_mean_7 49.87% 7-day average demand level
2 sales_lag_7 32.25% Same day last week
3 category_avg_all_stores 10.60% Chain-wide demand signal
4 sales_lag_14 2.19% 2 weeks ago sales
5 sales_lag_1 1.05% Yesterday's sales
6 expanding_std 0.93% Long-term volatility
7 sales_lag_28 0.50% Monthly cycle
8 cluster 0.28% Store cluster grouping
9 rolling_std_7 0.22% Recent demand volatility
10 rolling_max_7 0.19% Recent peak demand

Key Insight: Top 3 features account for 92.7% of prediction power โ€” all related to recent sales history and cross-store patterns.

Data Leakage Prevention

  • All lag features use only past data via F.lag()
  • Rolling windows exclude current row: rowsBetween(-N, -1)
  • Expanding windows exclude current row: rowsBetween(unboundedPreceding, -1)
  • Train/test split is strictly temporal (no future data in training)
  • Verified through manual spot-checks on random samples

๐Ÿ“ˆ Key EDA Findings

# Finding Business Impact
1 Saturday sales 45% higher than Tuesday Adjust daily order quantities by day of week โ†’ $80K savings
2 December 45% above annual average Pre-position inventory by late November โ†’ prevent stockouts
3 Promotions: GROCERY +42% vs BABY CARE +3% Reallocate marketing budget โ†’ $200K+ incremental revenue
4 Top 10 stores = 55% of total revenue Prioritize ML deployment to high-value stores first
5 12 of 33 categories have >70% zero-sales days Exclude from ML, keep on manual ordering
6 Pre-holiday surge +25%, holiday drop -60% Create holiday proximity features, not just binary flags
7 Oil price correlation: 0.15 Weak but measurable โ€” included as external feature
8 Year-over-year growth: ~8% Include trend feature to avoid systematic underprediction

๐Ÿ› ๏ธ Technology Stack

Layer Technology Purpose
Data Storage Google BigQuery Cloud data warehouse, SQL-based EDA
Data Processing PySpark Distributed feature engineering at scale
Machine Learning XGBoost, LightGBM, scikit-learn Model training and comparison
API FastAPI REST API for predictions
Dashboard Streamlit, Plotly Interactive visualization
Containerization Docker, Docker Compose Reproducible deployment
Version Control Git, GitHub Code management

๐Ÿ““ Notebooks

Notebook Description Key Output
01_EDA_BigQuery.ipynb SQL EDA on 3M+ rows, 12 queries, 7 visualizations Business insights, feature hypotheses
02_Feature_Engineering.ipynb PySpark feature pipeline, 60+ features, leakage prevention train_features.parquet, test_features.parquet
03_Model_Training.ipynb 5 model comparison, evaluation, business impact calculation best_model_lgbm.pkl, results

๐Ÿ“Š Dashboard Features

The Streamlit dashboard includes 5 interactive tabs:

Tab Feature
๐Ÿ”ฎ Predict Demand Enter sales data โ†’ get AI-powered forecast with confidence gauge
๐Ÿ“Š EDA Insights Interactive EDA visualizations with business context
๐Ÿ“ˆ Model Performance Model comparison charts, radar plot, downloadable results
๐Ÿ” Feature Insights Feature importance with bar, lollipop, and treemap views
โ„น๏ธ About System Architecture diagram, key results, tech stack

๐Ÿ”ฎ Future Improvements

Improvement Expected Impact
Add weather data (Open-Meteo API) +1-2% MAPE improvement
Hierarchical models per store type Better local predictions
Prediction intervals (confidence bands) Inform safety stock decisions
Model monitoring & drift detection Maintain accuracy over time
Automated retraining pipeline Keep model fresh with new data

๐Ÿ“‹ Requirements

fastapi
uvicorn
python-multipart
lightgbm
scikit-learn
pandas
numpy
streamlit
plotly
pydantic
requests

Python 3.10+ recommended. Tested on Python 3.12.


๐Ÿงช Testing

# Test API is running
curl http://localhost:8000/health

# Test prediction
curl -X POST http://localhost:8000/quick-predict \
  -H "Content-Type: application/json" \
  -d '{"yesterday_sales": 4200, "last_week_same_day": 4050, "weekly_average": 4100, "day_of_week": 7, "is_promotion": false, "is_holiday": false}'

# Test model info
curl http://localhost:8000/model-info

๐Ÿ‘ค Author

Anush Kundu

  • ๐Ÿ“ Nagpur, India
  • ๐ŸŽ“ MSc Data Science, Kingston University London
  • ๐Ÿ’ผ 2.5 years in retail analytics (Compass Group UK, Cognizant)
  • ๐Ÿ“ง anushkundu55@gmail.com
  • ๐Ÿ”— LinkedIn
  • ๐Ÿ™ GitHub

Background

This project was inspired by real-world experience at Compass Group (London), where I managed demand forecasting for 100+ menu items using manual methods. The moving average approach reduced waste by 12% but left significant room for improvement. This system explores how machine learning can push accuracy further โ€” achieving 64.4% better forecasting than the baseline methods used in operations.


๐Ÿ“„ License

This project is open source under the MIT License.


๐Ÿ“Š DemandCast AI โ€” Built with BigQuery ยท PySpark ยท XGBoost ยท FastAPI ยท Streamlit ยท Docker

About

ML-powered demand forecasting for retail inventory optimization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors