Skip to content

ML-driven batch scheduling optimization for pharmaceutical manufacturing. 92.5% prediction accuracy. Statistical analysis with JASP, simulation with Python.

Notifications You must be signed in to change notification settings

Anfal-AR/pharma-batch-scheduling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏭 Pharma Batch Scheduling: ML-Driven Optimization Under Uncertainty

DOI Python JASP License: CC BY 4.0

How do you schedule production across multiple reactors when demand is uncertain?
This project demonstrates that the commonly-used FIFO approach produces 47% longer production times than alternatives, and machine learning can predict scheduling outcomes with 92.5% accuracy.


📋 Table of Contents


Overview

The Problem

Pharmaceutical manufacturers must schedule production across multiple reactors while dealing with unpredictable processing times, equipment availability, and demand fluctuations. Poor scheduling decisions lead to:

  • Extended production cycles (lost time)
  • Missed delivery deadlines (customer impact)
  • Underutilized equipment (wasted capacity)

Why It Matters

In pharmaceutical manufacturing, efficient scheduling directly impacts:

  • Drug availability — Getting medications to patients on time
  • Manufacturing costs — Optimizing expensive reactor capacity
  • Operational reliability — Meeting commitments under uncertainty

The Approach

I simulated 1,200 production scenarios across 4 scheduling strategies and 3 uncertainty levels, then applied a comprehensive analytical framework:

Analysis Type Purpose Tool Used
Two-Way ANOVA Compare heuristic & uncertainty effects JASP
Machine Learning Classification Predict schedule robustness JASP
Machine Learning Regression Predict production duration JASP
Structural Equation Modeling Identify causal mechanisms JASP

Key Findings

🔴 Finding 1: FIFO is Categorically Unsuitable

FIFO (First-In-First-Out) scheduling produced makespans 590 hours longer than alternatives — a 47% efficiency gap.

Effect Size: d = 1.40 (large effect)
Variance Explained: η² = .503 (50.3% of all variance!)

🟢 Finding 2: Alternatives Perform Equally Well

SPT (Shortest Processing Time), LPT (Longest Processing Time), and BALANCED heuristics achieved statistically equivalent outcomes — providing flexibility in implementation.

🔵 Finding 3: Machine Learning Works

Task Performance
Robustness Classification 92.5% accuracy, AUC = 0.970
Makespan Prediction R² = 0.907, MAPE = 5.9%
Delay Detection 91.7% accuracy

🟡 Finding 4: The Mechanism — Workload Imbalance

Mediation analysis revealed why FIFO fails: it creates severe workload imbalance between reactors.

FIFO workload balance: 0.62 (severe imbalance)
Other heuristics:      0.05 (near-perfect balance)

The indirect effect through workload imbalance: +843 hours (fully mediates FIFO's poor performance).


Data

Dataset Overview

Characteristic Value
Total Observations 1,200 scenarios
Design 4 heuristics × 3 uncertainty levels × 100 replications
Data Type Simulation-based
Generation Python 3.13

Why Simulation?

Simulation enables:

  • Systematic comparison across controlled conditions
  • Exploration of scenarios impossible to test in production
  • Statistical power through sufficient sample sizes
  • Reproducibility for validation

Key Variables

Input Features:

  • demand_A, demand_B, demand_C — Product batch demands
  • heuristic — Scheduling strategy (FIFO, SPT, LPT, BALANCED)
  • uncertainty_level — Low, Medium, High
  • r1_availability, r2_availability — Reactor availability (%)
  • cip_conflict_prob — Cleaning system conflict probability

Outcome Variables:

  • makespan — Total production time (hours)
  • tardiness — Delay beyond due dates (hours)
  • workload_balance — |Reactor1_utilization - Reactor2_utilization|
  • schedule_robust — Binary: met performance thresholds?

Data Access

📁 data/scheduling_dataset.csv — Full 1,200-scenario dataset
📁 data/data_dictionary.md — Variable descriptions and coding


Methodology

System Configuration

┌─────────────────────────────────────────────────────────┐
│                  DUAL-REACTOR PLANT                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   ┌──────────────┐          ┌──────────────┐            │
│   │  REACTOR 1   │          │  REACTOR 2   │            │
│   │   10,000 L   │          │   5,000 L    │            │
│   │              │          │              │            │
│   │ Products:    │          │ Products:    │            │
│   │  A, B, C     │          │  A, B only   │            │
│   └──────────────┘          └──────────────┘            │
│          │                         │                     │
│          └─────────┬───────────────┘                     │
│                    │                                     │
│            ┌───────▼───────┐                            │
│            │  SHARED CIP   │                            │
│            │    SYSTEM     │                            │
│            └───────────────┘                            │
│                                                          │
└─────────────────────────────────────────────────────────┘

Product C (120h fermentation) → Reactor 1 ONLY → Creates bottleneck

Scheduling Heuristics

Heuristic Logic Rationale
FIFO Process in arrival order Simplest, common default
SPT Shortest jobs first Minimize queue buildup
LPT Longest jobs first Front-load complex work
BALANCED Minimize reactor workload difference Exploit parallel capacity

Analytical Framework

All statistical analyses were conducted using JASP 0.18.3, an open-source statistical software that provides transparent, reproducible analysis with a user-friendly interface.

┌─────────────────────────────────────────────────────────┐
│                  ANALYTICAL PIPELINE                     │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  1. DATA GENERATION (Python)                             │
│     └─→ Simulation of 1,200 scenarios                   │
│                                                          │
│  2. STATISTICAL ANALYSIS (JASP)                          │
│     ├─→ Descriptive statistics                          │
│     ├─→ Two-Way ANOVA                                   │
│     │   └─→ Main effects, interactions, effect sizes    │
│     ├─→ Post-hoc comparisons (Tukey HSD, Games-Howell)  │
│     └─→ Assumption testing (Levene's, Shapiro-Wilk)     │
│                                                          │
│  3. MACHINE LEARNING (JASP)                              │
│     ├─→ Classification (6 algorithms)                   │
│     │   └─→ Gradient Boosting, Random Forest, SVM,      │
│     │       Logistic Regression, Decision Tree          │
│     └─→ Regression (4 algorithms)                       │
│         └─→ Random Forest, Boosting, SVR, Linear        │
│                                                          │
│  4. MEDIATION ANALYSIS (JASP - SEM Module)              │
│     └─→ Path analysis with bootstrapped confidence      │
│         intervals                                        │
│                                                          │
└─────────────────────────────────────────────────────────┘

Why JASP?

  • Open Source: Free, transparent, reproducible
  • Point-and-Click Interface: Reduces coding errors
  • Comprehensive Output: Effect sizes, confidence intervals, assumption tests
  • Built-in ML Module: Classification and regression with proper train/test splits
  • SEM Module: Mediation analysis with bootstrapping
  • Reproducibility: Save complete analysis as .jasp file

Results

Heuristic Performance Comparison

Two-Way ANOVA Results:

Source SS df F p η²
Heuristic 8.24 × 10⁷ 3 459.80 < .001 .503
Uncertainty 1.04 × 10⁷ 2 87.29 < .001 .064
Interaction 2.93 × 10⁴ 6 0.08 .998 < .001

Heuristic selection explains 50.3% of all variance in makespan — an exceptionally large effect.

Post-Hoc Comparisons (Games-Howell):

Comparison Mean Difference 95% CI p
BALANCED – FIFO -590.60 hrs [-654.8, -526.4] < .001
BALANCED – LPT 21.63 hrs [-22.4, 65.6] .585
BALANCED – SPT 21.18 hrs [-22.1, 64.5] .589

Machine Learning Performance

Binary Classification: Schedule Robustness

Model Accuracy AUC F1 MCC
Gradient Boosting 92.5% 0.970 0.925 0.851
Logistic Regression 92.1% 0.919 0.921 0.841
SVM (Linear) 91.7% 0.916 0.917 0.833
Random Forest 90.4% 0.968 0.904 0.808

Regression: Makespan Prediction

Model RMSE (hrs) MAE (hrs) MAPE
Random Forest 0.907 123.4 86.1 6.6%
Gradient Boosting 0.905 133.6 83.6 5.9%
SVM 0.897 128.9 78.2 5.4%

Feature Importance

Rank Feature Relative Importance
1 Product C Demand 35%
2 Heuristic Strategy 34%
3 Uncertainty Level 13%
4 Product B Demand 11%
5 Product A Demand 7%

Product C's 120-hour processing time and Reactor 1 exclusivity make it the critical bottleneck.

Mediation Model

Path Coefficients (Model 2: Heuristic → Workload Balance → Makespan):

Path Estimate SE z p
FIFO → Workload Balance (a) 0.567 0.009 65.55 < .001
Workload Balance → Makespan (b) 1487.32 39.63 37.53 < .001
FIFO → Makespan (direct, c') -238.03 21.37 -11.14 < .001

Effect Decomposition:

Effect Estimate 95% CI p
Direct Effect -238.03 [-280.7, -197.3] < .001
Indirect Effect 842.90 [790.3, 901.8] < .001
Total Effect 604.87 [573.3, 638.8] < .001

Proportion mediated = 139% (suppression pattern) — workload imbalance fully explains FIFO's poor performance.


Repository Structure

pharma-batch-scheduling/
│
├── README.md                    # This file
├── LICENSE                      # CC BY 4.0
│
├── data/
│   ├── scheduling_dataset.csv   # Complete dataset (1,200 scenarios)
│   └── data_dictionary.md       # Variable descriptions
│
├── analysis/
│   ├── main_analysis.jasp       # Complete JASP analysis file
│   └── outputs/                 # Exported tables and figures
│       ├── anova_results.html
│       ├── ml_classification.html
│       ├── ml_regression.html
│       └── mediation_results.html
│
├── simulation/
│   └── data_generation.py       # Python simulation code
│
├── figures/
│   ├── makespan_by_heuristic.png
│   ├── roc_curves.png
│   ├── feature_importance.png
│   └── mediation_model.png
│
└── paper/
    └── Rababah_2025_MultiReactor.pdf

Tools & Technologies

Tool Purpose Version
JASP All statistical analysis (ANOVA, ML, SEM) 0.18.3
Python Dataset generation/simulation 3.13
NumPy/Pandas Data manipulation -

Why This Combination?

  • Python for Simulation: Flexible, reproducible data generation with precise control over parameters
  • JASP for Analysis:
    • Point-and-click reduces errors
    • Built-in effect sizes and confidence intervals
    • Transparent, shareable analysis files
    • Machine learning module with proper validation
    • SEM module for mediation analysis

How to Reproduce

Option 1: Full Reproduction

  1. Generate Data:

    cd simulation/
    python data_generation.py
  2. Open Analysis in JASP:

    • Download JASP (free)
    • Open analysis/main_analysis.jasp
    • All analyses will be pre-configured with results

Option 2: Explore Results Only

  1. Download this repository
  2. View exported results in analysis/outputs/
  3. Examine figures in figures/
  4. Read the full paper in paper/

Option 3: Use the Dataset

import pandas as pd

# Load data
df = pd.read_csv('data/scheduling_dataset.csv')

# Quick exploration
print(df.groupby('heuristic')['makespan'].describe())

Citation

If you use this work, please cite:

@article{rababah2025multireactor,
  author = {Rababah, Anfal},
  title = {Multi-Reactor Batch Scheduling Strategies for Pharmaceutical 
           Production Under Uncertainty: A Comparative Statistical and 
           Machine Learning Analysis},
  journal = {Zenodo},
  year = {2025},
  doi = {10.5281/zenodo.17847157}
}

Author

Anfal Rababah
Independent Researcher
MSc Chemical Engineering | BSc Chemical Engineering & Mathematics


License

This work is licensed under Creative Commons Attribution 4.0 International.


Part of my Data Science Portfolio

About

ML-driven batch scheduling optimization for pharmaceutical manufacturing. 92.5% prediction accuracy. Statistical analysis with JASP, simulation with Python.

Topics

Resources

Stars

Watchers

Forks

Languages