🏭 Pharma Batch Scheduling: ML-Driven Optimization Under Uncertainty

How do you schedule production across multiple reactors when demand is uncertain?
This project demonstrates that the commonly-used FIFO approach produces 47% longer production times than alternatives, and machine learning can predict scheduling outcomes with 92.5% accuracy.

Overview

The Problem

Pharmaceutical manufacturers must schedule production across multiple reactors while dealing with unpredictable processing times, equipment availability, and demand fluctuations. Poor scheduling decisions lead to:

Extended production cycles (lost time)
Missed delivery deadlines (customer impact)
Underutilized equipment (wasted capacity)

Why It Matters

In pharmaceutical manufacturing, efficient scheduling directly impacts:

Drug availability — Getting medications to patients on time
Manufacturing costs — Optimizing expensive reactor capacity
Operational reliability — Meeting commitments under uncertainty

The Approach

I simulated 1,200 production scenarios across 4 scheduling strategies and 3 uncertainty levels, then applied a comprehensive analytical framework:

Analysis Type	Purpose	Tool Used
Two-Way ANOVA	Compare heuristic & uncertainty effects	JASP
Machine Learning Classification	Predict schedule robustness	JASP
Machine Learning Regression	Predict production duration	JASP
Structural Equation Modeling	Identify causal mechanisms	JASP

Key Findings

🔴 Finding 1: FIFO is Categorically Unsuitable

FIFO (First-In-First-Out) scheduling produced makespans 590 hours longer than alternatives — a 47% efficiency gap.

Effect Size: d = 1.40 (large effect)
Variance Explained: η² = .503 (50.3% of all variance!)

🟢 Finding 2: Alternatives Perform Equally Well

SPT (Shortest Processing Time), LPT (Longest Processing Time), and BALANCED heuristics achieved statistically equivalent outcomes — providing flexibility in implementation.

🔵 Finding 3: Machine Learning Works

Task	Performance
Robustness Classification	92.5% accuracy, AUC = 0.970
Makespan Prediction	R² = 0.907, MAPE = 5.9%
Delay Detection	91.7% accuracy

🟡 Finding 4: The Mechanism — Workload Imbalance

Mediation analysis revealed why FIFO fails: it creates severe workload imbalance between reactors.

FIFO workload balance: 0.62 (severe imbalance)
Other heuristics:      0.05 (near-perfect balance)

The indirect effect through workload imbalance: +843 hours (fully mediates FIFO's poor performance).

Data

Dataset Overview

Characteristic	Value
Total Observations	1,200 scenarios
Design	4 heuristics × 3 uncertainty levels × 100 replications
Data Type	Simulation-based
Generation	Python 3.13

Why Simulation?

Simulation enables:

Systematic comparison across controlled conditions
Exploration of scenarios impossible to test in production
Statistical power through sufficient sample sizes
Reproducibility for validation

Key Variables

Input Features:

demand_A, demand_B, demand_C — Product batch demands
heuristic — Scheduling strategy (FIFO, SPT, LPT, BALANCED)
uncertainty_level — Low, Medium, High
r1_availability, r2_availability — Reactor availability (%)
cip_conflict_prob — Cleaning system conflict probability

Outcome Variables:

makespan — Total production time (hours)
tardiness — Delay beyond due dates (hours)
workload_balance — |Reactor1_utilization - Reactor2_utilization|
schedule_robust — Binary: met performance thresholds?

Data Access

📁 data/scheduling_dataset.csv — Full 1,200-scenario dataset
📁 data/data_dictionary.md — Variable descriptions and coding

Methodology

System Configuration

┌─────────────────────────────────────────────────────────┐
│                  DUAL-REACTOR PLANT                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   ┌──────────────┐          ┌──────────────┐            │
│   │  REACTOR 1   │          │  REACTOR 2   │            │
│   │   10,000 L   │          │   5,000 L    │            │
│   │              │          │              │            │
│   │ Products:    │          │ Products:    │            │
│   │  A, B, C     │          │  A, B only   │            │
│   └──────────────┘          └──────────────┘            │
│          │                         │                     │
│          └─────────┬───────────────┘                     │
│                    │                                     │
│            ┌───────▼───────┐                            │
│            │  SHARED CIP   │                            │
│            │    SYSTEM     │                            │
│            └───────────────┘                            │
│                                                          │
└─────────────────────────────────────────────────────────┘

Product C (120h fermentation) → Reactor 1 ONLY → Creates bottleneck

Scheduling Heuristics

Heuristic	Logic	Rationale
FIFO	Process in arrival order	Simplest, common default
SPT	Shortest jobs first	Minimize queue buildup
LPT	Longest jobs first	Front-load complex work
BALANCED	Minimize reactor workload difference	Exploit parallel capacity

Analytical Framework

All statistical analyses were conducted using JASP 0.18.3, an open-source statistical software that provides transparent, reproducible analysis with a user-friendly interface.

┌─────────────────────────────────────────────────────────┐
│                  ANALYTICAL PIPELINE                     │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  1. DATA GENERATION (Python)                             │
│     └─→ Simulation of 1,200 scenarios                   │
│                                                          │
│  2. STATISTICAL ANALYSIS (JASP)                          │
│     ├─→ Descriptive statistics                          │
│     ├─→ Two-Way ANOVA                                   │
│     │   └─→ Main effects, interactions, effect sizes    │
│     ├─→ Post-hoc comparisons (Tukey HSD, Games-Howell)  │
│     └─→ Assumption testing (Levene's, Shapiro-Wilk)     │
│                                                          │
│  3. MACHINE LEARNING (JASP)                              │
│     ├─→ Classification (6 algorithms)                   │
│     │   └─→ Gradient Boosting, Random Forest, SVM,      │
│     │       Logistic Regression, Decision Tree          │
│     └─→ Regression (4 algorithms)                       │
│         └─→ Random Forest, Boosting, SVR, Linear        │
│                                                          │
│  4. MEDIATION ANALYSIS (JASP - SEM Module)              │
│     └─→ Path analysis with bootstrapped confidence      │
│         intervals                                        │
│                                                          │
└─────────────────────────────────────────────────────────┘

Why JASP?

Open Source: Free, transparent, reproducible
Point-and-Click Interface: Reduces coding errors
Comprehensive Output: Effect sizes, confidence intervals, assumption tests
Built-in ML Module: Classification and regression with proper train/test splits
SEM Module: Mediation analysis with bootstrapping
Reproducibility: Save complete analysis as .jasp file

Results

Heuristic Performance Comparison

Two-Way ANOVA Results:

Source	SS	df	F	p	η²
Heuristic	8.24 × 10⁷	3	459.80	< .001	.503
Uncertainty	1.04 × 10⁷	2	87.29	< .001	.064
Interaction	2.93 × 10⁴	6	0.08	.998	< .001

Heuristic selection explains 50.3% of all variance in makespan — an exceptionally large effect.

Post-Hoc Comparisons (Games-Howell):

Comparison	Mean Difference	95% CI	p
BALANCED – FIFO	-590.60 hrs	[-654.8, -526.4]	< .001
BALANCED – LPT	21.63 hrs	[-22.4, 65.6]	.585
BALANCED – SPT	21.18 hrs	[-22.1, 64.5]	.589

Machine Learning Performance

Binary Classification: Schedule Robustness

Model	Accuracy	AUC	F1	MCC
Gradient Boosting	92.5%	0.970	0.925	0.851
Logistic Regression	92.1%	0.919	0.921	0.841
SVM (Linear)	91.7%	0.916	0.917	0.833
Random Forest	90.4%	0.968	0.904	0.808

Regression: Makespan Prediction

Model	R²	RMSE (hrs)	MAE (hrs)	MAPE
Random Forest	0.907	123.4	86.1	6.6%
Gradient Boosting	0.905	133.6	83.6	5.9%
SVM	0.897	128.9	78.2	5.4%

Feature Importance

Rank	Feature	Relative Importance
1	Product C Demand	35%
2	Heuristic Strategy	34%
3	Uncertainty Level	13%
4	Product B Demand	11%
5	Product A Demand	7%

Product C's 120-hour processing time and Reactor 1 exclusivity make it the critical bottleneck.

Mediation Model

Path Coefficients (Model 2: Heuristic → Workload Balance → Makespan):

Path	Estimate	SE	z	p
FIFO → Workload Balance (a)	0.567	0.009	65.55	< .001
Workload Balance → Makespan (b)	1487.32	39.63	37.53	< .001
FIFO → Makespan (direct, c')	-238.03	21.37	-11.14	< .001

Effect Decomposition:

Effect	Estimate	95% CI	p
Direct Effect	-238.03	[-280.7, -197.3]	< .001
Indirect Effect	842.90	[790.3, 901.8]	< .001
Total Effect	604.87	[573.3, 638.8]	< .001

Proportion mediated = 139% (suppression pattern) — workload imbalance fully explains FIFO's poor performance.

Repository Structure

pharma-batch-scheduling/
│
├── README.md                    # This file
├── LICENSE                      # CC BY 4.0
│
├── data/
│   ├── scheduling_dataset.csv   # Complete dataset (1,200 scenarios)
│   └── data_dictionary.md       # Variable descriptions
│
├── analysis/
│   ├── main_analysis.jasp       # Complete JASP analysis file
│   └── outputs/                 # Exported tables and figures
│       ├── anova_results.html
│       ├── ml_classification.html
│       ├── ml_regression.html
│       └── mediation_results.html
│
├── simulation/
│   └── data_generation.py       # Python simulation code
│
├── figures/
│   ├── makespan_by_heuristic.png
│   ├── roc_curves.png
│   ├── feature_importance.png
│   └── mediation_model.png
│
└── paper/
    └── Rababah_2025_MultiReactor.pdf

Tools & Technologies

Tool	Purpose	Version
JASP	All statistical analysis (ANOVA, ML, SEM)	0.18.3
Python	Dataset generation/simulation	3.13
NumPy/Pandas	Data manipulation	-

Why This Combination?

Python for Simulation: Flexible, reproducible data generation with precise control over parameters
JASP for Analysis:
- Point-and-click reduces errors
- Built-in effect sizes and confidence intervals
- Transparent, shareable analysis files
- Machine learning module with proper validation
- SEM module for mediation analysis

How to Reproduce

Option 1: Full Reproduction

Generate Data:

cd simulation/
python data_generation.py

Open Analysis in JASP:
- Download JASP (free)
- Open analysis/main_analysis.jasp
- All analyses will be pre-configured with results

Option 2: Explore Results Only

Download this repository
View exported results in analysis/outputs/
Examine figures in figures/
Read the full paper in paper/

Option 3: Use the Dataset

import pandas as pd

# Load data
df = pd.read_csv('data/scheduling_dataset.csv')

# Quick exploration
print(df.groupby('heuristic')['makespan'].describe())

Citation

If you use this work, please cite:

@article{rababah2025multireactor,
  author = {Rababah, Anfal},
  title = {Multi-Reactor Batch Scheduling Strategies for Pharmaceutical 
           Production Under Uncertainty: A Comparative Statistical and 
           Machine Learning Analysis},
  journal = {Zenodo},
  year = {2025},
  doi = {10.5281/zenodo.17847157}
}

Author

Anfal Rababah
Independent Researcher
MSc Chemical Engineering | BSc Chemical Engineering & Mathematics

License

This work is licensed under Creative Commons Attribution 4.0 International.

Part of my Data Science Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
data		data
figures		figures
paper		paper
simulation		simulation
.gitignore		.gitignore
README.md		README.md

Anfal-AR/pharma-batch-scheduling

Folders and files

Latest commit

History

Repository files navigation