How do you schedule production across multiple reactors when demand is uncertain?
This project demonstrates that the commonly-used FIFO approach produces 47% longer production times than alternatives, and machine learning can predict scheduling outcomes with 92.5% accuracy.
- Overview
- Key Findings
- Data
- Methodology
- Results
- Repository Structure
- Tools & Technologies
- How to Reproduce
- Citation
- Author
Pharmaceutical manufacturers must schedule production across multiple reactors while dealing with unpredictable processing times, equipment availability, and demand fluctuations. Poor scheduling decisions lead to:
- Extended production cycles (lost time)
- Missed delivery deadlines (customer impact)
- Underutilized equipment (wasted capacity)
In pharmaceutical manufacturing, efficient scheduling directly impacts:
- Drug availability — Getting medications to patients on time
- Manufacturing costs — Optimizing expensive reactor capacity
- Operational reliability — Meeting commitments under uncertainty
I simulated 1,200 production scenarios across 4 scheduling strategies and 3 uncertainty levels, then applied a comprehensive analytical framework:
| Analysis Type | Purpose | Tool Used |
|---|---|---|
| Two-Way ANOVA | Compare heuristic & uncertainty effects | JASP |
| Machine Learning Classification | Predict schedule robustness | JASP |
| Machine Learning Regression | Predict production duration | JASP |
| Structural Equation Modeling | Identify causal mechanisms | JASP |
FIFO (First-In-First-Out) scheduling produced makespans 590 hours longer than alternatives — a 47% efficiency gap.
Effect Size: d = 1.40 (large effect)
Variance Explained: η² = .503 (50.3% of all variance!)
SPT (Shortest Processing Time), LPT (Longest Processing Time), and BALANCED heuristics achieved statistically equivalent outcomes — providing flexibility in implementation.
| Task | Performance |
|---|---|
| Robustness Classification | 92.5% accuracy, AUC = 0.970 |
| Makespan Prediction | R² = 0.907, MAPE = 5.9% |
| Delay Detection | 91.7% accuracy |
Mediation analysis revealed why FIFO fails: it creates severe workload imbalance between reactors.
FIFO workload balance: 0.62 (severe imbalance)
Other heuristics: 0.05 (near-perfect balance)
The indirect effect through workload imbalance: +843 hours (fully mediates FIFO's poor performance).
| Characteristic | Value |
|---|---|
| Total Observations | 1,200 scenarios |
| Design | 4 heuristics × 3 uncertainty levels × 100 replications |
| Data Type | Simulation-based |
| Generation | Python 3.13 |
Simulation enables:
- Systematic comparison across controlled conditions
- Exploration of scenarios impossible to test in production
- Statistical power through sufficient sample sizes
- Reproducibility for validation
Input Features:
demand_A,demand_B,demand_C— Product batch demandsheuristic— Scheduling strategy (FIFO, SPT, LPT, BALANCED)uncertainty_level— Low, Medium, Highr1_availability,r2_availability— Reactor availability (%)cip_conflict_prob— Cleaning system conflict probability
Outcome Variables:
makespan— Total production time (hours)tardiness— Delay beyond due dates (hours)workload_balance— |Reactor1_utilization - Reactor2_utilization|schedule_robust— Binary: met performance thresholds?
📁 data/scheduling_dataset.csv — Full 1,200-scenario dataset
📁 data/data_dictionary.md — Variable descriptions and coding
┌─────────────────────────────────────────────────────────┐
│ DUAL-REACTOR PLANT │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ REACTOR 1 │ │ REACTOR 2 │ │
│ │ 10,000 L │ │ 5,000 L │ │
│ │ │ │ │ │
│ │ Products: │ │ Products: │ │
│ │ A, B, C │ │ A, B only │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ └─────────┬───────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ SHARED CIP │ │
│ │ SYSTEM │ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Product C (120h fermentation) → Reactor 1 ONLY → Creates bottleneck
| Heuristic | Logic | Rationale |
|---|---|---|
| FIFO | Process in arrival order | Simplest, common default |
| SPT | Shortest jobs first | Minimize queue buildup |
| LPT | Longest jobs first | Front-load complex work |
| BALANCED | Minimize reactor workload difference | Exploit parallel capacity |
All statistical analyses were conducted using JASP 0.18.3, an open-source statistical software that provides transparent, reproducible analysis with a user-friendly interface.
┌─────────────────────────────────────────────────────────┐
│ ANALYTICAL PIPELINE │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. DATA GENERATION (Python) │
│ └─→ Simulation of 1,200 scenarios │
│ │
│ 2. STATISTICAL ANALYSIS (JASP) │
│ ├─→ Descriptive statistics │
│ ├─→ Two-Way ANOVA │
│ │ └─→ Main effects, interactions, effect sizes │
│ ├─→ Post-hoc comparisons (Tukey HSD, Games-Howell) │
│ └─→ Assumption testing (Levene's, Shapiro-Wilk) │
│ │
│ 3. MACHINE LEARNING (JASP) │
│ ├─→ Classification (6 algorithms) │
│ │ └─→ Gradient Boosting, Random Forest, SVM, │
│ │ Logistic Regression, Decision Tree │
│ └─→ Regression (4 algorithms) │
│ └─→ Random Forest, Boosting, SVR, Linear │
│ │
│ 4. MEDIATION ANALYSIS (JASP - SEM Module) │
│ └─→ Path analysis with bootstrapped confidence │
│ intervals │
│ │
└─────────────────────────────────────────────────────────┘
- Open Source: Free, transparent, reproducible
- Point-and-Click Interface: Reduces coding errors
- Comprehensive Output: Effect sizes, confidence intervals, assumption tests
- Built-in ML Module: Classification and regression with proper train/test splits
- SEM Module: Mediation analysis with bootstrapping
- Reproducibility: Save complete analysis as .jasp file
Two-Way ANOVA Results:
| Source | SS | df | F | p | η² |
|---|---|---|---|---|---|
| Heuristic | 8.24 × 10⁷ | 3 | 459.80 | < .001 | .503 |
| Uncertainty | 1.04 × 10⁷ | 2 | 87.29 | < .001 | .064 |
| Interaction | 2.93 × 10⁴ | 6 | 0.08 | .998 | < .001 |
Heuristic selection explains 50.3% of all variance in makespan — an exceptionally large effect.
Post-Hoc Comparisons (Games-Howell):
| Comparison | Mean Difference | 95% CI | p |
|---|---|---|---|
| BALANCED – FIFO | -590.60 hrs | [-654.8, -526.4] | < .001 |
| BALANCED – LPT | 21.63 hrs | [-22.4, 65.6] | .585 |
| BALANCED – SPT | 21.18 hrs | [-22.1, 64.5] | .589 |
Binary Classification: Schedule Robustness
| Model | Accuracy | AUC | F1 | MCC |
|---|---|---|---|---|
| Gradient Boosting | 92.5% | 0.970 | 0.925 | 0.851 |
| Logistic Regression | 92.1% | 0.919 | 0.921 | 0.841 |
| SVM (Linear) | 91.7% | 0.916 | 0.917 | 0.833 |
| Random Forest | 90.4% | 0.968 | 0.904 | 0.808 |
Regression: Makespan Prediction
| Model | R² | RMSE (hrs) | MAE (hrs) | MAPE |
|---|---|---|---|---|
| Random Forest | 0.907 | 123.4 | 86.1 | 6.6% |
| Gradient Boosting | 0.905 | 133.6 | 83.6 | 5.9% |
| SVM | 0.897 | 128.9 | 78.2 | 5.4% |
| Rank | Feature | Relative Importance |
|---|---|---|
| 1 | Product C Demand | 35% |
| 2 | Heuristic Strategy | 34% |
| 3 | Uncertainty Level | 13% |
| 4 | Product B Demand | 11% |
| 5 | Product A Demand | 7% |
Product C's 120-hour processing time and Reactor 1 exclusivity make it the critical bottleneck.
Path Coefficients (Model 2: Heuristic → Workload Balance → Makespan):
| Path | Estimate | SE | z | p |
|---|---|---|---|---|
| FIFO → Workload Balance (a) | 0.567 | 0.009 | 65.55 | < .001 |
| Workload Balance → Makespan (b) | 1487.32 | 39.63 | 37.53 | < .001 |
| FIFO → Makespan (direct, c') | -238.03 | 21.37 | -11.14 | < .001 |
Effect Decomposition:
| Effect | Estimate | 95% CI | p |
|---|---|---|---|
| Direct Effect | -238.03 | [-280.7, -197.3] | < .001 |
| Indirect Effect | 842.90 | [790.3, 901.8] | < .001 |
| Total Effect | 604.87 | [573.3, 638.8] | < .001 |
Proportion mediated = 139% (suppression pattern) — workload imbalance fully explains FIFO's poor performance.
pharma-batch-scheduling/
│
├── README.md # This file
├── LICENSE # CC BY 4.0
│
├── data/
│ ├── scheduling_dataset.csv # Complete dataset (1,200 scenarios)
│ └── data_dictionary.md # Variable descriptions
│
├── analysis/
│ ├── main_analysis.jasp # Complete JASP analysis file
│ └── outputs/ # Exported tables and figures
│ ├── anova_results.html
│ ├── ml_classification.html
│ ├── ml_regression.html
│ └── mediation_results.html
│
├── simulation/
│ └── data_generation.py # Python simulation code
│
├── figures/
│ ├── makespan_by_heuristic.png
│ ├── roc_curves.png
│ ├── feature_importance.png
│ └── mediation_model.png
│
└── paper/
└── Rababah_2025_MultiReactor.pdf
| Tool | Purpose | Version |
|---|---|---|
| JASP | All statistical analysis (ANOVA, ML, SEM) | 0.18.3 |
| Python | Dataset generation/simulation | 3.13 |
| NumPy/Pandas | Data manipulation | - |
- Python for Simulation: Flexible, reproducible data generation with precise control over parameters
- JASP for Analysis:
- Point-and-click reduces errors
- Built-in effect sizes and confidence intervals
- Transparent, shareable analysis files
- Machine learning module with proper validation
- SEM module for mediation analysis
-
Generate Data:
cd simulation/ python data_generation.py -
Open Analysis in JASP:
- Download JASP (free)
- Open
analysis/main_analysis.jasp - All analyses will be pre-configured with results
- Download this repository
- View exported results in
analysis/outputs/ - Examine figures in
figures/ - Read the full paper in
paper/
import pandas as pd
# Load data
df = pd.read_csv('data/scheduling_dataset.csv')
# Quick exploration
print(df.groupby('heuristic')['makespan'].describe())If you use this work, please cite:
@article{rababah2025multireactor,
author = {Rababah, Anfal},
title = {Multi-Reactor Batch Scheduling Strategies for Pharmaceutical
Production Under Uncertainty: A Comparative Statistical and
Machine Learning Analysis},
journal = {Zenodo},
year = {2025},
doi = {10.5281/zenodo.17847157}
}Anfal Rababah
Independent Researcher
MSc Chemical Engineering | BSc Chemical Engineering & Mathematics
This work is licensed under Creative Commons Attribution 4.0 International.
Part of my Data Science Portfolio