Skip to content

taf-society/Durbyn.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

560 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Durbyn.jl

Stable Dev Build Status Coverage

About

Durbyn is a Julia package for time-series forecasting. It provides a native Julia implementation with its own formula-based grammar for declarative model specification.

Durbyn — Kurdish for “binoculars” (Dur, far + Byn, to see), embodies foresight through science. Like Hari Seldon’s psychohistory in Asimov’s Foundation, we seek to glimpse the shape of tomorrow through the disciplined clarity of mathematics.

This package is part of the TAFS Forecasting Ecosystem, an open-source initiative.

About TAFS

TAFS (Time Series Analysis and Forecasting Society) is a non-profit association registered as a "Verein" in Vienna, Austria. The organization connects a global audience of academics, experts, practitioners, and students to engage, share, learn, and innovate in the fields of data science and artificial intelligence, with a particular focus on time-series analysis, forecasting, and decision science. TAFS

TAFS's mission includes:

  • Connecting: Hosting events and discussion groups to establish connections and build a community of like-minded individuals.
  • Learning: Providing a platform to learn about the latest research, real-world problems, and applications.
  • Sharing: Inviting experts, academics, practitioners, and others to present and discuss problems, research, and solutions.
  • Innovating: Supporting the transfer of research into solutions and helping to drive innovations.

As a registered non-profit association under Austrian law, TAFS ensures that all contributions remain fully open source and cannot be privatized or commercialized. TAFS

License

The Durbyn package is licensed under the Apache License 2.0, allowing for open-source distribution and collaboration.

Installation

using Pkg
Pkg.add("Durbyn")

For the latest development version, you can install directly from GitHub:

Pkg.add(url="https://github.com/taf-society/Durbyn.jl")

Performance Tip: Durbyn supports automatic parallel computing when fitting models to panel data. Start Julia with multiple threads for massive speedups:

julia -t auto  # Use all available CPU cores

Performance scales with cores (8 cores: ~8x faster, 32 cores: ~20x faster, 96+ cores: even greater speedups). See the Quick Start Performance Guide for details.

Usage

Durbyn ships with multiple forecasting engines and a unified formula interface for declarative model specification. The formula interface is the recommended approach for most users, providing a modern, expressive way to specify models with full support for tables, panel data, and model comparison.

The PanelData interface follows a structured forecasting workflow (Hyndman & Athanasopoulos, 2021, Forecasting: Principles and Practice), providing a structured approach:

  1. Data Preparation — Load, reshape, and clean data using TableOps
  2. Visualization — Explore patterns with plot() and glimpse()
  3. Model Specification — Define models using the formula interface (@formula)
  4. Model Training — Fit models with fit(), producing fitted model objects
  5. Performance Evaluation — Assess accuracy with accuracy() and diagnostics
  6. Forecasting — Generate predictions with forecast(), returning tidy forecast tables

For complete documentation, see the Grammar Guide in the docs.

Key Components

  • Formula Interface — Declarative model specification using @formula for ARIMA, BATS, TBATS, and ETS models (ArimaSpec, BatsSpec, TbatsSpec, EtsSpec, SesSpec, HoltSpec, etc.)
  • Durbyn.ModelSpecs — Formula-based model specifications with PanelData support, grouped fitting, and as_table for tidy outputs
  • Durbyn.TableOps — Lightweight, Tables.jl-friendly data wrangling helpers (pivot_longer, arrange, groupby, summarise, mutate, …) plus glimpse utilities, see the Table Operations in the docs,
  • Durbyn.ExponentialSmoothing, Durbyn.Arima, Durbyn.Ararma, Durbyn.IntermittentDemand — Base model engines for array interface
  • Theta models (STM, OTM, DSTM, DOTM) — Dynamic Optimised Theta method (Fiorucci et al., 2016) with seasonal decomposition, multi-step MSE objective (nmse), and automatic variant selection via theta(). See the Theta guide.
  • BATS (Box-Cox, ARMA errors, Trend, Seasonal) — Multi-seasonal state-space models following De Livera, Hyndman & Snyder (2011) for complex seasonal patterns with integer seasonal periods. Suitable for series with multiple integer cycles (e.g., hourly data with daily and weekly patterns). Supports both formula interface (BatsSpec(@formula(...))) and direct API for flexible usage. See the dedicated BATS guide for methodology details and the reference paper: De Livera, Hyndman & Snyder (2011). "Forecasting time series with complex seasonal patterns using exponential smoothing." Journal of the American Statistical Association, 106(496), 1513-1527.
  • TBATS (Trigonometric seasonal BATS) — Extension of BATS using Fourier representation for seasonal components, enabling handling of non-integer seasonal periods (e.g., 52.18 weeks/year), very long cycles (e.g., 5-minute data with daily and weekly patterns), and dual calendar effects (e.g., Gregorian + Hijri). Computational efficiency: $O(k)$ instead of $O(m)$ where $k \ll m$. Supports both formula interface (TbatsSpec(@formula(...))) and direct API for flexible usage. See the TBATS documentation for full mathematical details, usage examples, and references.

Formula Interface (Primary Usage)

The formula interface provides a declarative, flexible way to specify forecasting models with support for single series, regressors, model comparison, and panel data.

Example 1: Single ARIMA Model

using Durbyn

# Load data
data = (sales = [120, 135, 148, 152, 141, 158, 170, 165, 180, 195],)

# Specify model with automatic order selection
spec = ArimaSpec(@formula(sales = p() + q() + d() + P() + D() + Q()))
fitted = fit(spec, data, m = 12)
fc = forecast(fitted, h = 10)
plot(fc)

Example 2: ARIMA with Regressors

# Model with exogenous regressors
data = (
    sales = rand(100),
    temperature = rand(100),
    promotion = rand(0:1, 100)
)

spec = ArimaSpec(@formula(sales = p(1,3) + q(1,3) + temperature + promotion))
fitted = fit(spec, data, m = 7)

# Forecast with future regressor values
newdata = (temperature = rand(7), promotion = rand(0:1, 7))
fc = forecast(fitted, h = 7, newdata = newdata)

Example 3: Theta (Dynamic Optimised)

using Durbyn, Durbyn.Grammar

data = (demand = rand(80),)

spec = ThetaSpec(@formula(demand = theta(
    model = :DOTM,
    decomposition = "multiplicative",
    nmse = 5    # optimise on 1–5 step SSE
)))

fitted = fit(spec, data, m = 12)
fc = forecast(fitted, h = 12)

Example 4: Fitting Multiple Models Together

# Fit multiple model specifications at once
models = model(
    ArimaSpec(@formula(value = p() + q() + P() + Q() + d() + D())),
    BatsSpec(@formula(value = bats(seasonal_periods=12))),
    TbatsSpec(@formula(value = tbats(seasonal_periods=12))),
    EtsSpec(@formula(value = e("Z") + t("Z") + s("Z") + drift(:auto))),
    SesSpec(@formula(value = ses())),
    HoltSpec(@formula(value = holt(damped=true))),
    HoltWintersSpec(@formula(value = hw(seasonal="multiplicative")); m = 12),
    CrostonSpec(@formula(value = croston(method="sba"))),  # Bias-corrected Croston
    names = ["arima", "bats", "tbats", "ets_auto", "ses", "holt_damped", "hw_mul", "croston_sba"]
)

fitted = fit(models, panel)       # each spec fitted to every series
fc     = forecast(fitted, h = 12) # ForecastModelCollection

fc_tbl = as_table(fc) # stacked tidy table with model_name column

Example 5: Panel Data (Models and Multiple Time Series)

using Durbyn
using Durbyn.ModelSpecs
using Durbyn.Grammar
using Downloads
using Tables
using CSV

# Download and load data
path = Downloads.download("https://raw.githubusercontent.com/Akai01/example-time-series-datasets/refs/heads/main/Data/retail.csv")
tbl = Tables.columntable(CSV.File(path))

# Reshape to long format
tbl = pivot_longer(tbl; id_cols=:date, names_to=:series, values_to=:value)

glimpse(tbl)

# Split into train and test sets using table operations
# Get unique dates to determine split point
all_dates = unique(tbl.date)
n_dates = length(all_dates)
split_date = all_dates[end-11]  # Hold out last 12 periods for testing

# Create train and test sets
train = query(tbl, row -> row.date <= split_date)
test = query(tbl, row -> row.date > split_date)

println("Training data:")
glimpse(train)
println("\nTest data:")
glimpse(test)

# Create panel data wrapper for training
panel = PanelData(train; groupby=:series, date=:date, m=12);

glimpse(panel)

# Define multiple models for comparison
models = model(
    ArimaSpec(@formula(value = p() + q())),
    BatsSpec(@formula(value = bats(seasonal_periods=12))),
    TbatsSpec(@formula(value = tbats(seasonal_periods=12))),
    EtsSpec(@formula(value = e("Z") + t("Z") + s("Z") + drift(:auto))),
    SesSpec(@formula(value = ses())),
    HoltSpec(@formula(value = holt(damped=true))),
    HoltWintersSpec(@formula(value = hw(seasonal="multiplicative")); m=12),
    CrostonSpec(@formula(value = croston(method="sba"))),  # Syntetos-Boylan Approximation
    names=["arima", "bats", "tbats", "ets_auto", "ses", "holt_damped", "hw_mul", "croston_sba"]
)

# Fit all models to all series
fitted = fit(models, panel)

# Generate forecasts (h=12 to match test set)
fc = forecast(fitted, h=12)

# Convert to tidy table format
fc_tbl = as_table(fc)

glimpse(fc_tbl)

# Optional: Save forecasts to CSV
# CSV.write("forecasts.csv", fc_tbl)

# Calculate accuracy metrics
# Method 1: Using ForecastModelCollection directly
acc_results = accuracy(fc, test)

println("\nAccuracy by Series and Model:")
glimpse(acc_results)

list_series(fc)  # See what's available
plot(fc)  # Quick look at first series
plot(fc, series=:all, facet=true, n_cols=4)  # Overview

# Detailed inspection
plot(fc, series="series_1", actual=test)

# Calculate accuracy
acc = accuracy(fc, test)

# Find and plot interesting cases
best = acc.series[argmin(acc.MAPE)]
worst = acc.series[argmax(acc.MAPE)]

plot(fc, series=[best, worst], facet=true, actual=test)

Example 5: ETS Models with Formula

# Automatic ETS model selection
spec_ets = EtsSpec(@formula(sales = e("Z") + t("Z") + s("Z")))
fitted = fit(spec_ets, data, m = 12)
fc = forecast(fitted, h = 12)

# Specific ETS components
spec_ses = SesSpec(@formula(sales = e("A")))
spec_holt = HoltSpec(@formula(sales = e("A") + t("A")), damped = true)
spec_hw = HoltWintersSpec(@formula(sales = e("A") + t("A") + s("M")))

Base Models (Array Interface)

The array interface provides direct access to forecasting engines for working with numeric vectors. This is useful for quick analyses or when integrating with existing code.

Exponential Smoothing

using Durbyn
using Durbyn.ExponentialSmoothing

ap = air_passengers();
fit_ets = ets(ap, 12, "ZZZ")
fc_ets = forecast(fit_ets, h = 12)
plot(fc_ets)


ses_fit = ses(ap)
ses_fc = forecast(ses_fit, h = 12)
plot(ses_fc)


holt_fit = holt(ap)
holt_fc = forecast(holt_fit, h = 12)
plot(holt_fc)


hw_fit = holt_winters(ap, 12)
hw_fc = forecast(hw_fit, h = 12)
plot(hw_fc)

Intermittent Demand Forecasting

Croston methods are specialized for intermittent demand time series—data with many zero values and sporadic non-zero demands. Common in:

  • Spare parts inventory management
  • Slow-moving retail items
  • Low-volume products with irregular purchasing patterns
  • Any series with >50% zero values

Why Croston? Standard methods (ARIMA, ETS) struggle with intermittent data because they:

  • Assume continuous demand patterns
  • Produce biased forecasts when faced with zeros
  • Cannot properly model the dual nature of intermittent demand (size and timing)

Quick Start: Formula Interface (Recommended)

using Durbyn

# Intermittent demand data (many zeros)
data = (demand = [6, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 2, 0, 0, 0, 3, 0],)

# 🎯 RECOMMENDED: Syntetos-Boylan Approximation (SBA)
# Bias-corrected method with superior accuracy
spec = CrostonSpec(@formula(demand = croston(method="sba")))
fitted = fit(spec, data)
fc = forecast(fitted, h = 12)
plot(fc)

# Compare multiple Croston variants
models = model(
    CrostonSpec(@formula(demand = croston(method="sba"))),
    CrostonSpec(@formula(demand = croston(method="sbj"))),
    CrostonSpec(@formula(demand = croston(method="classic"))),
    names = ["croston_sba", "croston_sbj", "croston_classic"]
)
fitted_models = fit(models, data)
fc_all = forecast(fitted_models, h = 12)

Advanced Configuration

Fine-tune optimization parameters for better performance (Kourentzes 2014 recommendations):

# Custom parameters for IntermittentDemand methods
spec = CrostonSpec(@formula(demand = croston(
    method = "sba",              # Bias-corrected method
    cost_metric = "mar",         # Mean Absolute Rate (recommended over MSE)
    number_of_params = 2,        # Separate smoothing for demand size and intervals
    optimize_init = true,        # Optimize initial values (important for short series)
    init_strategy = "mean"       # Initialize with mean (vs "naive")
)))
fitted = fit(spec, data)
fc = forecast(fitted, h = 12)

Method Selection Guide

Method Description When to Use
"sba" Syntetos-Boylan Approximation Default choice - bias-corrected, best accuracy
"sbj" Shale-Boylan-Johnston correction Alternative if SBA over-forecasts
"classic" Classical Croston (1972) Original method with modern optimization
"hyndman" Shenstone & Hyndman (2005) Standard implementation, fixed alpha

💡 Recommendation: Start with method="sba" - it's the most validated and generally performs best.

Panel Data Example

Croston methods integrate seamlessly with panel data for multi-product forecasting:

using Durbyn, CSV, Downloads, Tables

# Load intermittent demand data with multiple products
# (your data should have columns: product_id, date, demand)
panel = PanelData(tbl; groupby = :product_id, date = :date)

# Fit SBA to all products in parallel
spec = CrostonSpec(@formula(demand = croston(method="sba")))
fitted = fit(spec, panel)  # Automatically parallelized

# Generate forecasts for all products
fc = forecast(fitted, h = 12)

# Analyze results
fc_table = as_table(fc)
glimpse(fc_table)

Array Interface (Direct)

For lower-level control or integration with existing numeric workflows:

using Durbyn.IntermittentDemand

demand = [6, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 2, 0]

# Recommended: Syntetos-Boylan Approximation (bias-corrected)
fit_sba = croston_sba(demand, cost_metric="mar", number_of_params=2)
fc_sba = forecast(fit_sba, h = 12)
plot(fc_sba, show_fitted = true)

# Alternative: Shale-Boylan-Johnston (alternative bias correction)
fit_sbj = croston_sbj(demand, cost_metric="mar", optimize_init=true)
fc_sbj = forecast(fit_sbj, h = 12)

# Extract diagnostics
fitted_values = fitted(fit_sba)
resids = residuals(fit_sba)

# Compare methods
println("SBA Forecast: ", fc_sba.mean[1])
println("SBJ Forecast: ", fc_sbj.mean[1])

Key Parameters Explained

  • cost_metric: Loss function for optimization

    • "mar": Mean Absolute Rate - recommended (Kourentzes 2014)
    • "msr": Mean Squared Rate - also good
    • "mae", "mse": Classical metrics
  • number_of_params: Smoothing parameter count

    • 2: Separate smoothing for demand size and intervals - recommended
    • 1: Single parameter for both (faster, less accurate)
  • optimize_init: Optimize starting values

    • true: Recommended, especially for short series
    • false: Use heuristic initialization (faster)
  • init_strategy: Initial value method

    • "mean": Use mean of non-zero demands - recommended
    • "naive": Use first observation

References

For theoretical background and validation studies:

  • Croston, J. D. (1972). Forecasting and stock control for intermittent demands. Operational Research Quarterly, 23(3), 289-303.
  • Syntetos, A. A., & Boylan, J. E. (2005). The accuracy of intermittent demand estimates. International Journal of Forecasting, 21(2), 303-314.
  • Kourentzes, N. (2014). On intermittent demand model optimisation and selection. International Journal of Production Economics, 156, 180–190.

Classical ARIMA API

using Durbyn
using Durbyn.Arima

ap = air_passengers()
# Fit an arima model
arima_model = arima(ap, 12, order = PDQ(2,1,1), seasonal=PDQ(0,1,0))

## Generate a forecast
fc = forecast(arima_model, h = 12)
# Plot the forecast
plot(fc)

# Fit an auto arima model
auto_arima_model = auto_arima(ap, 12)

## Generate a forecast
fc = forecast(auto_arima_model, h = 12)
# Plot the forecast
plot(fc)

ARAR and ARARMA models

Formula Interface (Recommended)

Both ARAR and ARARMA support Durbyn's declarative grammar for seamless integration with panel data and model comparison:

using Durbyn

series = air_passengers()
data = (value = series,)

# ARAR with formula interface
arar_spec = ArarSpec(@formula(value = arar(max_ar_depth=20, max_lag=30)))
arar_fitted = fit(arar_spec, data)
fc_arar = forecast(arar_fitted; h = 12)
plot(fc_arar)

# ARARMA with formula interface - Fixed orders
ararma_spec = ArarmaSpec(@formula(value = p(1) + q(2)))
ararma_fitted = fit(ararma_spec, data)
fc_ararma = forecast(ararma_fitted; h = 12)
plot(fc_ararma)

# ARARMA with formula interface - Auto selection
ararma_auto_spec = ArarmaSpec(@formula(value = p() + q()))
ararma_auto_fitted = fit(ararma_auto_spec, data)
fc_ararma_auto = forecast(ararma_auto_fitted; h = 12)
plot(fc_ararma_auto)

# ARARMA with custom ARAR parameters
ararma_custom_spec = ArarmaSpec(
    @formula(value = p() + q()),
    max_ar_depth = 20,
    max_lag = 30,
    crit = :bic
)
ararma_custom_fitted = fit(ararma_custom_spec, data)
fc_ararma_custom = forecast(ararma_custom_fitted; h = 12)

Panel Data and Model Comparison

Both specs integrate with grouped data and model collections:

# Create panel data
panel_tbl = (
    value = vcat(series, series .* 1.05),
    region = vcat(fill("north", length(series)), fill("south", length(series)))
)
panel = PanelData(panel_tbl; groupby = :region, m = 12)

# Compare ARAR and ARARMA against other models
models = model(
    ArarSpec(@formula(value = arar())),
    ArarmaSpec(@formula(value = p() + q())),
    BatsSpec(@formula(value = bats(seasonal_periods=12))),
    TbatsSpec(@formula(value = tbats(seasonal_periods=12))),
    ArimaSpec(@formula(value = p() + q() + P() + Q())),
    EtsSpec(@formula(value = e("Z") + t("Z") + s("Z"))),
    names = ["arar", "ararma", "bats", "tbats", "arima", "ets"]
)

# Fit all models to all groups
fitted_models = fit(models, panel)

# Forecast with all models
fc_all = forecast(fitted_models, h = 12)

# Compare results
plot(fc_all)

Array Interface

For direct numeric vector operations:

using Durbyn
using Durbyn.Ararma

ap = air_passengers();

# Basic ARAR model
arar_model_basic = arar(ap, max_ar_depth = 13)
fc = forecast(arar_model_basic, h = 12)
plot(fc)

# ARARMA with fixed orders
ararma_model = ararma(ap, p = 1, q = 2)
fc = forecast(ararma_model, h = 12)
plot(fc)

# Auto ARARMA (order selection)
auto_ararma_model = auto_ararma(ap, max_p=3, max_q=2, crit=:bic)
fc = forecast(auto_ararma_model, h = 12)
plot(fc)

Key Points:

  • ArarSpec and ArarmaSpec slot into model(...) collections alongside ARIMA/ETS specs
  • ARARMA uses p() and q() grammar (same as ARIMA) - distinction is the Spec type
  • Auto selection when any order has a range: p() + q() or p(0,3) + q()
  • Fixed orders for faster fitting: p(1) + q(2) directly calls ararma()