Autonomous fine-tuning of time series foundation models via AI-driven experiment loops.
Inspired by Karpathy's autoresearch: give an AI agent a training script and let it experiment autonomously. The agent proposes hyperparameter tweaks, loss function changes, and architectural modifications, runs each experiment for a fixed time budget, and keeps the best result.
| Model | Provider | Architecture | Covariates | LoRA |
|---|---|---|---|---|
| Chronos-2 | Amazon | T5 encoder-decoder, tokenized | Yes (multivariate) | Yes |
| Chronos-Bolt | Amazon | T5 encoder-decoder, efficient | Yes | Yes |
| TimesFM | Decoder-only, patched | No | Yes | |
| Lag-Llama | OSS | LLaMA decoder, lag features | No | Yes |
| Moirai | Salesforce | Patch transformer, any-variate | Yes | Yes |
| MOMENT | CMU | Patch transformer encoder | No | Yes |
| TinyTimeMixer | IBM | MLP-Mixer, patched | Yes (channel-ind.) | No* |
| Timer | Tsinghua | GPT decoder | No | Yes |
* TTM uses MLP-Mixer (no attention), so standard LoRA doesn't apply. Use full fine-tuning or the built-in adapter head.
pip install ts-autoresearch
# List available models
ts-autoresearch list-models
# Single fine-tuning run with LoRA
ts-autoresearch train \
--model chronos-2 \
--data your_data.csv \
--target-col value \
--date-col date \
--lora-rank 16 \
--budget-seconds 300
# Zero-shot benchmark across models
ts-autoresearch benchmark \
--data your_data.csv \
--target-col value \
--models chronos-2 moment-large lag-llamaThe core feature: let an AI agent autonomously improve your fine-tuning setup.
# Start with a training script
ts-autoresearch run my_train.py \
--max-experiments 50 \
--budget-seconds 180 \
--parallel 2
# Use a different LLM provider
ts-autoresearch run my_train.py \
--llm-provider openai \
--llm-model gpt-4o┌──────────────────────────────────────────────────┐
│ Autoresearch Loop │
│ │
│ 1. Run train.py for N minutes │
│ 2. Parse val_metric from stdout │
│ 3. Ask LLM: "propose one change to improve" │
│ 4. LLM returns modified train.py │
│ 5. Run modified script │
│ 6. If improved → keep. If worse → revert. │
│ 7. Repeat until max_experiments │
│ │
│ Warm-starts across runs via S3 checkpoints │
└──────────────────────────────────────────────────┘
Your training script must:
- Accept
--budget-secondsCLI arg (wall-clock time limit) - Print
val_metric=X.XXXXon its last line (the metric to minimize) - Be self-contained (all imports, data loading, training, and eval)
The LLM will modify hyperparameters, loss functions, training strategies, and architecture choices. It keeps the overall structure and metric reporting intact.
See examples/quickstart_train.py for a minimal template.
from ts_autoresearch.models import get_model
from ts_autoresearch.training import apply_lora, Trainer
from ts_autoresearch.data import TimeSeriesDataset
# Load data
dataset = TimeSeriesDataset.from_csv(
"data.csv",
target_col="value",
id_col="series_id",
date_col="date",
context_length=512,
prediction_length=24,
)
train_ex, val_ex = dataset.prepare_examples()
# Load model + LoRA
adapter = get_model("chronos-2")
adapter.load("cuda")
model = apply_lora(adapter.get_trainable_module(), rank=16)
# Train
trainer = Trainer(
model=model,
forward_fn=adapter.forward_for_training,
device="cuda",
)
result = trainer.train(train_ex, val_ex, budget_seconds=300)
print(f"val_metric={result['val_metric']:.4f}")Add domain-specific training objectives:
from ts_autoresearch.training.losses import (
MonotonicityLoss, # covariate-target direction constraint
CalibrationLoss, # match known sensitivity labels
WeightedMSELoss, # upweight important examples
CompositeLoss, # combine multiple losses
)
# Example: price increase should decrease demand
mono_loss = MonotonicityLoss(direction="negative")
penalty = mono_loss(pred_baseline, pred_counterfactual)Add your own model backends:
from ts_autoresearch.models.registry import ModelRegistry
ModelRegistry.register(
name="my-model",
module_path="my_package.models.custom",
class_name="MyModelAdapter",
default_model_id="my-org/my-model-v1",
)
# Now use it like any built-in model
adapter = get_model("my-model")ts-autoresearch/
├── src/ts_autoresearch/
│ ├── models/ # Model adapters (one per foundation model)
│ │ ├── base.py # Abstract interface
│ │ ├── registry.py # Model registry with lazy imports
│ │ ├── chronos.py # Amazon Chronos / Chronos-2
│ │ ├── timesfm_adapter.py # Google TimesFM
│ │ ├── lag_llama.py # Lag-Llama
│ │ ├── moirai.py # Salesforce Moirai
│ │ ├── moment.py # CMU MOMENT
│ │ ├── tiny_time_mixer.py # IBM TinyTimeMixer
│ │ └── timer.py # Tsinghua Timer
│ ├── training/ # Training utilities
│ │ ├── lora.py # LoRA with auto-detection
│ │ ├── losses.py # Custom loss functions
│ │ └── trainer.py # Time-budgeted trainer
│ ├── data/ # Dataset loading and windowing
│ │ └── dataset.py # Generic time series dataset
│ ├── evaluation/ # Forecast metrics
│ │ └── metrics.py # wMAPE, MASE, MSE, MAE
│ ├── autoresearch/ # AI-driven experiment loop
│ │ ├── loop.py # Main autoresearch loop
│ │ ├── proposer.py # LLM-based proposal engine
│ │ └── checkpoint.py # State persistence
│ ├── config.py # Central configuration
│ └── cli.py # CLI entry point
├── examples/ # Ready-to-use training scripts
│ ├── quickstart_train.py # Minimal LoRA fine-tuning
│ └── custom_loss_train.py # Counterfactual loss example
└── tests/
git clone https://github.com/kshitijbichave/ts-autoresearch.git
cd ts-autoresearch
pip install -e ".[dev,all]"
pytestMIT