Add ML Training Recipes skill by dailycafi · Pull Request #31 · Orchestra-Research/AI-Research-SKILLs

dailycafi · 2026-03-08T08:44:15Z

Summary

Adds a comprehensive ML training recipes skill to 10-optimization/, covering battle-tested PyTorch training patterns across all domains.

What's included

SKILL.md (319 lines): Training loops, optimizer config, LR scheduling, mixed precision, debugging checklist, experiment management
6 reference files (96KB total):
- architecture.md — Transformer/LLM architecture patterns, weight init
- optimizers.md — Muon, AdamW hybrid, per-group LR, compiled steps
- domain-specific.md — Vision, diffusion, data loading, architecture tables
- scaling-and-selection.md — Chinchilla scaling, compute budgets, DGX Spark
- biomedical.md — Drug discovery, protein LMs, medical imaging, genomics, spatial omics, clinical NLP
- experiment-loop.md — Autonomous experiment loop (autoresearch-style keep/discard/revert)

Key differentiators from existing skills

This skill fills gaps not covered by existing optimization skills (which focus on quantization/inference):

Muon optimizer (polar express orthogonalization) — cutting-edge, not in any existing skill
Karpathy's debugging checklist — systematic training diagnosis
Autonomous experiment loop — fixed time-budget, keep/discard methodology
DGX Spark bandwidth-limited optimization — specialized hardware patterns
Biomedical ML — molecular GNNs, ESM-2, MONAI, nnU-Net, spatial omics
Per-parameter-group optimizer config — different optimizers for embeddings vs matrices

Sources

Karpathy autoresearch / nanochat
Muon Optimizer
Google Deep Learning Tuning Playbook
Chinchilla scaling laws (Hoffmann et al., 2022)
μP (Yang et al., 2022)

Complementary to existing skills

This skill focuses on training methodology while existing 10-optimization skills focus on inference optimization:

No overlap with Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, HQQ
Complements 08-distributed-training (covers single-GPU → multi-GPU bridge)
Complements 03-fine-tuning (provides the training recipe framework)

Quality checklist

YAML frontmatter with all required fields
SKILL.md: 319 lines (within 200-300 target)
Progressive disclosure (SKILL.md overview → reference files for depth)
Code examples with language detection
Debugging checklist with solutions
marketplace.json updated and validated

Battle-tested PyTorch training patterns covering all domains: - LLMs, vision, diffusion, medical imaging, protein/drug discovery - Muon optimizer, hybrid MuonAdamW, per-group LR scaling - Autonomous experiment loop (autoresearch-style keep/discard/revert) - DGX Spark bandwidth optimization - Comprehensive debugging checklist (Karpathy's recipe) - 319-line SKILL.md + 6 reference files (96KB total) Sources: Karpathy autoresearch/nanochat, modern optimizer research, production training best practices.

zechenzhangAGI · 2026-03-08T22:15:38Z

exciting direction on autoresearch. thanks for the proposal! will take a look

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ML Training Recipes skill#31

Add ML Training Recipes skill#31
dailycafi wants to merge 1 commit intoOrchestra-Research:mainfrom
dailycafi:add-ml-training-recipes

dailycafi commented Mar 8, 2026

Uh oh!

zechenzhangAGI commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dailycafi commented Mar 8, 2026

Summary

What's included

Key differentiators from existing skills

Sources

Complementary to existing skills

Quality checklist

Uh oh!

zechenzhangAGI commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants