Skip to content

Add ML Training Recipes skill#31

Open
dailycafi wants to merge 1 commit intoOrchestra-Research:mainfrom
dailycafi:add-ml-training-recipes
Open

Add ML Training Recipes skill#31
dailycafi wants to merge 1 commit intoOrchestra-Research:mainfrom
dailycafi:add-ml-training-recipes

Conversation

@dailycafi
Copy link

Summary

Adds a comprehensive ML training recipes skill to 10-optimization/, covering battle-tested PyTorch training patterns across all domains.

What's included

  • SKILL.md (319 lines): Training loops, optimizer config, LR scheduling, mixed precision, debugging checklist, experiment management
  • 6 reference files (96KB total):
    • architecture.md — Transformer/LLM architecture patterns, weight init
    • optimizers.md — Muon, AdamW hybrid, per-group LR, compiled steps
    • domain-specific.md — Vision, diffusion, data loading, architecture tables
    • scaling-and-selection.md — Chinchilla scaling, compute budgets, DGX Spark
    • biomedical.md — Drug discovery, protein LMs, medical imaging, genomics, spatial omics, clinical NLP
    • experiment-loop.md — Autonomous experiment loop (autoresearch-style keep/discard/revert)

Key differentiators from existing skills

This skill fills gaps not covered by existing optimization skills (which focus on quantization/inference):

  • Muon optimizer (polar express orthogonalization) — cutting-edge, not in any existing skill
  • Karpathy's debugging checklist — systematic training diagnosis
  • Autonomous experiment loop — fixed time-budget, keep/discard methodology
  • DGX Spark bandwidth-limited optimization — specialized hardware patterns
  • Biomedical ML — molecular GNNs, ESM-2, MONAI, nnU-Net, spatial omics
  • Per-parameter-group optimizer config — different optimizers for embeddings vs matrices

Sources

Complementary to existing skills

This skill focuses on training methodology while existing 10-optimization skills focus on inference optimization:

  • No overlap with Flash Attention, bitsandbytes, GPTQ, AWQ, GGUF, HQQ
  • Complements 08-distributed-training (covers single-GPU → multi-GPU bridge)
  • Complements 03-fine-tuning (provides the training recipe framework)

Quality checklist

  • YAML frontmatter with all required fields
  • SKILL.md: 319 lines (within 200-300 target)
  • Progressive disclosure (SKILL.md overview → reference files for depth)
  • Code examples with language detection
  • Debugging checklist with solutions
  • marketplace.json updated and validated

Battle-tested PyTorch training patterns covering all domains:
- LLMs, vision, diffusion, medical imaging, protein/drug discovery
- Muon optimizer, hybrid MuonAdamW, per-group LR scaling
- Autonomous experiment loop (autoresearch-style keep/discard/revert)
- DGX Spark bandwidth optimization
- Comprehensive debugging checklist (Karpathy's recipe)
- 319-line SKILL.md + 6 reference files (96KB total)

Sources: Karpathy autoresearch/nanochat, modern optimizer research,
production training best practices.
@zechenzhangAGI
Copy link
Collaborator

exciting direction on autoresearch. thanks for the proposal! will take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants