fix(pretrain): SPEC §82 P0-H — stamp APR checkpoint architecture from --init model by noahgift · Pull Request #1709 · paiml/aprender

noahgift · 2026-05-15T22:42:21Z

Summary

When apr pretrain --init <qwen2.apr> fine-tunes a Qwen2 model, the trainer hardcoded ("llama-370m-pretrain", "LlamaForCausalLM") regardless of the init model. Downstream apr export --format gguf routed through the llama-family mapper, which has no mapping for Qwen2's 72 per-layer biases — they fell through as passthrough names, getting counted in the GGUF header (291 total) but rejected by llama.cpp's llama-arch loader → expected 291, got 219.

The fix derives name and architecture from init_arch:

Qwen2 init → ("qwen2-pretrain", "Qwen2ForCausalLM") — qwen2 family mapper handles biases via q_proj_bias: "attn_q.bias" rules
Other init → ("<hf_model_type>-pretrain", "<hf_architecture>")
No init (from-scratch) → ("llama-370m-pretrain", "LlamaForCausalLM") (back-compat)

Discharges §82's P0-H item. Combined with PR #1706 (P0-G vocab pad) and #1701 (P0-D/E embed tokenizer + arch metadata), this unblocks AC-SHIP2-010 (llama-cli interop) end-to-end for apr pretrain outputs from Qwen2 init.

Test plan

3 new unit tests in pretrain::tests
cargo test -p apr-cli --lib checkpoint_name_and_arch → 3/3 PASS
cargo clippy -p apr-cli --lib -- -D warnings clean
cargo build -p apr-cli --bin apr succeeds

Methodology

Class 3 packaging defect cascade #29 confirmation:

#	Defect	PR
1	P0-D missing embedded tokenizer	#1701
2	P0-E missing arch metadata	#1701
3	P0-F HF→GGUF arch case	#1699
4	P0-G GGUF vocab pad	#1706
5	P0-H arch from init	THIS PR

5 Class 3 defects in 24h. The "waves of 4, not 2" lesson is empirically holding (perhaps "waves of 5+").

🤖 Generated with Claude Code

… --init model When `apr pretrain --init <qwen2.apr>` fine-tunes a Qwen2 model, the trainer was hardcoded to stamp `("llama-370m-pretrain", "LlamaForCausalLM")` regardless of what the init model actually was. Downstream `apr export --format gguf` then routed through the llama-family GGUF mapper, which has no mapping for Qwen2's per-layer biases (q_proj_bias, k_proj_bias, v_proj_bias × 24 layers = 72 tensors). Those biases fell through to passthrough names like `model.layers.0.self_attn.q_proj.bias`, got counted in the GGUF header (291 total), but llama.cpp's llama-arch loader silently skipped them → `done_getting_tensors: wrong number of tensors; expected 291, got 219`. The fix derives `name` and `architecture` from `init_arch`: - Qwen2 init → ("qwen2-pretrain", "Qwen2ForCausalLM") - Other init → ("<hf_model_type>-pretrain", "<hf_architecture>") - No init → ("llama-370m-pretrain", "LlamaForCausalLM") [back-compat] Once stamped correctly, the qwen2 GGUF family mapper handles biases via its `q_proj_bias: "attn_q.bias"` rules and the tensor count matches. Discharges §82's P0-H item and unblocks AC-SHIP2-010 (llama-cli interop) in combination with the P0-G vocab pad fix (PR #1706). Test plan: - 3 new unit tests in pretrain::tests: - checkpoint_name_and_arch_default_when_no_init (back-compat) - checkpoint_name_and_arch_qwen2_init (Qwen2 stamping) - checkpoint_name_and_arch_init_without_hf_fields (graceful fallback) - All 3 PASS Methodology lesson #29 evidence: P0-G surfaced P0-H within minutes; 4 Class 3 defects (P0-D, P0-E, P0-F, P0-G, P0-H) in 24h confirms the "waves of 4" pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

# Conflicts: # crates/apr-cli/src/commands/pretrain.rs

noahgift enabled auto-merge (squash) May 15, 2026 22:42

noahgift added 5 commits May 16, 2026 00:44

Merge branch 'main' into fix/p0h-arch-from-init

ba75c1b

Merge branch 'main' into fix/p0h-arch-from-init

3c73686

Merge remote-tracking branch 'origin/main' into fix/p0h-arch-from-init

fc4c783

# Conflicts: # crates/apr-cli/src/commands/pretrain.rs

Merge branch 'main' into fix/p0h-arch-from-init

bb6b132

Merge branch 'main' into fix/p0h-arch-from-init

e1aee6f

noahgift merged commit 2b26e69 into main May 16, 2026
10 checks passed

noahgift deleted the fix/p0h-arch-from-init branch May 16, 2026 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pretrain): SPEC §82 P0-H — stamp APR checkpoint architecture from --init model#1709

fix(pretrain): SPEC §82 P0-H — stamp APR checkpoint architecture from --init model#1709
noahgift merged 6 commits into
mainfrom
fix/p0h-arch-from-init

noahgift commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 15, 2026

Summary

Test plan

Methodology

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant