Skip to content

feat(rosetta): Architecture::Bloom variant + BLOOM model-family contract (closes #1586)#1694

Merged
noahgift merged 14 commits into
mainfrom
fix/rebloom-1586-clean-v2
May 16, 2026
Merged

feat(rosetta): Architecture::Bloom variant + BLOOM model-family contract (closes #1586)#1694
noahgift merged 14 commits into
mainfrom
fix/rebloom-1586-clean-v2

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Re-authors PR #1685 (which itself superseded #1671) on a clean cherry-pick
onto current `main` — DIRTY again after #1686 (InternLM2) landed and
introduced an interleaving conflict on `tensor_expectation.rs` /
`converter_types.rs`.

Now coexists cleanly with both FalconClassic (#1673) and InternLM2 (#1686)
in the enum + dispatch arms.

Adds:

  • `Architecture::Bloom` enum variant
  • `bloom_map_name` mapping HuggingFace BLOOM tensor names to APR canonical
  • `from_model_type("bloom" | "bloomz")` → `Some(Architecture::Bloom)`
  • `is_llm()` + `display_name() = "BLOOM"`
  • `contracts/model-families/bloom.yaml` (BLOOM-560M + BLOOM-7B1 size variants)

Out of scope (separate tickets):

  • ALiBi runtime inference (`is_inference_verified()` returns false)
  • Fused QKV splitter (BLOOM packs Q/K/V interleaved per head)

Verified

  • `cargo check -p aprender-core` clean
  • `pv validate contracts/model-families/bloom.yaml` → 0 errors
  • Coexists with FalconClassic + InternLM2 in all dispatch points

Closes #1586. Supersedes #1671 / #1685.

🤖 Generated with Claude Code

…act (closes #1586)

BLOOM (BigScience) is a GPT-2-derivative architecture with:
- ALiBi linear position bias (no positional-embedding tensor)
- Fused QKV (`self_attention.query_key_value`, Q/K/V interleaved per head)
- GELU MLP, LayerNorm, biases everywhere, tied embeddings
- HuggingFace `h.N.*` naming (NOT `model.layers.N.*`)

The HF tensor names diverge from every existing mapper, so a new
`Architecture::Bloom` variant is added rather than reusing `Llama` or
`Gpt2` mappers.

Engine changes (single function each):

  converter_types.rs::Architecture          + Bloom variant
  tensor_expectation.rs::map_name           + Bloom → bloom_map_name
  tensor_expectation.rs::is_llm             + Bloom
  tensor_expectation.rs::display_name       + "BLOOM"
  tensor_expectation.rs::from_model_type    + "bloom" | "bloomz" → Bloom
  tensor_expectation.rs::bloom_map_name     NEW function (50 LOC)

bloom_map_name translates:
  word_embeddings.weight               → model.embed_tokens.weight
  word_embeddings_layernorm.{w,b}      → model.embed_norm.{w,b}
  h.N.input_layernorm.{w,b}            → model.layers.N.input_layernorm.{w,b}
  h.N.self_attention.query_key_value   → model.layers.N.self_attn.qkv_proj
  h.N.self_attention.dense             → model.layers.N.self_attn.o_proj
  h.N.post_attention_layernorm.{w,b}   → model.layers.N.post_attention_layernorm.{w,b}
  h.N.mlp.dense_h_to_4h                → model.layers.N.mlp.up_proj
  h.N.mlp.dense_4h_to_h                → model.layers.N.mlp.down_proj
  ln_f.{w,b}                           → model.norm.{w,b}

Fused QKV is kept fused at this layer; splitting into separate q/k/v
tensors must happen at the conversion layer (BLOOM interleaves Q/K/V
per head, not concatenated like GPT-NeoX).

YAML at contracts/model-families/bloom.yaml covers BLOOM-560M and
BLOOM-7B1 size variants (shared 250880-token vocab).

Out of scope (separate tickets):
- ALiBi runtime inference support — `is_inference_verified()` returns
  false for BLOOM; the engine has no ALiBi position-bias code path
- Fused QKV splitter at conversion layer

Verified:
- `pv validate contracts/model-families/bloom.yaml` → 0 errors
- All 3 falsifiers pass: FALSIFY-PARITY-002 (every YAML mapped),
  FALSIFY-MF-006 (no duplicate arch classes), FALSIFY-MF-011 (vocab
  consistency)
- All 13764 aprender-core --lib tests pass
@noahgift noahgift enabled auto-merge (squash) May 15, 2026 12:07
@noahgift noahgift merged commit 8cdbd90 into main May 16, 2026
10 checks passed
@noahgift noahgift deleted the fix/rebloom-1586-clean-v2 branch May 16, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add BLOOM (BloomForCausalLM) loader to aprender::rosetta

1 participant