feat(models): BERT encoder + cross-encoder scaffolding (refs #326) by noahgift · Pull Request #1675 · paiml/aprender

noahgift · 2026-05-14T16:08:47Z

Summary

Refs #326. Ships the sovereign-stack BERT encoder + cross-encoder scoring API — the foundation for trueno-rag's MRR uplift (0.952 → 0.97+) without needing an ONNX Runtime dependency.

What's in

`crates/aprender-core/src/models/bert/`:

`config.rs` — `BertConfig` with `bert-base-uncased` defaults + `minilm_l6()` preset
`embeddings.rs` — word + position + token_type → LayerNorm
`layer.rs` — single post-norm encoder layer (MHA + LN; FFN(GELU) + LN)
`encoder.rs` — N × BertLayer stack
`cross_encoder.rs` — embeddings + encoder + optional pooler + classifier + sigmoid

Re-exports: `aprender::models::{BertConfig, BertEncoder, CrossEncoder}`.

Why post-norm (not pre-norm)

The existing `nn::transformer::TransformerEncoderLayer` is pre-norm, but BERT uses post-norm per the original "Attention Is All You Need" paper and the HuggingFace reference. Distinct layer required.

Built on existing primitives

`nn::transformer::MultiHeadAttention` (Q/K/V/O with bias)
`nn::LayerNorm` (eps=1e-12, BERT default)
`nn::Linear`
`nn::functional::gelu` (FFN activation — BERT uses GELU, not SiLU/SwiGLU)

What's NOT in (separate tickets — see issue ACs)

HuggingFace numerical parity validation — requires PyTorch reference activations (cosine sim > 0.999 per AC). This PR ships correctness-of-shape; numerical correctness needs side-by-side comparison work.
Weight loading — `BertEmbeddings::new()` etc. construct zero-init scaffolding. Integration with `apr import --arch bert` to populate weights is follow-up.
WordPiece tokenizer integration — `WordPieceTokenizer` already exists in `text::tokenize` but the cross-encoder API takes pre-tokenized `input_ids` / `token_type_ids` rather than wrapping the tokenizer.
Batched scoring — single-pair API; batched reranking optimization is downstream.

Test plan

14 new tests in `models::bert::*::tests` covering:
- bert-base-uncased + MiniLM-L-6 dimensions
- paired cross-encoder layout (`[CLS] q [SEP] p [SEP]`)
- long sequences (128 tokens)
- 1-label vs 2-label classifier heads
- with-pooler vs without-pooler paths
- mismatched input_ids/token_type_ids panic
All 13778 aprender-core --lib tests pass (was 13764 — net +14)
CI: workspace-test

🤖 Generated with Claude Code

Reflect v0.33.0 release shipped 2026-05-14 across user-facing docs: - README.md - Replace SHIP-007 known-issue warning with "LIVE-DISCHARGED in v0.33.0" note + link to release. SHIP-007 GPU dispatch on the canonical 7B Q4K teacher now produces correct output by default. - Bump provable-contract count 1105 → 1134 (live find result) - Bump CLI command count 80 → 82 (apr --help live count) - Bump library example `aprender = "0.31"` → `"0.33"` - Bump migration table 0.31 → 0.33 (compute, train, serve, orchestrate) - All four FALSIFY-README-001..004 gates now PASS - book/src/introduction.md - 70 crates → 80, 58 commands → 82, 405 contracts → 1134 - book/src/examples/cuda-backend.md - All three Cargo.toml snippet versions → 0.33 - CLAUDE.md (agent context) - 70 → 80 crates, 58 → 82 subcommands, 405 → 1134 contracts - Add v0.33.0 SHIPPED note in project overview Verified: `bash scripts/check_readme_claims.sh` → 4/4 PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add §76 recording the v0.33.0 cascade publish + post-publish QA + GH Release. Closes the loop on §75's "MODEL-1 100% in code" milestone: MODEL-1 is now 100% in users' hands via `cargo install aprender`. Highlights captured in §76: - 24-crate topological cascade order (contracts-macros → core → gpu → compute → serve → train → apr-cli → aprender root) - Two production blockers closed in flight: - PR #1670: `cc 1.2.59 → 1.2.62` lockfile bump for rustc 1.93.0 compatibility (apple_sdk_name method drift) - `make publish` `.cargo/config.toml` backup race (mitigated by serialization; Makefile fix deferred to follow-up) - /dogfood 12-gate audit verdict GO on the installed v0.33.0 binary (inference smoke: `apr run` "What is 2+2?" → "4" on 1.5B teacher) - Methodology lesson #23 NEW: cargo publish re-resolves Cargo.lock during verify; use --locked or bump-before-cascade §76 does NOT move MODEL-2 ship-% (stays at 57%, independent track). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds the sovereign-stack BERT encoder + cross-encoder scoring API: crates/aprender-core/src/models/bert/ ├── mod.rs — public exports ├── config.rs — BertConfig (bert-base + minilm_l6 presets) ├── embeddings.rs — word + position + token_type → LayerNorm ├── layer.rs — single post-norm encoder layer ├── encoder.rs — N × BertLayer stack └── cross_encoder.rs — embeddings + encoder + classifier head + sigmoid Re-exported via `aprender::models::{BertConfig, BertEncoder, CrossEncoder}`. Architecture: Input: [CLS] query [SEP] passage [SEP] ↓ BertEmbeddings (word + position + token_type → LayerNorm + Dropout) ↓ BertEncoder (N × BertLayer — post-norm: MHA + LN; FFN(GELU) + LN) ↓ CLS pooling (extract [CLS] hidden state) ↓ Optional pooler (Linear → tanh, BERT classification convention) ↓ Linear classifier (hidden_dim → num_labels) → sigmoid ↓ Relevance score ∈ [0, 1] Uses existing nn primitives: - nn::transformer::MultiHeadAttention — Q/K/V/O with bias - nn::LayerNorm — eps=1e-12 (BERT default) - nn::Linear - nn::functional::gelu (post-FFN activation) Distinct from nn::transformer::TransformerEncoderLayer (pre-norm); BERT uses post-norm per the original "Attention Is All You Need" paper. 14 new tests cover dimension correctness across: - bert-base-uncased dims (12 layers / 12 heads / 768 hidden) - MiniLM-L-6 dims (6 layers / 12 heads / 384 hidden) - paired cross-encoder input (5-token [CLS] q [SEP] p [SEP]) - long sequences (128 tokens) - 1-label and 2-label classifier heads - with-pooler and without-pooler paths All 13778 aprender-core --lib tests pass (was 13764; +14 BERT tests). Out of scope (separate tickets): - HuggingFace-parity numerical validation (requires reference activations from PyTorch transformers — see issue ACs) - Weight loading from .apr / SafeTensors (the scaffolding constructs with zero-init weights; integration with apr import is follow-up work) - Batched scoring optimization (current API is single-pair) - WordPiece tokenizer training/loading from BERT vocab.txt - Training / fine-tuning (inference only per issue)

…CI noise floor (#1692) `brick::brick_tests::tests::rmsnorm_brick_runs` started failing with `budget exceeded` on contended self-hosted runners — observed across at least 5 concurrent PRs (#1688, #1689, #1685, #1683, #1675) on 2026-05-15, including the ETXTBSY retry-window fix PR which has zero relation to brick code. A 4-element RmsNorm is microseconds of real compute. The previous 1ms budget (already labelled "more lenient" by a prior bump) sits inside the noise floor when the runner is concurrently building 10+ cargo workspaces sharing target dirs + L3 — wall time spikes 50–100×. 100ms preserves the guardrail (catches genuine 100s-of-ms perf regressions) without crossing the contention noise floor. Per Toyota Way memory rule: flakes are real defects, not items to mark `#[ignore]` or rerun-until-green. Verified: `cargo test -p aprender-serve --lib rmsnorm_brick_runs` → ok. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift and others added 3 commits May 14, 2026 16:49

noahgift enabled auto-merge (squash) May 14, 2026 16:08

noahgift added 6 commits May 15, 2026 08:12

Merge branch 'main' into fix/326-bert-cross-encoder

648db85

Merge branch 'main' into fix/326-bert-cross-encoder

69ffdd4

Merge branch 'main' into fix/326-bert-cross-encoder

9920056

Merge branch 'main' into fix/326-bert-cross-encoder

536effd

Merge branch 'main' into fix/326-bert-cross-encoder

84e68b6

Merge branch 'main' into fix/326-bert-cross-encoder

e0f241a

noahgift mentioned this pull request May 15, 2026

fix(brick-test): bump rmsnorm_brick_runs budget 1ms → 100ms — clears CI noise floor #1692

Merged

noahgift added 12 commits May 15, 2026 12:21

Merge branch 'main' into fix/326-bert-cross-encoder

e5d4a9c

Merge branch 'main' into fix/326-bert-cross-encoder

17b3e51

Merge branch 'main' into fix/326-bert-cross-encoder

6e29970

Merge branch 'main' into fix/326-bert-cross-encoder

b53587a

Merge branch 'main' into fix/326-bert-cross-encoder

9d9f3d2

Merge branch 'main' into fix/326-bert-cross-encoder

6d8083d

Merge branch 'main' into fix/326-bert-cross-encoder

6e1f6c5

Merge branch 'main' into fix/326-bert-cross-encoder

7fa3b11

Merge branch 'main' into fix/326-bert-cross-encoder

07f9661

Merge branch 'main' into fix/326-bert-cross-encoder

8d8e8be

Merge branch 'main' into fix/326-bert-cross-encoder

c3234f6

Merge branch 'main' into fix/326-bert-cross-encoder

01c94e0

noahgift added 6 commits May 15, 2026 23:11

Merge branch 'main' into fix/326-bert-cross-encoder

e3075b5

Merge branch 'main' into fix/326-bert-cross-encoder

a9684f0

Merge branch 'main' into fix/326-bert-cross-encoder

64effed

Merge branch 'main' into fix/326-bert-cross-encoder

120744c

Merge branch 'main' into fix/326-bert-cross-encoder

f7499ef

Merge branch 'main' into fix/326-bert-cross-encoder

857115e

noahgift added 3 commits May 16, 2026 04:04

Merge branch 'main' into fix/326-bert-cross-encoder

48d775d

Merge branch 'main' into fix/326-bert-cross-encoder

aae8abe

Merge branch 'main' into fix/326-bert-cross-encoder

6f7f7f4

noahgift merged commit 63a6562 into main May 16, 2026
10 checks passed

noahgift deleted the fix/326-bert-cross-encoder branch May 16, 2026 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): BERT encoder + cross-encoder scaffolding (refs #326)#1675

feat(models): BERT encoder + cross-encoder scaffolding (refs #326)#1675
noahgift merged 30 commits into
mainfrom
fix/326-bert-cross-encoder

noahgift commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 14, 2026

Summary

What's in

Why post-norm (not pre-norm)

Built on existing primitives

What's NOT in (separate tickets — see issue ACs)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant