feat(models): BERT encoder + cross-encoder scaffolding (refs #326)#1675
Merged
Conversation
Reflect v0.33.0 release shipped 2026-05-14 across user-facing docs:
- README.md
- Replace SHIP-007 known-issue warning with "LIVE-DISCHARGED in v0.33.0"
note + link to release. SHIP-007 GPU dispatch on the canonical 7B Q4K
teacher now produces correct output by default.
- Bump provable-contract count 1105 → 1134 (live find result)
- Bump CLI command count 80 → 82 (apr --help live count)
- Bump library example `aprender = "0.31"` → `"0.33"`
- Bump migration table 0.31 → 0.33 (compute, train, serve, orchestrate)
- All four FALSIFY-README-001..004 gates now PASS
- book/src/introduction.md
- 70 crates → 80, 58 commands → 82, 405 contracts → 1134
- book/src/examples/cuda-backend.md
- All three Cargo.toml snippet versions → 0.33
- CLAUDE.md (agent context)
- 70 → 80 crates, 58 → 82 subcommands, 405 → 1134 contracts
- Add v0.33.0 SHIPPED note in project overview
Verified: `bash scripts/check_readme_claims.sh` → 4/4 PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add §76 recording the v0.33.0 cascade publish + post-publish QA + GH Release. Closes the loop on §75's "MODEL-1 100% in code" milestone: MODEL-1 is now 100% in users' hands via `cargo install aprender`. Highlights captured in §76: - 24-crate topological cascade order (contracts-macros → core → gpu → compute → serve → train → apr-cli → aprender root) - Two production blockers closed in flight: - PR #1670: `cc 1.2.59 → 1.2.62` lockfile bump for rustc 1.93.0 compatibility (apple_sdk_name method drift) - `make publish` `.cargo/config.toml` backup race (mitigated by serialization; Makefile fix deferred to follow-up) - /dogfood 12-gate audit verdict GO on the installed v0.33.0 binary (inference smoke: `apr run` "What is 2+2?" → "4" on 1.5B teacher) - Methodology lesson #23 NEW: cargo publish re-resolves Cargo.lock during verify; use --locked or bump-before-cascade §76 does NOT move MODEL-2 ship-% (stays at 57%, independent track). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the sovereign-stack BERT encoder + cross-encoder scoring API:
crates/aprender-core/src/models/bert/
├── mod.rs — public exports
├── config.rs — BertConfig (bert-base + minilm_l6 presets)
├── embeddings.rs — word + position + token_type → LayerNorm
├── layer.rs — single post-norm encoder layer
├── encoder.rs — N × BertLayer stack
└── cross_encoder.rs — embeddings + encoder + classifier head + sigmoid
Re-exported via `aprender::models::{BertConfig, BertEncoder, CrossEncoder}`.
Architecture:
Input: [CLS] query [SEP] passage [SEP]
↓
BertEmbeddings (word + position + token_type → LayerNorm + Dropout)
↓
BertEncoder (N × BertLayer — post-norm: MHA + LN; FFN(GELU) + LN)
↓
CLS pooling (extract [CLS] hidden state)
↓
Optional pooler (Linear → tanh, BERT classification convention)
↓
Linear classifier (hidden_dim → num_labels) → sigmoid
↓
Relevance score ∈ [0, 1]
Uses existing nn primitives:
- nn::transformer::MultiHeadAttention — Q/K/V/O with bias
- nn::LayerNorm — eps=1e-12 (BERT default)
- nn::Linear
- nn::functional::gelu (post-FFN activation)
Distinct from nn::transformer::TransformerEncoderLayer (pre-norm); BERT
uses post-norm per the original "Attention Is All You Need" paper.
14 new tests cover dimension correctness across:
- bert-base-uncased dims (12 layers / 12 heads / 768 hidden)
- MiniLM-L-6 dims (6 layers / 12 heads / 384 hidden)
- paired cross-encoder input (5-token [CLS] q [SEP] p [SEP])
- long sequences (128 tokens)
- 1-label and 2-label classifier heads
- with-pooler and without-pooler paths
All 13778 aprender-core --lib tests pass (was 13764; +14 BERT tests).
Out of scope (separate tickets):
- HuggingFace-parity numerical validation (requires reference activations
from PyTorch transformers — see issue ACs)
- Weight loading from .apr / SafeTensors (the scaffolding constructs with
zero-init weights; integration with apr import is follow-up work)
- Batched scoring optimization (current API is single-pair)
- WordPiece tokenizer training/loading from BERT vocab.txt
- Training / fine-tuning (inference only per issue)
noahgift
added a commit
that referenced
this pull request
May 15, 2026
…CI noise floor (#1692) `brick::brick_tests::tests::rmsnorm_brick_runs` started failing with `budget exceeded` on contended self-hosted runners — observed across at least 5 concurrent PRs (#1688, #1689, #1685, #1683, #1675) on 2026-05-15, including the ETXTBSY retry-window fix PR which has zero relation to brick code. A 4-element RmsNorm is microseconds of real compute. The previous 1ms budget (already labelled "more lenient" by a prior bump) sits inside the noise floor when the runner is concurrently building 10+ cargo workspaces sharing target dirs + L3 — wall time spikes 50–100×. 100ms preserves the guardrail (catches genuine 100s-of-ms perf regressions) without crossing the contention noise floor. Per Toyota Way memory rule: flakes are real defects, not items to mark `#[ignore]` or rerun-until-green. Verified: `cargo test -p aprender-serve --lib rmsnorm_brick_runs` → ok. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refs #326. Ships the sovereign-stack BERT encoder + cross-encoder scoring API — the foundation for trueno-rag's MRR uplift (0.952 → 0.97+) without needing an ONNX Runtime dependency.
What's in
`crates/aprender-core/src/models/bert/`:
Re-exports: `aprender::models::{BertConfig, BertEncoder, CrossEncoder}`.
Why post-norm (not pre-norm)
The existing `nn::transformer::TransformerEncoderLayer` is pre-norm, but BERT uses post-norm per the original "Attention Is All You Need" paper and the HuggingFace reference. Distinct layer required.
Built on existing primitives
What's NOT in (separate tickets — see issue ACs)
Test plan
🤖 Generated with Claude Code