Skip to content

feat(models): BERT encoder + cross-encoder scaffolding (refs #326)#1675

Merged
noahgift merged 30 commits into
mainfrom
fix/326-bert-cross-encoder
May 16, 2026
Merged

feat(models): BERT encoder + cross-encoder scaffolding (refs #326)#1675
noahgift merged 30 commits into
mainfrom
fix/326-bert-cross-encoder

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Refs #326. Ships the sovereign-stack BERT encoder + cross-encoder scoring API — the foundation for trueno-rag's MRR uplift (0.952 → 0.97+) without needing an ONNX Runtime dependency.

What's in

`crates/aprender-core/src/models/bert/`:

  • `config.rs` — `BertConfig` with `bert-base-uncased` defaults + `minilm_l6()` preset
  • `embeddings.rs` — word + position + token_type → LayerNorm
  • `layer.rs` — single post-norm encoder layer (MHA + LN; FFN(GELU) + LN)
  • `encoder.rs` — N × BertLayer stack
  • `cross_encoder.rs` — embeddings + encoder + optional pooler + classifier + sigmoid

Re-exports: `aprender::models::{BertConfig, BertEncoder, CrossEncoder}`.

Why post-norm (not pre-norm)

The existing `nn::transformer::TransformerEncoderLayer` is pre-norm, but BERT uses post-norm per the original "Attention Is All You Need" paper and the HuggingFace reference. Distinct layer required.

Built on existing primitives

  • `nn::transformer::MultiHeadAttention` (Q/K/V/O with bias)
  • `nn::LayerNorm` (eps=1e-12, BERT default)
  • `nn::Linear`
  • `nn::functional::gelu` (FFN activation — BERT uses GELU, not SiLU/SwiGLU)

What's NOT in (separate tickets — see issue ACs)

  • HuggingFace numerical parity validation — requires PyTorch reference activations (cosine sim > 0.999 per AC). This PR ships correctness-of-shape; numerical correctness needs side-by-side comparison work.
  • Weight loading — `BertEmbeddings::new()` etc. construct zero-init scaffolding. Integration with `apr import --arch bert` to populate weights is follow-up.
  • WordPiece tokenizer integration — `WordPieceTokenizer` already exists in `text::tokenize` but the cross-encoder API takes pre-tokenized `input_ids` / `token_type_ids` rather than wrapping the tokenizer.
  • Batched scoring — single-pair API; batched reranking optimization is downstream.

Test plan

  • 14 new tests in `models::bert::*::tests` covering:
    • bert-base-uncased + MiniLM-L-6 dimensions
    • paired cross-encoder layout (`[CLS] q [SEP] p [SEP]`)
    • long sequences (128 tokens)
    • 1-label vs 2-label classifier heads
    • with-pooler vs without-pooler paths
    • mismatched input_ids/token_type_ids panic
  • All 13778 aprender-core --lib tests pass (was 13764 — net +14)
  • CI: workspace-test

🤖 Generated with Claude Code

noahgift and others added 3 commits May 14, 2026 16:49
Reflect v0.33.0 release shipped 2026-05-14 across user-facing docs:

- README.md
  - Replace SHIP-007 known-issue warning with "LIVE-DISCHARGED in v0.33.0"
    note + link to release. SHIP-007 GPU dispatch on the canonical 7B Q4K
    teacher now produces correct output by default.
  - Bump provable-contract count 1105 → 1134 (live find result)
  - Bump CLI command count 80 → 82 (apr --help live count)
  - Bump library example `aprender = "0.31"` → `"0.33"`
  - Bump migration table 0.31 → 0.33 (compute, train, serve, orchestrate)
  - All four FALSIFY-README-001..004 gates now PASS

- book/src/introduction.md
  - 70 crates → 80, 58 commands → 82, 405 contracts → 1134

- book/src/examples/cuda-backend.md
  - All three Cargo.toml snippet versions → 0.33

- CLAUDE.md (agent context)
  - 70 → 80 crates, 58 → 82 subcommands, 405 → 1134 contracts
  - Add v0.33.0 SHIPPED note in project overview

Verified: `bash scripts/check_readme_claims.sh` → 4/4 PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add §76 recording the v0.33.0 cascade publish + post-publish QA + GH
Release. Closes the loop on §75's "MODEL-1 100% in code" milestone:
MODEL-1 is now 100% in users' hands via `cargo install aprender`.

Highlights captured in §76:
- 24-crate topological cascade order (contracts-macros → core → gpu →
  compute → serve → train → apr-cli → aprender root)
- Two production blockers closed in flight:
  - PR #1670: `cc 1.2.59 → 1.2.62` lockfile bump for rustc 1.93.0
    compatibility (apple_sdk_name method drift)
  - `make publish` `.cargo/config.toml` backup race (mitigated by
    serialization; Makefile fix deferred to follow-up)
- /dogfood 12-gate audit verdict GO on the installed v0.33.0 binary
  (inference smoke: `apr run` "What is 2+2?" → "4" on 1.5B teacher)
- Methodology lesson #23 NEW: cargo publish re-resolves Cargo.lock
  during verify; use --locked or bump-before-cascade

§76 does NOT move MODEL-2 ship-% (stays at 57%, independent track).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the sovereign-stack BERT encoder + cross-encoder scoring API:

  crates/aprender-core/src/models/bert/
  ├── mod.rs            — public exports
  ├── config.rs         — BertConfig (bert-base + minilm_l6 presets)
  ├── embeddings.rs     — word + position + token_type → LayerNorm
  ├── layer.rs          — single post-norm encoder layer
  ├── encoder.rs        — N × BertLayer stack
  └── cross_encoder.rs  — embeddings + encoder + classifier head + sigmoid

Re-exported via `aprender::models::{BertConfig, BertEncoder, CrossEncoder}`.

Architecture:

  Input: [CLS] query [SEP] passage [SEP]
   ↓
  BertEmbeddings (word + position + token_type → LayerNorm + Dropout)
   ↓
  BertEncoder (N × BertLayer — post-norm: MHA + LN; FFN(GELU) + LN)
   ↓
  CLS pooling (extract [CLS] hidden state)
   ↓
  Optional pooler (Linear → tanh, BERT classification convention)
   ↓
  Linear classifier (hidden_dim → num_labels) → sigmoid
   ↓
  Relevance score ∈ [0, 1]

Uses existing nn primitives:
  - nn::transformer::MultiHeadAttention — Q/K/V/O with bias
  - nn::LayerNorm — eps=1e-12 (BERT default)
  - nn::Linear
  - nn::functional::gelu (post-FFN activation)

Distinct from nn::transformer::TransformerEncoderLayer (pre-norm); BERT
uses post-norm per the original "Attention Is All You Need" paper.

14 new tests cover dimension correctness across:
  - bert-base-uncased dims (12 layers / 12 heads / 768 hidden)
  - MiniLM-L-6 dims (6 layers / 12 heads / 384 hidden)
  - paired cross-encoder input (5-token [CLS] q [SEP] p [SEP])
  - long sequences (128 tokens)
  - 1-label and 2-label classifier heads
  - with-pooler and without-pooler paths

All 13778 aprender-core --lib tests pass (was 13764; +14 BERT tests).

Out of scope (separate tickets):
- HuggingFace-parity numerical validation (requires reference activations
  from PyTorch transformers — see issue ACs)
- Weight loading from .apr / SafeTensors (the scaffolding constructs with
  zero-init weights; integration with apr import is follow-up work)
- Batched scoring optimization (current API is single-pair)
- WordPiece tokenizer training/loading from BERT vocab.txt
- Training / fine-tuning (inference only per issue)
@noahgift noahgift enabled auto-merge (squash) May 14, 2026 16:08
noahgift added a commit that referenced this pull request May 15, 2026
…CI noise floor (#1692)

`brick::brick_tests::tests::rmsnorm_brick_runs` started failing with
`budget exceeded` on contended self-hosted runners — observed across at
least 5 concurrent PRs (#1688, #1689, #1685, #1683, #1675) on
2026-05-15, including the ETXTBSY retry-window fix PR which has
zero relation to brick code.

A 4-element RmsNorm is microseconds of real compute. The previous 1ms
budget (already labelled "more lenient" by a prior bump) sits inside
the noise floor when the runner is concurrently building 10+ cargo
workspaces sharing target dirs + L3 — wall time spikes 50–100×.

100ms preserves the guardrail (catches genuine 100s-of-ms perf
regressions) without crossing the contention noise floor.

Per Toyota Way memory rule: flakes are real defects, not items to mark
`#[ignore]` or rerun-until-green.

Verified: `cargo test -p aprender-serve --lib rmsnorm_brick_runs` → ok.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 63a6562 into main May 16, 2026
10 checks passed
@noahgift noahgift deleted the fix/326-bert-cross-encoder branch May 16, 2026 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant