docs(evidence): §61 — 5g.1 re-encode SUCCESS, 5g.2 honest dispatch surfaces H4 (PMAT-CODE-PRETRAIN-INIT-LOAD-003) by noahgift · Pull Request #1600 · paiml/aprender

noahgift · 2026-05-10T07:18:21Z

TL;DR

5g.1 re-encode SUCCESS: 1.24 B Python tokens, 0% unk, 7.42 bits entropy (was 99.99% unk / 0.001 bits in §60's broken corpus)
5g.2 LIVE dispatch ABORTED: val_loss=11.55 at epoch 0 (> 10.0 threshold) → divergence guard fires
NEW defect surface H4: val_loss=19.80 at step 1 — worse than ln(vocab)=17.21 (sub-random predictions). Qwen init loads but is structurally broken somewhere

What worked: §60 data-bug FULLY CLOSED

Metric	§60 broken corpus	§61 fixed corpus
Distinct tokens	2	3324
Shannon entropy	0.001 bits	7.415 bits
Unk ratio	99.99%	0.00%
Top tokens	`<unk>`, `</s>`	Ġ-prefix, \n, common Python bigrams

PR #1598's encoder fix processes the full 3 GB Python corpus correctly — 1.24 B tokens / 405 K docs across 126 shards in ~5 min wall (vs 17 hr broken-mode).

What broke (5g.2): val_loss > ln(vocab)

Diagnostic 1-step run on Qwen 0.5B init + 5g.1-v2 corpus:

val_loss = 19.80 at step 1
ln(vocab=151643) = 17.21 (uniform-over-vocab baseline)
Industry baseline Qwen 0.5B zero-shot on Python: ~1.5–3.0

val_loss > ln(vocab) means the model is anti-aligned with held-out tokens — worse than random init.

H4 candidate hypotheses

H4A (tied weights): Qwen 0.5B has tie_word_embeddings: true. If populate writes embed_tokens but lm_head goes to random init, predictions are random while embeddings are correct
H4B (layout): GGUF/APR is row-major (per tensor-layout-v1). If init APR lm_head is column-major, matmul produces wrong logits
H4C (norm scale): RMSNorm weights loaded but rms_norm_eps mismatch cascades through forward
H4D (residual stream): Some block's residual contributes zero from uninitialized buffer

Falsifier ledger update

Falsifier	Pre-§61	Post-§61
FALSIFY-005 (val_loss < 9.38)	NUMERICALLY-PASSED-METHODOLOGY-SUSPECT (data-bug fake pass)	RED-WITH-METHODOLOGICALLY-HONEST (real defect on real corpus)

The status flip from fake-pass to honest-RED is itself progress — the contract now reports the binding defect.

Five-Whys

Why val_loss=19.80 at step 1? Industry-baseline is 1.5–3.0; ln(vocab) is 17.21. 19.80 > 17.21 means anti-aligned predictions.
Why anti-aligned despite PR feat(aprender-train): respect config.use_bias in attention constructor (PMAT-CODE-PRETRAIN-INIT-POPULATE-COVERAGE-001) #1579's fix? PR feat(aprender-train): respect config.use_bias in attention constructor (PMAT-CODE-PRETRAIN-INIT-POPULATE-COVERAGE-001) #1579 fixed Q/K/V bias allocation. H4 is a different gap — likely tied weights (H4A) or layout (H4B).
Four hypotheses scope: Each is its own falsifier-discharge cascade per feedback_falsifier_first_cascade_pattern.md.
Why ship diagnosis but not H4 fix? Multi-PR scope. The honest verdict + hypothesis decomposition unblocks the next session's bisection work.
Why does this matter for ship %? §60 data-bug cascade is FULLY CLOSED. The honest RED on real data is the path forward — was previously masked by the fake-pass.

SHIP-TWO impact

MODEL-1 ship %: unchanged at 91%
MODEL-2 ship %: unchanged at 57% — diagnosis correct, H4 cascade is the gate
§60 H1C cascade: FULLY CLOSED. Encoder works end-to-end on real Qwen vocab + real Python corpus.
5g.1: SHIPPED (real corpus on disk: /mnt/.../codeparrot-python-permissive-shards-qwen-v2)
Closes: PMAT-CODE-PRETRAIN-FINETUNE-LIVE-003 (task Implement Apriori Algorithm for Association Rule Mining #21)
Tracks: PMAT-CODE-PRETRAIN-INIT-LOAD-003 (H4 cascade) — next ship-mover

Test plan

LIVE 5g.1 re-encode: 1.24 B tokens, 0% unk
Entropy audit on shard-0 first 32K: 7.42 bits / 17.21 max
LIVE 5g.2 500-step dispatch: GATE-TRAIN-005 abort at val_loss=11.55
LIVE 5g.2 1-step diagnostic: val_loss=19.80 > ln(vocab)
Documentation only (no Rust/contract changes in this PR)

Files

evidence/section-61-5g-1-re-encode-2026-05-10/README.md (NEW, full audit + H4 hypotheses)
evidence/section-61-5g-1-re-encode-2026-05-10/dispatch.txt (NEW, encode log)
evidence/section-61-5g-2-honest-2026-05-10/dispatch.txt (NEW, 5g.2 dispatch log)

🤖 Generated with Claude Code

…rfaces H4 (PMAT-CODE-PRETRAIN-INIT-LOAD-003) Records the full discharge of PMAT-CODE-PRETRAIN-FINETUNE-LIVE-003 (task #21) and the new H4 defect surface that the honest data exposed. Two artifacts: 1. **5g.1 re-encode SUCCESS** — `apr tokenize encode-corpus` with PR #1598's upfront vocab-format detection produced a real Python corpus from the 3.0 GB JSONL source: - 1,241.7 M tokens - 405,944 documents - 126 shards × 10 M tokens each - Shard-0 first 32K: entropy 7.42 bits / 17.21 max; 3324 distinct tokens; **0% unk** (was 99.99% unk in §60's broken corpus) The data-bug from §60 is fully closed. 2. **5g.2 LIVE dispatch surfaces H4** — Re-running fine-tune from Qwen 0.5B init on the now-real corpus aborted at GATE-TRAIN-005: - 500-step run: val_loss = 11.55 at epoch 0 (> 10.0 threshold) - 1-step diagnostic: val_loss = 19.80 (> ln(vocab) = 17.21) val_loss > ln(vocab) means the model assigns LESS than uniform probability to true tokens — *worse than random init*. The Qwen init weights load (PR #1579's populate-coverage fix is in main) but produce sub-random predictions. Five-Whys 1. Why was val_loss = 19.80 at step 1? Industry baseline for Qwen 0.5B zero-shot on Python is ~1.5–3.0; uniform random over vocab is ln(151643) = 17.21. 19.80 > 17.21 means the model is *anti-aligned* with held-out tokens. 2. Why anti-aligned despite Qwen init being loaded? Some structural component of the init pipeline is broken at a layer that PR #1579 doesn't cover. 3. Four hypotheses for H4: A. Tied weights — `tie_word_embeddings: true` on Qwen 0.5B; if populate writes embed_tokens but doesn't propagate to lm_head (or writes them separately to random buffers), forward predictions are random while embeddings are correct. B. Layout mismatch — GGUF/APR are row-major (tensor-layout-v1); if init APR's lm_head is column-major, matmul produces wrong logits. C. Norm scale — RMSNorm weights loaded but rms_norm_eps mismatch cascades through forward. D. Residual stream — some block's residual contributes zero from an uninitialized buffer. 4. Why ship the diagnosis but not the H4 fix? Each hypothesis is its own falsifier-discharge cascade per `feedback_falsifier_first_cascade_pattern.md`. Multi-PR scope. 5. Why does this matter for ship %? FALSIFY-005 status flips from NUMERICALLY-PASSED-METHODOLOGY-SUSPECT (pre-§61, fake pass on broken corpus) to RED-WITH-METHODOLOGICALLY-HONEST (post-§61, real defect on real corpus). The honest RED is itself progress — the contract now reports the binding defect. SHIP-TWO impact - MODEL-1 ship %: unchanged at 91% (this is MODEL-2 work) - MODEL-2 ship %: unchanged at 57% — diagnosis correct, H4 cascade is the gate - §60 H1C (data-bug) cascade: FULLY CLOSED. Encoder works end-to-end on real Qwen vocab + real Python corpus. Closes PMAT-CODE-PRETRAIN-FINETUNE-LIVE-003 (task #21). Tracking PMAT-CODE-PRETRAIN-INIT-LOAD-003 (H4 cascade) as the next ship-mover. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…05-10

noahgift enabled auto-merge (squash) May 10, 2026 07:18

Merge branch 'main' into docs/section-61-5g-1-re-encode-success-2026-…

5904060

…05-10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(evidence): §61 — 5g.1 re-encode SUCCESS, 5g.2 honest dispatch surfaces H4 (PMAT-CODE-PRETRAIN-INIT-LOAD-003)#1600

docs(evidence): §61 — 5g.1 re-encode SUCCESS, 5g.2 honest dispatch surfaces H4 (PMAT-CODE-PRETRAIN-INIT-LOAD-003)#1600
noahgift wants to merge 2 commits intomainfrom
docs/section-61-5g-1-re-encode-success-2026-05-10

noahgift commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 10, 2026

TL;DR

What worked: §60 data-bug FULLY CLOSED

What broke (5g.2): val_loss > ln(vocab)

H4 candidate hypotheses

Falsifier ledger update

Five-Whys

SHIP-TWO impact

Test plan

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant