Skip to content

Unit 9 (customization-trilemma): split criterion 2 (completes MEDIUM batch)#168

Merged
LuminLynx merged 2 commits into
mainfrom
claude/unit-09-rubric-split
May 21, 2026
Merged

Unit 9 (customization-trilemma): split criterion 2 (completes MEDIUM batch)#168
LuminLynx merged 2 commits into
mainfrom
claude/unit-09-rubric-split

Conversation

@LuminLynx
Copy link
Copy Markdown
Owner

Summary

Unit 9 (customization-trilemma), last of the MEDIUM batch. Faithful preserve-by-default split, gate-exempt.

Rubric (3 → 4): c1 unchanged · c2 = name the failure mode · c3 (NEW) = name/explain the mechanism · c4 = regime distinction (was c3).

Decomposition

  • Every old-c2=T pair → c2=T and c3=T.
  • All five old-c2=F pairs (p007, p008, p009, p011, p021) describe correct mappings without naming a mis-match failure mode at all → c2=F, c3=F. No c2=T/c3=F differential pairs — consistent with the audit's note that Unit 9's c2 mechanism is tightly fused to the failure statement.
  • c4 (regime) carries old-c3 unchanged. No realignments. p014 flagged-expected preserved. p007/p008/p011/p021 labels updated.

Post-split distribution (21 pairs)

6 × 4-of-4 · 1 × 3-of-4 (p006) · 5 × 2-of-4 (p007/p008/p010/p011/p021) · 1 × 1-of-4 (p009) · 7 all-missed · 1 flagged-expected (p014).

Local validation

  • lint_unit_markdown / ingest_units --check — clean
  • run_regression_set --check — 21 pairs valid (p014 flagged=true preserved)
  • pytest — 20/20

Milestone

This completes the entire c2-split rollout:

  • HIGH: 10, 11, 12 kept · 13 reverted
  • MEDIUM: 2, 3, 4, 5, 6, 7, 8, 9 — all split
  • Plus: the grader/gold-standard principle, the gate exemption, and the cleanup of early grader-chasing realignments.

Remaining future work: the deferred c1/c3/c4 follow-up audit (the grader-lenient-on-unsplit-AND-clauses pattern, now well-evidenced across units).

Ready to merge (gate-exempt).


Generated by Claude Code

… batch

Per docs/RUBRIC_AUDIT.md (MEDIUM): old c2 bundled 'names a concrete
failure mode' with 'names the mechanism.' Splits into
name-the-failure-mode c2 and a new c3 (mechanism); renumbers regime
distinction to position 4. Rubric grows 3 -> 4.

Preserve-by-default: faithful decomposition of locked Opus values
(old-c2=T -> c2=T,c3=T). All five old-c2=F pairs (p007/p008/p009/
p011/p021) describe correct mappings without naming a MIS-MATCH
failure mode at all -> c2=F,c3=F. No c2=T/c3=F differential pairs
(Unit 9's c2 mechanism is tightly fused to the failure statement,
per the audit). No realignments. c4 (regime) carries old-c3
unchanged. p014 flagged-expected preserved. Updated p007/p008/p011/
p021 labels for the 4-criterion shape.

Gate-exempt (faithful split, zero realignments per
docs/REGRESSION_GATE.md). Completes the MEDIUM-severity c2-split
batch (Units 2-9). Local lint, schema check, ingest-check, pytest
all pass.
Same Codex-flagged stale-label pattern as Unit 8: p006 'c1+c2 met,
c3 missing' -> 'c1+c2+c3 met, c4 missing' (T,T,T,F); p010 'c2 only'
-> 'c2+c3 met, c1/c4 missing' (F,T,T,F); 'All three met' -> 'All four
met'. Expected values unchanged; labels caught before #168 merges.
@LuminLynx LuminLynx merged commit 57a9a65 into main May 21, 2026
2 checks passed
LuminLynx pushed a commit that referenced this pull request May 21, 2026
The first draft claimed units 8 and 9 still carried a bundled c2. That was
read off a stale working tree from before a git pull — PRs #167 and #168
had already split both (rubric + regression sets to 4 criteria) and merged.
Remove the false "incomplete sweep" section, fix the numbering note (only
tokenization and the reverted multimodal remain 3-criterion), and correct
the regime-criterion index for units 8/9 from c3 to c4.

The c1 and regime-criterion analysis for 8/9 is unchanged — the split left
their c1 text alone and only renumbered the regime criterion.

https://claude.ai/code/session_019xEvNkByf5ic4kbMZFdKDR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants