Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 97 additions & 16 deletions content/regression-sets/hallucination-bundle-0.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,32 +32,49 @@
# (4% audit rate, 500/day, 8000-employee target,
# ~50 errors/day at full launch).
#
# Position-to-criterion mapping (from
# content/units/hallucination-bundle-0.md slot 8 rubric):
# RUBRIC SPLIT (2026-05-21, per docs/RUBRIC_AUDIT.md, MEDIUM
# batch): criterion 2 split into c2 (name the failure mode) and
# a new c3 (explain the mechanism); old c3 (regime) renumbered
# to position 4. Rubric grows 3 -> 4. Applied PRESERVE-BY-DEFAULT
# (docs/REGRESSION_GATE.md): faithful decomposition of the locked
# Opus values — every old-c2=T pair inherits c2=T AND c3=T. Of the
# old-c2=F pairs: p007 NAMES a failure mode (mitigation-only) without
# the volume-compound mechanism, so c2=T / c3=F (the lone differential,
# per its label); p009/p011 name no failure mode, so c2=F / c3=F. No
# realignments. c4 (regime) carries old-c3 unchanged. Gate-exempt
# (faithful split, zero realignments). Sonnet disagreements would be
# documented calibration gaps, not edits to the gold standard.
#
# Position-to-criterion mapping (after 2026-05-21 c2 split):
#
# 1. "Names hallucination as a structural base-rate
# problem (not a bug to be eliminated) AND treats
# reliability as a multi-axis design discipline
# (detection vs mitigation vs containment) anchored
# to the feature's actual cost-of-failure — not as
# a 'improve the model' exercise."
# 2. "Identifies a concrete failure mode of single-
# axis hallucination management AND explains the
# mechanism — e.g., mitigation-only collapses at
# scale because a 'small' rate compounds with
# volume; detection-only without mitigation is just
# measuring the bleed; containment-only without
# detection means failures go uncounted."
# 3. "Distinguishes which approach is load-bearing in
# 2. "Identifies a concrete failure mode of single-axis
# hallucination management — mitigation-only,
# detection-only, containment-only, or refusing to
# ship without 100% accuracy." (name the failure mode only)
# 3. "Explains the mechanism behind the named failure mode —
# why it happens — mitigation-only collapses because a
# small rate compounds with volume; detection-only is just
# measuring the bleed; containment-only means failures go
# uncounted; 100%-accuracy refusal treats the base rate as
# eliminable when it is structural." (NEW — from old c2)
# 4. "Distinguishes which approach is load-bearing in
# which regime — detection load-bearing when
# scaling exposure; mitigation load-bearing when
# the rate itself is the problem; containment
# load-bearing when the cost of an individual
# hallucination is high; and recognizes that
# high-stakes features require all three layered."
# (was position 3)
#
# Authoring distribution (21 pairs, 8/3/3/5/2/0;
# post-Unit-7-gate realignment per docs/UNIT_7_GATE.md):
# Authoring distribution (21 pairs; pre-split three-criterion
# shape, post-Unit-7-gate per docs/UNIT_7_GATE.md — preserved as
# historical record):
# * 8 pairs all-three-met (varied voices)
# p001, p002, p005, p008, p016 (pt),
# p017 (pseudocode), p018 (emoji moderate),
Expand All @@ -73,7 +90,18 @@
# p004, p012, p015, p019, p020
# * 2 pairs off-topic, all-missed, gradable (NOT flagged)
# p003, p013
# * 0 pairs flagged-expected (skip per Units 2-4)
#
# Authoring distribution (21 pairs; post-c2-split, 2026-05-21,
# four-criterion shape — current; faithful preserve-by-default):
# * 8 all-four-met: p001, p002, p005, p008, p016, p017, p018, p021
# * 3 partial (3 of 4): p006 (T,T,T,F), p007 (T,T,F,T —
# differential), p014 (T,T,T,F)
# * 1 partial (2 of 4): p010 (F,T,T,F)
# * 2 partial (1 of 4): p009 (T,F,F,F), p011 (F,F,F,T)
# * 5 all-missed on-topic: p004, p012, p015, p019, p020
# * 2 off-topic, all-missed, gradable: p003, p013
# Check: 8 + 3 + 1 + 2 + 5 + 2 = 21.
# p007 is the lone c2=T/c3=F differential.
#
# UNIT_7_GATE realignment (2026-05-13): initial run hit
# 90% per-criterion (95% adjusted excluding p018 ERROR).
Expand Down Expand Up @@ -119,6 +147,17 @@ description: |
(p018; 2-unit-confirmed). Zero flagged-expected pairs.
See header for full triage.

2026-05-21 c2 split (per docs/RUBRIC_AUDIT.md, MEDIUM batch):
criterion 2 split into c2 (name the failure mode) and a new c3
(explain the mechanism); old c3 (regime) renumbered to position
4. Preserve-by-default: faithful decomposition of the locked
Opus values (old-c2=T → c2=T,c3=T). p007 is the lone differential
(names mitigation-only failure without the volume-compound
mechanism → c2=T,c3=F); p009/p011 name no failure mode →
c2=F,c3=F. No realignments. Gate-exempt (faithful split, zero
realignments per docs/REGRESSION_GATE.md). Known-bad p018 ERROR
unaffected.

pairs:
- id: p001
label: All three met, balanced detection-mitigation-containment recommendation
Expand Down Expand Up @@ -181,6 +220,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p002
Expand Down Expand Up @@ -231,6 +272,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p003
Expand Down Expand Up @@ -258,6 +301,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p004
Expand All @@ -282,6 +327,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p005
Expand Down Expand Up @@ -327,6 +374,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p006
Expand Down Expand Up @@ -368,11 +417,13 @@ pairs:
- position: 2
met: true
- position: 3
met: true
- position: 4
met: false
flagged: false

- id: p007
label: c1 + c3 met, c2 missing — gestures at failure modes without naming the volume-compound mechanism (deliberate borderline)
label: c1 + c2 + c4 met, c3 missing — names the mitigation-only failure mode but doesn't explain the volume-compound mechanism (the lone c2=T/c3=F differential pair under the 2026-05-21 split)
answer: |
Hallucination is a base-rate problem; we don't
eliminate it, we design around it. Three axes —
Expand Down Expand Up @@ -406,8 +457,10 @@ pairs:
- position: 1
met: true
- position: 2
met: false
met: true
- position: 3
met: false
- position: 4
met: true
flagged: false

Expand Down Expand Up @@ -450,6 +503,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p009
Expand All @@ -475,6 +530,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p010
Expand Down Expand Up @@ -502,11 +559,13 @@ pairs:
- position: 2
met: true
- position: 3
met: true
- position: 4
met: false
flagged: false

- id: p011
label: c3 only — regime mapping (detection / mitigation / containment per cost-of-one), no base-rate frame, no mechanism
label: c4 only — regime mapping (detection / mitigation / containment per cost-of-one), no base-rate frame, no failure mode named, no mechanism (was "c3 only" pre-split; old c3 regime → position 4)
answer: |
The right call depends on the cost-of-one-
hallucination. For features where individual
Expand All @@ -530,6 +589,8 @@ pairs:
- position: 2
met: false
- position: 3
met: false
- position: 4
met: true
flagged: false

Expand All @@ -555,6 +616,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p013
Expand All @@ -579,6 +642,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p014
Expand Down Expand Up @@ -618,6 +683,8 @@ pairs:
- position: 2
met: true
- position: 3
met: true
- position: 4
met: false
flagged: false

Expand Down Expand Up @@ -647,6 +714,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p016
Expand Down Expand Up @@ -696,6 +765,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p017
Expand Down Expand Up @@ -755,6 +826,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p018
Expand Down Expand Up @@ -796,6 +869,8 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false

- id: p019
Expand All @@ -820,6 +895,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p020
Expand All @@ -846,6 +923,8 @@ pairs:
met: false
- position: 3
met: false
- position: 4
met: false
flagged: false

- id: p021
Expand Down Expand Up @@ -880,4 +959,6 @@ pairs:
met: true
- position: 3
met: true
- position: 4
met: true
flagged: false
3 changes: 2 additions & 1 deletion content/units/hallucination-bundle-0.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ sources:
primary_source: true
rubric:
- text: "Names hallucination as a structural base-rate problem (not a bug to be eliminated) AND treats reliability as a multi-axis design discipline (detection vs mitigation vs containment) anchored to the feature's actual cost-of-failure — not as a 'improve the model' exercise."
- text: "Identifies a concrete failure mode of single-axis hallucination management AND explains the mechanism — e.g., mitigation-only collapses at scale because a 'small' rate compounds with volume (mechanism: 0.5% × 10k calls/day = 50 false outputs/day, enough to erode trust); detection-only without mitigation is just measuring the bleed; containment-only without detection means failures go uncounted; refusing to ship without 100% accuracy is treating the base rate as if it were eliminable."
- text: "Identifies a concrete failure mode of single-axis hallucination management — e.g., mitigation-only, detection-only, containment-only, or refusing to ship without 100% accuracy."
- text: "Explains the mechanism behind the named failure mode — why it happens, not just that it happens — e.g., mitigation-only collapses at scale because a 'small' rate compounds with volume (0.5% × 10k calls/day = 50 false outputs/day, enough to erode trust); detection-only without mitigation is just measuring the bleed; containment-only without detection means failures go uncounted; refusing to ship without 100% accuracy treats the base rate as if it were eliminable when it is structural."
- text: "Distinguishes which approach is load-bearing in which regime — detection load-bearing when shipping into a new domain or scaling exposure (you need to know the rate); mitigation load-bearing when the rate itself is the unit-economics or quality problem; containment load-bearing when the cost of an individual hallucination is high (legal liability, financial loss, safety); and recognizes that high-stakes features require all three layered (the PM-default for any feature where a single hallucination can cause non-recoverable damage)."
---

Expand Down