refactor(error): typed WireField + AllocationContext + AllocationFailed unit (Phase 2 prep)#144
Merged
Conversation
…ed unit (Phase 2 prep) Closes #138, #133, #134, partial #131. Bundled per the Phase-2-prep PR plan — these four issues set foundational precedents that Phase 2's new fault sub-enums will build on, so fixing them before Phase 2 starts means new variants follow the clean pattern from day one. **#138 — replace `unreachable!()` in core (CLAUDE.md "no panics in core"):** The match arm at `pak/mod.rs` for unsupported compression methods inside `stream_entry_to`'s post-reject dispatch called `unreachable!(...)`. The arm IS unreachable in practice (the early-reject at the top of the function would have already returned), but a future contributor adding a new `CompressionMethod` variant who only updates the early-reject and forgets the dispatch would land here and panic. Now returns `InvalidIndex { fault: InvariantViolated { reason: "..." } }` with a message that names the bug class. The variant set is `#[non_exhaustive]` so this discipline matters. **#134 — replace `field: &'static str` and `context: &'static str` with typed enums:** Three `IndexParseFault` variants (`BoundsExceeded`, `U64ExceedsPlatformUsize`, `FieldMismatch`) carried `field: &'static str`, and `AllocationFailed` carried `context: &'static str`. Tests already keyed on these strings (`matches!(... { field: "file_count", .. })`). Recreates the same stringly-typed-discriminator anti-pattern that `OverflowSite` was introduced to escape: a typo at a future call site, or a rename in the wire spec, leaves existing `matches!` tests passing against wrong-name calls. Introduced two enums in `error.rs`: - `WireField` (16 variants) — closed set of wire-format field names (`UncompressedSize`, `CompressedSize`, `BlockLength`, etc.). Display emits the canonical wire-stable snake_case name; every variant Displays to the exact `&'static str` the call site previously passed, so operator log greps and downstream tooling that hard-coded the old strings keep working. - `AllocationContext` (11 variants) — closed set of allocation sites (`EntryPayloadBytes`, `InlineCompressionBlocks`, `V10MainIndexBytes`, etc.). Same Display-stability contract as `WireField`. Both follow the `OverflowSite` precedent: `#[non_exhaustive]` for forward-compat, `Display` impl with wire-stable names, doc-comment naming the precedent and the typo-resistance + closed-set rationale. Updated 18 production call sites across 5 files (`pak/mod.rs`, `pak/index/{mod,flat,path_hash,entry_header}.rs`) plus all in-source tests in `error.rs` and `index/mod.rs` plus 2 sites in `pak_integration.rs`. The mismatch closure in `entry_header.rs` got typed too — the `field: &'static str` parameter is now `field: WireField`. **#133 — add `unit: BoundsUnit` to `AllocationFailed`:** Operators / dashboards keying on `requested` to size-budget alerts couldn't tell whether `requested = 65535` was "65 KiB" or "65535 entries × header size each." The sibling type `BoundsExceeded` got `unit: BoundsUnit` for exactly this reason; `AllocationFailed` did not. Now requires `unit: BoundsUnit` at every call site (Bytes for byte buffers like `EntryPayloadBytes` and `V10*Bytes`, Items for collections like `*CompressionBlocks`, `*Entries`, `*Lookup`). Display includes the unit: "could not reserve {requested} {unit} for {context} [for entry `{path}`]: {source}". Pre-#133 the rendered text was ambiguous; now unambiguous in every variant. **#131 (partial) — weaken `is_fully_verified()` docs:** Added a `Caveat` paragraph noting that for v10+ archives, the method confirms the FDI/PHI region BYTES hash to stored values, but does NOT cross-validate the FNV-64 keys inside the PHI table against FDI-derived paths. An attacker who can rewrite both the PHI table and the corresponding hash slot can install arbitrary `(hash, encoded_offset)` pairs and pass this check. The PHI ↔ FDI cross- validation is the Phase-2 hook on top of `path_hash_seed`; until it lands, treat `is_fully_verified() == true` as "stored hashes match stored bytes," not "the hash table is authoritative." Issue #131 stays open for the full FNV cross-validation work in Phase 2. **Verification:** - `cargo build` clean (no warnings) - `cargo build --no-default-features` clean (production build elides test seams + works without dev-deps) - `cargo clippy --workspace --all-targets --all-features -- -D warnings` clean - `cargo test -p paksmith-core` passes 263/263 across all 7 test binaries Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R1 surfaced two HIGH (unanimous across 4 reviewers): the AllocationFailed
Display rendering produced "bytes for bytes for v10+ index" stutters, and
no per-variant Display pin tests existed for the two new enums even
though the codebase's own `OverflowSite` precedent established the
discipline. Plus three smaller findings.
**HIGH (unanimous): "bytes for bytes" Display stutter — FIXED**
Pre-PR rendering for the 4 `*Bytes` AllocationContext variants:
"could not reserve 65536 bytes for bytes for v10+ index: ..."
^^^^^^^^^^
Pre-PR call-site `&'static str` strings (preserved by #134) were
authored for a template that didn't have a `{unit}` slot. Adding the
`{unit} for ` prefix in #133 collided with the variant strings that
themselves led with the unit word.
Renamed the 4 `*Bytes` variant Display strings to bare noun phrases:
- `EntryPayloadBytes`: "bytes" → "entry payload"
- `V10MainIndexBytes`: "bytes for v10+ index" → "v10+ index"
- `V10EncodedEntriesBytes`: "bytes for v10+ encoded entries" → "v10+ encoded entries"
- `V10FdiBytes`: "bytes for v10+ full directory index" → "v10+ full directory index"
Post-fix renders read naturally:
"could not reserve 65536 bytes for v10+ index: ..."
Also documented the bare-label convention + naming convention (Inline/
Encoded/Flat/V10 prefixes; Bytes suffix for raw byte buffers) in the
`AllocationContext` doc comment so a future contributor extending the
enum knows the rule.
**HIGH (unanimous): missing per-variant Display pin tests — FIXED**
Added `wire_field_display_tokens_are_wire_stable` (16 cases) and
`allocation_context_display_tokens_are_wire_stable` (11 cases),
modelled directly on the existing `overflow_site_display_tokens_are_wire_stable`.
Without these, a typo in a Display arm (`"compression_block_sze"`)
would compile, pass clippy, pass every other test, and silently break
operator dashboard greps. Now caught at CI.
**MEDIUM (pr-test sev 6): AllocationFailed Display untested with Bytes
unit — FIXED**
Added `index_parse_fault_display_allocation_failed_bytes_unit_no_stutter`
exercising all 4 `*Bytes` contexts with `unit: BoundsUnit::Bytes`. The
test pins the exact rendered prefix per context AND has a negative
assertion that `"bytes for bytes"` must NOT appear — would have caught
the R1 stutter regression before merge.
Also tightened the existing `*_with_path` and `*_archive_level` tests
to assert *adjacency* of `count + unit + context` (single substring)
rather than independent containment, so a future template reordering
fails loudly.
**HIGH (silent-failure-hunter H1): IndexParseFault docstring claimed
wire-stability that #133 broke — FIXED**
The type-level docstring still asserted "Display format mirrors the
prior `reason: String` text shapes so operator-visible messages are
stable across the refactor." After #133, AllocationFailed broke that
contract. Updated the docstring to flag AllocationFailed as a
documented exception (with the rationale + specific text-shape change
+ which substring greps still match).
**MEDIUM (silent-failure-hunter M2): is_fully_verified caveat
threat-model imprecision — FIXED**
The pre-R2 caveat described an attack chain that incompletely listed
the gates ("rewrite both the PHI table and the corresponding hash
slot in the main-index header"). Reworded to lead with the missing
PRIMITIVE (no PHI ↔ FDI cross-validation) and parenthetically note
the chained-gate complexity for completeness. The pre-R2 phrasing
under-stated the attack surface in a safe direction (didn't over-
claim safety) but was imprecise; new phrasing is direct.
**MEDIUM (silent-failure-hunter M1): EntryPayloadBytes doc named wrong
call site — FIXED**
Was: `"e.g. read_entry_to's output Vec"` — but `read_entry_to` is the
streaming trait method that doesn't allocate the output buffer.
Updated to: `PakReader::read_entry` (the `ContainerReader` trait
override at `container/pak/mod.rs` that allocates upfront before
streaming).
**LOW (code-reviewer SUG-1, type-design L1): Hash derive asymmetry
between WireField and AllocationContext — FIXED**
Both new enums cite the OverflowSite precedent. OverflowSite doesn't
derive Hash; AllocationContext didn't either; WireField did. Dropped
Hash from WireField for symmetry. No in-tree consumer used it as a
HashMap key. Doc-commented the rationale ("add only when a real
consumer materializes").
**Deferred to follow-up issues (out of scope for the must-fix R2):**
- type-design R1 MEDIUM: WireField layout-version prefixes (FlatEntryCount,
V10NonEncodedCount, FdiFileCount, FdiDirCount). Would parallel
AllocationContext's prefix discipline. Mechanical rename across all
call sites + Display tests.
- type-design R1 MEDIUM: AllocationFailed.unit redundant w/ context.
Add AllocationContext::unit() method, drop the separate field. Would
eliminate the implicit invariant the type system doesn't enforce.
- code-reviewer SUG-2: InvariantViolated reason fragility (issue ref
in the message text becomes misleading if the issue is closed).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #144 R2 reviewers (code-reviewer conf 70 + type-design sev 2) independently noted that the "bare names only for cross-version utility allocations like DedupTracker and ByPathLookup" rule picks examples that aren't `*Bytes`-suffixed, leaving `EntryPayloadBytes` (which IS bare-named — no V10/Inline/Encoded/Flat prefix — but DOES have the Bytes suffix) as a naming-convention orphan. A reader implementing a new variant might infer "bare ⇒ not Bytes-suffixed" from the example list. Reworded the rule to clarify that "bare" refers to the absent layout- version prefix, not the absent suffix, and added EntryPayloadBytes to the example list. Both reviewers explicitly flagged this as below the merge bar; landing as a 5-line doc tweak rather than a separate follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 prep bundle. Closes #138, #133, #134, partial #131. Per the recommended approach, these four issues set foundational precedents that Phase 2's new fault sub-enums will build on — fixing them before Phase 2 starts means new variants follow the clean pattern from day one.
unreachable!()instream_entry_towith typedInvariantViolated(CLAUDE.md "no panics in core")field: &'static str(3 variants) with newWireFieldenum (16 variants) andcontext: &'static str(AllocationFailed) with newAllocationContextenum (11 variants). Both follow theOverflowSiteprecedent:#[non_exhaustive], wire-stableDisplay, doc-comment naming the typo-resistance + closed-set rationale. Tests nowmatches!against typed variants instead of brittle string literals.unit: BoundsUnittoAllocationFailed(sibling parallel toBoundsExceeded). Pre-fix the rendered text was ambiguous: "could not reserve 65535 compression blocks" reads as 65 535 BLOCKS, but "could not reserve 65535 bytes" reads as 65 535 BYTES — only human inference disambiguated. Now unambiguous in every variant.is_fully_verified()docs to be honest that for v10+ archives, the FDI/PHI region BYTES hash to stored values but the PHI table's FNV-64 keys are not cross-validated against FDI-derived paths. The full FNV cross-validation is Phase-2 work; Security: PHI table FNV-64 cross-validation against FDI (Phase 2 hook) #131 stays open for it.Wire-stable
Displaystrings preserved verbatim across all 18 production call sites — operator log greps and downstream tooling that hard-coded the previous&'static strvalues keep working.Test plan
cargo buildcleancargo build --no-default-featuresclean (production build elides test seams)cargo clippy --workspace --all-targets --all-features -- -D warningsclean (CI-mirroring invocation)cargo test -p paksmith-corepasses 263/263 across all 7 test binariesDisplayarms for the 4 affected fault variants tested with the new typed payloads, including the newunitfield🤖 Generated with Claude Code