Skip to content

feat(apr-cli): CRUX-B-20 apr diff --quant-roundtrip per-tensor error report#1696

Merged
noahgift merged 12 commits into
mainfrom
feat/crux-b-20-quant-roundtrip
May 15, 2026
Merged

feat(apr-cli): CRUX-B-20 apr diff --quant-roundtrip per-tensor error report#1696
noahgift merged 12 commits into
mainfrom
feat/crux-b-20-quant-roundtrip

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Closes the per-tensor quantization-error parity gap with llama.cpp's
`llama-quantize-stats -v`:

```
apr diff REF.safetensors QUANT.safetensors --quant-roundtrip [--json]
[--threshold 0.95] [--no-threshold]
```

For every tensor present in both files, computes RMSE / cosine / max_abs
error pairwise, sorts by RMSE descending, and emits a TSV (default) or
JSON (`--json`) report. Each row carries a green/yellow/red verdict per
the contract bucketing (cosine ≥ 0.999 / ≥ 0.99 / else).

Threshold gate (contract invariant): when ANY tensor's cosine falls
below `--threshold` (default 0.95), the command exits non-zero unless
`--no-threshold` is set. Useful as a CI gate against quant regressions.

Implementation

  • New module `crates/apr-cli/src/commands/diff_quant_roundtrip.rs`:
    • `error_metrics(reference, quantized) → (rmse, cosine, max_abs)`
      pure f64-accumulated math, deterministic
    • `build_report(ref_path, quant_path, threshold) → Report` safetensors
      loader supporting F32 / F16 / BF16
    • `render_tsv` / serde JSON output paths
  • Existing `apr diff` clap variant extended with three flags
    (`--quant-roundtrip`, `--threshold`, `--no-threshold`)
  • Dispatch in `dispatch.rs::dispatch_quant_roundtrip` returns
    `CliError::ValidationFailed` on threshold breach → exit code 5

Tests

8 in-tree tests under `cargo test -p apr-cli --lib commands::diff_quant_roundtrip`:

  • Metrics: identity, anti-parallel, orthogonal, empty, small-error
  • Verdict bucket boundaries (green/yellow/red at 0.999/0.99)
  • FALSIFY-CRUX-B-20-001: rows sorted rmse DESC; every row carries
    rmse/cosine/qtype/numel/max_abs/verdict
  • FALSIFY-CRUX-B-20-002: threshold gate flips on cosine < threshold

Live e2e verified on built binary:

```
$ apr diff ref.safetensors quant.safetensors --quant-roundtrip --json

exit 0 when all tensors green; exit 5 (ValidationFailed) when any red

```

Contract promotion

`contracts/crux-B-20-v1.yaml`:
v1.1.0, status: draft, intake_status: partial
→ v1.2.0, status: partial_algorithm_level, intake_status: supported

FALSIFY-CRUX-B-20-001/002 ship as in-tree.
FALSIFY-CRUX-B-20-003 (parity with `llama-quantize-stats` on a golden
GGUF model) remains pending — separate ticket; needs routing through
`format::gguf::dequant.rs`.

3-surface drift compliance

  • `commands_enum.rs` — 3 new fields on existing Diff variant
  • `dispatch.rs` — new `dispatch_quant_roundtrip` helper + Diff pattern updated
  • `contracts/apr-cli-commands-v1.yaml` — `diff` description mentions B-20
  • 3 test sites updated to construct new clap shape
  • `cli_commands.rs` 8/8 still green

Out of scope

  • GGUF Q4_K / Q6_K reference path (separate ticket)
  • Parity sha256 against `llama-quantize-stats` golden tensors

🤖 Generated with Claude Code

…r report

Closes the per-tensor quantization-error parity gap with llama.cpp's
`llama-quantize-stats -v`:

  apr diff REF.safetensors QUANT.safetensors --quant-roundtrip [--json] \
       [--threshold 0.95] [--no-threshold]

For every tensor present in both files, computes RMSE / cosine / max_abs
error pairwise, sorts by RMSE descending, and emits a TSV (default) or
JSON (`--json`) report. Each row carries a green/yellow/red verdict per
the contract bucketing (cosine ≥ 0.999 / ≥ 0.99 / else).

Threshold gate (contract invariant): when ANY tensor's cosine falls
below `--threshold` (default 0.95), the command exits non-zero unless
`--no-threshold` is set. Useful as a CI gate: a regression in quant
quality breaks the build.

## Implementation

- New module `crates/apr-cli/src/commands/diff_quant_roundtrip.rs`:
  - `error_metrics(reference, quantized) -> (rmse, cosine, max_abs)`
    pure f64-accumulated math, deterministic
  - `build_report(ref_path, quant_path, threshold) -> Report`
    safetensors loader supporting F32 / F16 / BF16 (covers HuggingFace
    fp16 ground truth + apr quantize int8/fp16 outputs)
  - `render_tsv(report) -> String` for the human path
- Wires through existing `apr diff` clap variant via three new flags
  (`--quant-roundtrip`, `--threshold`, `--no-threshold`)
- Dispatch in `dispatch.rs::dispatch_quant_roundtrip` returns
  `CliError::ValidationFailed` on threshold breach → exit code 5

## Tests

8 in-tree tests under `cargo test -p apr-cli --lib commands::diff_quant_roundtrip`:

- Metrics: identity, anti-parallel, orthogonal, empty, small-error
- Verdict bucket boundaries (green/yellow/red at 0.999/0.99)
- **FALSIFY-CRUX-B-20-001**: rows sorted rmse DESC; every row carries
  rmse/cosine/qtype/numel/max_abs/verdict
- **FALSIFY-CRUX-B-20-002**: threshold gate flips on cosine < threshold
  (sign-flip pattern); does not flip on small ε perturbation

Live e2e verified on the built binary:

  $ apr diff ref.safetensors quant.safetensors --quant-roundtrip --json
  # exit 0 when all tensors green; exit 5 (ValidationFailed) when any red

## Contract promotion

`contracts/crux-B-20-v1.yaml`:
  v1.1.0, status: draft, intake_status: partial
  → v1.2.0, status: partial_algorithm_level, intake_status: supported

FALSIFY-CRUX-B-20-001/002 ship as in-tree. FALSIFY-CRUX-B-20-003
(parity with `llama-quantize-stats` on a golden GGUF model) remains
pending: it requires routing through `format::gguf::dequant.rs`, which
is a separate ticket.

## 3-surface drift compliance

- `extended_commands.rs` n/a (Diff is a top-level Commands variant; I
  extended the existing variant with 3 fields)
- `dispatch.rs` — new `dispatch_quant_roundtrip` helper + Diff dispatch
  pattern updated for new fields
- `apr-cli-commands-v1.yaml` — `diff` entry description updated to
  mention `--quant-roundtrip` and the CRUX-B-20 ticket
- 3 test sites updated to construct the new clap shape
  (lib_parse_rosetta_02, lib_extract_paths, lib_dispatch_coverage)
- `cli_commands.rs` integration test still green (8/8)

## Out of scope

- GGUF Q4_K / Q6_K reference path (separate ticket — needs
  `format::gguf::dequant.rs` wiring)
- Parity sha256 against `llama-quantize-stats` golden tensors

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 15, 2026 12:30
@noahgift noahgift merged commit de50143 into main May 15, 2026
10 checks passed
@noahgift noahgift deleted the feat/crux-b-20-quant-roundtrip branch May 15, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant