Skip to content

feat(apr-cli): CRUX-B-05 safetensors shard/unshard (apr shard + apr unshard)#1683

Merged
noahgift merged 5 commits into
mainfrom
feat/crux-b-05-safetensors-shard
May 16, 2026
Merged

feat(apr-cli): CRUX-B-05 safetensors shard/unshard (apr shard + apr unshard)#1683
noahgift merged 5 commits into
mainfrom
feat/crux-b-05-safetensors-shard

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

CRUX-B-05 safetensors shard/unshard via weight-map — HuggingFace
model.safetensors.index.json parity:

  • apr shard FILE --max-shard-size SZ -o OUT/ — split a single
    safetensors file into N shards named model-NNNNN-of-MMMMM.safetensors,
    plus a deterministic alphabetically-sorted model.safetensors.index.json
    with weight_map + metadata.total_size.
  • apr unshard DIR -o merged.safetensors — reconstruct a single
    safetensors file from the sharded directory, with shard-path validation
    (no absolute paths, no ..) and total_size cross-check.

Round-trip identity, weight-map coverage, and total_size invariant ship
as in-tree falsifiers — 25 tests green under
cargo test -p apr-cli --lib commands::shard (3 FALSIFY-CRUX-B-05-001/002/003

  • 22 supporting).

Naming choice: kept verbs orthogonal from existing
apr merge (model parameter averaging: average | weighted | slerp | ties | dare)
by introducing apr unshard instead of overloading the verb. Contract
description and FALSIFY tests updated accordingly.

Contract promotion

contracts/crux-B-05-v1.yaml:

  • v1.0.0-draft → v1.1.0
  • status: draft → partial_algorithm_level
  • intake_status: partial → supported
  • FALSIFY-001/002/003 in-tree; FALSIFY-004 (transformers golden parity)
    remains pending golden-set population.

3-surface drift compliance

  • crates/apr-cli/src/extended_commands.rsShard + Unshard clap variants
  • crates/apr-cli/src/dispatch_analysis.rsdispatch_shard / dispatch_unshard
  • contracts/apr-cli-commands-v1.yamlshard + unshard entries
  • crates/apr-cli/tests/cli_commands.rsregistered_commands mirrored
  • apr <cmd> --help exit 0 verified (cli_commands.rs 8/8 green)

CRUX M1 epic (#918) progress

Story Status
K-11 modelfile DSL shipped (PR #1680)
B-05 shard/unshard this PR
C-04, C-11, C-13, I-04 partial — blocked on live aprender-serve endpoint
K-02 draft — blocked on live aprender-serve endpoint

Test plan

  • cargo test -p apr-cli --lib commands::shard — 25/25 green (3 falsifiers + 22 supporting)
  • cargo test -p apr-cli --test cli_commands — 8/8 green
  • cargo test -p apr-cli --lib — 5708/5708 green
  • pv validate contracts/crux-B-05-v1.yaml — 0 errors, 0 warnings
  • Live e2e binary round-trip: apr shard model.safetensors --max-shard-size 8KB -o out/ && apr unshard out/ -o rebuilt.safetensors → tensors byte-identical
  • cargo fmt --all -- --check (apr-cli unchanged)
  • cargo clippy -p apr-cli --lib --no-deps -- -D warnings

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 15, 2026 06:11
noahgift added a commit that referenced this pull request May 15, 2026
…CI noise floor (#1692)

`brick::brick_tests::tests::rmsnorm_brick_runs` started failing with
`budget exceeded` on contended self-hosted runners — observed across at
least 5 concurrent PRs (#1688, #1689, #1685, #1683, #1675) on
2026-05-15, including the ETXTBSY retry-window fix PR which has
zero relation to brick code.

A 4-element RmsNorm is microseconds of real compute. The previous 1ms
budget (already labelled "more lenient" by a prior bump) sits inside
the noise floor when the runner is concurrently building 10+ cargo
workspaces sharing target dirs + L3 — wall time spikes 50–100×.

100ms preserves the guardrail (catches genuine 100s-of-ms perf
regressions) without crossing the contention noise floor.

Per Toyota Way memory rule: flakes are real defects, not items to mark
`#[ignore]` or rerun-until-green.

Verified: `cargo test -p aprender-serve --lib rmsnorm_brick_runs` → ok.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Closes the safetensors shard/unshard parity gap with HuggingFace
`model.safetensors.index.json`:

  - `apr shard FILE --max-shard-size SZ -o OUT/`
    Splits a single safetensors file into N shards
    `model-NNNNN-of-MMMMM.safetensors` + emits a deterministic
    `model.safetensors.index.json` (alphabetically-sorted weight_map +
    metadata.total_size).
  - `apr unshard DIR -o merged.safetensors`
    Walks `model.safetensors.index.json`, validates shard filenames
    (no absolute paths, no `..`), and reconstructs a single safetensors
    file whose tensor values are byte-identical to the original.

Round-trip identity, weight-map coverage, and total_size invariant ship
as in-tree falsifiers under `cargo test -p apr-cli --lib commands::shard`
(25 tests green: 3 FALSIFY-CRUX-B-05-001/002/003 + 22 supporting).

Contract promotion: contracts/crux-B-05-v1.yaml
  v1.0.0-draft, status: draft, intake_status: partial
  → v1.1.0,        status: partial_algorithm_level, intake_status: supported

3-surface drift:
  - extended_commands.rs: new `Shard` + `Unshard` clap variants
  - dispatch_analysis.rs: `dispatch_shard` / `dispatch_unshard`
  - contracts/apr-cli-commands-v1.yaml: `shard`, `unshard` entries
  - crates/apr-cli/tests/cli_commands.rs: registered_commands updated
  - All `apr <cmd> --help` exit 0 (cli_commands.rs 8/8 green)

Naming: kept verbs orthogonal from the existing `apr merge` model
parameter-averaging command (average | weighted | slerp | ties | dare)
by introducing `apr unshard` instead of overloading `apr merge`. The
contract description was updated to reflect this.

CRUX M1 epic (#918) progress: K-11 (PR #1680, modelfile DSL) +
B-05 (this PR) = 2/6 stories. Remaining: C-04, C-11, C-13, I-04, K-02
all require a live aprender-serve endpoint and stay PARTIAL pending
that upstream.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/crux-b-05-safetensors-shard branch from def7671 to 57e226e Compare May 15, 2026 23:35
@noahgift noahgift merged commit 0c232b8 into main May 16, 2026
10 checks passed
@noahgift noahgift deleted the feat/crux-b-05-safetensors-shard branch May 16, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant