Skip to content

P2 3-way-audit follow-ups: audit truncation + numbers/Origin/OAuth + count reconcile#306

Draft
heznpc wants to merge 6 commits into
mainfrom
audit-p2-fixes
Draft

P2 3-way-audit follow-ups: audit truncation + numbers/Origin/OAuth + count reconcile#306
heznpc wants to merge 6 commits into
mainfrom
audit-p2-fixes

Conversation

@heznpc

@heznpc heznpc commented Jun 17, 2026

Copy link
Copy Markdown
Owner

The verifiable half of the P2 backlog from the 3-way completeness audit. Five self-contained fixes, each green on its own (typecheck, full suite of 2030 tests, gen/manifest/intents, mcp:validate, stats:check, npm audit).

  1. audit_summary exposes verified — the HMAC-chain tamper verdict was computed but omitted from outputSchema, so the SDK stripped it from structuredContent. Declared it (+ verifiedFirstBreak / auditDisabled) with a strict-contract test. Also corrected the OAuth resource comment (impl is aud-only).

  2. audit-log tail-truncation detection — the HMAC chain caught edits / inserts / reorders but not removal of the most recent lines (a truncated chain is a valid shorter chain). Each line now carries a monotonic seq + a signed per-flush checkpoint; verify reports truncated / checkpoint_forged. A corrupt or absent checkpoint degrades to chain-only (no false alarm). Plus a one-time warning when the chain is keyed off the host-derived fallback.

  3. Numbers cells write native typesset_cell forced every value through a quoted JXA string, so 42 landed as the text "42". The value schema is now string | number | boolean and emits native literals; swiftTypeFor gained scalar-union support so the AppIntent parameter isn't silently dropped.

  4. no-Origin policyisOriginAllowed has always allowed Origin-less requests (correct: a browser always sends Origin, so a missing one is a token-gated non-browser client), but RFC 0002 documented the opposite plus an unimplemented bypass. Reconciled the RFC, made the rationale explicit, and added an opt-in AIRMCP_DENY_NO_ORIGIN strict mode.

  5. tool-count reconciliation — the public "272 tools" headline undercounted the real surface; count-stats now reads manifest.toolCount (286) as the single source of truth and propagates it across README / docs site / registry manifests / landing page / locales. Historical figures preserved on purpose (the ~111 starter, the v2.7 "262 tools" pitch quote, RFC 0013/0014 point-in-time counts).

Not included — the dual-provider AppShortcutsProvider guard, the FoundationModels fail-closed runtime test + tool-loop pre-check, and app/FM-flag CI. These live behind #if canImport(FoundationModels) and cannot even be compile-checked without the macOS 26 SDK + a real device, so they stay in the separate macOS-verification bucket rather than being written blind and claimed done.

heznpc added 6 commits June 18, 2026 07:10
…ce comment

audit_summary already computed verified / verifiedFirstBreak / auditDisabled but
omitted them from outputSchema, so the MCP SDK strips the tamper-evidence signal
out of structuredContent. Declared all three; added a strict-contract test so a
future drift is caught at the runtime contract, not just registration shape.

oauth-verifier's header comment claimed the RFC 8707 resource claim is accepted
as an audience fallback, but the impl only checks aud via jose. Made the comment
match reality (resource rides untouched in raw, never gates audience) and dropped
the now-unused claimsFromPayload audience parameter.
The HMAC chain catches edits, inserts, reorders, and genesis-reroot, but a
tail-truncated chain is still a valid shorter chain — removing the most recent
lines went undetected. Each sealed line now carries a monotonic seq, and every
flush overwrites a single signed checkpoint anchoring the highest seq + head.
verifyAuditChain reports truncated when the chain falls short of the checkpoint
and checkpoint_forged when the checkpoint MAC fails. A corrupt or absent
checkpoint degrades to chain-only verification (no false alarm); an attacker who
removes both the tail and the checkpoint is a documented limit, not a silent gap.

Also emit a one-time warning on first flush when the chain is keyed off the
host-derived fallback (tamper-evident only, not strong auth) so an operator
knows to set AIRMCP_AUDIT_HMAC_KEY for cross-machine integrity.

Regenerated tool-manifest.json + MCPIntents.swift for the extended audit_summary
outputSchema (verified / verifiedFirstBreak / auditDisabled).
numbers_set_cell forced every value through a single-quoted JXA string literal,
so 42 landed as the text "42" (left-aligned, unsortable, invisible to formulas)
and only Numbers' own coercion rescued some cases. The value schema is now a
string | number | boolean union and setCellScript emits a native JS literal for
numbers/booleans (quoted + escaped only for strings, where a leading '=' is
still interpreted by Numbers as a formula).

A scalar-union param previously returned null from swiftTypeFor and was silently
dropped from the generated AppIntent, which would have removed value from the
Siri/Shortcuts "Set Numbers Cell" intent. swiftTypeFor now projects a scalar
union to its string member (String) so the intent keeps a usable text parameter.

Regenerated tool-manifest.json + MCPIntents.swift.
…opt-in

isOriginAllowed has always allowed requests with no Origin header, but RFC 0002
documented the opposite (default-deny + an AIRMCP_TRUST_NO_ORIGIN bypass that was
never implemented). Allow-by-default is the correct call: a browser always
attaches Origin to a cross-origin request, so a missing Origin is a non-browser
client (curl, native MCP) already gated by the token/OAuth policy — denying it
would break those clients for no security gain.

Made the rationale explicit in code, reconciled RFC 0002 section 5.2 to match,
and added an opt-in strict mode (AIRMCP_DENY_NO_ORIGIN=1) for browser-only
deployments that genuinely want to reject Origin-less requests.
The public "272 tools" headline counted only server.registerTool() call sites and
undercounted the real runtime surface — the generated manifest exposes 286 (it
also includes the dynamically-registered, skill_*, and MCP-app tools). count-stats
now reads manifest.toolCount as the single source of truth and propagates 286
across README, the docs site, registry manifests, the landing page, and locales.

Left as-is on purpose: the "~111 tools" starter-preset figure, the v2.7 "262 tools
across 27 modules" pitch quoted in REGISTRY_SUBMISSIONS, and the point-in-time
counts inside RFC 0013/0014 (historical context, not current claims).

tool-count-drift now asserts README advertises exactly manifest.toolCount, instead
of carrying a stale "superset" comment — guarding the headline against future drift.
…policy

Two issues an adversarial review of this branch surfaced before merge:

- llms.txt (the root AI-discovery manifest) still advertised "272 tools" while
  README / registry manifests moved to 286 — gen-llms-txt derived the headline
  from a registerTool( regex (272) instead of manifest.toolCount (286), and the
  llms:check gate compared against its own stale number so the contradiction
  shipped green. Headline now reads manifest.toolCount; the per-module list stays
  a documented presentation projection.

- The no-Origin allow-by-default comment claimed the request is "already gated by
  the token/OAuth policy" — true for with-token*/with-oauth*, but the DEFAULT
  loopback-only policy has no such gate (it relies on the 127.0.0.1 binding). The
  behavior is safe either way; corrected the comment (code + RFC 0002 section 5.2)
  to attribute the gate to the active policy so it can't mislead a maintainer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant