Skip to content

sandbox-image: complete with dataplane-signed URI#698

Open
sgirones wants to merge 16 commits into
mainfrom
salvador/dataplane-presign-url
Open

sandbox-image: complete with dataplane-signed URI#698
sgirones wants to merge 16 commits into
mainfrom
salvador/dataplane-presign-url

Conversation

@sgirones
Copy link
Copy Markdown
Contributor

@sgirones sgirones commented May 26, 2026

Summary

Sandbox image builds now get their rootfs upload URLs from the dataplane instead of from platform-api. When /prepare returns snapshotRelPath, the CLI asks the builder sandbox's proxy to call dataplane POST /api/v1/blob/sign; dataplane composes the final object URI from its regional config, signs the multipart upload, and returns that canonical uri with the upload spec.

The CLI treats dataplane's signed uri as the source of truth. It writes that URI into spec.snapshotUri so the in-sandbox builder keeps the existing input contract, keeps the same URI in CLI state, and sends it directly to /complete. Builder metadata still supplies computed fields like snapshot_size_bytes, snapshot_format_version, and rootfs_disk_bytes; if metadata includes snapshot_uri, the CLI checks that it matches the dataplane URI.

The legacy /prepare shape is unchanged. If snapshotRelPath is absent, the prepared spec already contains upload, and completion continues to use builder metadata with the prepared snapshotUri fallback.

Notes For Review

  • snapshotRelPath is only used to request dataplane signing. Completion is keyed off the saved signed URI, not the rel path.
  • The new upload response must include uri; missing uri fails before the builder runs.
  • The upload block remains mostly opaque JSON in the passthrough spec. The CLI only reads uri.
  • Diff builds still sign parent manifests as full-URI SingleGet requests.

Test Plan

  • cargo +nightly fmt --check
  • git diff --check
  • cargo test -p tensorlake sandbox_images
  • Manual smoke against legacy platform-api response with embedded upload.
  • Manual smoke against new platform-api + sandbox-proxy sign_blob.

Related PRs

Flow

CLI -> platform-api /prepare
  <- snapshotRelPath

CLI -> sandbox-proxy /api/v1/blob/sign { rel_path, multipart_put }
  -> dataplane validates namespace, composes final URI, signs upload
  <- { uri, uploadId, partUrls, completeUrl, abortUrl }

CLI writes uri into spec.snapshotUri
CLI runs the in-sandbox builder
CLI completes with the saved dataplane-signed uri

For parent snapshot reads, the CLI uses { uri, op: SingleGet } because the parent manifest already has a full stored URI.

Adds a CLI bridge so `build_sandbox_image` works against both the legacy
platform-api response (embedded pre-signed `upload` block) and the new
versioned-response shape (`snapshotRelPath` only). On the new path the
CLI calls the sandbox-proxy `POST /api/v1/blob/sign` endpoint and
splices the returned upload spec into the raw prepared spec before
handing spec.json to the in-sandbox rootfs builder.

The branch key (`snapshot_rel_path`) is the only field added to the
typed `PreparedSandboxTemplateBuild`. Everything else — including the
`upload` block from either path — stays opaque inside the raw
passthrough `Value`, preserving the property that future fields added
to the platform-api ↔ in-sandbox-builder contract don't require an SDK
release.

Always multipart on the new path with 100 MB parts, clamped to ≥ 1 and
saturated at u32::MAX; size hint reuses the existing
`rootfs_disk_bytes` precedence (explicit --disk_mb → parent's
rootfsDiskBytes for diff builds → default). Bindings (Python, Node)
are unchanged — they only see the final registered-template JSON.

Co-authored-by: Cursor <cursoragent@cursor.com>
Platform-api is moving the snapshot location off `snapshotUri` and onto
`snapshotRelPath` (the rel-path then gets resolved client-side via
`SandboxProxyClient::sign_blob`). Stop requiring `snapshotUri` on the
prepared-spec response so the CLI keeps deserializing once platform-api
drops the field.

The completion path now prefers the in-sandbox builder's metadata.json
for the final URI (it always knows where it landed the upload), falls
back to the prepared value for the legacy path, and errors clearly if
neither source provides one — instead of POSTing an empty string to
platform-api's complete endpoint.

Co-authored-by: Cursor <cursoragent@cursor.com>
`pick_upload_op` always returned `MultipartPut` — it "picked" nothing.
The whole helper, plus `disk_mb_for_upload`, plus the four boundary
tests, were just wrapping a one-line part-count computation around the
sole call site in `build_sandbox_image`. Inline it.

The splice now reuses the `rootfs_disk_bytes` value already computed
just upstream for builder sizing, so we don't recompute the same
precedence (explicit --disk_mb → parent rootfsDiskBytes for diff → default).

`MULTIPART_PART_SIZE_MB` stays as the one tunable, and the clamp /
saturation rationale moves into the comment at the call site.

Net -42 lines.

Co-authored-by: Cursor <cursoragent@cursor.com>
@sgirones sgirones changed the title sandbox-image: bridge CLI to sandbox-proxy sign_blob for upload presign sandbox-image: bridge CLI to sandbox-proxy sign_blob May 26, 2026
sgirones and others added 2 commits May 26, 2026 11:54
Drop `#[serde(rename_all = "camelCase")]` so `rel_path` goes on the wire
as `rel_path` to match the sandbox-proxy's expected payload shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the part size from 100 MiB to 64 MiB and cap the requested part
count at S3's 10,000-part limit so absurd disk budgets don't ask the
proxy to mint an invalid multipart op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sgirones sgirones requested a deployment to test-pypi May 26, 2026 10:44 — with GitHub Actions Abandoned
sgirones and others added 3 commits May 26, 2026 13:29
Extend SignBlobRequest to accept either a rel_path or a full uri and
add a SingleGet BlobOp so the proxy can presign downloads. When a
prepared spec includes a parent, fetch a signed download for the
parent manifest URI and inject it into the prepared spec.
Cross-reference MAX_MULTIPART_PARTS in the dataplane's sign_blob
endpoint so a future change to either side flags the other.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sgirones and others added 3 commits June 1, 2026 13:52
Sets pyproject.toml, Cargo.toml, and crates/rust-cloud-sdk-py/pyproject.toml
to 0.5.28 and regenerates Cargo.lock. Skips 0.5.27, which main partially
claimed by hand-bumping only the root pyproject.toml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pending

`wait_for_sandbox_status` propagated the transient 502 / PROXY_ERROR that the
lifecycle gateway returns while a sandbox is still `pending` (not yet routable),
failing the build before it ever reached the builder. In slower environments the
pending window is a minute or two, which broke `sbx image create` outright.

Retry transient proxy errors the same way `wait_for_proxy_ready` already does,
reusing `is_transient_proxy_error`; the existing deadline still bounds the wait,
and non-transient errors still fail immediately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sgirones sgirones changed the title sandbox-image: bridge CLI to sandbox-proxy sign_blob sandbox-image: complete with dataplane-signed URI Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant