Add storage proxy support for FilesExt uploads#1319
Merged
parthban-db merged 4 commits intomainfrom Mar 13, 2026
Merged
Conversation
This was referenced Mar 10, 2026
2fdc0b2 to
92878a6
Compare
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 11, 2026
…1295) ## 🥞 Stacked PR Use this [link](https://github.com/databricks/databricks-sdk-py/pull/1295/files) to review incremental changes. - [**stack/refactor-files-client-3**](#1295) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1295/files)] - [stack/refactor-files-client-4](#1319) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1319/files/9e504b191ad065bf8adbeb003fa493ba51bcc165..8218fc4ae84ff4b9ec2f838b26faa0c3a957f2e5)] - [stack/refactor-files-client-5](#1324) [[Files changed](https://github.com/databricks/databricks-sdk-py/pull/1324/files/8218fc4ae84ff4b9ec2f838b26faa0c3a957f2e5..2c4f3e63287e86b0984a4dfb323547cdaacfa058)] --------- ## Summary Extracts the presigned URL coordination logic (`create-upload-part-urls` and `create-resumable-upload-url` API calls and response parsing) into a dedicated `_PresignedUrlRequestBuilder` class, replacing inline code in three upload methods. ## Why The `create-upload-part-urls` API call, response validation, and header parsing was duplicated across `_do_upload_one_part`, `_perform_multipart_upload`, and `_perform_resumable_upload` (with a similar pattern for `create-resumable-upload-url`). Each copy built the request body, called `_api.do()`, validated the response structure, and converted the headers list to a dict — all inline. This made the upload methods longer than necessary and meant any change to the coordination logic required updating multiple places. `_PresignedUrlRequestBuilder` consolidates this into a single class. It also prepares for storage-proxy routing (#1278), where a different builder implementation can construct URLs directly instead of calling the presigned URL APIs. ## What changed ### Interface changes None. All changes are to private methods. ### Behavioral changes - Malformed presigned URL responses (`ValueError`/`KeyError` from response parsing) now trigger fallback to single-shot upload on the first part, instead of propagating as hard errors. Previously the `_api.do()` call was wrapped in `try/except` but the response parsing was outside it. Now both are encapsulated in the builder, so parsing errors are also caught. This is more resilient — a broken coordination response should not abort the upload when a simpler path exists. ### Internal changes - **`_PresignedUrl`** — new dataclass holding a resolved presigned URL and its associated headers. - **`_PresignedUrlRequestBuilder`** — new class with `build_upload_part_urls()` and `build_resumable_upload_url()`. Encapsulates the coordination API calls and response parsing. - **`_do_upload_one_part`** — replaced inline `create-upload-part-urls` call and response parsing with `builder.build_upload_part_urls(..., count=1)[0]`. - **`_perform_multipart_upload`** — replaced inline `create-upload-part-urls` batch call and response parsing with `builder.build_upload_part_urls(..., count=batch_size)`. - **`_perform_resumable_upload`** — replaced inline `create-resumable-upload-url` call and response parsing with `builder.build_resumable_upload_url()`. ## How is this tested? Unit tests. NO_CHANGELOG=true
8218fc4 to
c7928c1
Compare
Contributor
Author
Range-diff: main (8218fc4 -> c7928c1)
Reproduce locally: |
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
hectorcast-db
approved these changes
Mar 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
Summary
Adds optional storage-proxy routing for file uploads, gated behind the
experimental_files_ext_enable_storage_proxyconfig flag. When enabled and the proxy is reachable, uploads bypass the presigned URL coordination APIs and go directly to the storage proxy.Why
When running inside a Databricks cluster or notebook, a storage proxy is available on the data plane. It can handle file upload operations directly.
The feature is disabled by default and marked experimental, so there is no impact on existing users.
What changed
Interface changes
One new config field:
experimental_files_ext_enable_storage_proxy: bool = False— enables the storage proxy path.Behavioral changes
session.authcallback) is used instead of the unauthenticated session used for presigned URL uploads.build_abort_urlwas added to_PresignedUrlRequestBuilder, extracting inline abort URL logic from_abort_multipart_upload.Internal changes
_StorageProxyRequestBuilder— new class with the same method signatures as_PresignedUrlRequestBuilder(build_upload_part_urls,build_resumable_upload_url,build_abort_url). Constructs URLs directly to the proxy endpoint instead of calling coordination APIs._create_request_builder()— factory method that returns the proxy builder when the proxy is available, otherwise the presigned URL builder._probe_storage_proxy()— GET to the proxy ping endpoint with SDK auth. Result is cached._create_storage_proxy_session()— cachedrequests.Sessionwithsession.authcallback for SDK credentials. Protected by a lock for thread safety._cloud_provider_session()— updated to return the authenticated proxy session when the proxy is active. Session creation protected by a lock for thread safety._get_hostname()— updated to return the proxy hostname when the flag is enabled and the probe succeeds._abort_multipart_upload— now uses the builder instead of inlinecreate-abort-upload-urlAPI call.UploadTestCasegainsuse_storage_proxyparameter. Storage proxy test cases added to existingMultipartUploadTestCaseandResumableUploadTestCaseparametrized lists. Presigned URL coordination handlers assert they are never called when storage proxy is active.How is this tested?
_StorageProxyRequestBuilderused, nocreate-upload-part-urls/create-resumable-upload-urlcalls,session.authcallback active, uploaded and downloaded hashes match.NO_CHANGELOG=true