Skip to content

Refactor FilesExt: extract presigned URL coordination into builder#1295

Merged
parthban-db merged 4 commits intomainfrom
parthban-db/stack/refactor-files-client-3
Mar 11, 2026
Merged

Refactor FilesExt: extract presigned URL coordination into builder#1295
parthban-db merged 4 commits intomainfrom
parthban-db/stack/refactor-files-client-3

Conversation

@parthban-db
Copy link
Contributor

@parthban-db parthban-db commented Feb 24, 2026

🥞 Stacked PR

Use this link to review incremental changes.


Summary

Extracts the presigned URL coordination logic (create-upload-part-urls and create-resumable-upload-url API calls and response parsing) into a dedicated _PresignedUrlRequestBuilder class, replacing inline code in three upload methods.

Why

The create-upload-part-urls API call, response validation, and header parsing was duplicated across _do_upload_one_part, _perform_multipart_upload, and _perform_resumable_upload (with a similar pattern for create-resumable-upload-url). Each copy built the request body, called _api.do(), validated the response structure, and converted the headers list to a dict — all inline. This made the upload methods longer than necessary and meant any change to the coordination logic required updating multiple places.

_PresignedUrlRequestBuilder consolidates this into a single class. It also prepares for storage-proxy routing (#1278), where a different builder implementation can construct URLs directly instead of calling the presigned URL APIs.

What changed

Interface changes

None. All changes are to private methods.

Behavioral changes

  • Malformed presigned URL responses (ValueError/KeyError from response parsing) now trigger fallback to single-shot upload on the first part, instead of propagating as hard errors. Previously the _api.do() call was wrapped in try/except but the response parsing was outside it. Now both are encapsulated in the builder, so parsing errors are also caught. This is more resilient — a broken coordination response should not abort the upload when a simpler path exists.

Internal changes

  • _PresignedUrl — new dataclass holding a resolved presigned URL and its associated headers.
  • _PresignedUrlRequestBuilder — new class with build_upload_part_urls() and build_resumable_upload_url(). Encapsulates the coordination API calls and response parsing.
  • _do_upload_one_part — replaced inline create-upload-part-urls call and response parsing with builder.build_upload_part_urls(..., count=1)[0].
  • _perform_multipart_upload — replaced inline create-upload-part-urls batch call and response parsing with builder.build_upload_part_urls(..., count=batch_size).
  • _perform_resumable_upload — replaced inline create-resumable-upload-url call and response parsing with builder.build_resumable_upload_url().

How is this tested?

Unit tests.

NO_CHANGELOG=true

@parthban-db parthban-db marked this pull request as ready for review February 24, 2026 23:17
github-merge-queue bot pushed a commit that referenced this pull request Mar 10, 2026
…anagement (#1294)

## 🥞 Stacked PR
Use this
[link](https://github.com/databricks/databricks-sdk-py/pull/1294/files)
to review incremental changes.
-
[**stack/refactor-files-client-2**](#1294)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1294/files)]
-
[stack/refactor-files-client-3](#1295)
[[Files
changed](https://github.com/databricks/databricks-sdk-py/pull/1295/files/a2305e5743d3d08230f26e08932969f81dff4185..f9e1b4f9bb0829e87e450753544d236e928c4a05)]

---------
<!--
  This template provides a recommended structure for PR descriptions.
  Adapt it freely — the goal is clarity, not rigid compliance.
The three-section format (Summary / Why / What Changed) helps reviewers
understand the change quickly and makes the PR easier to revisit later.
-->

## Why

Today the FilesExt class builds API URLs using hardcoded relative paths
(e.g. "/api/2.0/fs/create-upload-part-urls") and creates a new
unauthenticated requests.Session on every upload/download operation.
This works, but has two limitations:
- No single point of control for request routing. When we add
storage-proxy support, uploads need to be routed to a different
hostname. With paths scattered across a dozen methods, everyone needs to
change. A central _get_hostname() makes the future change a one-liner.
- Unnecessary session churn. A cloud_provider_session was created by a
high-level method and threaded through 5+ layers of calls as a
parameter. Some methods received it only to pass it further down. This
made signatures noisy, and the data flow hard to follow.

## What changed

### Interface changes
 
None. All changes are to private methods.

### Behavioral changes

<!--
User-visible behavior changes: different defaults, changed error
messages,
new side effects, performance characteristics. Write "None." if this is
a
  pure refactor — this explicitly reassures reviewers.
-->

None. This is a pure refactor — the cloud provider session is now cached
and reused instead of re-created per operation, but requests.Session is
safe to share across calls.

### Internal changes

<!--
  Refactoring, file moves, implementation details, test infrastructure.
  Things that don't affect the public API or user-visible behavior.
-->
- _get_hostname() — new method returning the workspace hostname. All
self._api.do() calls now use url=f"{hostname}/api/..." instead of
relative paths. This is the future extension point for storage-proxy
routing.
- _cloud_provider_session() — renamed from
_create_cloud_provider_session(). Caches the session on first call.
Callers invoke self._cloud_provider_session() directly instead of
storing a local variable.

## How is this tested?

<!--
Describe any tests you have done, especially tests that are not part of
the unit tests (e.g. local tests, integration tests, manual
verification).

ALWAYS ANSWER THIS QUESTION: answer with "N/A" if tests are not
applicable
to your PR (e.g. if the PR only modifies comments). Do not be afraid of
answering "Not tested" if the PR has not been tested. Being clear about
what
has been done and not done provides important context to the reviewers.
-->

- All existing unit tests. 
- One new test case was added ("Create resumable URL: node exists but
URL is empty") to cover the build_resumable_upload_url validation path,
where the response node exists but contains no URL.


NO_CHANGELOG=true
@parthban-db parthban-db force-pushed the parthban-db/stack/refactor-files-client-3 branch from f9e1b4f to 245d575 Compare March 10, 2026 17:19
@parthban-db parthban-db force-pushed the parthban-db/stack/refactor-files-client-3 branch from 245d575 to 9e504b1 Compare March 10, 2026 17:20
@parthban-db parthban-db changed the title Refactor FilesExt: centralize hostname routing and simplify session management Refactor FilesExt: extract presigned URL coordination into builder Mar 10, 2026
@parthban-db parthban-db requested a review from Divyansh-db March 10, 2026 17:31
Copy link
Contributor

@Divyansh-db Divyansh-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some comments


url = upload_part_url["url"]
required_headers = upload_part_url.get("headers", [])
assert current_part_number == upload_part_url["part_number"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep current_part_number as part of _PresignedUrl to validate this assertion?

"POST", url=f"{hostname}/api/2.0/fs/create-upload-part-urls", headers=headers, body=body
)
presigned = builder.build_upload_part_urls(
ctx.target_path, session_token, part_index, 1, self._get_upload_url_expire_time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx.target_path, session_token, part_index, 1, self._get_upload_url_expire_time()
ctx.target_path, session_token, part_index, 1, self._get_upload_url_expire_time(),

nit, The parameters of a function should be linted in a uniform format (as https://github.com/databricks/databricks-sdk-py/pull/1295/changes#diff-469b96ba8e49ef7cbe8d3b4577d2a0ddf3a3552638b76d87bfccb936fda65d18R1849-R1854)

url = url_node.get("url")
if not url:
raise ValueError(f"Unexpected server response: {response}")
headers: dict[str, str] = {"Content-Type": "application/octet-stream"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not set in the existing code

@parthban-db
Copy link
Contributor Author

Range-diff: main (27164f7 -> 18f57c6)
databricks/sdk/mixins/files.py
@@ -10,6 +10,7 @@
 +
 +    url: str
 +    headers: dict[str, str]
++    part_number: int | None = None
 +
 +
 +class _PresignedUrlRequestBuilder:
@@ -46,11 +47,11 @@
 +        results = []
 +        for part_info in upload_part_urls:
 +            url = part_info["url"]
-+            _ = part_info["part_number"]
++            part_number = part_info["part_number"]
 +            headers: dict[str, str] = {"Content-Type": "application/octet-stream"}
 +            for h in part_info.get("headers", []):
 +                headers[h["name"]] = h["value"]
-+            results.append(_PresignedUrl(url=url, headers=headers))
++            results.append(_PresignedUrl(url=url, headers=headers, part_number=part_number))
 +        return results
 +
 +    def build_resumable_upload_url(self, path: str, session_token: str) -> _PresignedUrl:
@@ -69,7 +70,7 @@
 +        url = url_node.get("url")
 +        if not url:
 +            raise ValueError(f"Unexpected server response: {response}")
-+        headers: dict[str, str] = {"Content-Type": "application/octet-stream"}
++        headers: dict[str, str] = {}
 +        for h in url_node.get("headers", []):
 +            headers[h["name"]] = h["value"]
 +        return _PresignedUrl(url=url, headers=headers)
@@ -104,7 +105,11 @@
 -                    "POST", url=f"{hostname}/api/2.0/fs/create-upload-part-urls", headers=headers, body=body
 -                )
 +                presigned = builder.build_upload_part_urls(
-+                    ctx.target_path, session_token, part_index, 1, self._get_upload_url_expire_time()
++                    ctx.target_path,
++                    session_token,
++                    part_index,
++                    1,
++                    self._get_upload_url_expire_time(),
 +                )[0]
              except Exception as e:
                  if is_first_part:
@@ -195,6 +200,7 @@
 -                for h in required_headers:
 -                    headers[h["name"]] = h["value"]
 -
++                assert current_part_number == presigned.part_number
                  actual_chunk_length = min(actual_buffer_length, ctx.part_size)
                  _LOG.debug(
                      f"Uploading part {current_part_number}: [{chunk_offset}, {chunk_offset + actual_chunk_length - 1}]"

Reproduce locally: git range-diff 127b960..27164f7 127b960..18f57c6 | Disable: git config gitstack.push-range-diff false

@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1295
  • Commit SHA: 18f57c6c1f3589180a131f70aaead1eea39950eb

Checks will be approved automatically on success.

@parthban-db parthban-db enabled auto-merge March 11, 2026 18:04
@parthban-db parthban-db added this pull request to the merge queue Mar 11, 2026
Merged via the queue into main with commit ff61559 Mar 11, 2026
17 checks passed
@parthban-db parthban-db deleted the parthban-db/stack/refactor-files-client-3 branch March 11, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants