Skip to content

PEP 694: Abstract file upload mechanisms #4431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jul 28, 2025

Conversation

ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented May 21, 2025

This attempts to defer the implementation details of getting the bits of a given artifact from the client to the server.

The primary motivation here is to decouple this PEP from resumable/multi-part uploads, provide flexibility to implementers of PEP 694, and allow for new upload mechanisms without a PEP cycle.


📚 Documentation preview 📚: https://pep-previews--4431.org.readthedocs.build/

@ewdurbin
Copy link
Member Author

I spent some time to sketch this out as a Finite State Machine in pypi/warehouse#18174, and will be using what I learned to refine this a bit!

@ewdurbin ewdurbin marked this pull request as ready for review May 28, 2025 14:13
@ewdurbin ewdurbin requested review from dstufft and warsaw as code owners May 28, 2025 14:13
@ewdurbin
Copy link
Member Author

@dstufft @warsaw I think this is ready for a proper review.

@warsaw
Copy link
Member

warsaw commented May 28, 2025

I'm planning on taking a closer look later today.

Copy link
Member

@dstufft dstufft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments, but overall the changes look good to me.

I just wanted to note a few things I came across when reading the entire PEP (with these changes incorporated-- and probably a number of these came from my original PEP so that's on me :D ).

  • The PEP specifies that the endpoint for PyPI will be https://upload.pypi.org/2.0. I would probably remove that from the PEP and let PyPI decide what it's endpoint will be (in particular the /2.0 part is somewhat confusing given the PEP also uses conneg).
  • The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.
  • The content type handling is kind of wonky I think. Much like PEP 691 the client can use the Accept header to request a particular content type from the server, and the server includes the full version number in the meta.api-version key in the response. However, the requests appear to only be using the meta.api-version key. I think ideally we want to have requests using a correct Content-Type for their request (and the latest wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that the meta.api-version matches the Content-Type for major version.

@ewdurbin
Copy link
Member Author

ewdurbin commented May 30, 2025

  • The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.

While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...", 05d7fc2

@ewdurbin
Copy link
Member Author

ewdurbin commented May 30, 2025

Realization while specifying http-post-application-octet-stream: we need to support PEP 740 style attestations... tagging @woodruffw for thoughts :) See: 2ef077c

@ewdurbin
Copy link
Member Author

  • The content type handling is kind of wonky I think. Much like PEP 691 the client can use the Accept header to request a particular content type from the server, and the server includes the full version number in the meta.api-version key in the response. However, the requests appear to only be using the meta.api-version key. I think ideally we want to have requests using a correct Content-Type for their request (and the latest wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that the meta.api-version matches the Content-Type for major version.

See: 924a27d

@ewdurbin
Copy link
Member Author

  • The PEP specifies that the endpoint for PyPI will be https://upload.pypi.org/2.0. I would probably remove that from the PEP and let PyPI decide what its endpoint will be (in particular the /2.0 part is somewhat confusing given the PEP also uses conneg).

See: f8469cf

Co-authored-by: Donald Stufft <[email protected]>
@ewdurbin
Copy link
Member Author

To me, a lack of auth is a major hurdle to writing a client with Upload API 2.0: Without a way to pass any sorts of credentials, it's impossible to write a client that works universally the way twine and uv publish can currently talk to any registry; You can't write a CLI interface, as we don't know that the CLI arguments would be.

Again, this is very understandable, but it is likely to mean that this PEP would bake in HTTP Basic Auth with __token__ username, and a generic "token" in as the password. Maybe this is fine, but I certainly don't prefer it as a constraint on PyPI's implementation long term.

Copy link
Member

@warsaw warsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewdurbin - apologies for the long delay, but I've finally completed my review of the updated PR text.

TL;DR: LGTM, thank you!

I do have a number of small suggestions for wording and such, and a question or two, but overall, this looks great and I approve. Next, I'll go through and resolve any outstanding conversations, and I'll update the branch via the GH web UI. Then I think it's just a matter of you going through the suggestions and question, and then we can merge. We can talk off-line about posting to DPO after the PR lands.

@warsaw
Copy link
Member

warsaw commented Jul 11, 2025

I'm thinking e.g. about declaring all value enums and dicts as non-exhaustive, so we can add new info both on the server and client side without it being a breaking change. This allows let's say adding variant information (from the WheelNext project) to the POST upload field, or having additionally statuses in the future. (This is something I'd expect implementers to do anyway, but spelling it out helps).

Interesting use case 😄. Forward compatibility is different than allowing index-specific variations. I'd be more concerned about the latter introducing variations that aren't standardized, making evolution of the standard more difficult. Forward compatible evolution of the standard without a major version bump should be fine, although a minor version bump probably still makes sense, so clients know what they're interacting with.

Regarding the admonition against adding new upload mechanisms without a major version bump, with the above caveat in place, it makes sense to only require a major version bump if a standard mechanism is removed, although frankly I don't expect that to happen in practice.

For the session status, do I read it correctly that if the server says an upload is pending, the client has to poll until it is completed, there is no mechanism intended for subscribing to server updates?

Like a webhook? That could make sense, e.g. add a URL to the (upload & file-upload) session request payload when they are created. I think we'd have to evaluate a) security implications (e.g. we wouldn't want a package uploader to trigger push notifications to some unrelated service); b) complexity for implementations such as PyPI which don't AFAIK currently have infra to do that. Maybe for 2.0 we can just leave it at polling and think about push notifications for a future rev?

@warsaw
Copy link
Member

warsaw commented Jul 11, 2025

nit: Instead of valid-for, what about a valid-until with a timestamp?

I thought the same thing when I read this section, but didn't comment, so I'm glad you did! If this change is made, I'd recommend ISO 8601 UTC date time for consistency.

Co-authored-by: Ee Durbin <[email protected]>
@warsaw
Copy link
Member

warsaw commented Jul 11, 2025

Um, I committed the suggestion about not allowing server specific mechanisms to bump the major or minor versions. I hope I didn't mess up the rest of the text (GH always surprises me here). Looks like it did resolve the thread though.

@ewdurbin
Copy link
Member Author

To me, a lack of auth is a major hurdle to writing a client with Upload API 2.0: Without a way to pass any sorts of credentials, it's impossible to write a client that works universally the way twine and uv publish can currently talk to any registry; You can't write a CLI interface, as we don't know that the CLI arguments would be.

Again, this is very understandable, but it is likely to mean that this PEP would bake in HTTP Basic Auth with __token__ username, and a generic "token" in as the password. Maybe this is fine, but I certainly don't prefer it as a constraint on PyPI's implementation long term.

I've taken some time to consider this a little bit more before finalizing my proposed changes...

@konstin would specifying it as requiring the endpoint to implement RFC 7235 WWW-Authenticate header negotiation (and thus, the client must authenticate via an appropriate Authorization header) be acceptable? Or are you still advocating for something more concrete like a specific challenge/scheme (i.e. Bearer)

@ewdurbin
Copy link
Member Author

ewdurbin commented Jul 25, 2025

OK, I have completed all of the changes I am confident in and they are ready for review, @warsaw, @dstufft .


I have also prepared two PRs, that I am less confident in merging here:

I would appreciate input from @konstin on if these two change sets address the concerns previously raised, and input from @warsaw and @dstufft on if these changes would be acceptable to the pep.

@warsaw
Copy link
Member

warsaw commented Jul 25, 2025

OK, I have completed all of the changes I am confident in and they are ready for review, @warsaw, @dstufft .

I have also prepared two PRs, that I am less confident in merging here:

I would appreciate input from @konstin on if these two change sets address the concerns previously raised, and input from @warsaw and @dstufft on if these changes would be acceptable to the pep.

I really like both of these @ewdurbin. I've added some questions and suggestions.

@ewdurbin
Copy link
Member Author

On a little further consideration, I am on the fence about generic capabilities. I worry about the ability for them to really fracture upload 2.0 implementations across the ecosystem, particularly if they are truly generic rather than allowing for certain specific extensions and implementation.

The only real reason I would like to see them as part of this pep so that the pep does not have to own specifications on webhook style notifications.

@dstufft
Copy link
Member

dstufft commented Jul 25, 2025

I'm kind of the fence about the callback stuff? Like I see a use case for it, but I also feel like it's probably a lot of added complexity and it feels like the majority of use cases aren't going to be able to take advantage of it.

I suspect the vast bulk of publishing is done with something like twine, whether that's running in a CI system like GA or on a local developer's computer, both of which it's unlikely that they're going to be able to just spin up a web server attached to a specific port and assume that traffic is going to be able to reach it. So it would likely only be useful for dedicated platforms that handle publishing.

Personally I'd leave it out (at least for now) as a lot of extra complexity for little gain, and we can always add it at a later time if it turns out there's really a big desire for it. However, the proposed mechanism looks fine to me, so if folks really want it now I don't have any real problem with it as it stands.

@dstufft
Copy link
Member

dstufft commented Jul 25, 2025

Other than the callback comment above, everything in here looks good to me.

@ewdurbin
Copy link
Member Author

I had a quick call with Barry. My plan is to drop anything about callbacks and extensions via capabilities to an open question to be addressed in a future backwards compatible spec.

ewdurbin added 2 commits July 28, 2025 14:54
Add documentation about webhook notifications and capabilities extensions
that were discussed but deferred to avoid complexity and ecosystem
fragmentation during initial Upload 2.0 rollout.
@ewdurbin ewdurbin merged commit 8d87d02 into python:main Jul 28, 2025
5 checks passed
@konstin
Copy link
Contributor

konstin commented Jul 30, 2025

I would appreciate input from @konstin on if these two change sets address the concerns previously raised

Thanks for addressing both of those! I agree that the complexity for callbacks is too high here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants