diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index db2ef5fdb24..1bf63c693cc 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -1,6 +1,6 @@ PEP: 694 Title: Upload 2.0 API for Python Package Indexes -Author: Barry Warsaw , Donald Stufft +Author: Barry Warsaw , Donald Stufft , Ee Durbin Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879 Status: Draft Type: Standards Track @@ -14,22 +14,24 @@ Post-History: `27-Jun-2022 `__; +* "staging" a release, which can be used to test uploads before publicly publishing them, + without the need for `test.pypi.org `__; * artifacts which can be overwritten and replaced, until a session is published; -* asynchronous and "chunked", resumable file uploads, for more efficient use of network bandwidth; - * detailed status on the state of artifact uploads; * new project creation without requiring the uploading of an artifact. +* a protocol to extend the supported upload mechanisms in the future without requiring a full PEP; + these can be standardized and recommended for all indexes, or be index-specific; + Once this new upload API is adopted, the existing legacy API can be deprecated, however this PEP does not propose a deprecation schedule for the legacy API. @@ -49,10 +51,10 @@ In addition, there are a number of major issues with the legacy API: * It is fully synchronous, which forces requests to be held open both for the upload itself, and while the index processes the uploaded file to determine success or failure. -* It does not support any mechanism for resuming an upload. With the largest default file size on - PyPI being around 1GB in size, requiring the entire upload to complete successfully means - bandwidth is wasted when such uploads experience a network interruption while the request is in - progress. +* It does not support any mechanism for parallelizing or resuming an upload. With the largest + default file size on PyPI being around 1GB in size, requiring the entire upload to complete + successfully means bandwidth is wasted when such uploads experience a network interruption while + the request is in progress. * The atomic unit of operation is a single file. This is problematic when a release logically includes an sdist and multiple binary wheels, leading to race conditions where consumers get @@ -77,10 +79,13 @@ In addition, there are a number of major issues with the legacy API: * Creation of new projects requires the uploading of at least one file, leading to "stub" uploads to claim a project namespace. -The new upload API proposed in this PEP solves all of these problems, providing for a much more -flexible, bandwidth friendly approach, with better error reporting, a better release testing -experience, and atomic and simultaneous publishing of all release artifacts. - +The new upload API proposed in this PEP provides ways to solve all of these problems, +either directly or through an extensible approach, +allowing servers to implement features such as resumable and parallel uploads. +This upload API this PEP proposes provides +better error reporting, +a more robust release testing experience, +and atomic and simultaneous publishing of all release artifacts. Legacy API ========== @@ -96,8 +101,9 @@ The existing upload API lives at a base URL. For PyPI, that URL is currently ``https://upload.pypi.org/legacy/``. Clients performing uploads specify the API they want to call by adding an ``:action`` URL parameter with a value of ``file_upload``. [#fn-action]_ -The legacy API also has a ``protocol_version`` parameter, in theory allowing new versions of the API -to be defined. In practice this has never happened, and the value is always ``1``. +The legacy API also has a ``protocol_version`` parameter, +in theory allowing new versions of the API to be defined. +In practice this has never happened, and the value is always ``1``. Thus, the effective upload API on PyPI is: ``https://upload.pypi.org/legacy/?:action=file_upload&protocol_version=1``. @@ -108,8 +114,8 @@ Encoding The data to be submitted is submitted as a ``POST`` request with the content type of ``multipart/form-data``. This reflects the legacy API's historical nature, which was originally -designed not as an API, but rather as a web form on the initial PyPI implementation, with client code -written to programmatically submit that form. +designed not as an API, but rather as a web form on the initial PyPI implementation, +with client code written to programmatically submit that form. Content @@ -118,8 +124,8 @@ Content Roughly speaking, the metadata contained within the package is submitted as parts where the content disposition is ``form-data``, and the metadata key is the name of the field. The names of these various pieces of metadata are not documented, and they sometimes, but not always match the names -used in the ``METADATA`` files for package artifacts. The case rarely matches, and the ``form-data`` -to ``METADATA`` conversion is inconsistent. +used in the ``METADATA`` files for package artifacts. +The case rarely matches, and the ``form-data`` to ``METADATA`` conversion is inconsistent. The upload artifact file itself is sent as a ``application/octet-stream`` part with the name of ``content``, and if there is a PGP signature attached, then it will be included as a @@ -129,21 +135,20 @@ The upload artifact file itself is sent as a ``application/octet-stream`` part w Authentication -------------- -Upload authentication is also not standardized. On PyPI, authentication is through `API tokens -`__ or `Trusted Publisher (OpenID Connect) -`__. Other indexes may support different authentication -methods. +Upload authentication is also not standardized. + +PyPI uses HTTP Basic Authentication +with `API tokens `__ as the password +and the username ``__token__``. +`Trusted Publishers `__ +authenticate via OpenID Connect and receive short-lived API tokens +that are used in the same way. .. _spec: Upload 2.0 API Specification ============================ -This PEP draws inspiration from the `Resumable Uploads for HTTP `_ internet draft, -however there are significant differences. This is largely due to the unique nature of Python -package releases (i.e. metadata, multiple related artifacts, etc.), and the support for an upload -session and release stages. Where it makes sense to adopt details of the draft, this PEP does so. - This PEP traces the root cause of most of the issues with the existing API to be roughly two things: - The metadata is submitted alongside the file, rather than being parsed from the @@ -155,20 +160,86 @@ This PEP traces the root cause of most of the issues with the existing API to be To address these issues, this PEP proposes a multi-request workflow, which at a high level involves these steps: -#. Initiate an upload session, creating a release stage. -#. Upload the file(s) to that stage as part of the upload session. -#. Complete the upload session, publishing or discarding the stage. -#. Optionally check the status of an upload session. +#. Initiate an :ref:`Publishing Session `, creating a release stage. +#. Initiate :ref:`File Upload Session(s) ` to that stage + as part of the Publishing Session. +#. Negotiate the specific :ref:`File Upload Mechanism ` to use + between client and server. +#. Execute File Upload Mechanism for the File Upload Session(s) using the negotiated mechanism(s). +#. Complete the File Upload Session(s), marking them as completed or canceled. +#. Complete the Publishing Session, publishing or discarding the stage. +#. Optionally check the status of a Publishing Session. +.. _versioning: Versioning ---------- -This PEP uses the same ``MAJOR.MINOR`` versioning system as used in :pep:`691`, but it is otherwise -independently versioned. The legacy API is considered by this PEP to be version ``1.0``, but this -PEP does not modify the legacy API in any way. +This PEP uses the same ``MAJOR.MINOR`` versioning system as used in :pep:`691`, +but it is otherwise independently versioned. +The legacy API is considered by this PEP to be version ``1.0``, +but this PEP does not modify the legacy API in any way. + +The API proposed in this PEP therefore has the version number ``2.0``. + +Both major and minor version numbers of the Upload API +**MUST** only be changed through the PEP process. +Index operators and implementers **MUST NOT** advertise or implement +new API versions without an approved PEP. +This ensures consistency across all implementations +and prevents fragmentation of the ecosystem. + +Content Types +------------- + +Like :pep:`691`, this PEP proposes that all requests and responses from this upload API will have a +standard content type that describes what the content is, what version of the API it represents, +and what serialization format has been used. + +This standard request content type applies to all requests *except* for requests to execute +a File Upload Mechanism, which will be specified by the documentation for that mechanism. + +The structure of the ``Content-Type`` header for all other requests is: + +.. code-block:: text + + application/vnd.pypi.upload.$version+$format + +Since minor API version differences should never be disruptive, only the major version is included +in the content type; the version number is prefixed with a ``v``. + +The major API version specified in the ``.meta.api-version`` JSON key of client requests +**MUST** match the ``Content-Type`` header for major version. + +Unlike :pep:`691`, this PEP does not change the existing *legacy* ``1.0`` upload API in any way, +so servers are required to host the new API described in this PEP at a different endpoint than the +existing upload API. + +Since JSON is the only defined request format defined in this PEP, all non-file-upload requests +defined in this PEP **MUST** include a ``Content-Type`` header value of: + +- ``application/vnd.pypi.upload.v2+json``. -The API proposed in this PEP therefor has the version number ``2.0``. +Similar to :pep:`691`, this PEP also standardizes on using server-driven content negotiation to +allow clients to request different versions or serialization formats, +which includes the ``format`` part of the content type. +However, since this PEP expects the existing legacy ``1.0`` upload API +to exist at a different endpoint, +and this PEP currently only provides for JSON serialization, +this mechanism is not particularly useful. +Clients only have a single version and serialization they can request. +However clients **SHOULD** be prepared to handle content negotiation gracefully +in the case that additional formats or versions are added in the future. + +Servers **MUST NOT** advertise support for API versions beyond those defined in approved PEPs. +Any new versions or formats require standardization through a new PEP. + +Unless otherwise specified, all HTTP requests and responses in this document are assumed to include +the HTTP header: + +.. code-block:: text + + Content-Type: application/vnd.pypi.upload.v2+json Root Endpoint @@ -178,18 +249,74 @@ All URLs described here are relative to the "root endpoint", which may be locate the url structure of a domain. For example, the root endpoint could be ``https://upload.example.com/``, or ``https://example.com/upload/``. -Specifically for PyPI, this PEP proposes to implement the root endpoint at -``https://upload.pypi.org/2.0``. This root URL will be considered provisional while the feature is -being tested, and will be blessed as permanent after sufficient testing with live projects. +The choice of the root endpoint is left up to the index operator. -.. _session-create: +Authentication for Upload 2.0 API +---------------------------------- -Create an Upload Session -~~~~~~~~~~~~~~~~~~~~~~~~ +All endpoints in this specification **MUST** use standard HTTP authentication +mechanisms as defined in :rfc:`7235`. -A release starts by creating a new upload session. To create the session, a client submits a ``POST`` request -to the root URL, with a payload that looks like: +Authentication follows the standard HTTP pattern: + +- Servers use the ``WWW-Authenticate`` response header when authentication is required +- Clients provide credentials via the ``Authorization`` request header +- ``401 Unauthorized`` indicates missing or invalid authentication +- ``403 Forbidden`` indicates insufficient permissions + +The specific authentication schemes (e.g., Bearer, Basic, Digest) +are determined by the index operator. + + +.. _session-errors: + +Errors +------ + +All error responses that contain content look like: + +.. code-block:: json + + { + "meta": { + "api-version": "2.0" + }, + "message": "...", + "errors": [ + { + "source": "...", + "message": "..." + } + ] + } + +Besides the standard ``meta`` key, this has the following top level keys: + +``message`` + A singular message that encapsulates all errors that may have happened on this + request. + +``errors`` + An array of specific errors, each of which contains a ``source`` key, which is a string that + indicates what the source of the error is, and a ``message`` key for that specific error. + +The ``message`` and ``source`` strings do not have any specific meaning, and are intended for human +interpretation to aid in diagnosing underlying issue. + + +.. _publishing-session: + +Publishing Session +------------------ + +.. _publishing-session-create: + +Create a Publishing Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A release starts by creating a new Publishing Session. To create the session, a client submits a +``POST`` request to the root URL like: .. code-block:: json @@ -216,31 +343,34 @@ The request includes the following top-level keys: The version of the project that this session is attempting to add files to. ``nonce`` (**optional**) - An additional client-side string input to the :ref:`"session token" ` - algorithm. Details are provided below, but if this key is omitted, it is equivalent - to passing the empty string. + An additional client-side string input to the + :ref:`"Publishing Session Token" ` algorithm. + Details are provided below, but if this key is omitted, + it is equivalent to passing the empty string. Upon successful session creation, the server returns a ``201 Created`` response. If an error occurs, the appropriate ``4xx`` code will be returned, as described in the :ref:`session-errors` section. -If a session is created for a project which has no previous release, then the index **MAY** reserve -the project name before the session is published, however it **MUST NOT** be possible to navigate to -that project using the "regular" (i.e. :ref:`unstaged `) access protocols, *until* -the stage is published. If this first-release stage gets canceled, then the index **SHOULD** delete -the project record, as if it were never uploaded. +If a session is created for a project which has no previous release, +then the index **MAY** reserve the project name before the session is published, +however it **MUST NOT** be possible to navigate to that project using +the "regular" (i.e. :ref:`unstaged `) access protocols, +*until* the stage is published. +If this first-release stage gets canceled, +then the index **SHOULD** delete the project record, as if it were never uploaded. -The session is owned by the user that created it, and all subsequent requests **MUST** be performed -with the same credentials, otherwise a ``403 Forbidden`` will be returned on those subsequent -requests. +The session is owned by the user that created it, +and all subsequent requests **MUST** be performed with the same credentials, +otherwise a ``403 Forbidden`` will be returned on those subsequent requests. -.. _session-response: +.. _publishing-session-response: Response Body +++++++++++++ -The successful response includes the following JSON content: +The successful response includes the following content: .. code-block:: json @@ -253,8 +383,9 @@ The successful response includes the following JSON content: "upload": "...", "session": "...", }, + "mechanisms": ["http-post-bytes"], "session-token": "", - "valid-for": 604800, + "expires-at": "2025-08-01T12:00:00Z", "status": "pending", "files": {}, "notices": [ @@ -267,24 +398,27 @@ Besides the ``meta`` key, which has the same format as the request JSON, the suc the following keys: ``links`` - A dictionary mapping :ref:`keys to URLs ` related to this session, the details of - which are provided below. + A dictionary mapping :ref:`keys to URLs ` related to this session, + the details of which are provided below. + +``mechanisms`` + A list of file-upload mechanisms supported by the server, sorted in server-preferred order. + At least one value is required. ``session-token`` - If the index supports :ref:`previewing staged releases `, this key will contain - the unique :ref:`"session token" ` that can be provided to installers in order to - preview the staged release before it's published. If the index does *not* support stage - previewing, this key **MUST** be omitted. + If the index supports :ref:`previewing staged releases `, + this key will contain the unique :ref:`"session token" ` + that can be provided to installers in order to preview the staged release before it's published. + If the index does *not* support stage previewing, this key **MUST** be omitted. -``valid-for`` - An integer representing how long, in seconds, until the server itself will expire this session, +``expires-at`` + An ISO8601 formatted timestamp string representing when the server will expire this session, and thus all of its content, including any uploaded files and the URL links related to the - session. This value is roughly relative to the time at which the session was created or - :ref:`extended `. The session **SHOULD** live at least this much longer + session. The session **SHOULD** remain active until at least this time unless the client itself has canceled or published the session. Servers **MAY** choose to - *increase* this time, but should never *decrease* it, except naturally through the passage of - time. Clients can query the :ref:`session status ` to get time remaining in the - session. + extend this expiration time, but should never move it earlier. + Clients can query the :ref:`session status ` + to get the current expiration time of the session. ``status`` A string that contains one of ``pending``, ``published``, ``error``, or ``canceled``, @@ -292,7 +426,7 @@ the following keys: ``files`` A mapping containing the filenames that have been uploaded to this session, to a mapping - containing details about each :ref:`file referenced in this session `. + containing details about each :ref:`file referenced in this session `. ``notices`` An optional key that points to an array of human-readable informational notices that the server @@ -300,16 +434,16 @@ the following keys: to any particular file in the session. -.. _session-links: +.. _publishing-session-links: -Session Links -+++++++++++++ +Publishing Session Links +++++++++++++++++++++++++ For the ``links`` key in the success JSON, the following sub-keys are valid: ``upload`` - The endpoint session clients will use to initiate :ref:`uploads ` for each file to - be included in this session. + The endpoint session clients will use to initiate a :ref:`File Upload Session ` + for each file to be included in this session. ``stage`` The endpoint where this staged release can be :ref:`previewed ` prior to @@ -317,36 +451,39 @@ For the ``links`` key in the success JSON, the following sub-keys are valid: the index does not support previewing staged releases, this key **MUST** be omitted. ``session`` - The endpoint where actions for this session can be performed, including :ref:`publishing this - session `, :ref:`canceling and discarding the session `, - :ref:`querying the current session status `, and :ref:`requesting an extension - of the session lifetime ` (*if* the server supports it). + The endpoint where actions for this session can be performed, + including :ref:`publishing this session `, + :ref:`canceling and discarding the session `, + :ref:`querying the current session status `, + and :ref:`requesting an extension of the session lifetime ` + (*if* the server supports it). -.. _session-files: +.. _publishing-session-files: -Session Files -+++++++++++++ +Publishing Session Files +++++++++++++++++++++++++ The ``files`` key contains a mapping from the names of the files uploaded in this session to a sub-mapping with the following keys: ``status`` - A string with valid values ``partial``, ``pending``, ``complete``, and ``error``. If a file - upload has not seen an ``Upload-Complete: ?1`` header, then ``partial`` will be returned. If - ``Upload-Complete: ?1`` resulted in a ``202 Accepted``, then ``pending`` will be returned until - asynchronous processing of the last chunk and the full file has been completed. If a ``201 - Created`` was returned, or the last chunk processing is finished, ``complete`` will be returned. - If there was an error during upload, then clients should not assume the file is in any usable - state, ``error`` will be returned and it's best to :ref:`cancel or delete ` - the file and start over. This action would remove the file name from the ``files`` key of the - :ref:`session status response body `. + A string with valid values + ``pending``, ``processing``, ``complete``, ``error``, and ``canceled``. + If there was an error during upload, + then clients should not assume the file is in any usable state, + ``error`` will be returned and it's best to + :ref:`cancel or delete ` the file and start over. + This action would remove the file name from the ``files`` key of the + :ref:`session status response body `. ``link`` - The *absolute* URL that the client should use to reference this specific file. This URL is used - to retrieve, replace, or delete the :ref:`referenced file `. If a ``nonce`` was - provided, this URL **MUST** be obfuscated with a non-guessable token as described in the - :ref:`session token ` section. + The *absolute* URL that the client should use to reference this specific file. + This URL is used to retrieve, replace, or delete + the :ref:`referenced file `. + If a ``nonce`` was provided, this URL **MUST** be obfuscated + with a non-guessable token as described in the + :ref:`Publishing Session Token ` section. ``notices`` An optional key with similar format and semantics as the ``notices`` session key, except that @@ -354,21 +491,19 @@ sub-mapping with the following keys: If a second session is created for the same name-version pair while a session for that pair is in the ``pending`` state, then the server **MUST** return the JSON status response for the already -existing session, along with the ``200 Ok`` status code rather than creating a new, empty session. +existing session, along with the ``200 OK`` status code rather than creating a new, empty session. -.. _file-uploads: +.. _publishing-session-completion: -File Upload -~~~~~~~~~~~ +Complete a Publishing Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -After creating the session, the ``upload`` endpoint from the response's :ref:`session links -` mapping is used to begin the upload of new files into that session. Clients -**MUST** use the provided ``upload`` URL and **MUST NOT** assume there is any pattern or commonality -to those URLs from one session to the next. +To complete a session and publish the files that have been included in it, a client issues a +``POST`` request to the ``session`` :ref:`link ` +given in the :ref:`session creation response body `. -To initiate a file upload, a client first sends a ``POST`` request to the ``upload`` URL. The -request body has the following JSON format: +The request looks like: .. code-block:: json @@ -376,275 +511,349 @@ request body has the following JSON format: "meta": { "api-version": "2.0" }, - "filename": "foo-1.0.tar.gz", - "size": 1000, - "hashes": {"sha256": "...", "blake2b": "..."}, - "metadata": "..." + "action": "publish", } -Besides the standard ``meta`` key, the request JSON has the following additional keys: +If the server is able to immediately complete the Publishing Session, it may do so and return a +``201 Created`` response. If it is unable to immediately complete the Publishing Session +(for instance, if it needs to do validation that may take longer than reasonable in a single HTTP +request), then it may return a ``202 Accepted`` response. -``filename`` (**required**) - The name of the file being uploaded. +In either case, the server should include a ``Location`` header pointing back to +the Publishing Session status URL, +and if the server returned a ``202 Accepted``, +the client may poll that URL to watch for the status to change. -``size`` (**required**) - The size in bytes of the file being uploaded. +If an error occurs, the appropriate ``4xx`` code should be returned, as described in the +:ref:`session-errors` section. -``hashes`` (**required**) - A mapping of hash names to hex-encoded digests. Each of these digests are the checksums of the - file being uploaded when hashed by the algorithm identified in the name. +.. _publishing-session-cancellation: - By default, any hash algorithm available in `hashlib - `_ can be used as a key for the hashes - dictionary [#fn-hash]_. At least one secure algorithm from ``hashlib.algorithms_guaranteed`` - **MUST** always be included. This PEP specifically recommends ``sha256``. +Cancellation +~~~~~~~~~~~~ - Multiple hashes may be passed at a time, but all hashes provided **MUST** be valid for the file. +To cancel a Publishing Session, a client issues a ``DELETE`` request to +the ``session`` :ref:`link ` +given in the :ref:`session creation response body `. +The server then marks the session as canceled, and **SHOULD** purge any data that was uploaded +as part of that session. +Future attempts to access that session URL or any of the Publishing Session URLs +**MUST** return a ``404 Not Found``. -``metadata`` (**optional**) - If given, this is a string value containing the file's `core metadata - `_. +To prevent dangling sessions, servers may also choose to cancel timed-out sessions on their own +accord. It is recommended that servers expunge their sessions after no less than a week, but each +server may choose their own schedule. Servers **MAY** support client-directed :ref:`session +extensions `. -Servers **MAY** use the data provided in this request to do some sanity checking prior to allowing -the file to be uploaded. These checks may include, but are not limited to: -- checking if the ``filename`` already exists in a published release; +.. _publishing-session-token: -- checking if the ``size`` would exceed any project or file quota; +Publishing Session Token +~~~~~~~~~~~~~~~~~~~~~~~~ -- checking if the contents of the ``metadata``, if provided, are valid. +When creating a Publishing Session, clients can provide a ``nonce`` in the +:ref:`initial session creation request `. +This nonce is a string with arbitrary content. The ``nonce`` is +optional, and if omitted, is equivalent to providing an empty string. -If the server determines that upload should proceed, it will return a ``201 Created`` response, with -an empty body, and a ``Location`` header pointing to the URL that the file content should be -uploaded to. The :ref:`status ` of the session will also include the filename in -the ``files`` mapping, with the above ``Location`` URL included in under the ``link`` sub-key. -If the server determines the upload cannot proceed, it **MUST** return a ``409 Conflict``. The -server **MAY** allow parallel uploads of files, but is not required to. +In order to support previewing of staged uploads, the package ``name`` and ``version``, along with +this ``nonce`` are used as input into a hashing algorithm to produce a unique "session token". +This session token is valid for the life of the session +(i.e., until it is completed, either by cancellation or publishing), +and can be provided to supporting installers to gain access to the staged release. + +The use of the ``nonce`` allows clients to decide whether they want to +obscure the visibility of their staged releases or not, +and there can be good reasons for either choice. +For example, if a CI system wants to upload some wheels for a new release, +and wants to allow independent validation of a stage before it's published, +the client may opt for not including a nonce. +On the other hand, if a client would like to pre-seed a release which it publishes atomically +at the time of a public announcement, +that client will likely opt for providing a nonce. +The `SHA256 algorithm `_ is used to +turn these inputs into a unique token, in the order ``name``, ``version``, ``nonce``, using the +following Python code as an example: -.. IMPORTANT:: +.. code-block:: python - The `IETF draft `_ calls this the URL of the `upload resource - `_, and this PEP uses that nomenclature as well. + from hashlib import sha256 -.. _ietf-upload-resource: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-creation-2 + def gentoken(name: bytes, version: bytes, nonce: bytes = b''): + h = sha256() + h.update(name) + h.update(version) + h.update(nonce) + return h.hexdigest() +It should be evident that if no ``nonce`` is provided in the +:ref:`session creation request `, +then the session token is easily guessable from the package name and version number alone. +Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) +if they want to allow previewing from anybody without access to the session token. +By providing a non-empty ``nonce``, +clients can elect for security-through-obscurity, +but this does not protect staged files behind any kind of authentication. -.. _upload-contents: -Upload File Contents -++++++++++++++++++++ +File Upload Session +------------------- -The actual file contents are uploaded by issuing a ``POST`` request to the upload resource URL -[#fn-location]_. The client may either upload the entire file in a single request, or it may opt -for "chunked" upload where the file contents are split into multiple requests, as described below. +.. _file-upload-session: -.. IMPORTANT:: +Create a File Upload Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The protocol defined in this PEP differs from the `IETF draft `_ in a few ways: +After creating a Publishing Session, the ``upload`` endpoint from the response's +:ref:`session links ` mapping +is used to begin the upload of new files into that session. +Clients **MUST** use the provided ``upload`` URL and +**MUST NOT** assume there is any pattern or commonality to those URLs from one session to the next. - * For chunked uploads, the `second and subsequent chunks `_ are uploaded - using a ``POST`` request instead of ``PATCH`` requests. Similarly, this PEP uses - ``application/octet-stream`` for the ``Content-Type`` headers for all chunks. +To initiate a file upload, a client first sends a ``POST`` request to the ``upload`` URL. +The request looks like: - * No ``Upload-Draft-Interop-Version`` header is required. +.. code-block:: json - * Some of the server responses are different. + { + "meta": { + "api-version": "2.0" + }, + "filename": "foo-1.0.tar.gz", + "size": 1000, + "hashes": {"sha256": "...", "blake2b": "..."}, + "metadata": "...", + "mechanism": "http-post-bytes" + } -.. _ietf-upload-append: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html#name-upload-append-2 +Besides the standard ``meta`` key, the request JSON has the following additional keys: -When uploading the entire file in a single request, the request **MUST** include the following -headers (e.g. for a 100,000 byte file): +``filename`` (**required**) + The name of the file being uploaded. -.. code-block:: email +``size`` (**required**) + The size in bytes of the file being uploaded. - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Length: 100000 - Upload-Complete: ?1 +``hashes`` (**required**) + A mapping of hash names to hex-encoded digests. Each of these digests are the checksums of the + file being uploaded when hashed by the algorithm identified in the name. -The body of this request contains all 100,000 bytes of the unencoded raw binary data. + By default, any hash algorithm available in `hashlib + `_ can be used as a key for the hashes + dictionary [#fn-hash]_. At least one secure algorithm from ``hashlib.algorithms_guaranteed`` + **MUST** always be included. This PEP specifically recommends ``sha256``. -``Content-Length`` - The number of file bytes contained in the body of *this* request. + Multiple hashes may be passed at a time, but all hashes provided **MUST** be valid for the file. -``Content-Type`` - **MUST** be ``application/octet-stream``. +``mechanism`` (**required**) + The file-upload mechanisms the client intends to use for this file. + This mechanism **SHOULD** be chosen from the list of mechanisms advertised in the + :ref:`Publishing Session response body `. + A client **MAY** send a mechanism that is not advertised in cases where server operators have + documented a new or upcoming mechanism that is available for use on a "pre-release" basis. -``Upload-Length`` - Indicates the total number of bytes that will be uploaded for this file. For single-request - uploads this will always be equal to ``Content-Length``, but these values will likely differ for - chunked uploads. This value **MUST** equal the number of bytes given in the ``size`` field of - the file upload initiation request. +``metadata`` (**optional**) + If given, this is a string value containing the file's `core metadata + `_. -``Upload-Complete`` - A flag indicating whether more chunks are coming for this file. For single-request uploads, the - value of this header **MUST** be ``?1``. +Servers **MAY** use the data provided in this request to do some sanity checking prior to allowing +the file to be uploaded. These checks may include, but are not limited to: -If the upload completes successfully, the server **MUST** respond with a ``201 Created`` status. -The response body has no content. +- checking if the ``filename`` already exists in a published release; -If this single-request upload fails, the entire file must be resent in another single HTTP request. -This is the recommended, preferred format for file uploads since fewer requests are required. +- checking if the ``size`` would exceed any project or file quota; -As an example, if the client was to upload a 100,000 byte file, the headers would look like: +- checking if the contents of the ``metadata``, if provided, are valid. -.. code-block:: email +If the server determines that upload should proceed, it will return a ``202 Accepted`` response, +with the response body below. +The :ref:`status ` of the session will also include +the filename in the ``files`` mapping. +If the server cannot proceed with an upload because +the ``mechanism`` supplied by the client is not supported +it **MUST** return a ``422 Unprocessable Entity``. +If the server determines the upload cannot proceed, +it **MUST** return a ``409 Conflict``. +The server **MAY** allow parallel uploads of files, but is not required to. - Content-Length: 100000 - Content-Type: application/octet-stream - Upload-Length: 100000 - Upload-Complete: ?1 +.. _file-upload-session-response: -Clients can opt to upload the file in multiple chunks. Because the upload resource URL provided in -the metadata response will be unique per file, clients **MUST** use the given upload resource URL -for all chunks. Clients upload file chunks by sending multiple ``POST`` requests to this URL, with -one request per chunk. +Response Body ++++++++++++++ -For chunked uploads, the ``Content-Length`` is equal to the size in bytes of the chunk that is -currently being sent. The client **MUST** include a ``Upload-Offset`` header which indicates the -byte offset that the content included in this chunk's request starts at, and an ``Upload-Complete`` -header with the value ``?0``. For the first chunk, the ``Upload-Offset`` header **MUST** be set to -``0``. As with single-request uploads, the ``Content-Type`` header is ``application/octet-stream`` -and the body is the raw, unencoded bytes of the chunk. +The successful response includes the following: -For example, if uploading a 100,000 byte file in 1000 byte chunks, the first chunk's request headers -would be: +.. code-block:: json -.. code-block:: email + { + "meta": { + "api-version": "2.0" + }, + "links": { + "publishing-session": "...", + "file-upload-session": "..." + }, + "status": "pending", + "expires-at": "2025-08-01T13:00:00Z", + "mechanism": { + "identifier": "http-post-bytes", + "file_url": "...", + "attestations_url": "..." + } + } - Content-Length: 1000 - Content-Type: application/octet-stream - Upload-Offset: 0 - Upload-Length: 100000 - Upload-Complete: ?0 +A ``Retry-After`` response header **MUST** be present +to indicate to clients when they should next poll for an updated status. -For the second chunk representing bytes 1000 through 1999, include the following headers: +Besides the ``meta`` key, which has the same format as the request JSON, the success response has +the following keys: -.. code-block:: email +``links`` + A dictionary mapping :ref:`keys to URLs ` related to this session, + the details of which are provided below. - Content-Length: 1000 - Content-Type: application/octet-stream - Upload-Offset: 1000 - Upload-Length: 100000 - Upload-Complete: ?0 +``status`` + A string with valid values ``pending``, ``processing``, ``complete``, ``error``, and ``canceled`` + indicating the current state of the File Upload Session. -These requests would continue sequentially until the last chunk is ready to be uploaded. +``expires-at`` + An ISO8601 formatted timestamp string representing when the server will expire this File Upload Session. + The session **SHOULD** remain active until at least this time + unless the client cancels or completes it. Servers **MAY** choose to + extend this expiration time, but should never move it earlier. -For each successful chunk, the server **MUST** respond with a ``202 Accepted`` header, except for -the final chunk, which **MUST** be either: +``mechanism`` + A mapping containing the necessary details for the supported mechanism + as negotiated by the client and server. + This mapping **MUST** contain a key ``identifier`` which maps to + the identifier string for the chosen File Upload Mechanism. -* ``201 Created`` if the server accepts and processes the last chunk synchronously, completing the - file upload. -* ``202 Accepted`` if the server accepts the last chunk, but must process it asynchronously. In - this case, the client should query the :ref:`session status ` periodically until - the uploaded :ref:`file status ` transitions to ``complete``. +.. _file-upload-session-links: -The final chunk of data **MUST** include the ``Upload-Complete: ?1`` header, since at that point the -entire file has been uploaded. +File Upload Session Links ++++++++++++++++++++++++++ -With both chunked and non-chunked uploads, once completed successfully, the file **MUST NOT** be -publicly visible in the repository, but merely staged until the upload session is :ref:`completed -`. If the server supports :ref:`previews `, the file **MUST** be -visible at the ``stage`` :ref:`URL `. Partially uploaded chunked files **SHOULD -NOT** be visible at the ``stage`` URL. +For the ``links`` key in the success JSON, the following sub-keys are valid: -The following constraints are placed on uploads regardless of whether they are single chunk or -multiple chunks: +``publishing-session`` + The endpoint where actions for the parent Publishing Session can be performed. -- A client **MUST NOT** perform multiple ``POST`` requests in parallel for the same file to avoid - race conditions and data loss or corruption. +``file-upload-session`` + The endpoint where actions for this file-upload-session can be performed. + including :ref:`canceling and discarding the File Upload Session `, + :ref:`querying the current File Upload Session status `, + and :ref:`requesting an extension of the File Upload Session lifetime ` + (*if* the server supports it). -- If the offset provided in ``Upload-Offset`` is not ``0`` and does not correctly specify the byte - offset of the next chunk in an incomplete upload, then the server **MUST** respond with a ``409 - Conflict``. This means that a client **MUST NOT** upload chunks out of order. +.. _file-upload-session-completion: -- Once a file upload has completed successfully, you may initiate another upload for that file, - which **once completed**, will replace that file. This is possible until the entire session is - completed, at which point no further file uploads (either creating or replacing a session file) - are accepted. I.e. once a session is published, the files included in that release are immutable - [#fn-immutable]_. +Complete a File Upload Session +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To complete a File Upload Session, which indicates that the file upload mechanism has been executed +and did not produce an error, a client issues a ``POST`` to the ``file-upload-session`` link in the +File Upload Session creation response body. -Resume an Upload -++++++++++++++++ +The requests looks like: -To resume an upload, you first have to know how much of the file's contents the server has already -received. If this is not already known, a client can make a ``HEAD`` request to the upload resource -URL. +.. code-block:: json -The server **MUST** respond with a ``204 No Content`` response, with an ``Upload-Offset`` header -that indicates what offset the client should continue uploading from. If the server has not received -any data, then this would be ``0``, if it has received 1007 bytes then it would be ``1007``. For -this example, the full response headers would look like: + { + "meta": { + "api-version": "2.0" + }, + "action": "complete", + } -.. code-block:: email +If the server is able to immediately complete the File Upload Session, it may do so and return a +``201 Created`` response and set the status of the File Upload Session to ``complete``. +If it is unable to immediately complete the File Upload Session +(for instance, if it needs to do validation that may take longer than reasonable in a single HTTP +request), then it may return a ``202 Accepted`` response +and set the status of the File Upload Session to ``processing``. - Upload-Offset: 1007 - Upload-Complete: ?0 - Cache-Control: no-store +In either case, the server should include a ``Location`` header pointing back to the File Upload +Session status URL. +Servers **MUST** allow clients to poll the File Upload Session status URL +to watch for the status to change. +If the server responds with a ``202 Accepted``, +clients may poll the File Upload Session status URL to watch for the status to change. +Clients **SHOULD** respect the ``Retry-After`` header value +of the File Upload Session status response. -Once the client has retrieved the offset that they need to start from, they can upload the rest of -the file as described above, either in a single request containing all of the remaining bytes, or in -multiple chunks as per the above protocol. +If an error occurs, the appropriate ``4xx`` code should be returned, as described in the +:ref:`session-errors` section. -.. _cancel-an-upload: +.. _file-upload-session-cancelation: -Canceling and Deleting File Uploads -+++++++++++++++++++++++++++++++++++ +Cancellation and Deletion +~~~~~~~~~~~~~~~~~~~~~~~~~ -A client can cancel an in-progress upload for a file, or delete a file that has been completely -uploaded. In both cases, the client performs this by issuing a ``DELETE`` request to the upload -resource URL of the file they want to delete. +A client can cancel an in-progress File Upload Session, or delete a file that has been +completely uploaded. In both cases, the client performs this by issuing a ``DELETE`` request to +the File Upload Session URL of the file they want to delete. -A successful deletion request **MUST** response with a ``204 No Content``. +A successful deletion request **MUST** respond with a ``204 No Content``. -Once canceled or deleted, a client **MUST NOT** assume that the previous upload resource URL can be reused. +Once canceled or deleted, a client **MUST NOT** assume that +the previous File Upload Session resource +or associated file upload mechanisms +can be reused. Replacing a Partially or Fully Uploaded File -++++++++++++++++++++++++++++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To replace a session file, the file upload **MUST** have been previously completed, canceled, or deleted. It is not possible to replace a file if the upload for that file is in-progress. -To replace a session file, clients should :ref:`cancel and delete the in-progress upload -` by issuing a ``DELETE`` to the upload resource URL for the file they want to -replace. After this, the new file upload can be initiated by beginning the entire :ref:`file upload -` sequence over again. This means providing the metadata request again to retrieve a -new upload resource URL. Client **MUST NOT** assume that the previous upload resource URL can be -reused after deletion. +To replace a session file, clients should +:ref:`cancel and delete the in-progress upload ` by +issuing a ``DELETE`` to the upload resource URL for the file they want to replace. +After this, the new file upload can be initiated by beginning +the entire :ref:`file upload ` sequence over again. +This means providing the metadata request again to retrieve a new upload resource URL. +Clients **MUST NOT** assume that the previous upload resource URL can be reused after deletion. .. _session-status: Session Status -~~~~~~~~~~~~~~ +-------------- -At any time, a client can query the status of the session by issuing a ``GET`` request to the -``session`` :ref:`link ` given in the :ref:`session creation response body -`. +At any time, a client can query the status of a session by issuing a ``GET`` request to the +``publishing-session`` :ref:`link ` +or ``file-upload-session`` :ref:`link ` +given in the :ref:`session creation response body ` +or :ref:`File Upload Session creation response body `, +respectively. -The server will respond to this ``GET`` request with the same :ref:`response ` -that they got when they initially created the upload session, except with any changes to ``status``, -``valid-for``, or ``files`` reflected. +The server will respond to this ``GET`` request with the same +:ref:`Publishing Session creation response body ` +or :ref:`File Upload Session creation response body `, +that they got when they initially created the Publishing Session or File Upload Session, +except with any changes to ``status``, ``expires-at``, or ``files`` reflected. .. _session-extension: Session Extension -~~~~~~~~~~~~~~~~~ +----------------- Servers **MAY** allow clients to extend sessions, but the overall lifetime and number of extensions allowed is left to the server. To extend a session, a client issues a ``POST`` request to the -``session`` :ref:`link ` given in the :ref:`session creation response body -`. +``publishing-session`` :ref:`link ` +or ``file-upload-session`` :ref:`link ` +given in the :ref:`Publishing Session creation response body ` +or :ref:`File Upload Session creation response body `, +respectively. -The JSON body of this request looks like: +The request looks like: .. code-block:: json @@ -652,227 +861,145 @@ The JSON body of this request looks like: "meta": { "api-version": "2.0" }, - ":action": "extend", + "action": "extend", "extend-for": 3600 } The number of seconds specified is just a suggestion to the server for the number of additional seconds to extend the current session. For example, if the client wants to extend the current session for another hour, ``extend-for`` would be ``3600``. Upon successful extension, the server -will respond with the same :ref:`response ` that they got when they initially -created the upload session, except with any changes to ``status``, ``valid-for``, or ``files`` -reflected. +will respond with the same +:ref:`Publishing Session creation response body ` +or :ref:`File Upload Session creation response body `, +that they got when they initially created the Publishing Session or File Upload Session, +except with any changes to ``status``, ``expires-at``, or ``files`` reflected. If the server refuses to extend the session for the requested number of seconds, it still returns a -success response, and the ``valid-for`` key will simply include the number of seconds remaining in -the current session. - - -.. _session-cancellation: - -Session Cancellation -~~~~~~~~~~~~~~~~~~~~ - -To cancel an entire session, a client issues a ``DELETE`` request to the ``session`` :ref:`link -` given in the :ref:`session creation response body `. The server -then marks the session as canceled, and **SHOULD** purge any data that was uploaded as part of that -session. Future attempts to access that session URL or any of the upload session URLs **MUST** -return a ``404 Not Found``. - -To prevent dangling sessions, servers may also choose to cancel timed-out sessions on their own -accord. It is recommended that servers expunge their sessions after no less than a week, but each -server may choose their own schedule. Servers **MAY** support client-directed :ref:`session -extensions `. - - -.. _publish-session: - -Session Completion -~~~~~~~~~~~~~~~~~~ - -To complete a session and publish the files that have been included in it, a client issues a -``POST`` request to the ``session`` :ref:`link ` given in the :ref:`session creation -response body `. - -The JSON body of this request looks like: - -.. code-block:: json - - { - "meta": { - "api-version": "2.0" - }, - ":action": "publish", - } - - -If the server is able to immediately complete the session, it may do so and return a ``201 Created`` -response. If it is unable to immediately complete the session (for instance, if it needs to do -processing that may take longer than reasonable in a single HTTP request), then it may return a -``202 Accepted`` response. - -In either case, the server should include a ``Location`` header pointing back to the session status -URL, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the -status to change. - -If a session is published that has no staged files, the operation is effectively a no-op, except -where a new project name is being reserved. In this case, the new project is created, reserved, and -owned by the user that created the session. - -If an error occurs, the appropriate ``4xx`` code should be returned, as described in the -:ref:`session-errors` section. - - -.. _session-token: - -Session Token -~~~~~~~~~~~~~ - -When creating a session, clients can provide a ``nonce`` in the :ref:`initial session creation -request ` . This nonce is a string with arbitrary content. The ``nonce`` is -optional, and if omitted, is equivalent to providing an empty string. - -In order to support previewing of staged uploads, the package ``name`` and ``version``, along with -this ``nonce`` are used as input into a hashing algorithm to produce a unique "session token". This -session token is valid for the life of the session (i.e., until it is completed, either by -cancellation or publishing), and can be provided to supporting installers to gain access to the -staged release. - -The use of the ``nonce`` allows clients to decide whether they want to obscure the visibility of -their staged releases or not, and there can be good reasons for either choice. For example, if a CI -system wants to upload some wheels for a new release, and wants to allow independent validation of a -stage before it's published, the client may opt for not including a nonce. On the other hand, if a -client would like to pre-seed a release which it publishes atomically at the time of a public -announcement, that client will likely opt for providing a nonce. - -The `SHA256 algorithm `_ is used to -turn these inputs into a unique token, in the order ``name``, ``version``, ``nonce``, using the -following Python code as an example: - -.. code-block:: python - - from hashlib import sha256 - - def gentoken(name: bytes, version: bytes, nonce: bytes = b''): - h = sha256() - h.update(name) - h.update(version) - h.update(nonce) - return h.hexdigest() - -It should be evident that if no ``nonce`` is provided in the :ref:`session creation request -`, then the preview token is easily guessable from the package name and version -number alone. Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if -they want to allow previewing from anybody without access to the preview token. By providing a -non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not protect -staged files behind any kind of authentication. - +success response, and the ``expires-at`` key will simply reflect the current expiration time of +the session. .. _staged-preview: Stage Previews -~~~~~~~~~~~~~~ +-------------- The ability to preview staged releases before they are published is an important feature of this PEP, enabling an additional level of last-mile testing before the release is available to the public. Indexes **MAY** provide this functionality through the URL provided in the ``stage`` -sub-key of the :ref:`links key ` returned when the session is created. The ``stage`` -URL can be passed to installers such as ``pip`` by setting the `--extra-index-url +sub-key of the :ref:`links key ` returned when +the Publishing Session is created. +The ``stage`` URL can be passed to installers such as ``pip`` by setting the `--extra-index-url `__ flag to this value. Multiple stages can even be previewed by repeating this flag with multiple values. -In the future, it may be valuable to include something like a ``Stage-Token`` header to the `Simple -Repository API `_ -requests or the :pep:`691` JSON-based Simple API, with the value from the ``session-token`` sub-key -of the JSON response to the session creation request. Multiple ``Stage-Token`` headers could be -allowed, and installers could support enabling stage previews by adding a ``--staged `` or -similarly named option to set the ``Stage-Token`` header at the command line. This feature is not -currently support, nor proposed by this PEP, though it could be proposed by a separate PEP in the -future. - -In either case, the index will return views that expose the staged releases to the installer tool, +If supported, the index will return views that expose the staged releases to the installer tool, making them available to download and install into virtual environments built for that last-mile -testing. The former option allows for existing installers to preview staged releases with no -changes, although perhaps in a less user-friendly way. The latter option can be a better user -experience, but the details of this are left to installer tool maintainers. +testing. This option allows existing installers to preview staged releases with no +changes to the installer tool required. +The details of this user experience are left to installer tool maintainers. -.. _session-errors: +.. _file-upload-mechanisms: -Errors ------- +File Upload Mechanisms +---------------------- -All error responses that contain content will have a body that looks like: +Servers **MUST** implement :ref:`required file upload mechanisms `. +Such mechanisms serve as a fallback if no server specific implementations exist. -.. code-block:: json +Each major version of the Upload API **MUST** specify at least one required File Upload Mechanism. - { - "meta": { - "api-version": "2.0" - }, - "message": "...", - "errors": [ - { - "source": "...", - "message": "..." - } - ] - } +New required mechanisms **MUST NOT** be added +and existing required mechanisms **MUST NOT** be removed +without an update to the :ref:`major version `. +Any server-specific or experimental mechanisms added or removed +**MUST NOT** change the major or minor version number of this specification. -Besides the standard ``meta`` key, this has the following top level keys: +.. _required-file-upload-mechanisms: -``message`` - A singular message that encapsulates all errors that may have happened on this - request. +Required File Upload Mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``errors`` - An array of specific errors, each of which contains a ``source`` key, which is a string that - indicates what the source of the error is, and a ``message`` key for that specific error. +``http-post-bytes`` ++++++++++++++++++++ -The ``message`` and ``source`` strings do not have any specific meaning, and are intended for human -interpretation to aid in diagnosing underlying issue. +Upload API version 2.0 compliant servers **MUST** support the ``http-post-bytes`` mechanism. +This mechanism **MUST** use the same authentication scheme as +the rest of the Upload 2.0 protocol endpoints. -Content Types -------------- +A client executes this mechanism by submitting a ``POST`` request to the ``file_url`` +returned in the ``http-post-bytes`` map of the ``mechanism`` map of the +:ref:`File Upload Session creation response body ` like: -Like :pep:`691`, this PEP proposes that all requests and responses from this upload API will have a -standard content type that describes what the content is, what version of the API it represents, and -what serialization format has been used. +.. code-block:: text -This standard request content type applies to all requests *except* for :ref:`file upload requests -` which, since they contain only binary data, is always ``application/octet-stream``. + Content-Type: application/octet-stream -The structure of the ``Content-Type`` header for all other requests is: + + +Servers **MAY** support uploading of digital attestations for files (see :pep:`740`). +This support will be indicated by inclusion of an ``attestations_url`` key in the +``http-post-bytes`` map of the ``mechanism`` map of the +:ref:`File Upload Session creation response body `. +Attestations **MUST** be uploaded to the ``attestations_url`` before +:ref:`File Upload Session completion `. + +To upload an attestation, a client submits a ``POST`` request to the ``attestations_url`` +containing a JSON array of :pep:`attestation objects <740#attestation-objects>` like: .. code-block:: text - application/vnd.pypi.upload.$version+$format + Content-Type: application/json -Since minor API version differences should never be disruptive, only the major version is included -in the content type; the version number is prefixed with a ``v``. + [{"version": 1, "verification_material": {...}, "envelope": {...}},...] -Unlike :pep:`691`, this PEP does not change the existing *legacy* ``1.0`` upload API in any way, so -servers are required to host the new API described in this PEP at a different endpoint than the -existing upload API. -Since JSON is the only defined request format defined in this PEP, all non-file-upload requests -defined in this PEP **MUST** include a ``Content-Type`` header value of: +.. _server-specific-file-upload-mechanisms: -- ``application/vnd.pypi.upload.v2+json``. +Server Specific File Upload Mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As with :pep:`691`, a special "meta" version is supported named ``latest``, the purpose of which is -to allow clients to request the latest version implemented by the server, without having to know -ahead of time what that version is. It is recommended however, that clients be explicit about what -versions they support. +A given server **MAY** implement an arbitrary number of server specific mechanisms +and is responsible for documenting their usage. -Similar to :pep:`691`, this PEP also standardizes on using server-driven content negotiation to -allow clients to request different versions or serialization formats, which includes the ``format`` -part of the content type. However, since this PEP expects the existing legacy ``1.0`` upload API to -exist at a different endpoint, and this PEP currently only provides for JSON serialization, this -mechanism is not particularly useful. Clients only have a single version and serialization they can -request. However clients **SHOULD** be prepared to handle content negotiation gracefully in the case -that additional formats or versions are added in the future. +A server specific implementation file upload mechanism identifier has three parts: + +.. code-block:: text + + -- + +Server specific implementations **MUST** use ``vnd`` as their ``prefix``. +The ``operator identifier`` **SHOULD** clearly identify the server operator, +be unique from other well known indexes, +and contain only alphanumeric characters ``[a-z0-9]``. +The ``implementation identifier`` **SHOULD** concisely describe the underlying implementation +and contain only alphanumeric characters ``[a-z0-9]`` and ``-``. + +When server operators need to make breaking changes to their upload mechanisms, +they **SHOULD** create a new mechanism identifier rather than modifying the existing one. +The recommended pattern is to append a version suffix like ``-v1``, ``-v2``, etc. +to the implementation identifier. +This allows clients to explicitly opt into new versions while maintaining +backward compatibility with existing clients. + +For example: + +====================================== ================ ========================================================================= +File Upload Mechanism string Server Operator Mechanism description +====================================== ================ ========================================================================= +``vnd-pypi-s3multipart-presigned`` PyPI S3 multipart upload via pre-signed URL +``vnd-pypi-s3multipart-presigned-v2`` PyPI S3 multipart upload via pre-signed URL version 2 +``vnd-pypi-http-fetch`` PyPI File delivered by instructing server to fetch from a URL via HTTP request +``vnd-acmecorp-http-fetch`` Acme Corp File delivered by instructing server to fetch from a URL via HTTP request +``vnd-acmecorp-postal`` Acme Corp File delivered via postal mail +``vnd-widgetinc-stream-v1`` Widget Inc. Streaming upload protocol version 1 +``vnd-widgetinc-stream-v2`` Widget Inc. Streaming upload protocol version 2 +``vnd-madscience-quantumentanglement`` Mad Science Labs Upload via quantum entanglement +====================================== ================ ========================================================================= + +If a server intends to precisely match the behavior of another server's implementation, it **MAY** respond +with that implementation's file upload mechanism name. FAQ @@ -884,38 +1011,8 @@ Does this mean PyPI is planning to drop support for the existing upload API? At this time PyPI does not have any specific plans to drop support for the existing upload API. Unlike with :pep:`691` there are significant benefits to doing so, so it is likely that support for -the legacy upload API to be (responsibly) deprecated and removed at some point in the future. Such -future deprecation planning is explicitly out of scope for *this* PEP. - - -Is this Resumable Upload protocol based on anything? ----------------------------------------------------- - -Yes! - -It's actually based on the protocol specified in an `active internet draft `_, where the -authors took what they learned implementing `tus `_ to provide the idea of -resumable uploads in a wholly generic, standards based way. - -.. _ietf-draft: https://www.ietf.org/archive/id/draft-ietf-httpbis-resumable-upload-05.html - -This PEP deviates from that spec in several ways, as described in the body of the proposal. This -decision was made for a few reasons: - -- The ``104 Upload Resumption Supported`` is the only part of that draft which does not rely - entirely on things that are already supported in the existing standards, since it was adding a new - informational status. - -- Many clients and web frameworks don't support ``1xx`` informational responses in a very good way, - if at all, adding it would complicate implementation for very little benefit. - -- The purpose of the ``104 Upload Resumption Supported`` support is to allow clients to determine - that an arbitrary endpoint that they're interacting with supports resumable uploads. Since this - PEP is mandating support for that in servers, clients can just assume that the server they are - interacting with supports it, which makes using it unneeded. - -- In theory, if the support for ``1xx`` responses got resolved and the draft gets accepted with it - in, we can add that in at a later date without changing the overall flow of the API. +the legacy upload API to be (responsibly) deprecated and removed at some point in the future. +Such future deprecation planning is explicitly out of scope for *this* PEP. Can I use the upload 2.0 API to reserve a project name? @@ -924,9 +1021,11 @@ Can I use the upload 2.0 API to reserve a project name? Yes! If you're not ready to upload files to make a release, you can still reserve a project name (assuming of course that the name doesn't already exist). -To do this, :ref:`create a new session `, then :ref:`publish the session -` without uploading any files. While the ``version`` key is required in the JSON -body of the create session request, you can simply use the placeholder version number ``"0.0.0"``. +To do this, +:ref:`create a new Publishing Session `, +then :ref:`publish the session ` without uploading any files. +While the ``version`` key is required in the JSON body of the create session request, +you can simply use the placeholder version number ``"0.0.0"``. The user that created the session will become the owner of the new project. @@ -934,100 +1033,22 @@ The user that created the session will become the owner of the new project. Open Questions ============== -Defer Stage Previews --------------------- - -:ref:`Stage previews ` are an important and useful feature for testing new version -wheel uploads before they are published. They'd allow us to effectively decommission -``test.pypi.org``, which has well-known deficiencies. - -However, the ability to preview stages before they're published does complicate the protocol and -this proposal. We could defer this feature for later, although if we do, we should still keep the -optional ``nonce`` for token generation, in order to be easily future proof. - +Extensions to the Upload 2.0 Protocol +------------------------------------- -Multipart Uploads vs tus ------------------------- +Features such as asynchronous webhook notifications for completion of upload processing +were discussed during review of this PEP. +The concept of a capabilities extension for the upload protocol was discussed, +which would allow implementers to advertise support for optional features +such as asynchronous notifications or webhooks. -This PEP currently bases the actual uploading of files on an `internet draft `_ -(originally designed by `tus.io `__) that supports resumable file uploads. +This idea was left open due to the complexity that would arise in designing +such an extension protocol and ensuring that it did not cause excessive +fracturing of the ecosystem as Upload 2.0 is rolled out. -That protocol requires a few things: +Future revisions to the upload protocol should explore such extensions +as experience is gained operating Upload 2.0. -- That if clients don't upload the entire file in one shot, that they have to submit the chunks - serially, and in the correct order, with all but the final chunk having a ``Upload-Complete: ?0`` - header. - -- Resumption of an upload is essentially just querying the server to see how much data they've - gotten, then sending the remaining bytes (either as a single request, or in chunks). - -- The upload implicitly is completed when the server successfully gets all of the data from the - client. - -This has the benefit that if a client doesn't care about resuming their download, it can essentially -ignore the protocol. Clients can just ``POST`` the file to the file upload URL, and if it doesn't -succeed, they can just ``POST`` the whole file again. - -The other benefit is that even if clients do want to support resumption, unless they *need* to -resume the download, they can still just ``POST`` the file. - -Another, possibly theoretical benefit is that for hashing the uploaded files, the serial chunks -requirement means that the server can maintain hashing state between requests, update it for each -request, then write that file back to storage. Unfortunately this isn't actually possible to do with -Python's `hashlib `__ standard library module. -There are some libraries third party libraries, such as `Rehash -`__ that do implement the necessary APIs, but they don't -support every hash that ``hashlib`` does (e.g. ``blake2`` or ``sha3`` at the time of writing). - -We might also need to reconstitute the download for processing anyways to do things like extract -metadata, etc from it, which would make it a moot point. - -The downside is that there is no ability to parallelize the upload of a single file because each -chunk has to be submitted serially. - -AWS S3 has a similar API, and most blob stores have copied it either wholesale or something like it -which they call multipart uploading. - -The basic flow for a multipart upload is: - -#. Initiate a multipart upload to get an upload ID. -#. Break your file up into chunks, and upload each one of them individually. -#. Once all chunks have been uploaded, finalize the upload. This is the step where any errors would - occur. - -Such multipart uploads do not directly support resuming an upload, but it allows clients to control -the "blast radius" of failure by adjusting the size of each part they upload, and if any of the -parts fail, they only have to resend those specific parts. The trade-off is that it allows for more -parallelism when uploading a single file, allowing clients to maximize their bandwidth using -multiple threads to send the file data. - -We wouldn't need an explicit step (1), because our session would implicitly initiate a multipart -upload for each file. - -There are downsides to this though: - -- Clients have to do more work on every request to have something resembling resumable uploads. They - would *have* to break the file up into multiple parts rather than just making a single POST - request, and only needing to deal with the complexity if something fails. - -- Clients that don't care about resumption at all still have to deal with the third explicit step, - though they could just upload the file all as a single part. (S3 works around this by having - another API for one shot uploads, but the PEP authors place a high value on having a single API - for uploading any individual file.) - -- Verifying hashes gets somewhat more complicated. AWS implements hashing multipart uploads by - hashing each part, then the overall hash is just a hash of those hashes, not of the content - itself. Since PyPI needs to know the actual hash of the file itself anyway, we would have to - reconstitute the file, read its content, and hash it once it's been fully uploaded, though it - could still use the hash of hashes trick for checksumming the upload itself. - -The PEP authors lean towards ``tus`` style resumable uploads, due to them being simpler to use, -easier to imp;lement, and more consistent, with the main downside being that multi-threaded -performance is theoretically left on the table. - -One other possible benefit of the S3 style multipart uploads is that you don't have to try and do -any sort of protection against parallel uploads, since they're just supported. That alone might -erase most of the server side implementation simplification. .. rubric:: Footnotes @@ -1046,10 +1067,6 @@ erase most of the server side implementation simplification. .. [#fn-immutable] Published files may still be yanked (i.e. :pep:`592`) or `deleted `__ as normal. -.. [#fn-location] Or the URL given in the ``Location`` header in the response to the file upload - initiation request, i.e. the metadata upload request; both of these links **MUST** - be the same. - Copyright =========