-
Notifications
You must be signed in to change notification settings - Fork 1
UCS Redux - Lynx Boreal (GSI-1685) #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
TheByronHimes
wants to merge
8
commits into
main
Choose a base branch
from
feature/lynx_boreal
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
e7e8e92
Commit something
TheByronHimes 56840e0
Add tests and StrEnums
TheByronHimes bb33732
Add UploadContext state diagram
TheByronHimes 5204b50
Fix termination
TheByronHimes a77de7e
Specify limits of UCS in UploadContext state management
TheByronHimes c1c2c0c
Expand upon UploadContext
TheByronHimes 875eef9
Include file upload init request body
TheByronHimes d1ef70e
Fix typos and improve wording for clarity
TheByronHimes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
# Upload Service Redux (Lynx Boreal) | ||
**Epic Type:** Implementation Epic | ||
|
||
Epic planning and implementation follow the | ||
[Epic Planning and Marathon SOP](https://docs.ghga-dev.de/main/sops/sop001_epic_planning.html). | ||
|
||
## Scope | ||
### Outline: | ||
The goal of this epic is to overhaul the Upload Controller Service (UCS) as part of the | ||
new [File Upload concept](https://ghga.pages.hzdr.de/internal.ghga.de/feature_archconcept-file-upload/developer/architecture_concepts/ac007_file_upload/). | ||
|
||
 | ||
|
||
#### Domain Objects | ||
The UCS owns two domain objects, which it broadcasts as outbox events via Kafka. The | ||
first domain object is the `UploadContext`, which broadly serves to delineate | ||
in-progress and finalized file submissions for a given study. The second domain | ||
object is the `FileUpload`. As its name suggests, the `FileUpload` object reflects | ||
the upload status of a single file within an `UploadContext`. Thus, there is a | ||
hierarchical, one-to-many relationship between `UploadContext` and `FileUpload`. | ||
|
||
We will define the Pydantic models for these two classes in `ghga-event-schemas`, | ||
along with one stateful config class for each. | ||
|
||
#### Inputs | ||
The UCS only receives user input in the form of HTTP requests. It doesn't subscribe to | ||
any Kafka events. However, we will define a slim CLI interface for the service that | ||
exposes commands to `run-rest` and `publish-events`. These commands are commonly seen | ||
across our services at this time. | ||
|
||
### Outputs | ||
There are three categories of output in the UCS: HTTP responses, published events, and | ||
data stored in the database. HTTP responses are described below in the API Definitions | ||
section. The published events and database storage are driven simultaneously by | ||
Hexkit's MongoKafkaDaoPublisher, which the UCS uses to store `UploadContext` and | ||
`FileUpload` instances. Anytime an `UploadContext` or `FileUpload` is created, modified, | ||
or deleted, the UCS publishes a Kafka event containing the latest state. This is done | ||
according to the Outbox Pattern (not described in further detail here). | ||
|
||
#### Auth | ||
Users will not access the UCS's HTTP API directly, but rather through the | ||
`ghga-connector` or Data Portal. | ||
Successful access to HTTP endpoints will require the encrypted | ||
access token they obtained from the Data Portal when creating the Upload Context. | ||
The HTTP request responsible for creating the Upload Context does not come directly | ||
from the user, but rather from the Study Repository Service. | ||
For more information on the HTTP API, see the endpoint definitions below. | ||
|
||
### Included/Required: | ||
- Remove existing core logic | ||
- Create new core class w/ outbox publisher | ||
- Write Unit and Integration Tests | ||
|
||
### Not included: | ||
Archive test bed integration, Study Repository Service development, or front end work. | ||
|
||
## User Journeys | ||
|
||
### UploadContext Creation | ||
Using the Data Portal, the user initiates a file upload for a study. The request flows | ||
from the Data Portal to the Study Repository Service (where it passes through | ||
validation and other checks) and ultimately to the UCS's HTTP endpoint | ||
`POST /contexts`. The UCS creates a new | ||
`UploadContext` with the state set to `OPEN` and returns the `UploadContext` to the | ||
Study Repository Service. The Study Repository Service returns authentication info | ||
to the user via the Data Portal. | ||
|
||
### UploadContext Update | ||
The user makes a request to the `PATCH /contexts` endpoint via the Study Repository | ||
Service. If a valid encrypted access token is supplied with the request, the UCS | ||
updates the state of the `UploadContext` to `LOCKED`, `CLOSED`, or `OPEN`, as | ||
specified by the request. If the `UploadContext` is already in the given state, nothing | ||
happens and the UCS returns a successful response. | ||
The initial state of the `UploadContext` is `OPEN`. When the user is finished uploading | ||
files, they can use the Data Portal to set the Context to a semi-finalized state, | ||
`LOCKED`. It is possible that the user decides they need to make changes, such as | ||
uploading or removing a file, and in that case they can revert the Context to `OPEN`. | ||
If no changes are needed, however, the user can fully finalize the Context by setting | ||
it to `CLOSED`, after which point no changes can be made without opening a new | ||
`UploadContext`. | ||
If user tries to change the status of an `UploadContext` that's already set to `CLOSED`, | ||
they receive an error. Once the update operation is complete, the UCS publishes a Kafka | ||
event reflecting the latest state of the `UploadContext` and returns an HTTP response | ||
indicating the update was successful. | ||
|
||
 | ||
|
||
An `UploadContext` may only be moved from `LOCKED` to `CLOSED` if all its linked | ||
`FileUpload`s are set to `COMPLETED`. External logic in the Study Repository Service | ||
is responsible for further validation, like ensuring interrogation was successful. | ||
|
||
### File Upload Init | ||
The user initiates the upload process for a given single file by making a request to | ||
the `POST /uploads` endpoint. The request body includes the unencrypted checksum, the | ||
access token, and the alias (or whichever naming element is used to match the file | ||
with the metadata content). | ||
|
||
If a valid encrypted access token is supplied with the | ||
request, the UCS ensures it doesn't already have a completed `FileUpload` for the same | ||
file, then adds the `FileUpload` to the associated `UploadContext`. | ||
|
||
The UCS publishes upsertion events to Kafka for both the `FileUpload` and | ||
`UploadContext` objects, and finally returns an HTTP response to the user indicating | ||
that the file upload was successfully initiated. | ||
|
||
The `ghga-connector` uploads a given file in chunks, and for each chunk it requests | ||
a pre-signed upload URL. If the request includes a valid access token, the UCS | ||
returns an HTTP response with the pre-signed upload URL if the token is valid. | ||
mephenor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### File Upload Termination (Upload Completion) | ||
The user initiates a file upload using the `ghga-connector`. When the upload is | ||
complete, the connector automatically makes a request to `PATCH /uploads`. This call | ||
instructs the UCS to communicate with the S3 instance and terminate (complete) the | ||
multipart upload. The UCS will update the `FileUpload` instance to `COMPLETED` and | ||
publish a Kafka event reflecting the new state. Finally, the UCS will return an HTTP | ||
response indicating the operation was successful and that the file is completely | ||
uploaded. | ||
|
||
### File Upload Deletion | ||
The user makes a request to the `DELETE /uploads` endpoint, indicating they wish to | ||
delete a file from the associated Upload Context. If a valid encrypted access token | ||
is supplied with the request, the UCS cancels the ongoing upload if it exists and | ||
deletes the `FileUpload` object from the database. It removes the reference | ||
from the `file_uploads` field in the `UploadContext` and publishes Kafka events | ||
reflecting the deletion of the upload and the new state of the Upload Context. | ||
Finally, the UCS returns an HTTP response to the user indicating the deletion was | ||
successful. | ||
|
||
## API Definitions: | ||
|
||
### RESTful/Synchronous: | ||
|
||
- POST /contexts: Create a new UploadContext | ||
- PATCH /contexts: Update an UploadContext to change the status | ||
- POST /uploads: Initiate a multipart file upload | ||
- PATCH /uploads: Signal that a multipart file upload has been completed | ||
- DELETE /uploads: Remove a file upload from the UploadContext | ||
|
||
### Payload Schemas for Events: | ||
|
||
```python | ||
class UploadContextState(StrEnum): | ||
"""The allowed states for an UploadContext instance""" | ||
|
||
OPEN = "open" | ||
LOCKED = "locked" | ||
CLOSED = "closed" | ||
|
||
class UploadContext(BaseModel): | ||
"""A class representing an Upload Context""" | ||
|
||
upload_context_id: UUID4 # unique identifier for the instance | ||
state: UploadContextState # one of OPEN, LOCKED, CLOSED | ||
file_uploads: list[FileUpload] # use list function for default_factory | ||
|
||
class FileUploadState(StrEnum): | ||
"""The allowed states for a FileUpload instance""" | ||
|
||
INIT = "init" | ||
COMPLETED = "completed" | ||
|
||
class FileUpload(BaseModel): | ||
"""A File Upload""" | ||
|
||
upload_id: UUID4 | ||
state: FileUploadState # one of INIT, COMPLETED | ||
original_path: str | ||
checksum: str | ||
``` | ||
|
||
|
||
## Additional Implementation Details: | ||
|
||
|
||
### Testing | ||
Tests need to cover at least the following items (not exhaustive): | ||
- Standard endpoint authentication battery | ||
- Happy path for each endpoint | ||
- Core error translation for HTTP API for each endpoint | ||
- Disallow changing status of a CLOSED UploadContext | ||
- Disallow removing a file from a CLOSED UploadContext | ||
|
||
|
||
## Human Resource/Time Estimation: | ||
|
||
Number of sprints required: 1 | ||
|
||
Number of developers required: 1 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.