-
Notifications
You must be signed in to change notification settings - Fork 6
feat: support minio as file storage #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR adds support for MinIO-based file storage by introducing a storage abstraction layer, integrating a MinIO client, and refactoring existing endpoints and registration logic to delegate file operations to the new storage manager; it also updates deployment configurations and dependencies to include MinIO services. Sequence diagram for storing a file using MinIO strategysequenceDiagram
participant API as TRS Filer API
participant FSM as FileStorageManager
participant MS as MinIOStorage
participant MC as MinIOClient
API->>FSM: store_file(file_content, file_path, file_type)
FSM->>MS: store_file(file_content, file_path, file_type)
MS->>MC: upload_file(object_name, file_content, content_type)
MC-->>MS: object_name
MS-->>FSM: {minio_path: object_name}
FSM-->>API: {minio_path: object_name}
Sequence diagram for retrieving a file's download URL using MinIOsequenceDiagram
participant API as TRS Filer API
participant FSM as FileStorageManager
participant MS as MinIOStorage
participant MC as MinIOClient
API->>FSM: get_file_url(file_wrapper)
FSM->>MS: get_file_url(file_wrapper)
MS->>MC: get_presigned_url(object_name)
MC-->>MS: presigned_url
MS-->>FSM: presigned_url
FSM-->>API: presigned_url
Class diagram for file storage abstraction and MinIO clientclassDiagram
class FileStorageManager {
+store_file(file_content, file_path, file_type)
+retrieve_file_content(file_wrapper)
+get_file_url(file_wrapper)
+delete_file(file_wrapper)
-strategy: FileStorageStrategy
}
class FileStorageStrategy {
<<abstract>>
+store_file(file_content, file_path, file_type)
+retrieve_file_content(file_wrapper)
+get_file_url(file_wrapper)
+delete_file(file_wrapper)
}
class MongoDBStorage {
+store_file(file_content, file_path, file_type)
+retrieve_file_content(file_wrapper)
+get_file_url(file_wrapper)
+delete_file(file_wrapper)
}
class MinIOStorage {
+store_file(file_content, file_path, file_type)
+retrieve_file_content(file_wrapper)
+get_file_url(file_wrapper)
+delete_file(file_wrapper)
-minio_client: MinIOClient
}
class MinIOClient {
+upload_file(object_name, file_content, content_type)
+get_presigned_url(object_name)
+download_file(object_name)
+delete_file(object_name)
+file_exists(object_name)
}
FileStorageManager --> FileStorageStrategy
FileStorageStrategy <|-- MongoDBStorage
FileStorageStrategy <|-- MinIOStorage
MinIOStorage --> MinIOClient
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `trs_filer/ga4gh/trs/server.py:528` </location>
<code_context>
- 'content' not in _file['file_wrapper']
- ):
- _w = _file['file_wrapper']
- try:
- _w['content'] = requests.get(_w['url']).text
- except (
</code_context>
<issue_to_address>
Catching all exceptions when retrieving file content may mask unexpected errors.
Catching 'Exception' may hide programming or system errors. Please catch only specific exceptions relevant to file retrieval.
</issue_to_address>
### Comment 2
<location> `trs_filer/minio_client.py:43` </location>
<code_context>
+ self.bucket = os.environ.get('MINIO_BUCKET', "trs-files")
+ self._ensure_bucket()
+
+ def _parse_endpoint(self, endpoint_url: str) -> tuple[str, bool]:
+ """Parse endpoint URL to extract host:port and determine if secure."""
+ if endpoint_url.startswith(('http://', 'https://')):
+ parsed = urlparse(endpoint_url)
+ secure = parsed.scheme == 'https'
+ # Extract host:port and path
+ path = parsed.path.rstrip('/')
+ endpoint = parsed.netloc + path
+ return endpoint, secure
+ else:
</code_context>
<issue_to_address>
Concatenating netloc and path may result in malformed endpoints.
For MinIO, the endpoint should be host:port only; including the path can break client connections. Use parsed.netloc instead of concatenating with the path.
</issue_to_address>
### Comment 3
<location> `trs_filer/file_storage.py:44` </location>
<code_context>
+ def store_file(self, file_content: str | bytes, file_path: str,
+ file_type: str) -> Dict[str, Any]:
+ """Store file content directly in the file wrapper."""
+ if isinstance(file_content, bytes):
+ file_content = file_content.decode('utf-8')
+ return {"content": file_content}
+
+ def retrieve_file_content(self, file_wrapper: Dict[str, Any]) -> bytes:
</code_context>
<issue_to_address>
Decoding bytes as UTF-8 may fail for non-text files.
Decoding will fail for non-UTF-8 or binary files. Consider adding error handling or supporting binary formats, such as base64 encoding.
</issue_to_address>
### Comment 4
<location> `trs_filer/file_storage.py:116` </location>
<code_context>
+
+ def __init__(self):
+ """Initialize storage manager."""
+ strategy = os.environ.get('FILE_STORAGE_STRATEGY', 'minio')
+ if strategy == "minio":
+ self.strategy = MinIOStorage()
+ elif strategy == "mongodb":
+ self.strategy = MongoDBStorage()
+ else:
+ raise ValueError(f"Unsupported file storage strategy: {strategy}")
+
+ def store_file(self, file_content: str | bytes, file_path: str,
</code_context>
<issue_to_address>
Defaulting to 'minio' may cause unexpected failures if MinIO is not configured.
Defaulting to 'minio' may lead to runtime errors if MinIO is not available. Recommend specifying the strategy in deployment settings or choosing a safer default.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def __init__(self):
"""Initialize storage manager."""
strategy = os.environ.get('FILE_STORAGE_STRATEGY', 'minio')
if strategy == "minio":
self.strategy = MinIOStorage()
elif strategy == "mongodb":
self.strategy = MongoDBStorage()
else:
raise ValueError(f"Unsupported file storage strategy: {strategy}")
=======
def __init__(self):
"""Initialize storage manager."""
strategy = os.environ.get('FILE_STORAGE_STRATEGY')
if not strategy:
raise ValueError(
"FILE_STORAGE_STRATEGY environment variable must be set to 'minio' or 'mongodb'."
)
if strategy == "minio":
self.strategy = MinIOStorage()
elif strategy == "mongodb":
self.strategy = MongoDBStorage()
else:
raise ValueError(f"Unsupported file storage strategy: {strategy}. Please set FILE_STORAGE_STRATEGY to 'minio' or 'mongodb'.")
>>>>>>> REPLACE
</suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ret = _d['file_wrapper'] | ||
ret = _prepare_file_wrapper(_d['file_wrapper']) | ||
except (IndexError, KeyError, TypeError): | ||
raise NotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Catching all exceptions when retrieving file content may mask unexpected errors.
Catching 'Exception' may hide programming or system errors. Please catch only specific exceptions relevant to file retrieval.
if isinstance(file_content, bytes): | ||
file_content = file_content.decode('utf-8') | ||
return {"content": file_content} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Decoding bytes as UTF-8 may fail for non-text files.
Decoding will fail for non-UTF-8 or binary files. Consider adding error handling or supporting binary formats, such as base64 encoding.
object_name = file_wrapper.get("minio_path") | ||
if not object_name: | ||
raise ValueError("No MinIO path specified in file wrapper") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Lift code into else after jump in control flow (
reintroduce-else
) - Swap if/else branches (
swap-if-else-branches
)
except (IndexError, KeyError, TypeError): | ||
raise NotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error
)
except (IndexError, KeyError, TypeError): | |
raise NotFound | |
except (IndexError, KeyError, TypeError) as e: | |
raise NotFound from e |
except (IndexError, KeyError, TypeError): | ||
raise NotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error
)
except (IndexError, KeyError, TypeError): | |
raise NotFound | |
except (IndexError, KeyError, TypeError) as e: | |
raise NotFound from e |
raise NotFound | ||
del_obj_tools = db_coll_tools.delete_one({'id': id}) | ||
|
||
if del_obj_tools.deleted_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Swap if/else branches (
swap-if-else-branches
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
if endpoint_url.startswith(('http://', 'https://')): | ||
parsed = urlparse(endpoint_url) | ||
secure = parsed.scheme == 'https' | ||
# Extract host:port and path | ||
path = parsed.path.rstrip('/') | ||
endpoint = parsed.netloc + path | ||
return endpoint, secure | ||
else: | ||
# Assume it's already in host:port format | ||
return endpoint_url, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Swap if/else branches (
swap-if-else-branches
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
presigned_url = public_client.presigned_get_object( | ||
bucket_name=self.bucket, | ||
object_name=object_name, | ||
expires=timedelta(hours=1) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice solution! Don't have much to comment. The Sourcery comments seem worthwhile to consider though, so please have a look and respond to them. Apart from that, once tests are added, I think it's good to go.
# # validate image file types | ||
# elif _file['tool_file']['file_type'] == "CONTAINERFILE": | ||
# if _file['type'] not in self.image_types: | ||
# logger.error("Missing or invalid image file type.") | ||
# raise BadRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove if not needed
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Checklist:
Summary by Sourcery
Enable external file storage support by introducing a FileStorageManager with MinIO and MongoDB strategies, refactor all file-related endpoints to store, retrieve, and delete files via MinIO, add a bulk import endpoint, and update build and deployment configurations to include a MinIO service.
New Features:
Enhancements:
Build:
Deployment: