Skip to content

Conversation

anuragxxd
Copy link
Member

@anuragxxd anuragxxd commented Aug 20, 2025

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not reduced the existing code coverage
  • I have added docstrings following the Python style guidelines of this project to all new modules, classes, methods and functions are documented with docstrings following; I have updated any previously existing docstrings, if applicable
  • I have updated any sections of the app's documentation that are affected by the proposed changes, if applicable

Summary by Sourcery

Enable external file storage support by introducing a FileStorageManager with MinIO and MongoDB strategies, refactor all file-related endpoints to store, retrieve, and delete files via MinIO, add a bulk import endpoint, and update build and deployment configurations to include a MinIO service.

New Features:

  • Introduce a file storage abstraction with pluggable MinIO and MongoDB strategies
  • Integrate MinIO client for uploading, downloading, generating presigned URLs, and deleting files
  • Add a bulk tools import endpoint (postToolsBulk)

Enhancements:

  • Refactor TRS server endpoints to use FileStorageManager for all file operations and return presigned URLs instead of inline content
  • Add regex-based filtering for source and toolname parameters and reverse the default sort order
  • Automatically delete stored files from MinIO when deleting tools or tool versions
  • Allow optional override of tool and version URLs only when not already provided

Build:

  • Add minio library dependency and update requirements.txt
  • Update docker-compose.yaml to include MinIO, MongoDB, and mongo-express services

Deployment:

  • Extend Helm charts with MinIO deployment, service, and PVC templates
  • Configure storage strategy and MinIO credentials via environment variables in Kubernetes deployment and values.yaml

Copy link

sourcery-ai bot commented Aug 20, 2025

Reviewer's Guide

This PR adds support for MinIO-based file storage by introducing a storage abstraction layer, integrating a MinIO client, and refactoring existing endpoints and registration logic to delegate file operations to the new storage manager; it also updates deployment configurations and dependencies to include MinIO services.

Sequence diagram for storing a file using MinIO strategy

sequenceDiagram
    participant API as TRS Filer API
    participant FSM as FileStorageManager
    participant MS as MinIOStorage
    participant MC as MinIOClient
    API->>FSM: store_file(file_content, file_path, file_type)
    FSM->>MS: store_file(file_content, file_path, file_type)
    MS->>MC: upload_file(object_name, file_content, content_type)
    MC-->>MS: object_name
    MS-->>FSM: {minio_path: object_name}
    FSM-->>API: {minio_path: object_name}
Loading

Sequence diagram for retrieving a file's download URL using MinIO

sequenceDiagram
    participant API as TRS Filer API
    participant FSM as FileStorageManager
    participant MS as MinIOStorage
    participant MC as MinIOClient
    API->>FSM: get_file_url(file_wrapper)
    FSM->>MS: get_file_url(file_wrapper)
    MS->>MC: get_presigned_url(object_name)
    MC-->>MS: presigned_url
    MS-->>FSM: presigned_url
    FSM-->>API: presigned_url
Loading

Class diagram for file storage abstraction and MinIO client

classDiagram
    class FileStorageManager {
        +store_file(file_content, file_path, file_type)
        +retrieve_file_content(file_wrapper)
        +get_file_url(file_wrapper)
        +delete_file(file_wrapper)
        -strategy: FileStorageStrategy
    }
    class FileStorageStrategy {
        <<abstract>>
        +store_file(file_content, file_path, file_type)
        +retrieve_file_content(file_wrapper)
        +get_file_url(file_wrapper)
        +delete_file(file_wrapper)
    }
    class MongoDBStorage {
        +store_file(file_content, file_path, file_type)
        +retrieve_file_content(file_wrapper)
        +get_file_url(file_wrapper)
        +delete_file(file_wrapper)
    }
    class MinIOStorage {
        +store_file(file_content, file_path, file_type)
        +retrieve_file_content(file_wrapper)
        +get_file_url(file_wrapper)
        +delete_file(file_wrapper)
        -minio_client: MinIOClient
    }
    class MinIOClient {
        +upload_file(object_name, file_content, content_type)
        +get_presigned_url(object_name)
        +download_file(object_name)
        +delete_file(object_name)
        +file_exists(object_name)
    }
    FileStorageManager --> FileStorageStrategy
    FileStorageStrategy <|-- MongoDBStorage
    FileStorageStrategy <|-- MinIOStorage
    MinIOStorage --> MinIOClient
Loading

File-Level Changes

Change Details Files
Introduce file storage abstraction with MinIO and MongoDB strategies
  • add FileStorageStrategy abstract class and FileStorageManager
  • implement MongoDBStorage and MinIOStorage classes
  • create MinIOClient wrapper for S3 operations
trs_filer/file_storage.py
trs_filer/minio_client.py
Refactor server endpoints to use storage manager for file operations
  • add _get_file_storage and _prepare_file_wrapper helpers
  • replace inline content/url handling with storage.get_file_url and retrieve_file_content
  • invoke storage.delete_file on tool and version deletions
trs_filer/ga4gh/trs/server.py
Integrate file storage into registration workflows
  • inject FileStorageManager in RegisterServiceInfo and RegisterToolVersion
  • store file content via storage.store_file in process_files
  • remove previous URL-based content fetch logic
  • cleanup stored files when replacing existing versions
trs_filer/ga4gh/trs/endpoints/register_objects.py
Add bulk tool import endpoint
  • implement postToolsBulk to register multiple tools in one call
trs_filer/ga4gh/trs/server.py
Update deployment configurations to include MinIO
  • add MinIO service and mongo-express to docker-compose
  • inject FILE_STORAGE_STRATEGY and MinIO env vars in Helm chart
  • add MinIO deployment, service, PVC templates
docker-compose.yaml
deployment/values.yaml
deployment/templates/trs-filer-deploy.yaml
deployment/templates/minio-deploy.yaml
deployment/templates/minio-service.yaml
deployment/templates/minio-pvc.yaml
Bump dependencies to support MinIO client
  • add minio>=7.1.0 to requirements
requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `trs_filer/ga4gh/trs/server.py:528` </location>
<code_context>
-                'content' not in _file['file_wrapper']
-            ):
-                _w = _file['file_wrapper']
-                try:
-                    _w['content'] = requests.get(_w['url']).text
-                except (
</code_context>

<issue_to_address>
Catching all exceptions when retrieving file content may mask unexpected errors.

Catching 'Exception' may hide programming or system errors. Please catch only specific exceptions relevant to file retrieval.
</issue_to_address>

### Comment 2
<location> `trs_filer/minio_client.py:43` </location>
<code_context>
+        self.bucket = os.environ.get('MINIO_BUCKET', "trs-files")
+        self._ensure_bucket()
+
+    def _parse_endpoint(self, endpoint_url: str) -> tuple[str, bool]:
+        """Parse endpoint URL to extract host:port and determine if secure."""
+        if endpoint_url.startswith(('http://', 'https://')):
+            parsed = urlparse(endpoint_url)
+            secure = parsed.scheme == 'https'
+            # Extract host:port and path
+            path = parsed.path.rstrip('/')
+            endpoint = parsed.netloc + path
+            return endpoint, secure
+        else:
</code_context>

<issue_to_address>
Concatenating netloc and path may result in malformed endpoints.

For MinIO, the endpoint should be host:port only; including the path can break client connections. Use parsed.netloc instead of concatenating with the path.
</issue_to_address>

### Comment 3
<location> `trs_filer/file_storage.py:44` </location>
<code_context>
+    def store_file(self, file_content: str | bytes, file_path: str,
+                  file_type: str) -> Dict[str, Any]:
+        """Store file content directly in the file wrapper."""
+        if isinstance(file_content, bytes):
+            file_content = file_content.decode('utf-8')
+        return {"content": file_content}
+
+    def retrieve_file_content(self, file_wrapper: Dict[str, Any]) -> bytes:
</code_context>

<issue_to_address>
Decoding bytes as UTF-8 may fail for non-text files.

Decoding will fail for non-UTF-8 or binary files. Consider adding error handling or supporting binary formats, such as base64 encoding.
</issue_to_address>

### Comment 4
<location> `trs_filer/file_storage.py:116` </location>
<code_context>
+
+    def __init__(self):
+        """Initialize storage manager."""
+        strategy = os.environ.get('FILE_STORAGE_STRATEGY', 'minio')
+        if strategy == "minio":
+            self.strategy = MinIOStorage()
+        elif strategy == "mongodb":
+            self.strategy = MongoDBStorage()
+        else:
+            raise ValueError(f"Unsupported file storage strategy: {strategy}")
+
+    def store_file(self, file_content: str | bytes, file_path: str,
</code_context>

<issue_to_address>
Defaulting to 'minio' may cause unexpected failures if MinIO is not configured.

Defaulting to 'minio' may lead to runtime errors if MinIO is not available. Recommend specifying the strategy in deployment settings or choosing a safer default.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    def __init__(self):
        """Initialize storage manager."""
        strategy = os.environ.get('FILE_STORAGE_STRATEGY', 'minio')
        if strategy == "minio":
            self.strategy = MinIOStorage()
        elif strategy == "mongodb":
            self.strategy = MongoDBStorage()
        else:
            raise ValueError(f"Unsupported file storage strategy: {strategy}")
=======
    def __init__(self):
        """Initialize storage manager."""
        strategy = os.environ.get('FILE_STORAGE_STRATEGY')
        if not strategy:
            raise ValueError(
                "FILE_STORAGE_STRATEGY environment variable must be set to 'minio' or 'mongodb'."
            )
        if strategy == "minio":
            self.strategy = MinIOStorage()
        elif strategy == "mongodb":
            self.strategy = MongoDBStorage()
        else:
            raise ValueError(f"Unsupported file storage strategy: {strategy}. Please set FILE_STORAGE_STRATEGY to 'minio' or 'mongodb'.")
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

ret = _d['file_wrapper']
ret = _prepare_file_wrapper(_d['file_wrapper'])
except (IndexError, KeyError, TypeError):
raise NotFound
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Catching all exceptions when retrieving file content may mask unexpected errors.

Catching 'Exception' may hide programming or system errors. Please catch only specific exceptions relevant to file retrieval.

Comment on lines +44 to +46
if isinstance(file_content, bytes):
file_content = file_content.decode('utf-8')
return {"content": file_content}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Decoding bytes as UTF-8 may fail for non-text files.

Decoding will fail for non-UTF-8 or binary files. Consider adding error handling or supporting binary formats, such as base64 encoding.

Comment on lines +81 to +83
object_name = file_wrapper.get("minio_path")
if not object_name:
raise ValueError("No MinIO path specified in file wrapper")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Comment on lines 413 to 414
except (IndexError, KeyError, TypeError):
raise NotFound
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error)

Suggested change
except (IndexError, KeyError, TypeError):
raise NotFound
except (IndexError, KeyError, TypeError) as e:
raise NotFound from e

Comment on lines 474 to 475
except (IndexError, KeyError, TypeError):
raise NotFound
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error)

Suggested change
except (IndexError, KeyError, TypeError):
raise NotFound
except (IndexError, KeyError, TypeError) as e:
raise NotFound from e

raise NotFound
del_obj_tools = db_coll_tools.delete_one({'id': id})

if del_obj_tools.deleted_count:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Comment on lines +45 to +54
if endpoint_url.startswith(('http://', 'https://')):
parsed = urlparse(endpoint_url)
secure = parsed.scheme == 'https'
# Extract host:port and path
path = parsed.path.rstrip('/')
endpoint = parsed.netloc + path
return endpoint, secure
else:
# Assume it's already in host:port format
return endpoint_url, False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Comment on lines +102 to +106
presigned_url = public_client.presigned_get_object(
bucket_name=self.bucket,
object_name=object_name,
expires=timedelta(hours=1)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)

Copy link
Member

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice solution! Don't have much to comment. The Sourcery comments seem worthwhile to consider though, so please have a look and respond to them. Apart from that, once tests are added, I think it's good to go.

Comment on lines +387 to +391
# # validate image file types
# elif _file['tool_file']['file_type'] == "CONTAINERFILE":
# if _file['type'] not in self.image_types:
# logger.error("Missing or invalid image file type.")
# raise BadRequest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove if not needed

Base automatically changed from support/zip to dev August 22, 2025 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants