Skip to content

Conversation

@vtnphan
Copy link
Collaborator

@vtnphan vtnphan commented Jan 15, 2026

Pull Request

Summary

SBP-235 Job page - get max score from results

Changes

  • File Listing list_s3_files
    Lists files in S3 bucket with filtering capabilities:
    • Parameters:
      • prefix: Filter by S3 folder path (e.g., "results/ziad-test/ranker/")
      • file_extension: Filter by file type (e.g., ".csv")
    • Returns: List of file objects with key, size, last_modified timestamp, and bucket name
  • CSV Reading read_csv_from_s3: Reads CSV files from S3 with column selection:
    • Parameters:
      • file_key S3 object path (e.g., "results/run-123/ranker/s1_final_design_stats.csv")
      • columns: Optional list of column names to return (returns all if None)
    • Returns: List of dictionaries, each representing a CSV row
  • Maximum Value Calculation calculate_csv_column_max: Calculates the maximum value of a numeric column in a CSV file:
    • Parameters:
      • file_key: S3 object path
      • column_name: Column to analyze (e.g., "Average_i_pTM")
    • Returns: Maximum numeric value from the column

How to Test

Please contact me to get keys for testing

AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_S3_BUCKET=sbp-storage
AWS_REGION=ap-southeast-2

Test with custom parameters

curl "http://localhost:3000/api/s3/run/ziad-test/max-score?folder_prefix=results&subfolder=ranker&filename=s1_final_design_stats.csv"

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have added or updated documentation where necessary
  • I have run linting and unit tests locally
  • The code follows the project's style guidelines

@vtnphan vtnphan marked this pull request as ready for review January 15, 2026 03:55
Copy link
Collaborator

@marius-mather marius-mather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current API for access files from S3 is potentially insecure - can you have a think about which files from S3 the user should be able to access, and try to ensure the API can only access those objects?

I will review the rest of the code once that's been addressed

"""
try:
# Construct file path: results/{run_id}/ranker/s1_final_design_stats.csv
file_key = f"{folder_prefix}/{run_id}/{subfolder}/{filename}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful about putting together a key like this - I feel like it could be vulnerable to bad inputs, similar to a path traversal attack.

I know these are S3 object keys rather than actual file paths, but if you don't validate the different parts of the key (prefix, run_id, subfolder, filename), you could potentially expose data from elsewhere in S3.

It will probably help if you exclude/ from any of the query parameters, but you should also think about how to put together the object key in a more structured way to make sure you only access the expected S3 objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants