Conversation
| return prompt | ||
|
|
||
|
|
||
| def build_batch_validation_system_prompt() -> str: |
There was a problem hiding this comment.
Can make this a variable instead of function.
sonic-o1/04_vqa_generation/main.py
Outdated
| class Config: | ||
| """Configuration wrapper""" | ||
| """Configuration wrapper with nested attribute access.""" | ||
|
|
There was a problem hiding this comment.
Can move Config and load_config to utils/config_utils. I see them being used in 4 files.
sonic-o1/04_vqa_generation/main.py
Outdated
| summarizer = SummarizationModel(config) | ||
| # Check for existing Task 1 output | ||
| task1_output_file = output_dir / 'task1_summarization' / f"{topic_id:02d}_{topic_name.replace(' ', '_')}.json" | ||
| task1_output_file = ( |
There was a problem hiding this comment.
Can modularize this:
def get_task_files(task_filter):
:
return model, existing_tasks| config: Config, | ||
| output_dir: Path, | ||
| task_filter: str = None) -> tuple: | ||
| def process_topic( |
There was a problem hiding this comment.
The logic to go through tasks is a bit convoluted. Can simplify this to process topic for 1 task and call it from main.
sonic-o1/04_vqa_generation/main.py
Outdated
|
|
||
| # Check summary_short for failures (more specific patterns) | ||
| summary_short = entry.get('summary_short', []) | ||
| summary_short = entry.get("summary_short", []) |
There was a problem hiding this comment.
Can modularize code into following functions per task: get_summary_detailed ,
get_summary_failed,
skip_task() function returning a bool
get_confidence
and then call task specific functions to process videos.
This way seems more modular.
| def _format_option_with_letter(self, option: str, letter: str) -> str: | ||
| """Format option with letter prefix, removing any existing prefix.""" | ||
| cleaned = option.strip() | ||
| # Remove any existing letter prefix | ||
| for existing_letter in self.option_letters: | ||
| if cleaned.startswith(f"({existing_letter})"): | ||
| cleaned = cleaned[3:].strip() | ||
| break | ||
| return f"({letter}) {cleaned}" | ||
|
|
||
| def _format_options_with_letters(self, options: List[str]) -> List[str]: | ||
| """Format all options with correct letter prefixes.""" | ||
| formatted = [] | ||
| for i, option in enumerate(options): | ||
| letter = self.option_letters[i] | ||
| formatted.append(self._format_option_with_letter(option, letter)) | ||
| return formatted | ||
|
|
There was a problem hiding this comment.
Seems like unnecessary processing with nested for loops. Can just replace the returned options with A, B if needed (If I understand it correctly).
| "glossary": merged_summary.get("glossary", []), | ||
| "demographics": demographics.get("demographics", []), | ||
| "confidence": merged_summary.get("confidence", 0.0), | ||
| } |
There was a problem hiding this comment.
This can be another modular function like _create_entry().
| return valid_questions, stats | ||
|
|
||
| def _validate_single_question( |
There was a problem hiding this comment.
Can divide this function into multiple ones - check_absolute, convert_abs_to_relative etc.
| "correction_reason", "timestamps adjusted" | ||
| ) | ||
| question["rationale_model"] += f" [Judge corrected: {reason}]" | ||
|
|
There was a problem hiding this comment.
Can initialize message variable within if conditions - message = 'GPT-4v corrected timetsamp or message = GPT-4v validated. Then return this dict.
|
@AhmedRadwan02 Reviewed the 4th folder. I'll need to come back to the 5th folder as the PR is quite large and the review is taking considerable time. For future PRs, it would be helpful to keep them smaller so we can ensure a thorough and efficient review. Thanks! |
Overall, the PR looks good. I like the use of classes, private functions, and prompt templates. The code appears to have improved from folder 1 -> 5 :) Logging is also fairly comprehensive and useful for debugging later PRs. Areas of improvement for future PRs:
|
PR Summary – Folder 04Modularization
Shared Utilities
Dry-Run Mode
Bug Fixes
Lint / Mypy / Docs
Reviewers |
SONIC PR Review Template
Summary
This PR addresses previous review comments from the initial SONIC-o1 release (PRs 01-03) by implementing comprehensive code cleanup, adding detailed docstrings, and improving code quality across the pipeline. The focus of this PR is on the VQA generation (04_vqa_generation) and Evaluation/Inference (05_evaluation_inference) modules, with particular attention to code organization, documentation, and maintainability.
Scope
What reviewers should focus on
Changes Implemented
Previous PR Comments Addressed (01-03)
Primary Focus: 04_vqa_generation
Primary Focus: 05_evaluation_inference
1) Leakage / Secrets / Privacy Checklist (Reviewer must verify)
2) Large Files / Dataset Hygiene Checklist
Risks / Notes
Reviewer Assignments
Requested reviewers:
Checklist (Author)