Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces the initial SONIC-o1 release, establishing the complete end-to-end pipeline for multimodal video understanding evaluation. The codebase includes data curation, caption generation, demographics annotation, VQA generation, and comprehensive evaluation/inference workflows across 13 diverse topics. The implementation supports multiple state-of-the-art models (Gemini, Qwen3, MiniCPM, Uni-MoE, VITA, VideoLLaMA2, Phi-4, GPT-4o) with proper environment isolation and reproducible configurations.
Changes:
- Added complete evaluation/inference pipeline with model wrappers, retry logic, and metrics computation (T1-T3 tasks)
- Implemented VQA generation workflow with video segmentation, prompt engineering, and quality validation
- Established configuration management system with environment-specific requirements and model-specific settings
Reviewed changes
Copilot reviewed 59 out of 81 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| sonic-o1/05_evaluation_inference/models_requirements/*.txt | Python dependency specifications for model-specific virtual environments |
| sonic-o1/05_evaluation_inference/models/*.py | Model wrapper implementations following BaseModel pattern |
| sonic-o1/05_evaluation_inference/metrics/*.py | Metrics computation for T1-T3 tasks with LLM judge integration |
| sonic-o1/05_evaluation_inference/inference/run_inference.py | Main inference orchestration with resume capability and error handling |
| sonic-o1/05_evaluation_inference/configs/models_config.yaml | Centralized model configuration with retry strategies |
| sonic-o1/04_vqa_generation/utils/video_segmenter.py | Video segmentation with FFmpeg and duration validation |
| sonic-o1/03_demographics_annotation/config.yaml | Demographics annotation configuration |
| sonic-o1/02_caption_generation/config_whisper.yaml | WhisperX caption generation settings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Check for OOM | ||
| if "out of memory" in error_msg.lower() or "CUDA out of memory" in error_msg: | ||
| logger.error(f"OOM error: {error_msg[:200]}...") | ||
| self._clear_memory |
There was a problem hiding this comment.
Missing parentheses for method call _clear_memory. This line references the method but does not execute it. Should be self._clear_memory().
| self._clear_memory | |
| self._clear_memory() |
|
@AhmedRadwan02 Repo code needs to follow PEP standard. Must-haves:
|
In your local repo:
uv tool install mypy
uv tool install pre-commit
uv run pre-commit run --all-filesNo need to commit these changes. Just fix the errors you see from pre-commit. |
| import subprocess | ||
| import tempfile | ||
| import shutil |
|
|
||
| def _build_prompt(self, metadata: Dict[str, Any], transcript_text: str) -> str: | ||
| """Build the analysis prompt with transcript embedded""" | ||
| from prompts import MAIN_PROMPT_TEMPLATE |
There was a problem hiding this comment.
Moved to top of file.
| @@ -0,0 +1,365 @@ | |||
| # Demographics Annotation with Gemini | |||
|
|
|||
| This directory handles automatic demographics annotation for videos using Google's Gemini multimodal model. It analyzes videos, audio, and captions to extract demographic information (race, gender, age, language) of people appearing in the videos. | |||
There was a problem hiding this comment.
Include an overview section here. Optional, but might be helpful to reiterate the list of topics here and link the previous README.md.
| content = f.read() | ||
|
|
||
| # Simple SRT parser | ||
| import re |
| @@ -0,0 +1,155 @@ | |||
| # VQA Generation Configuration | |||
There was a problem hiding this comment.
Move to 04_vqa_generation folder? There is only 1 file within config folder.
| ### Check Annotation Quality | ||
|
|
||
| ```bash | ||
| # Count videos with demographics | ||
| jq '[.[] | select(.demographics_detailed != null)] | length' \ | ||
| dataset/videos/01_Patient-Doctor_Consultations/metadata_enhanced.json | ||
|
|
||
| # View specific annotation | ||
| jq '.[] | select(.video_number == "001")' \ | ||
| dataset/videos/01_Patient-Doctor_Consultations/metadata_enhanced.json | ||
| ``` |
There was a problem hiding this comment.
Add details on how quality is checked with these commands.
| - The script processes videos in order by video number | ||
| - Already processed videos are skipped automatically |
There was a problem hiding this comment.
Add to script features in Overview.
| └── metadata_enhanced_checkpoint.json # Checkpoint for resume | ||
| ``` | ||
|
|
||
| ## Checkpoint and Resume |
There was a problem hiding this comment.
Review comment about including a checkpoint flag in the comment.
| return True | ||
|
|
||
| # Check if it has the expected fields with actual values | ||
| required_fields = ['race', 'gender', 'age', 'language'] |
There was a problem hiding this comment.
Can these be taken from YAML instead of hardcoding?
| - `google-generativeai` | ||
| - `openai` | ||
| - `python-dotenv` | ||
| - `pyyaml` | ||
| - `tqdm` |
SONIC PR Review Template (Leakage + Paths)
Summary
This PR introduces the initial SONIC-o1 release, adding the full end-to-end codebase and supporting documentation required to run the pipeline. It establishes the project structure and core modules for data curation, generation, and evaluation workflows, with a focus on reproducibility and avoiding leakage (no secrets, no sensitive paths, and large artifacts excluded).
Scope
sonic-o1/01_data_curation)sonic-o1/02_caption_generation)sonic-o1/03_demographics_annotation)sonic-o1/04_vqa_generation)sonic-o1/05_evaluation_inference)What reviewers should focus on
1) Leakage / Secrets / Privacy Checklist (Reviewer must verify)
.envfile committed (only.env.exampleif needed)/projects/...,/home/...) in code/docs unless clearly marked as examplesSensitive patterns to search for (optional):
AKIA,AIza,sk-,hf_,ssh-rsa,BEGIN PRIVATE KEYtoken=,api_key,secret,.pem,.key,.p122) Large Files / Dataset Hygiene Checklist
dataset/,vqa/, raw media).gitignoreexcludes generated artifacts + external repos as intendedRisks / Notes
Reviewer Assignments
Requested reviewers:
Checklist (Author)
.envand dataset outputs are ignored