SONIC-O1 Release by AhmedRadwan02 · Pull Request #2 · VectorInstitute/sonic-o1

AhmedRadwan02 · 2026-01-15T02:28:38Z

SONIC PR Review Template (Leakage + Paths)

Summary

This PR introduces the initial SONIC-o1 release, adding the full end-to-end codebase and supporting documentation required to run the pipeline. It establishes the project structure and core modules for data curation, generation, and evaluation workflows, with a focus on reproducibility and avoiding leakage (no secrets, no sensitive paths, and large artifacts excluded).

Scope

Data curation (sonic-o1/01_data_curation)
Caption generation (sonic-o1/02_caption_generation)
Demographics annotation (sonic-o1/03_demographics_annotation)
VQA generation (sonic-o1/04_vqa_generation)
Evaluation / inference (sonic-o1/05_evaluation_inference)

What reviewers should focus on

Leakage / Secrets / Privacy
Paths / Reproducibility
Large files / datasets not committed

1) Leakage / Secrets / Privacy Checklist (Reviewer must verify)

No API keys, tokens, credentials, or private URLs committed
No .env file committed (only .env.example if needed)
No absolute user paths (e.g., /projects/..., /home/...) in code/docs unless clearly marked as examples
No personal identifiers or private dataset content accidentally committed
Logs / outputs don’t print secrets (e.g., env vars)

Sensitive patterns to search for (optional):

AKIA, AIza, sk-, hf_, ssh-rsa, BEGIN PRIVATE KEY
token=, api_key, secret, .pem, .key, .p12

2) Large Files / Dataset Hygiene Checklist

No large datasets committed (e.g., dataset/, vqa/, raw media)
.gitignore excludes generated artifacts + external repos as intended
If sample data is included, it is tiny + clearly marked as sample

Risks / Notes

None
Yes: ______________________________________

Reviewer Assignments

Requested reviewers:

Checklist (Author)

I confirmed no secrets in git history for this branch
I confirmed .env and dataset outputs are ignored
I updated READMEs with correct working directory + paths
I verified docs/paths are relative (no machine-specific absolute paths)

Copilot

Pull request overview

This PR introduces the initial SONIC-o1 release, establishing the complete end-to-end pipeline for multimodal video understanding evaluation. The codebase includes data curation, caption generation, demographics annotation, VQA generation, and comprehensive evaluation/inference workflows across 13 diverse topics. The implementation supports multiple state-of-the-art models (Gemini, Qwen3, MiniCPM, Uni-MoE, VITA, VideoLLaMA2, Phi-4, GPT-4o) with proper environment isolation and reproducible configurations.

Changes:

Added complete evaluation/inference pipeline with model wrappers, retry logic, and metrics computation (T1-T3 tasks)
Implemented VQA generation workflow with video segmentation, prompt engineering, and quality validation
Established configuration management system with environment-specific requirements and model-specific settings

Reviewed changes

Copilot reviewed 59 out of 81 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
sonic-o1/05_evaluation_inference/models_requirements/*.txt	Python dependency specifications for model-specific virtual environments
sonic-o1/05_evaluation_inference/models/*.py	Model wrapper implementations following BaseModel pattern
sonic-o1/05_evaluation_inference/metrics/*.py	Metrics computation for T1-T3 tasks with LLM judge integration
sonic-o1/05_evaluation_inference/inference/run_inference.py	Main inference orchestration with resume capability and error handling
sonic-o1/05_evaluation_inference/configs/models_config.yaml	Centralized model configuration with retry strategies
sonic-o1/04_vqa_generation/utils/video_segmenter.py	Video segmentation with FFmpeg and duration validation
sonic-o1/03_demographics_annotation/config.yaml	Demographics annotation configuration
sonic-o1/02_caption_generation/config_whisper.yaml	WhisperX caption generation settings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-16T12:53:38Z

sonic-o1/05_evaluation_inference/models/phi4.py

+            # Check for OOM
+            if "out of memory" in error_msg.lower() or "CUDA out of memory" in error_msg:
+                logger.error(f"OOM error: {error_msg[:200]}...")
+                self._clear_memory


Missing parentheses for method call _clear_memory. This line references the method but does not execute it. Should be self._clear_memory().

Suggested change

self._clear_memory

self._clear_memory()

AnanyaRaval · 2026-01-20T22:49:38Z

@AhmedRadwan02 Repo code needs to follow PEP standard. Must-haves:

Correct import format
All functions in the repo should have the same format for docstrings.
Wrap lines (max is 79 characters)
Indent function definitions and calls properly: https://peps.python.org/pep-0008/#indentation
Same for class docstrings.

[!NOTE] Creating a repo from aieng-uv-template adds Github actions to check for formatting errors. Please make sure to create a repository from the template next time. cc: @shainarazavi

sonic-o1/01_data_curation/README.md

sonic-o1/02_caption_generation/README.md

AnanyaRaval · 2026-01-22T14:58:15Z

@AhmedRadwan02 Repo code needs to follow PEP standard. Must-haves:

Correct import format

All functions in the repo should have the same format for docstrings.

Wrap lines (max is 79 characters)

Indent function definitions and calls properly: https://peps.python.org/pep-0008/#indentation

Same for class docstrings.

[!NOTE] Creating a repo from aieng-uv-template adds Github actions to check for formatting errors. Please make sure to create a repository from the template next time. cc: @shainarazavi

In your local repo:

Copy the .github folder from aieng-uv-template to sonic-o1.
Copy the following config to pyproject.toml:

[tool.mypy]
follow_imports = "normal"
ignore_missing_imports = false
install_types = true
pretty = true
non_interactive = true
allow_untyped_defs = false
no_implicit_optional = true
check_untyped_defs = true
namespace_packages = true
explicit_package_bases = true
warn_unused_configs = true
allow_subclassing_any = false
allow_untyped_calls = false
allow_incomplete_defs = false
allow_untyped_decorators = false
warn_redundant_casts = true
warn_unused_ignores = true
implicit_reexport = false
strict_equality = true
extra_checks = true
mypy_path = "src"

[tool.ruff]
include = ["*.py", "pyproject.toml", "*.ipynb"]
exclude = []
line-length = 88

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
docstring-code-format = true

[tool.ruff.lint]
select = [
    "A", # flake8-builtins
    "B", # flake8-bugbear
    "COM", # flake8-commas
    "C4", # flake8-comprehensions
    "RET", # flake8-return
    "SIM", # flake8-simplify
    "ICN", # flake8-import-conventions
    "Q", # flake8-quotes
    "RSE", # flake8-raise
    "D", # pydocstyle
    "E", # pycodestyle
    "F", # pyflakes
    "I", # isort
    "W", # pycodestyle
    "N", # pep8-naming
    "ERA", # eradicate
    "PL", # pylint
]

fixable = ["A", "B", "COM", "C4", "RET", "SIM", "ICN", "Q", "RSE", "D", "E", "F", "I", "W", "N", "ERA", "PL"]
ignore = [
    "B905", # `zip()` without an explicit `strict=` parameter
    "E501", # line too long
    "D203", # 1 blank line required before class docstring
    "D213", # Multi-line docstring summary should start at the second line
    "PLR2004", # Replace magic number with named constant
    "PLR0913", # Too many arguments
    "COM812", # Missing trailing comma
    "ERA001", # Found commented-out code (too many false positives with math comments)
    "A001", # Ignore variable `input` is shadowing a Python builtin (common for torch)
    "A002", # Ignore variable `input` is shadowing a Python builtin in function (common for torch)
    "D301", # r-strings for docstrings with backslashes
]

# Ignore import violations in all `__init__.py` files.
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["E402", "F401", "F403", "F811"]

[tool.ruff.lint.pep8-naming]
ignore-names = ["X*", "setUp"]

[tool.ruff.lint.isort]
lines-after-imports = 2

[tool.ruff.lint.pydocstyle]
convention = "numpy"

[tool.ruff.lint.pycodestyle]
max-doc-length = 88

Install mypy and pre-commit

uv tool install mypy
uv tool install pre-commit

See formatting errors and fix as many as you can.

uv run pre-commit run --all-files

No need to commit these changes. Just fix the errors you see from pre-commit.

AnanyaRaval · 2026-01-21T17:59:58Z

sonic-o1/03_demographics_annotation/model.py

+            import subprocess
+            import tempfile
+            import shutil


Move to top of file

AnanyaRaval · 2026-01-21T18:02:15Z

sonic-o1/03_demographics_annotation/model.py

+
+    def _build_prompt(self, metadata: Dict[str, Any], transcript_text: str) -> str:
+        """Build the analysis prompt with transcript embedded"""
+        from prompts import MAIN_PROMPT_TEMPLATE 


Moved to top of file.

AnanyaRaval · 2026-01-21T18:22:05Z

sonic-o1/03_demographics_annotation/README.md

@@ -0,0 +1,365 @@
+# Demographics Annotation with Gemini
+
+This directory handles automatic demographics annotation for videos using Google's Gemini multimodal model. It analyzes videos, audio, and captions to extract demographic information (race, gender, age, language) of people appearing in the videos.


Include an overview section here. Optional, but might be helpful to reiterate the list of topics here and link the previous README.md.

AnanyaRaval · 2026-01-21T18:32:16Z

sonic-o1/03_demographics_annotation/model.py

+                content = f.read()
+
+            # Simple SRT parser
+            import re


Move to top of file.

AnanyaRaval · 2026-01-21T18:36:24Z

sonic-o1/04_vqa_generation/config/vqa_config.yaml

@@ -0,0 +1,155 @@
+# VQA Generation Configuration


Move to 04_vqa_generation folder? There is only 1 file within config folder.

AnanyaRaval · 2026-01-21T21:40:30Z

sonic-o1/03_demographics_annotation/README.md

+### Check Annotation Quality
+
+```bash
+# Count videos with demographics
+jq '[.[] | select(.demographics_detailed != null)] | length' \
+  dataset/videos/01_Patient-Doctor_Consultations/metadata_enhanced.json
+
+# View specific annotation
+jq '.[] | select(.video_number == "001")' \
+  dataset/videos/01_Patient-Doctor_Consultations/metadata_enhanced.json
+```


Add details on how quality is checked with these commands.

AnanyaRaval · 2026-01-21T21:41:13Z

sonic-o1/03_demographics_annotation/README.md

+- The script processes videos in order by video number
+- Already processed videos are skipped automatically


Add to script features in Overview.

AnanyaRaval · 2026-01-21T21:42:43Z

sonic-o1/03_demographics_annotation/README.md

+    └── metadata_enhanced_checkpoint.json       # Checkpoint for resume
+```
+
+## Checkpoint and Resume


Review comment about including a checkpoint flag in the comment.

AnanyaRaval · 2026-01-21T22:00:22Z

sonic-o1/03_demographics_annotation/run_annotation.py

+            return True
+
+        # Check if it has the expected fields with actual values
+        required_fields = ['race', 'gender', 'age', 'language']


Can these be taken from YAML instead of hardcoding?

AnanyaRaval · 2026-01-21T22:05:16Z

sonic-o1/04_vqa_generation/README.md

+- `google-generativeai`
+- `openai`
+- `python-dotenv`
+- `pyyaml`
+- `tqdm`


Unnecessary. Delete.

AhmedRadwan02 added 14 commits January 13, 2026 15:12

Initial commit (clean repo)

33f3e08

Ignore optionalFiles and uv.lock

a5fb627

Remove optionalFiles and uv.lock from branch

d0fb179

Refactor: Reorganize project structure with numbered directories

6a12e45

Add vqa to gitignore

011311f

Remove backup files and demographics metrics from tracking

6e2901c

Remove visualitzation and all backup txt files from tracking

0428194

Remove backup_vllm_qwen3.txt from tracking

60df5a3

updated path

2932b57

last Readmes

9d32d16

Bridge: connect sonic-o1 history to main

9df7e89

-

fb0123d

-

7ca166b

Stop tracking evaluation scores output

4fb11d1

AhmedRadwan02 assigned AnanyaRaval, chuiyangmeng96, AhmedRadwan02 and shainarazavi Jan 15, 2026

AhmedRadwan02 and others added 4 commits January 14, 2026 21:36

fixing relative

a11e0c7

fixing default value

641315d

Delete sonic-o1/04_vqa_generation/check_empty_demographics.py

e8716a0

Delete sonic-o1/04_vqa_generation/check_failed_summary.py

dd31b9d

shainarazavi requested a review from Copilot January 16, 2026 12:52

Copilot AI reviewed Jan 16, 2026

View reviewed changes

AhmedRadwan02 added 2 commits January 19, 2026 21:50

"small fixes"

8dc487f

Merge remote deletions

42f2167

AnanyaRaval suggested changes Jan 20, 2026

View reviewed changes

AnanyaRaval suggested changes Jan 22, 2026

View reviewed changes

		@@ -0,0 +1,365 @@
		# Demographics Annotation with Gemini

		This directory handles automatic demographics annotation for videos using Google's Gemini multimodal model. It analyzes videos, audio, and captions to extract demographic information (race, gender, age, language) of people appearing in the videos.

		- The script processes videos in order by video number
		- Already processed videos are skipped automatically

Conversation

AhmedRadwan02 commented Jan 15, 2026

SONIC PR Review Template (Leakage + Paths)

Summary

Scope

What reviewers should focus on

1) Leakage / Secrets / Privacy Checklist (Reviewer must verify)

2) Large Files / Dataset Hygiene Checklist

Risks / Notes

Reviewer Assignments

Checklist (Author)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

AnanyaRaval commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnanyaRaval commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AnanyaRaval commented Jan 22, 2026 •

edited

Loading