Skip to content

SONIC-O1 Release#2

Open
AhmedRadwan02 wants to merge 20 commits intomainfrom
sonic-o1
Open

SONIC-O1 Release#2
AhmedRadwan02 wants to merge 20 commits intomainfrom
sonic-o1

Conversation

@AhmedRadwan02
Copy link
Collaborator

SONIC PR Review Template (Leakage + Paths)

Summary

This PR introduces the initial SONIC-o1 release, adding the full end-to-end codebase and supporting documentation required to run the pipeline. It establishes the project structure and core modules for data curation, generation, and evaluation workflows, with a focus on reproducibility and avoiding leakage (no secrets, no sensitive paths, and large artifacts excluded).

Scope

  • Data curation (sonic-o1/01_data_curation)
  • Caption generation (sonic-o1/02_caption_generation)
  • Demographics annotation (sonic-o1/03_demographics_annotation)
  • VQA generation (sonic-o1/04_vqa_generation)
  • Evaluation / inference (sonic-o1/05_evaluation_inference)

What reviewers should focus on

  1. Leakage / Secrets / Privacy
  2. Paths / Reproducibility
  3. Large files / datasets not committed

1) Leakage / Secrets / Privacy Checklist (Reviewer must verify)

  • No API keys, tokens, credentials, or private URLs committed
  • No .env file committed (only .env.example if needed)
  • No absolute user paths (e.g., /projects/..., /home/...) in code/docs unless clearly marked as examples
  • No personal identifiers or private dataset content accidentally committed
  • Logs / outputs don’t print secrets (e.g., env vars)

Sensitive patterns to search for (optional):

  • AKIA, AIza, sk-, hf_, ssh-rsa, BEGIN PRIVATE KEY
  • token=, api_key, secret, .pem, .key, .p12

2) Large Files / Dataset Hygiene Checklist

  • No large datasets committed (e.g., dataset/, vqa/, raw media)
  • .gitignore excludes generated artifacts + external repos as intended
  • If sample data is included, it is tiny + clearly marked as sample

Risks / Notes

  • None
  • Yes: ______________________________________

Reviewer Assignments

Requested reviewers:

Checklist (Author)

  • I confirmed no secrets in git history for this branch
  • I confirmed .env and dataset outputs are ignored
  • I updated READMEs with correct working directory + paths
  • I verified docs/paths are relative (no machine-specific absolute paths)

@shainarazavi shainarazavi requested a review from Copilot January 16, 2026 12:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the initial SONIC-o1 release, establishing the complete end-to-end pipeline for multimodal video understanding evaluation. The codebase includes data curation, caption generation, demographics annotation, VQA generation, and comprehensive evaluation/inference workflows across 13 diverse topics. The implementation supports multiple state-of-the-art models (Gemini, Qwen3, MiniCPM, Uni-MoE, VITA, VideoLLaMA2, Phi-4, GPT-4o) with proper environment isolation and reproducible configurations.

Changes:

  • Added complete evaluation/inference pipeline with model wrappers, retry logic, and metrics computation (T1-T3 tasks)
  • Implemented VQA generation workflow with video segmentation, prompt engineering, and quality validation
  • Established configuration management system with environment-specific requirements and model-specific settings

Reviewed changes

Copilot reviewed 59 out of 81 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
sonic-o1/05_evaluation_inference/models_requirements/*.txt Python dependency specifications for model-specific virtual environments
sonic-o1/05_evaluation_inference/models/*.py Model wrapper implementations following BaseModel pattern
sonic-o1/05_evaluation_inference/metrics/*.py Metrics computation for T1-T3 tasks with LLM judge integration
sonic-o1/05_evaluation_inference/inference/run_inference.py Main inference orchestration with resume capability and error handling
sonic-o1/05_evaluation_inference/configs/models_config.yaml Centralized model configuration with retry strategies
sonic-o1/04_vqa_generation/utils/video_segmenter.py Video segmentation with FFmpeg and duration validation
sonic-o1/03_demographics_annotation/config.yaml Demographics annotation configuration
sonic-o1/02_caption_generation/config_whisper.yaml WhisperX caption generation settings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Check for OOM
if "out of memory" in error_msg.lower() or "CUDA out of memory" in error_msg:
logger.error(f"OOM error: {error_msg[:200]}...")
self._clear_memory
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing parentheses for method call _clear_memory. This line references the method but does not execute it. Should be self._clear_memory().

Suggested change
self._clear_memory
self._clear_memory()

Copilot uses AI. Check for mistakes.
@AnanyaRaval
Copy link
Collaborator

@AhmedRadwan02 Repo code needs to follow PEP standard. Must-haves:

  • Correct import format
  • All functions in the repo should have the same format for docstrings.
  • Wrap lines (max is 79 characters)
  • Indent function definitions and calls properly: https://peps.python.org/pep-0008/#indentation
  • Same for class docstrings.

[!NOTE] Creating a repo from aieng-uv-template adds Github actions to check for formatting errors. Please make sure to create a repository from the template next time. cc: @shainarazavi

@AnanyaRaval
Copy link
Collaborator

AnanyaRaval commented Jan 22, 2026

@AhmedRadwan02 Repo code needs to follow PEP standard. Must-haves:

  • Correct import format
  • All functions in the repo should have the same format for docstrings.
  • Wrap lines (max is 79 characters)
  • Indent function definitions and calls properly: https://peps.python.org/pep-0008/#indentation
  • Same for class docstrings.

[!NOTE] Creating a repo from aieng-uv-template adds Github actions to check for formatting errors. Please make sure to create a repository from the template next time. cc: @shainarazavi

In your local repo:

  1. Copy the .github folder from aieng-uv-template to sonic-o1.
  2. Copy the following config to pyproject.toml:
[tool.mypy]
follow_imports = "normal"
ignore_missing_imports = false
install_types = true
pretty = true
non_interactive = true
allow_untyped_defs = false
no_implicit_optional = true
check_untyped_defs = true
namespace_packages = true
explicit_package_bases = true
warn_unused_configs = true
allow_subclassing_any = false
allow_untyped_calls = false
allow_incomplete_defs = false
allow_untyped_decorators = false
warn_redundant_casts = true
warn_unused_ignores = true
implicit_reexport = false
strict_equality = true
extra_checks = true
mypy_path = "src"

[tool.ruff]
include = ["*.py", "pyproject.toml", "*.ipynb"]
exclude = []
line-length = 88

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
docstring-code-format = true

[tool.ruff.lint]
select = [
    "A", # flake8-builtins
    "B", # flake8-bugbear
    "COM", # flake8-commas
    "C4", # flake8-comprehensions
    "RET", # flake8-return
    "SIM", # flake8-simplify
    "ICN", # flake8-import-conventions
    "Q", # flake8-quotes
    "RSE", # flake8-raise
    "D", # pydocstyle
    "E", # pycodestyle
    "F", # pyflakes
    "I", # isort
    "W", # pycodestyle
    "N", # pep8-naming
    "ERA", # eradicate
    "PL", # pylint
]

fixable = ["A", "B", "COM", "C4", "RET", "SIM", "ICN", "Q", "RSE", "D", "E", "F", "I", "W", "N", "ERA", "PL"]
ignore = [
    "B905", # `zip()` without an explicit `strict=` parameter
    "E501", # line too long
    "D203", # 1 blank line required before class docstring
    "D213", # Multi-line docstring summary should start at the second line
    "PLR2004", # Replace magic number with named constant
    "PLR0913", # Too many arguments
    "COM812", # Missing trailing comma
    "ERA001", # Found commented-out code (too many false positives with math comments)
    "A001", # Ignore variable `input` is shadowing a Python builtin (common for torch)
    "A002", # Ignore variable `input` is shadowing a Python builtin in function (common for torch)
    "D301", # r-strings for docstrings with backslashes
]

# Ignore import violations in all `__init__.py` files.
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["E402", "F401", "F403", "F811"]

[tool.ruff.lint.pep8-naming]
ignore-names = ["X*", "setUp"]

[tool.ruff.lint.isort]
lines-after-imports = 2

[tool.ruff.lint.pydocstyle]
convention = "numpy"

[tool.ruff.lint.pycodestyle]
max-doc-length = 88
  1. Install mypy and pre-commit
uv tool install mypy
uv tool install pre-commit
  1. See formatting errors and fix as many as you can.
uv run pre-commit run --all-files

No need to commit these changes. Just fix the errors you see from pre-commit.

Comment on lines +135 to +137
import subprocess
import tempfile
import shutil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to top of file


def _build_prompt(self, metadata: Dict[str, Any], transcript_text: str) -> str:
"""Build the analysis prompt with transcript embedded"""
from prompts import MAIN_PROMPT_TEMPLATE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to top of file.

@@ -0,0 +1,365 @@
# Demographics Annotation with Gemini

This directory handles automatic demographics annotation for videos using Google's Gemini multimodal model. It analyzes videos, audio, and captions to extract demographic information (race, gender, age, language) of people appearing in the videos.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include an overview section here. Optional, but might be helpful to reiterate the list of topics here and link the previous README.md.

content = f.read()

# Simple SRT parser
import re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to top of file.

@@ -0,0 +1,155 @@
# VQA Generation Configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to 04_vqa_generation folder? There is only 1 file within config folder.

Comment on lines +337 to +347
### Check Annotation Quality

```bash
# Count videos with demographics
jq '[.[] | select(.demographics_detailed != null)] | length' \
dataset/videos/01_Patient-Doctor_Consultations/metadata_enhanced.json

# View specific annotation
jq '.[] | select(.video_number == "001")' \
dataset/videos/01_Patient-Doctor_Consultations/metadata_enhanced.json
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add details on how quality is checked with these commands.

Comment on lines +361 to +362
- The script processes videos in order by video number
- Already processed videos are skipped automatically
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to script features in Overview.

└── metadata_enhanced_checkpoint.json # Checkpoint for resume
```

## Checkpoint and Resume
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comment about including a checkpoint flag in the comment.

return True

# Check if it has the expected fields with actual values
required_fields = ['race', 'gender', 'age', 'language']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be taken from YAML instead of hardcoding?

Comment on lines +43 to +47
- `google-generativeai`
- `openai`
- `python-dotenv`
- `pyyaml`
- `tqdm`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary. Delete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants