Skip to content

Sonic o1 legacy#5

Open
AhmedRadwan02 wants to merge 31 commits intomainfrom
sonic-o1-legacy
Open

Sonic o1 legacy#5
AhmedRadwan02 wants to merge 31 commits intomainfrom
sonic-o1-legacy

Conversation

@AhmedRadwan02
Copy link
Copy Markdown
Collaborator

SONIC PR Review Template

Summary

This PR addresses previous review comments from the initial SONIC-o1 release (PRs 01-03) by implementing comprehensive code cleanup, adding detailed docstrings, and improving code quality across the pipeline. The focus of this PR is on the VQA generation (04_vqa_generation) and Evaluation/Inference (05_evaluation_inference) modules, with particular attention to code organization, documentation, and maintainability.

Scope

  • ✅ Data curation (sonic-o1/01_data_curation) - previous PR comments addressed
  • ✅ Caption generation (sonic-o1/02_caption_generation) - previous PR comments addressed
  • ✅ Demographics annotation (sonic-o1/03_demographics_annotation) - previous PR comments addressed
  • 🔍 VQA generation (sonic-o1/04_vqa_generation) - primary focus of this PR
  • 🔍 Evaluation / inference (sonic-o1/05_evaluation_inference) - primary focus of this PR

What reviewers should focus on

  1. Code quality improvements in 04_ and 05_ modules
  2. Docstring completeness and clarity
  3. Code organization and refactoring
  4. Leakage / Secrets / Privacy (standard checks)
  5. Paths / Reproducibility (standard checks)

Changes Implemented

Previous PR Comments Addressed (01-03)

  • Cleaned up code structure and removed redundancies
  • Added comprehensive docstrings to all functions and classes
  • Improved error handling and logging
  • Standardized coding conventions across modules

Primary Focus: 04_vqa_generation

  • Added detailed docstrings for all VQA generation functions
  • Improved code readability and organization
  • Enhanced error handling
  • Standardized parameter naming and documentation

Primary Focus: 05_evaluation_inference

  • Comprehensive docstring coverage for evaluation pipeline
  • Code refactoring for better maintainability
  • Improved logging and debugging capabilities
  • Clearer separation of concerns

1) Leakage / Secrets / Privacy Checklist (Reviewer must verify)

  • ✅ No API keys, tokens, credentials, or private URLs committed
  • ✅ No .env file committed (only .env.example if needed)
  • ✅ No absolute user paths (e.g., /projects/..., /home/...) in code/docs unless clearly marked as examples
  • ✅ No personal identifiers or private dataset content accidentally committed
  • ✅ Logs / outputs don't print secrets (e.g., env vars)

2) Large Files / Dataset Hygiene Checklist

  • ✅ No large datasets committed (e.g., dataset/, vqa/, raw media)
  • ✅ .gitignore excludes generated artifacts + external repos as intended
  • ✅ If sample data is included, it is tiny + clearly marked as sample

Risks / Notes

  • ☐ None
  • ☐ Yes: ______________________________________

Reviewer Assignments

Requested reviewers:

Checklist (Author)

  • ✅ I confirmed no secrets in git history for this branch
  • ✅ I confirmed .env and dataset outputs are ignored
  • ✅ I updated READMEs with correct working directory + paths
  • ✅ I verified docs/paths are relative (no machine-specific absolute paths)
  • ✅ I added comprehensive docstrings to all functions in 04_ and 05_ modules
  • ✅ I performed code cleanup and refactoring for improved maintainability
  • ✅ I addressed all comments from previous PRs (01-03)

return prompt


def build_batch_validation_system_prompt() -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can make this a variable instead of function.

class Config:
"""Configuration wrapper"""
"""Configuration wrapper with nested attribute access."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move Config and load_config to utils/config_utils. I see them being used in 4 files.

summarizer = SummarizationModel(config)
# Check for existing Task 1 output
task1_output_file = output_dir / 'task1_summarization' / f"{topic_id:02d}_{topic_name.replace(' ', '_')}.json"
task1_output_file = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can modularize this:

def get_task_files(task_filter):
    : 
     return model, existing_tasks

config: Config,
output_dir: Path,
task_filter: str = None) -> tuple:
def process_topic(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to go through tasks is a bit convoluted. Can simplify this to process topic for 1 task and call it from main.


# Check summary_short for failures (more specific patterns)
summary_short = entry.get('summary_short', [])
summary_short = entry.get("summary_short", [])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can modularize code into following functions per task: get_summary_detailed ,
get_summary_failed,
skip_task() function returning a bool
get_confidence

and then call task specific functions to process videos.

This way seems more modular.

Comment on lines +386 to +403
def _format_option_with_letter(self, option: str, letter: str) -> str:
"""Format option with letter prefix, removing any existing prefix."""
cleaned = option.strip()
# Remove any existing letter prefix
for existing_letter in self.option_letters:
if cleaned.startswith(f"({existing_letter})"):
cleaned = cleaned[3:].strip()
break
return f"({letter}) {cleaned}"

def _format_options_with_letters(self, options: List[str]) -> List[str]:
"""Format all options with correct letter prefixes."""
formatted = []
for i, option in enumerate(options):
letter = self.option_letters[i]
formatted.append(self._format_option_with_letter(option, letter))
return formatted

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like unnecessary processing with nested for loops. Can just replace the returned options with A, B if needed (If I understand it correctly).

"glossary": merged_summary.get("glossary", []),
"demographics": demographics.get("demographics", []),
"confidence": merged_summary.get("confidence", 0.0),
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be another modular function like _create_entry().

return valid_questions, stats

def _validate_single_question(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can divide this function into multiple ones - check_absolute, convert_abs_to_relative etc.

"correction_reason", "timestamps adjusted"
)
question["rationale_model"] += f" [Judge corrected: {reason}]"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can initialize message variable within if conditions - message = 'GPT-4v corrected timetsamp or message = GPT-4v validated. Then return this dict.

@AnanyaRaval
Copy link
Copy Markdown
Collaborator

@AhmedRadwan02 Reviewed the 4th folder. I'll need to come back to the 5th folder as the PR is quite large and the review is taking considerable time. For future PRs, it would be helpful to keep them smaller so we can ensure a thorough and efficient review. Thanks!

@AnanyaRaval
Copy link
Copy Markdown
Collaborator

AnanyaRaval commented Mar 9, 2026

@AhmedRadwan02 Reviewed the 4th folder. I'll need to come back to the 5th folder as the PR is quite large and the review is taking considerable time. For future PRs, it would be helpful to keep them smaller so we can ensure a thorough and efficient review. Thanks!

Overall, the PR looks good. I like the use of classes, private functions, and prompt templates. The code appears to have improved from folder 1 -> 5 :) Logging is also fairly comprehensive and useful for debugging later PRs. Areas of improvement for future PRs:

  1. Smaller classes and functions make code more readable. More modularization can also help with reuse of code.
  2. Creating classes for each modality (video, audio, transcripts etc.) for CRUD operations can again help with reuse.
  3. Creating common utilities for files etc. can again help with reuse.
  4. Explore the use of decorators such as retry. It helps reduce code for for loops.
  5. Explore the use of Pydantic library to reduce JSON parsing code.

@AhmedRadwan02
Copy link
Copy Markdown
Collaborator Author

AhmedRadwan02 commented Mar 17, 2026

PR Summary – Folder 04

Modularization

  • main.py — extracted helpers, skip-logic functions, and TASKS registry to replace if/else blocks
  • standardize_demographics.py — split standardize_value into per-field functions with a dispatcher, extracted shared helpers
  • fill_empty_demographics.py — combined loops, extracted segment helpers, used copy.deepcopy() to prevent mutation
  • models/ — extracted helpers across base_gemini, temporal_question_judge, mcq_model, summarization_model
  • prompts/ — converted function to module-level constant

Shared Utilities

  • utils/config_utils.py and utils/file_utils.py consolidated from duplicated code

Dry-Run Mode

  • --dry-run flag added across main.py, fill_empty_demographics.py, and model classes

Bug Fixes

  • Fixed early return in temporal_localization_model.py (only first segment was processed)
  • Fixed non-ASCII causing SyntaxError, missing Optional import, dead code removal

Lint / Mypy / Docs

  • Fixed ruff errors, added __init__.py files, updated README.md

Reviewers

@AnanyaRaval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants