Skip to content

Add bo767 subset of digital_corpora_10k annotations#1730

Open
KyleZheng1284 wants to merge 1 commit intoNVIDIA:mainfrom
KyleZheng1284:kyzheng/add-bo767-annotations
Open

Add bo767 subset of digital_corpora_10k annotations#1730
KyleZheng1284 wants to merge 1 commit intoNVIDIA:mainfrom
KyleZheng1284:kyzheng/add-bo767-annotations

Conversation

@KyleZheng1284
Copy link
Copy Markdown
Member

Filter digital_corpora_10k_annotations.csv to only include the 767 PDFs in the bo767 evaluation dataset. Results in 1,005 annotation rows across 426 PDFs that have matching entries in the original annotations.

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • [] If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

Filter digital_corpora_10k_annotations.csv to only include the 767 PDFs
in the bo767 evaluation dataset. Results in 1,005 annotation rows across
426 PDFs that have matching entries in the original annotations.

Made-with: Cursor
@KyleZheng1284 KyleZheng1284 requested review from a team as code owners March 25, 2026 21:21
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant