Skip to content

(retriever) Skip page-elements model download when only pdfium text e…#1726

Open
edknv wants to merge 2 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-skip-pe-download
Open

(retriever) Skip page-elements model download when only pdfium text e…#1726
edknv wants to merge 2 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-skip-pe-download

Conversation

@edknv
Copy link
Copy Markdown
Collaborator

@edknv edknv commented Mar 25, 2026

…xtraction is needed

Description

Skip downloading the page-elements model (nvidia/nemotron-page-elements-v3) from HuggingFace when it isn't needed. Page-elements detection is only required for table, chart, and infographic extraction, or for text extraction when the method uses OCR (pdfium_hybrid or ocr). This avoids an unnecessary model download and GPU allocation for the common case of pdfium-only text extraction.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@edknv edknv requested review from a team as code owners March 25, 2026 17:24
@edknv edknv requested a review from nkmcalli March 25, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants