(retriever) Skip page-elements model download when only pdfium text e… by edknv · Pull Request #1726 · NVIDIA/NeMo-Retriever

edknv · 2026-03-25T17:24:11Z

…xtraction is needed

Description

Skip downloading the page-elements model (nvidia/nemotron-page-elements-v3) from HuggingFace when it isn't needed. Page-elements detection is only required for table, chart, and infographic extraction, or for text extraction when the method uses OCR (pdfium_hybrid or ocr). This avoids an unnecessary model download and GPU allocation for the common case of pdfium-only text extraction.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

…xtraction is needed

(retriever) Skip page-elements model download when only pdfium text e…

d86f686

…xtraction is needed

edknv requested review from a team as code owners March 25, 2026 17:24

edknv requested a review from nkmcalli March 25, 2026 17:24

Merge branch 'main' into edwardk/retriever-skip-pe-download

43af658

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(retriever) Skip page-elements model download when only pdfium text e…#1726

(retriever) Skip page-elements model download when only pdfium text e…#1726
edknv wants to merge 2 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-skip-pe-download

edknv commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edknv commented Mar 25, 2026

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants