Implement code changes to enhance functionality and improve performance #161
+991
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is a fix to the issue on
Issue langchain-ai/langchain-community#187
, and includes updates to the PDF parsing logic and dependency management, as well as minor configuration changes. The most important changes involve improving error handling and refining theextract_images_from_page
method, adding new PDF-related dependencies, and simplifying configuration settings inpyproject.toml
.PDF Parsing Improvements:
libs/community/langchain_community/document_loaders/parsers/pdf.py
: Enhanced theextract_images_from_page
method to handle cases where/Resources
might be missing in the page object and added error handling to log warnings when image extraction fails. [1] [2]Dependency Management:
libs/community/pyproject.toml
: Added new dependencies (pdfminer-six
,pdfplumber
,pymupdf
,pypdf
, andunstructured
) under a new[pdf]
section to support PDF-related functionality.Configuration Simplification:
libs/community/pyproject.toml
: Updatedmypy
configuration values to use lowercasetrue
for better consistency and readability.