feat: add --decrypt mode for password-protected PDFs#54
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new --decrypt mode to the passifypdf CLI so the tool can both encrypt plain PDFs (default) and decrypt password-protected PDFs, with shared input validation and expanded test coverage for success/failure scenarios.
Changes:
- Add
decrypt_pdf(...)and a shared_load_pdf_reader(...)validation helper in core. - Extend CLI parsing/help text with
--decryptand route to encrypt vs decrypt inmain(). - Update docs and add unit/integration tests (including CLI subprocess coverage) for decrypt flows.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
passifypdf/encryptpdf.py |
Adds decrypt functionality, shared PDF validation, and dual-mode main() execution. |
passifypdf/cli.py |
Introduces --decrypt flag and updates help text for encrypt/decrypt usage. |
tests/unittests/test_encryptpdf.py |
Adds unit tests for decrypt success and explicit decrypt failure cases; adds PDF validation tests. |
tests/integrationtests/test_encryptpdf_integration.py |
Adds integration + CLI subprocess tests for decrypt success and failure scenarios. |
docs/CLI_OPTIONS.md |
Documents --decrypt option and adds decrypt example. |
README.md |
Adds decrypt sample run to README usage section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| invalid_pdf_error = ValueError(f"Input file '{input_pdf}' is not a valid PDF file.") | ||
| with input_path.open("rb") as source_file: | ||
| if source_file.read(5) != b"%PDF-": |
There was a problem hiding this comment.
The PDF “magic bytes” validation is too strict: checking only the first 5 bytes for "%PDF-" can reject otherwise valid PDFs where the header appears later in the first 1024 bytes (a case many PDF parsers tolerate). Consider relying on PdfReader(...) for validation, or scanning the first chunk (e.g., 1KB) for the header rather than requiring it at offset 0.
| if source_file.read(5) != b"%PDF-": | |
| header_chunk = source_file.read(1024) | |
| if b"%PDF-" not in header_chunk: |
| with input_path.open("rb") as source_file: | ||
| if source_file.read(5) != b"%PDF-": | ||
| raise invalid_pdf_error | ||
|
|
||
| try: | ||
| reader = PdfReader(input_path) | ||
| except PdfReadError as exc: | ||
| raise invalid_pdf_error from exc |
There was a problem hiding this comment.
_load_pdf_reader opens the input file to read the header, then PdfReader opens/parses it again. This adds redundant I/O for every call. If you keep the header check, consider doing it in a way that avoids double-opening (or just let PdfReader be the single source of truth for validity).
| def _load_pdf_reader(input_pdf: Union[str, Path]) -> Tuple[Path, PdfReader]: | ||
| """Validate that input is an existing PDF file and return its reader.""" | ||
| input_path = Path(input_pdf) |
There was a problem hiding this comment.
_load_pdf_reader is typed/structured to return (Path, PdfReader), but current callers use _, reader = ... and ignore the Path. Consider returning only the PdfReader (or using the path) to keep the helper’s contract minimal.
Summary
--decryptCLI flag to support decrypting password-protected PDFsChanges
--decryptand update help text for dual-mode operationdecrypt_pdf(...)and shared PDF validation for both encrypt/decrypt pathsValidation
uv run pytest tests/unittests/ -v --cov=passifypdf --cov-report=xml --cov-report=termuv run pytest tests/integrationtests/ -vpassifypdf/Sample_PDF.pdf:Closes #28