Welcome to pdf-site-extractor! This tool helps you crawl websites and extract PDF files effortlessly. You can manage your sessions interactively, making your PDF extraction smooth and user-friendly.
To get started, you will need to download the software. Click the button above to visit the Releases page.
Before downloading, ensure your computer meets these basic requirements:
- Operating System: Windows, macOS, or Linux
- Python: Version 3.6 or higher installed on your machine
- Internet connection for crawling websites
- Interactive Command Line Interface (CLI): Navigate through options easily and enjoy a user-friendly experience.
- Session Management: Organize multiple sessions and resume them whenever you need to.
- Dependency Management: The tool uses UV-based management for smooth installations.
-
Visit the Releases Page: Go to the Releases page to find the latest version.
-
Download the Application: Locate the most recent release. Select the appropriate file for your operating system to download.
-
Run the Installer: Once the download is complete, locate the file on your computer and double-click it to run the installer.
-
Follow Installation Prompts: Complete the installation process by following the on-screen instructions.
After installation, launch the application from your programs menu or desktop shortcut. Hereβs a quick guide on how to use the software:
- Open Command Line: Launch the tool by opening your command line interface (CLI).
- Start a New Session: Type in
start sessionto create a new extraction session. - Enter the Website URL: Input the URL of the website you wish to crawl for PDFs.
- Choose Options: Select from the interactive options to set extraction preferences, like saving options or specific directories.
- Start Crawling: Type
extractto begin crawling the website and extracting PDFs.
You can manage your sessions effectively with these commands:
- List Sessions: To view all active sessions, use the command
list sessions. - Resume a Session: To return to an existing session, type
resume [session name]. - End Session: Use
end sessionwhen youβre done with your work.
- Always check the website's terms of service before crawling.
- Organize your extracted PDFs into separate folders to avoid confusion.
- Regularly update the tool to enjoy the latest features and improvements.
To keep your experience smooth, regularly check for updates on the Releases page. Updated versions may have new features or important fixes.
If you encounter issues or have questions, feel free to reach out through the GitHub repositoryβs Issues section. The community and maintainers are here to help.
This project is open-source and available to use under the MIT License. For more details, please check the license file in the repository.
For further reading and tips, you can refer to the following:
Feel free to explore and enjoy the power of automated PDF extraction with pdf-site-extractor!