GitHub - Biswajit1693/PythonPDFOperation: Python Pdf Library Operation

Python PDF library Operation

Reading & Extracting Text

Python supports many libraries for extracting information from PDF files.One of the important library is PyPDF2. In order to read a pdf document in python we import the package PyPDF2.

       Syntax:
   
         import PyPDF2
         a=PyPDF2.PdfReader('sample.pdf')  # Reads the pdf into the python environment(Here pdf name is sample)
         print(a.getNumPages())            # Prints the number of pages in pdf
         page=a.pages[0]                   # selects and assigns the page into the variable 'page'
         print(page.extract_text())        # Prints the extracted text from the specified page in line 15

Important Note-

Whenever importing pdf its might show error-"file or directory not found".In that case make sure that the pdf is defined in your python environment. Its directory should be in your python file else it throws error most of the time.
If copying the file path into the python code make sure to use "r" before the file path.It will convert the normal string into raw string

Example- x='hello\nPython' Print(x)

hello Python #here the string will create a new line with backlash

x=r'hello\nPython' print(x)

hello\nPython # here it treats the backlash as normal character so its prints the raw string as it is.
While importing file into python code make sure to check the directory path and if there is any issues like these you can use 'r'.

We can import pdf file from any directory but make sure to open the pdf first then use pdfreader.

    Example-
       import PyPDF2
       pdf_object=open(r"path","rb")  #need to mention "rb" for reader
       z=PyPDF2.PdfReader(pdf_object)

PDF FILE TO AUDIO

Extracting text to speech in a pdf file

    Syntax:
    import PyPDF2                        # pdf importer
    import pyttsx3                       # text-speech library

    reader=PyPDF2.PdfReader("sample.pdf") # reads pdf file
    speak=pyttsx3.init()                  # to get a reference to a pyttsx3 for drivers support 2 voice m/f with help of “sapi5”      
    page=reader.pages[0]
    text=page.extract_text()              # extract text from page
    speak.say(text)                       # passes input taxt to be spoken
    speak.runAndWait()                    # it process the voice command
    speak.stop()                          # it stops the speech at the end of text

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
pdfaudio.py		pdfaudio.py
pdfopn.py		pdfopn.py
pypopen.py		pypopen.py
reversetexttospeechpdf.py		reversetexttospeechpdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Biswajit1693/PythonPDFOperation

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages