This repository has now been archived as of 16/12/2021. This repository has not been maintained since May 2020
A programme written by Sailesh Patel (160034811) designed to scrape information from course programme specification PDFs, as a part of the FYP project, A Chatbot for Assisting University Admission Process, supervised by Dr Sylvia Wong at Aston University.
- Clone the repository
- Install the required technologies listed above (the links are to their respective installation instructions)
Note PIP is not required, but would be beneficial to install Tabula-Py, BeautifulSoup, and Requests
- Please ensure that all the software requirements have been met before executing the program
- To execute the program, run the command
python3 programme-scraper.py - To run the PDF scraper
- Type
Pand pressEnter - Type the PDF file in without the
.pdfextension and pressEnterBScComputerScienceshows the PDF scraper workingBScDigitalDegreeApprenticeshipshows the PDF scraper not working
- Type
- To run the web scraper
- Type
Wand pressEnter- Type
EASfor the school and pressEnter - Type the website you would like to scrape
- Type
https://www2.aston.ac.uk/study/courses/computer-science-bscto show the web scraper working - Type
https://www2.aston.ac.uk/study/courses/chemistry-bscto show the web scraper fail to format the text inside the Entry Requirements & Fees for 2020
- Type
- Type
- Type
All Rights Reserved