Welcome to the technical documentation for the University Course & Fee Scraper application. This documentation is structured to help you understand the architecture, maintain the codebase, and extend its functionality.
Product handover: The completed handover confirmation document is available here: Product_Handover_Confirmation.pdf.
The documentation is split into three main layers:
- Frontend Documentation
- Learn about the React application, component structure, and API integration.
- Database Documentation
- Understand the Prisma schema, data models, and relationships.
- Backend Documentation
- Application & API: Details on the Express.js server, routing, and authentication.
- Scraper Engine: How UCAS import, Prisma rows,
manager.ts,config.ts, and custom adapters work together.
- Node.js 20+
- PostgreSQL
- npm or yarn
-
Clone the repository
-
Install dependencies for both backend and frontend:
cd backend && npm install cd ../frontend && npm install
-
Setup Environment Variables
- Create a
.envfile in thebackenddirectory based on.env.example. - At minimum, it should contain:
DATABASE_URL=postgresql://user:password@localhost:5432/courses_dev PORT=5001
- Create a
-
Initialize Database
cd backend npx prisma generate npx prisma migrate deployFor a clean local reset, use:
cd backend npx prisma migrate reset --forceThis deletes local database data and reapplies all migrations.
-
Import UCAS Data
cd backend npx ts-node src/ucas_job.tsThis imports universities, courses, course URLs, and course options from UCAS. The scraper works from these database rows and fills missing fee values.
-
Run the App
To start backend and frontend together:
cd backend npm run devOr run them separately:
# Terminal 1 cd backend npm run dev:server # Terminal 2 cd frontend npm run dev
Scrape missing fees for one university:
cd backend
npx ts-node src/scrapers/manager.ts --universityIds="UNIVERSITY_ID"Scrape one course for one university:
cd backend
npx ts-node src/scrapers/manager.ts --universityIds="UNIVERSITY_ID" --q="Course Name"View the database and find university IDs:
cd backend
npx prisma studioScraper logs are written to backend/logs/scrape-*.log.
For the full scraper workflow, see Scraper Engine.
This application is designed to scrape course information and tuition fees from various UK university websites. It provides a dashboard to view the scraped data, manage scraping tasks, and export data.
- Automated Scraping: Configurable scrapers for different university website structures.
- Data Standardization: Normalizes diverse fee structures into a common format.
- Dashboard: A user-friendly interface to trigger scrapes and view results.
- Excel Export: Download scraped data for analysis.