This repository contains materials for "Web Scraping", a workshop taught by the Digital Scholarship Group at Harvard University.
The workshop teaches participants how to automate the extraction of data from websites and other online repositories into a well-formatted, locally stored dataset, for later analysis. Web scraping tools make the process of collecting large amounts of online information more efficient, and help automate an otherwise tedious, time-consuming, and error prone process.
The workshop includes an introduction to web structures and provides direct, hands-on experience with a series of scraping techniques that run the gamut from simple to complex, including tools for batch file downloading, a full workflow using browser extensions only, and advanced HTML and DOM parsing techniques using Python.