Skip to content

harvard-digital-history/Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web-Scraping

This repository contains materials for "Web Scraping", a workshop taught by the Digital Scholarship Group at Harvard University.

The workshop teaches participants how to automate the extraction of data from websites and other online repositories into a well-formatted, locally stored dataset, for later analysis. Web scraping tools make the process of collecting large amounts of online information more efficient, and help automate an otherwise tedious, time-consuming, and error prone process.

The workshop includes an introduction to web structures and provides direct, hands-on experience with a series of scraping techniques that run the gamut from simple to complex, including tools for batch file downloading, a full workflow using browser extensions only, and advanced HTML and DOM parsing techniques using Python.

About

Materials for the "Web Scraping" workshop.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •