Skip to content

A custom crawler for old "Handschriften-Faksimile" websites generating json data by reading out all pages

Notifications You must be signed in to change notification settings

domsteinbach/hssfaks-data-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

HSSFAKS DATA EXTRACTOR

Extracts all data from the Handschriften Faksimiles pages as json.

It does so by iterating all pages via their nextPage() function of the base websites and reads out all data displayed at every page. In the end you get a JSON file containing all data extracted as a download.

HowTo run:

  • import pageParser into template: <script src="pageParser.js" type="text/javascript"></script>
  • call readOutManuscript(), e.g. create a button to call the readOutManuscript() function Get data into normal form!
  • open the html of the faksimile in a browser and start at the very first page of the very first manuscript and hit the button. You will get a download of a JSON file containing all data needed to build a proper website or database.

About

A custom crawler for old "Handschriften-Faksimile" websites generating json data by reading out all pages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published