A basic repo for all of the automated HODP scraping scripts
Please refer to CONTRIBUTING.md for instructions on how to add your own scraper.
- If you have access, ssh into the instance, and run
sudo suto login again as the root user. - Navigate to the project and run
source hodp/bin/activate
./resolve_reqs
./init_crontab
crontab scrape.tab- Add logging system
- Add unit tests
- 05/25/19 #2 (kevalii)
- Reverted the routing referenced in #1
- Added
init_crontab.sh,write_cron.py, andresolve_reqs.py.write_cron.pywrites cron jobs to a crontab (scrape.tab) using an API provided by python-crontab package. You can still write cron jobs directly intoscrape.tab; this just provides a perhaps more organized way of writing cron jobs in.- NOTE:
scrape.tabisn't provided in the repo. Eithertouchit locally or executeinit_crontab.sh
- NOTE:
init_crontab.shexecuteswrite_cron.pybut it does not set the crontab. Make any changes toscrape.taband then executecrontab scrape.tab.resolve_reqs.pygoes through all therequirements.txts in each subdirectory ofscrapers/and installs the dependencies, updating the root directory'srequirements.txtas well.
scrapers/crime/scrape_crime.pyno longer features the scrape function referenced in #1.
- 05/25/19 #1 (kevalii)
- Set up routing, enabling us to add more scrapers (and schedule them) in a sustainable manner.
- Added
gocrimsonscraper at/scrape/gocrimsonand a corresponding cron job.- While the scraper has also been added to this repo, it is actually executed by an GCloud function that uses a local copy of the source code for the scraper. For the future, we'll have to adjust this so that the function is sourced from this repo instead.
- Modified
crimescraper to route to/scrape/crime.- Wrapped relevant code in a
scrapefunction in the renamed src filescrape_crime.pyso that the scraper is executed by a call toscrapeinstead of just running at the top-level.
- Wrapped relevant code in a
- Moved to each scraper to a respective folder in
scrapers/that also contains each scraper's respective dependencies in arequirements.txt.