Skip to content

minn scraper sometimes stuck in captcha challenge #1461

Open
@grossir

Description

@grossir

See an example of the saved HTML in S3. We should ask them to whitelist our IP? If they won't answer, and this is an automated challenge, we should have a subset of scrapers that makes requests every 24 hours, instead of every single hour

Image

This is making our coverage spotty. For example, from the 5 most recent opinions we only have 2

Image

I was trying to run ./manage.py cl_back_scrape_citations --courts juriscraper.opinions.united_states.state.minn --backscrape-start=2024/10/01 --backscrape-end=2024/10/20 --verbosity 3 for #858 (comment) and noticed suspicious 0 results

Metadata

Metadata

Assignees

Labels

scraper blockedSuspicion the court website may be blocking our scraper

Type

No type

Projects

Status

Mid July

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions