Skip to content

okla, oklacivapp, oklacrimapp blocking document download #1460

Open
@grossir

Description

@grossir

This has been happening since May 29. So, if we fix it ASAP, we won't have to make a backscraper to fill any possible gaps

It's loading the source page; but not the individual opinion pages. On standalone and local Juriscraper it's working fine, but it's blocked on the server...
python sample_caller.py -c juriscraper.opinions.united_states.state.okla --verbosity 3 -b

Checking on a server shell, it confirms the server IP is specifically blocked

In [1]: import requests

In [2]: r = requests.get("https://www.oscn.net/applications/oscn/deliverdocument.asp?citeid=548589")
Out[2]: <Response [403]>

In [3]: r.text
# below...

Image

Sentry Issue: COURTLISTENER-9YJ

HTTPError: 403 Client Error: Forbidden for url: https://www.oscn.net/applications/oscn/deliverdocument.asp?citeid=548272
(1 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 400, in handle
    self.parse_and_scrape_site(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 364, in parse_and_scrape_site
    self.scrape_court(site, options["full_crawl"])
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 261, in scrape_court
    self.ingest_a_case(
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 298, in ingest_a_case
    content = get_binary_content(item["download_urls"], site)
  File "cl/scrapers/utils.py", line 304, in get_binary_content
    r.raise_for_status()

Metadata

Metadata

Assignees

Labels

scraper blockedSuspicion the court website may be blocking our scraperscraper down

Type

No type

Projects

Status

Mid July

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions