Skip to content

feat(scrapers): allow cloning of existing dockets #5867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

florean
Copy link
Contributor

@florean florean commented Jul 3, 2025

Currently, clone_from_cl crashes if you try to clone a docket that already exists in the database. So if you want to pull in new docket entries and documents for a docket, you have to manually delete all of the docket's existing entries and documents. This is the minimal amount of changes required (read: fast and dirty) to allow a docket's entries to be updated. I've used it a number of times and it's been very useful.

While this works great, I am going to do a larger rewrite of clone_from_cl to simplify the logic and make updates much more efficient. So I have a few questions (and feel free to move this to an Issue or Discussion):

  1. Do dockets, docket entries, or RECAPDocuments (and associated metadata) ever change? My sense is that dockets might be no, docket entries are rarely, and RECAPDocuments are common as PACER documents get added to the system.
  2. What other objects should be updated? Seems like Person is an obvious one.

I think my approach would be to only add new objects and add a --force option that would update everything.

Changes DocketEntry and RECAPDocument cloning to use `update_or_create`,
enabling the cloning of Dockets that already exist in the database.  This is a
quick but inefficient method to enable dockets to be updated with new entries
and documents.
@mlissner
Copy link
Member

mlissner commented Jul 3, 2025

This looks fine to me, but @quevon24 is the owner of this bit of code, so I'll leave it to him to merge.

As for changes, dockets do change, yes. We get them bit by bit some cases, so today's metadata may be less complete than tomorrow's. I think I'd invert your logic and assume that people using this script want the content overwritten.

Personally, I'd be annoyed at having to look up the --force flag every time. Maybe just consider that the default and have a warning:

Beginning clone process. This will overwrite any local data you may have (use --no-clobber to prevent this).

Press any key to proceed...

This uses the clobbering approach, terminology, and flag from the mv command, which might be familiar to folks?

@mlissner mlissner requested a review from quevon24 July 3, 2025 13:31
@mlissner mlissner moved this to To Do in Case Law Sprint Jul 3, 2025
@flooie flooie moved this from To Do to Late July in Case Law Sprint Jul 11, 2025
@flooie flooie moved this from Late July to Mid July in Case Law Sprint Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Mid July
Development

Successfully merging this pull request may close these issues.

3 participants