-
-
Notifications
You must be signed in to change notification settings - Fork 182
Update pacer.py to fix price/cost of transcripts, which are not capped #5990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
What sort of validation have you done here? The The The actual transcript event has docket text like "Transcript of Evidentiary Hearing held in Hampden Courtroom as to Nia Dinzey held on April 24, 2025, before Judge Mark G. Mastroianni. Court Reporter Name and Contact Information: Leigh Gershowitz at [email protected] The Transcript may be purchased through the Court Reporter, viewed at the public terminal, or viewed through PACER after it is released. Redaction Request due 8/6/2025. Redacted Transcript Deadline set for 8/18/2025. Release of Transcript Restriction set for 10/14/2025. (DRK)" But my neighbor to the north in D. New Hampshire skips the two-event process and just dockets a Transcript event with text like "TRANSCRIPT of Proceedings for Motion Hearing held on May 21, 2025. Court Reporter: Susan Bateman, Telephone # 603-225-1453. Transcript is available for public inspection, but may not be copied or otherwise reproduced, at the Clerk's Office for a period of 90 days. Additionally, only attorneys of record and pro se parties with an ECF login and password who purchase a transcript from the court reporter will have access to the transcript through PACER during this 90-day period. If you would like to order a copy, please contact the court reporter at the above listed phone number. "NOTICE: Any party who requests an original transcript has 21 days from service of this notice to determine whether it is necessary to redact any personal identifiers and, if so, to electronically file a Redaction Request. "Redaction Request Follow Up 8/1/2025. Redacted Transcript Follow Up 8/11/2025. And, say, D.D.C. is like New Hampshire: "TRANSCRIPT OF REMEDIES HEARING PROCEEDINGS - DAY 1 MORNING SESSION before Judge Amit P. Mehta held on April 21, 2025; Page Numbers: 1-130. Court Reporter/ Transcriber: William Zaremba; Email: [email protected]. Transcripts may be ordered by submitting the Transcript Order Form "For the first 90 days after this filing date, the transcript may be viewed at the courthouse at a public terminal or purchased from the court reporter referenced above. After 90 days, the transcript may be accessed via PACER. Other transcript formats, (multi-page, condensed, PDF or ASCII) may be purchased from the court reporter. "NOTICE RE REDACTION OF TRANSCRIPTS: The parties have twenty-one days to file with the court and the court reporter any request to redact personal identifiers from this transcript. If no such requests are filed, the transcript will be made available to the public via PACER without redaction after 90 days. The policy, which includes the five personal identifiers specifically covered, is located on our website at www.dcd.uscourts.gov. Redaction Request due 7/30/2025. Redacted Transcript Deadline set for 8/9/2025. But searching for those strings is probably not sufficient. I don't remember if I've ever seen one, but if somebody files "OBJECTION to [23] TRANSCRIPT OF REMEDIES HEARING PROCEEDINGS - DAY 1 MORNING SESSION," then that would presumably not be something that should be captured. Ninety-four districts, ninety-four bankruptcy courts, 13 Courts of Appeals (with BAPs), some jay pee em ell and international trade, 204 ways of doing things. Oh, and don't forget those torts. |
@johnhawkinson I see what you mean. I was initially looking at the documents on the prayers leaderboard and noticed a bunch were called "transcript" but yeah I see some have no RECAPDocument.description, and "transcript" isn't universal either. This docket has transcripts with RECAPDocument.description set to "Transcript (CR)", for example doc number 76, which is listed as $3 but because it's a transcript, people will be billed for 38 pages or $3.80. And I guess parsing the "Transaction Receipt" page from users is perhaps more trouble than it's worth? ![]() |
( I assume it is obvious that "Transcript (CR)" means "Transcript (Criminal).")
I don't see why it would be more trouble than it is worth? We already do some parsing of that, and it seems like it is guaranteed to be more reliable than any method that attempts to guess, particularly given what I outlined above. Given the choice between trying to predict what someone else will do, and just trying the thing, we should always try the thing unless it is somehow meaningfully more expensive. (I think a problem is that we have a substantial corpus of wrongly parsed pricing for transcripts, but why should we let that stop us going forward?) |
I support fixing this issue, but I'm not sure this is the best solution. I think one of the better things we could do is collect the prices from the transaction receipt pages, at least for new documents going forward. |
Because transcripts are single-document items (to the best of my knowledge, that is always true), there is no attachment page, so this only comes from the RSS feed. So it is absent for those districts that do not include transcripts in their RSS feeds (esp. those with no RSS feeds at all). Why would we choose a heuristic we know is unreliable when there is one with perfect fidelity? (As noted previously, there are several variants, including "Transcript (CR)"; obviously that's a simple substring search in the short description, though, but it is not quite as simple as the match you describe.) |
I like it because it takes one fewer scrape of PACER and we're already doing a lot of scraping these days. |
My attempt at fixing Issue #5429 where cost/price of transcripts aren't being accurately calculated because they are not capped at $3.
I just check the RECAPDocument.description field is transcript (case insensitive). Would we also have to check if its docketentry.description contains something like "NOTICE OF FILING OF OFFICIAL TRANSCRIPT"?
I also added some constants, which perhaps could be imported from elsewhere.