Description
In freelawproject/courtlistener#5244 we found that there are some PDFs that contains two headers with different document number each one.
This happens because the document is being uploaded to both the district and appellate cases, which can lead to retrieving the wrong document number.
We should tweak the get_document_number_from_pdf
method to return the right document number.
Mike proposed:
One way to deal with this is to have different regexes for parsing each header, and to only use the appellate regex on appellate stuff, and vice versa. Would that work?
In this case, as part of the microservice request, should we pass the court type? Then, depending on the docket number format, we can parse one header or the other.
One question remains: can a bankruptcy case end up in an appellate court as well? If that's the case, and a bankruptcy document is uploaded to an appellate case, relying solely on the docket number format might not be reliable, since appellate and bankruptcy docket formats can be the same.
An alternative approach would be to return not just a single document_number, but a list of document_numbers and the docket numbers found in the document. Then, the client can decide which one to use by comparing it to the docket number of the case being processed.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status