Skip to content

fix(extraction): Handle sequential citations better #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

branliu0
Copy link
Contributor

@branliu0 branliu0 commented Jul 11, 2025

When two citations are right next to each other (i.e., within BACKWARD_SEEK), the citation extraction can be wrong.

I was testing with the string West v. Atkins, 487 U.S. 42, 54-58 (1988), Polk Cty. v. Dodson, 454 U.S. 312, 325-26 (1981), and Monell v. Department of Soc. Servs., 436 U.S. 658, 694 (1978)

Before this fix, the second and third citations would have missing plaintiff, defendant, and pincite. Now with this small fix, those three fields are populated correctly.

I'm not super happy with this fix, but it doesn't break any tests and it'll be helpful for me. Sharing this PR here in case it's of interest or if you have better ideas for how to fix.

The main thing I'm not happy with is word_str.endswith("),"), which feels too tightly coupled to the specific example I have. But I'm going to move on for now and I'll come back to this when/if I find more real-life examples.

@mlissner mlissner requested a review from flooie July 11, 2025 18:10
@mlissner mlissner moved this to To Do in Case Law Sprint Jul 11, 2025
@flooie flooie moved this from To Do to Late July in Case Law Sprint Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Late July
Development

Successfully merging this pull request may close these issues.

2 participants