fix(extraction): Handle sequential citations better #286
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When two citations are right next to each other (i.e., within BACKWARD_SEEK), the citation extraction can be wrong.
I was testing with the string
West v. Atkins, 487 U.S. 42, 54-58 (1988), Polk Cty. v. Dodson, 454 U.S. 312, 325-26 (1981), and Monell v. Department of Soc. Servs., 436 U.S. 658, 694 (1978)
Before this fix, the second and third citations would have missing plaintiff, defendant, and pincite. Now with this small fix, those three fields are populated correctly.
I'm not super happy with this fix, but it doesn't break any tests and it'll be helpful for me. Sharing this PR here in case it's of interest or if you have better ideas for how to fix.
The main thing I'm not happy with is
word_str.endswith("),")
, which feels too tightly coupled to the specific example I have. But I'm going to move on for now and I'll come back to this when/if I find more real-life examples.