Skip to content

fix(tokenizer): Don't drop nominative reporter due to overlapping CitationTokens #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

branliu0
Copy link
Contributor

@branliu0 branliu0 commented Jul 7, 2025

I found that eyecite was not parsing "Calderon v. Thompson, 523 U.S. 538" correctly because Thompson is also a nominative reporter.

>>> text = "Calderon v. Thompson, 523 U.S. 538"
>>> cite = eyecite.get_citations(text)[0]
>>> cite
FullCaseCitation('523 U.S. 538', groups={'volume': '523', 'reporter': 'U.S.', 'page': '538'}, metadata=FullCaseCitation.Metadata(parenthetical=None, pin_cite=None, pin_cite_span_start=None, pin_cite_span_end=None, year=None, month=None, day=None, court='scotus', plaintiff='', defendant='Calderon', extra=None, antecedent_guess=None, resolved_case_name_short=None, resolved_case_name=None))

>>> text[cite.full_span()[0]:cite.full_span()[1]]
'. Thompson, 523 U.S. 538'

(I used AI to help me with fixing this bug). The issue appears to be that when we have a token overlap, and we drop the token with the nominative reporter, the nominative reporter is no longer accounted for by any token.

@branliu0
Copy link
Contributor Author

branliu0 commented Jul 7, 2025

This example makes it more clear that what's happening is that "Thompson, " is just being ignored when calculating full_span:

>>> text = "Cal v. Thompson, 523 U.S. 538"
>>> cite = eyecite.get_citations(text)[0]
>>> text[cite.full_span()[0]:cite.full_span()[1]]
'mpson, 523 U.S. 538'

@mlissner mlissner requested a review from flooie July 8, 2025 14:07
@mlissner mlissner moved this to To Do in Case Law Sprint Jul 8, 2025
@mlissner mlissner added the slow-review Volunteer-led work that is scheduled for review soon, but not immediately label Jul 8, 2025
@flooie flooie moved this from To Do to Late July in Case Law Sprint Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
slow-review Volunteer-led work that is scheduled for review soon, but not immediately
Projects
Status: Late July
Development

Successfully merging this pull request may close these issues.

3 participants