Skip to content

fix(citation): Ensure full_span is aligned for parallel citations, fix full_span_end #288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

branliu0
Copy link
Contributor

It seems that the full_span_end can sometimes differ for a parallel citation due to the way POST_FULL_CITATION_REGEX is defined. Under certain conditions, it can end up matching to the next citation as opposed to the end of the current citation. However, we can trust that the post citation matching worked correctly for the first of the parallel citations.

The example I came across is "Kaiser Steel Corp. v. W.S. Ranch Co., 391 U.S. 593, 598, 88 S. Ct. 1753, 20 L.Ed.2d 835 (1968). We have previously held that the automatic stay provisions of the Bankruptcy Code may toll the statute of limitations under the Warsaw Convention, which is the precursor to the Montreal Convention. See Zicherman v. Korean Air Lines Co., Ltd., 516 F.3d 1237, 1254 (11th Cir. 2008)"

The third of the three parallel citations has a full_span_end that goes all the way to the end of the text. I think it's because POST_FULL_CITATATION_REGEX ends up matching with the citation at the end of the text.

@branliu0
Copy link
Contributor Author

Okay I found the root cause issue. In POST_FULL_CITATION_REGEX, the "court" group is matching a lot of text, perhaps because it's trying to match to a court + month/year regex before trying to match to just a date. I fixed this by ensuring that the "court" group can't include a closing parenthesis. Another option might be to limit the length of the court group to something like 20 or 30 characters.

Here's the part of the regex in question:

        (?:
            (?:
                (?P<court>[^)]*?) # treat anything before date as court
                (?= # lookahead to stop when we see a month or year
                    \s+{MONTH_REGEX} |
                    \s+{YEAR_REGEX}
                )
            )?
            \ ?
            (?P<month>{MONTH_REGEX})?\ ?   # optional month
            (?P<day>\d{{1,2}})?\,?\ ?      # optional day and comma
            (?P<year>{YEAR_REGEX})         # year is required
        )

@branliu0 branliu0 changed the title fix(citation): Ensure full_span is aligned for parallel citations fix(citation): Ensure full_span is aligned for parallel citations, fix full_span_end Jul 16, 2025
@mattdahl
Copy link
Contributor

I recently edited this regex (#243) and I think introduced this error. I think excluding closing parentheses from the court group makes sense -- indeed, it looks like something like that was previously there that I foolishly removed, probably because I didn't understand what it was doing (https://github.com/freelawproject/eyecite/pull/243/files#diff-cfcb2df6a1c6cb15160f5093212c9443c150c91c6c4062c0a4ca162553b2e2d4L298).

So I would just suggest adding a brief comment to the court group line noting that closing parentheses are being intentionally excluded. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Development

Successfully merging this pull request may close these issues.

2 participants