Skip to content

Fix set-zip misalignment in PerClassScorer.__call__#571

Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/per-class-scorer-set-zip-misalignment
Open

Fix set-zip misalignment in PerClassScorer.__call__#571
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/per-class-scorer-set-zip-misalignment

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

In PerClassScorer.__call__(), untyped_predicted_spans is built as a set comprehension (line 21), then zipped with predicted_spans (a list) on line 23. Since Python sets are unordered, the i-th element yielded by iterating the set does not correspond to the i-th element of the list. This means untyped_span and span in each loop iteration refer to unrelated predictions, corrupting both the typed and untyped precision/recall/F1 metrics.

A secondary issue: if multiple predicted spans share the same (start, end) but differ in label, the set deduplicates them, making it shorter than the list. zip silently stops at the shorter iterable, so some predictions are never evaluated.

Root cause

Line 23 assumes set iteration order matches list order, which is not guaranteed.

Fix

Derive untyped_span directly from span inside the loop body instead of zipping with the set. This guarantees the untyped version always corresponds to the correct typed prediction.

untyped_predicted_spans is a set (unordered), so zipping it with
predicted_spans (a list) pairs unrelated spans together. Derive
untyped_span directly from span inside the loop instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant