Skip to content

Fix PerClassScorer overall totals deduplicating equal per-tag counts#578

Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/per-class-scorer-set-sum
Open

Fix PerClassScorer overall totals deduplicating equal per-tag counts#578
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/per-class-scorer-set-sum

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

PerClassScorer.get_metric computes the overall precision/recall/F1 by summing per-tag TP/FP/FN counts, but builds each sum from a set comprehension. Equal counts across tags collapse to a single element, so the overall totals (and the resulting precision/recall/F1) are understated whenever two or more classes share a count.

https://github.com/allenai/scispacy/blob/eacccd4/scispacy/per_class_scorer.py#L70-L79

sum_true_positives = sum(
    {v for k, v in self._true_positives.items() if k != "untyped"}
)
sum_false_positives = sum(
    {v for k, v in self._false_positives.items() if k != "untyped"}
)
sum_false_negatives = sum(
    {v for k, v in self._false_negatives.items() if k != "untyped"}
)

Example: with _true_positives = {'A': 5, 'B': 5, 'C': 3} the current code computes sum({5, 3}) = 8 instead of 5 + 5 + 3 = 13.

Root cause

The {} braces make each argument a set comprehension, which deduplicates equal values before sum() sees them.

Fix

Drop the braces so each expression is a generator expression passed directly to sum(); every per-tag count is now summed as intended. Behaviour is unchanged when all per-tag counts are distinct.

The overall-span totals used set comprehensions (sum({v for k, v in ...}))
so two tags with equal TP/FP/FN counts collapsed to a single value before
summing, understating the overall totals whenever multiple classes share
a count.

Example: _true_positives = {'A': 5, 'B': 5, 'C': 3} gave sum({5, 3}) = 8
instead of 5 + 5 + 3 = 13.

Switch to generator expressions so every per-tag count is summed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant