Skip to content

Fix HyponymDetector mutating module-level BASE_PATTERNS#577

Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/hyponym-detector-mutates-base-patterns
Open

Fix HyponymDetector mutating module-level BASE_PATTERNS#577
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/hyponym-detector-mutates-base-patterns

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

HyponymDetector.__init__ with extended=True mutates the module-level BASE_PATTERNS list in-place, so later HyponymDetector instantiations pick up the extended patterns even when extended=False, and repeated extended=True constructions duplicate EXTENDED_PATTERNS inside BASE_PATTERNS each time.

https://github.com/allenai/scispacy/blob/eacccd4/scispacy/hyponym_detector.py#L41-L43

self.patterns = BASE_PATTERNS
if extended:
    self.patterns.extend(EXTENDED_PATTERNS)

Root cause

self.patterns = BASE_PATTERNS binds the attribute to the same list object imported from scispacy.hearst_patterns. Calling self.patterns.extend(...) then mutates the module-level list rather than a private copy.

Fix

Copy the list: self.patterns = list(BASE_PATTERNS). BASE_PATTERNS is left untouched, each detector holds its own patterns list, and behavior is unchanged when only a single extended detector is created.

self.patterns = BASE_PATTERNS binds self.patterns to the module-level
list. When extended=True, self.patterns.extend(EXTENDED_PATTERNS) mutates
BASE_PATTERNS itself, so:

- every subsequent HyponymDetector (including extended=False) inherits
  the extended patterns, and
- creating multiple extended detectors duplicates EXTENDED_PATTERNS in
  BASE_PATTERNS each time.

Copy the list so self.patterns is independent of the module-level data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant