Skip to content

Fix with_more_than_one_type_count counting concepts with exactly one type#575

Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/export-umls-json-type-count-off-by-one
Open

Fix with_more_than_one_type_count counting concepts with exactly one type#575
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/export-umls-json-type-count-off-by-one

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

`scripts/export_umls_json.py` prints summary statistics per concept. The aliases block uses:

```python
with_one_alias_count += 1 if len(concept['aliases']) == 1 else 0
with_more_than_one_alias_count += 1 if len(concept['aliases']) > 1 else 0
```

but the types block uses `>= 1` for the "more than one" counter:

```python
with_one_type_count += 1 if len(concept['types']) == 1 else 0
with_more_than_one_type_count += 1 if len(concept['types']) >= 1 else 0
```

Root cause

`>= 1` is true for every concept with at least one type, so every single-type concept is counted under both `with_one_type_count` and `with_more_than_one_type_count`. The two buckets should be disjoint — matching the aliases pattern and the variable's own name.

Fix

Change `>= 1` to `> 1` so "more than one type" means strictly greater than one, mirroring the aliases block.

…type

export_umls_json.py prints per-concept summary statistics. The aliases
block pairs 'one alias' (== 1) with 'more than one alias' (> 1). The
types block pairs 'one type' (== 1) with 'more than one type' (>= 1),
so every concept with >= 1 type is counted under both with_one_type_count
and with_more_than_one_type_count, inflating the 'more than one type'
statistic by the count of single-type concepts.

Change >= 1 to > 1 to match the aliases pattern and the variable's name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant