-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Gene identifier mapping is essential for integrating datasets that use different ID schemes. Currently, hvantk lacks support for HGNC (HUGO Gene Nomenclature Committee) identifiers, which are the gold standard
for stable human gene identification.
Key challenges:
- ClinGen uses hgnc_id (e.g., HGNC:1100) as primary gene identifiers
- Ensembl uses gene_id (e.g., ENSG00000012048)
- Many datasets use gene_symbol (e.g., BRCA1), which can change over time or have aliases
Without a unified mapping utility, users must handle ID conversion externally, leading to potential mismatches and data loss.
Proposed Solution
Add HGNC as a new data source with:
- HGNC Downloader - Fetch the official HGNC dataset
- HGNC Table Builder - Create a Hail Table with gene ID mappings
- Gene ID Mapper Utility - Provide bidirectional mapping between ID types
HGNC Data Source
URL: https://www.genenames.org/download/statistics-and-files/
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels