Develop a Graph Neural Network (GNN) to integrate proteomic and genomic features for cancer subtype classification (e.g. CPTAC-2 and CPTAC-3 breast, colorectal, and ovarian cancer data)
- Install GithubDesktop or git.
- Clone the repository
git clone https://github.com/collaborativebioinformatics/Proteomic_Genomic_Cancer_KG.git
cd Proteomic_Genomic_Cancer_KG
- Install anaconda or mini-conda
- Create conda environment
conda create -n cptac-kgnn python==3.12
- Install python dependencies
pip install -e .
- CPTAC COAD proteomic dataset (tumor + normal), somatic mutations, clinical subtype files using Consensus Molecular Subtypes (CMS) annotations.
- Cleaning, filtering, and merging the data along with dropping the samples with not enough data.
- Median imputation with missingness mask channel.

- GNNMutation - heterogeneous graph-based framework for cancer detection (training soon!)
- MVGNN - for predicting cancer differentiation and subtype classification
- MoGCN - a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis

- A knowledge graph to interpret clinical proteomics data,
- KG-Hub-building and exchanging biological knowledge graphs
- Building a knowledge graph to enable precision medicine
- Democratizing knowledge representation with BioCypher
- Clinical Knowledge Graph (CKG), HUGE resource for analysis of proteomics clinical data.
- DisGeNET, relationships between diseases and human diseases
- DrugBank
- DrugCentral
- Entrez Gene
- MONDO disease ontology
- Rectome pathway database
- Side effects knowledgebase
- ...
- You get the idea! So many more data can be added to aid classification. LLMs can be used to help bridge the gap between terms (PrimeKG style), and even images (segmentation / obj. detection) can be integrated!