Skip to content

Develop a graph neural network to integrate proteomic and genomic features for cancer subtype classification (e.g. CPTAC-2 and CPTAC-3 breast, colorectal, and ovarian cancer data)

License

Notifications You must be signed in to change notification settings

collaborativebioinformatics/ClassiGraph

Repository files navigation

ClassiGraph, a Colon Adenocarcinoma GNN Based Classifier

Develop a Graph Neural Network (GNN) to integrate proteomic and genomic features for cancer subtype classification (e.g. CPTAC-2 and CPTAC-3 breast, colorectal, and ovarian cancer data)

Development Environment

  1. Install GithubDesktop or git.
  2. Clone the repository
git clone https://github.com/collaborativebioinformatics/Proteomic_Genomic_Cancer_KG.git
cd Proteomic_Genomic_Cancer_KG
  1. Install anaconda or mini-conda
  2. Create conda environment
conda create -n cptac-kgnn python==3.12
  1. Install python dependencies
pip install -e .

Methods

Dataset Aggregation

  • CPTAC COAD proteomic dataset (tumor + normal), somatic mutations, clinical subtype files using Consensus Molecular Subtypes (CMS) annotations.
  • Cleaning, filtering, and merging the data along with dropping the samples with not enough data.
  • Median imputation with missingness mask channel.

Knowledge Graph Construction

Protein-Protein Interaction (PPI)

image

GNNs

  • GNNMutation - heterogeneous graph-based framework for cancer detection (training soon!)
  • MVGNN - for predicting cancer differentiation and subtype classification
  • MoGCN - a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis

Training Basic GNN

image

Literature Review

Knowledge Graphs

Oncology

Next Steps?

  • Clinical Knowledge Graph (CKG), HUGE resource for analysis of proteomics clinical data.
  • DisGeNET, relationships between diseases and human diseases
  • DrugBank
  • DrugCentral
  • Entrez Gene
  • MONDO disease ontology
  • Rectome pathway database
  • Side effects knowledgebase
  • ...
  • You get the idea! So many more data can be added to aid classification. LLMs can be used to help bridge the gap between terms (PrimeKG style), and even images (segmentation / obj. detection) can be integrated!

About

Develop a graph neural network to integrate proteomic and genomic features for cancer subtype classification (e.g. CPTAC-2 and CPTAC-3 breast, colorectal, and ovarian cancer data)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7