Skip to content

michelleAlexan/NDDs

Repository files navigation

Welcome to the Multi-omics Interactive Dashboard for Neurodegenerative Diseases (MIND NDDs)

Run the Dashboard App

Execute src/app/run.py



This app has been a project under the supervision of Dr. Nicolas Ruffini and can be considered as a side project of his multi-omics meta analysis on neurodegenerative diseases (Ruffini et al., 2020, and Ruffini et al. 2022).
The objective is to collect transcriptomics, proteomics, and genomics data on neurodegenerative diseases from various studies and databases, and to standardize the structure across all datasets to ensure uniformity and, consequently, to facilitate further data analysis. The MIND NDD app serves as an interactive tool for visualizing, searching, filtering and downloading that data.

References

Ruffini N, Klingenberg S, Heese R, Schweiger S, Gerber S. The Big Picture of Neurodegeneration: A Meta Study to Extract the Essential Evidence on Neurodegenerative Diseases in a Network-Based Approach. Front Aging Neurosci. 2022;14:866886. doi: 10.3389/fnagi.2022.866886. PMID: 35832065; PMCID: PMC9271745.

Ruffini N, Klingenberg S, Schweiger S, Gerber S. Common Factors in Neurodegeneration: A Meta-Study revealing Shared Patterns on a Multi-Omics Scale. Cells. 2020;9(12):2642. doi: 10.3390/cells9122642.

The following information is found on the "Information"-page of the App:

Data Handling

Important Notice

Please note that this code has not undergone a formal review. Errors may have occurred during data collection and processing.

Data Retrieval

All data can be directly retrieved from the NDDs.db of this repository.

ER Diagram of the NDDs Database Model (created with Miro)

Data Handling - General

All following steps on data handling (except data collection) were performed using Python 3.12, and, mainly, the pandas, sqlalchemy, and dash packages. Details can be found in the repository.

Data Handling - Collection

Online literature research was performed to obtain multi-omics data on neurodegenerative diseases. Hereby, only human samples were considered. Keywords such as (“Alzheimer”, ”Parkinson”, ”Huntington”) were used for searching appropriate studies. The data sources for the meta analysis of Ruffini 2020 were a starting point and were considered if the data was publicly available. Different search tools / databases were used:

Transcriptomics Data

Proteomics Data

Genomics Data

QTL Data

Data Handling - Processing

The workflow of processing the data depends on the omics-level.

Transcriptomics Data

  • Extract the meta data on the dataset / study
    • Manual process, must be figured out from the dataset / study information provided along with the data
  • Read the data
    • Necessary attributes: gene symbol, uniprot accession id, p-value, fold change
  • Drop data points if
    • nan values for either gene symbol, p-value or fold change
    • p-values < 0.05
    • gene symbol cannot be found in the HGNC database
      • If the gene symbol is an “alias” or a “previous” symbol, it is exchanged to the approved symbol. The database for this was manually downloaded and is not updated automatically.
  • Look up the UBERON id of the sample tissue(s) on
    • OLS website
    • This could be done automatically, however, due to efficiency reasons, this way was preferred.
  • Look up the CL id of the cell type(s) on
    • OLS website
    • Only needed for single cell / single nucleus sequencing experiments
  • Only genes that are HGNC approved will end up in the transcriptomics table. In the genes table, only the genes of transcriptomics dataset will end in the genes table if the MyGenes packages finds an entry for that particular gene.

Proteomics Data

  • Extract the meta data on the dataset / study
    • Manual process, must be figured out from the dataset / study information provided along with the data
  • Read the data
    • Necessary attributes: gene symbol, uniprot accession id, p-value, fold change
  • Wrangle data so that
    • One protein per datapoint applies
    • Some proteomics dataset have multiple genes / proteins listed for one data point. However, the data model of this database expects only one unique foreign key in the proteomics table that references one gene / protein only. The datapoint will then be duplicated for each gene-protein pair.
  • Drop data points if
    • nan values for either gene symbol, p-value or fold change
    • p-values < 0.05
    • gene symbol cannot be found in the HGNC database
      • If the gene symbol is an “alias” or a “previous” symbol, it is exchanged to the approved symbol. The database for this was manually downloaded and is not updated automatically.
  • If the base of the log fold change is not already 2, convert it (e.g. from log10 FC to log2 FC)
  • Only genes that are HGNC approved will end up in the proteomics table. Out of those genes, only those for which the MyGenes package finds an entry will end up in the genes database table.
  • All proteins within the dataset will end up in the proteomics table. The information the status of each protein is “reviewed”, ”not found”, “unreviewed” or “inactive” and whether the protein is an isoform can be found in the proteins table and can be used for customized filtering.

Genomics Data

  • The GWAS data for various neurodegenerative diseases is downloaded from the website.
    • This is a manual process and not updated automatically.
  • This is done as well for the Open Target Platform.
    • This is a manual process and not updated automatically.
  • The data are merged so that association scores to genomics data can be used for filtering.
  • Drop data points if

QTL Data

  • Single Tissue eQTL
    • All gene variants collected from genomics data are used to create GTEX request JSON files. The corresponding eQTL values for each tissue are then returned.
  • Single Tissue sQTL
    • All gene variants collected from genomics data are used to create GTEX request JSON files. The corresponding sQTL values for each tissue are then returned.

Sample Site Ontology

  • For both tissues and cell types, ontology IDs (UBERON for tissues, CL for cell types) are requested, and the corresponding data is saved in the tissue and cell type tables.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors