Welcome to the Multi-omics Interactive Dashboard for Neurodegenerative Diseases (MIND NDDs)

Run the Dashboard App

Execute src/app/run.py

This app has been a project under the supervision of Dr. Nicolas Ruffini and can be considered as a side project of his multi-omics meta analysis on neurodegenerative diseases (Ruffini et al., 2020, and Ruffini et al. 2022).
The objective is to collect transcriptomics, proteomics, and genomics data on neurodegenerative diseases from various studies and databases, and to standardize the structure across all datasets to ensure uniformity and, consequently, to facilitate further data analysis. The MIND NDD app serves as an interactive tool for visualizing, searching, filtering and downloading that data.

References

Ruffini N, Klingenberg S, Heese R, Schweiger S, Gerber S. The Big Picture of Neurodegeneration: A Meta Study to Extract the Essential Evidence on Neurodegenerative Diseases in a Network-Based Approach. Front Aging Neurosci. 2022;14:866886. doi: 10.3389/fnagi.2022.866886. PMID: 35832065; PMCID: PMC9271745.

Ruffini N, Klingenberg S, Schweiger S, Gerber S. Common Factors in Neurodegeneration: A Meta-Study revealing Shared Patterns on a Multi-Omics Scale. Cells. 2020;9(12):2642. doi: 10.3390/cells9122642.

The following information is found on the "Information"-page of the App:

Data Handling

Important Notice

Please note that this code has not undergone a formal review. Errors may have occurred during data collection and processing.

Data Retrieval

All data can be directly retrieved from the NDDs.db of this repository.

Data Handling - General

All following steps on data handling (except data collection) were performed using Python 3.12, and, mainly, the pandas, sqlalchemy, and dash packages. Details can be found in the repository.

Data Handling - Collection

Online literature research was performed to obtain multi-omics data on neurodegenerative diseases. Hereby, only human samples were considered. Keywords such as (“Alzheimer”, ”Parkinson”, ”Huntington”) were used for searching appropriate studies. The data sources for the meta analysis of Ruffini 2020 were a starting point and were considered if the data was publicly available. Different search tools / databases were used:

Transcriptomics Data

Proteomics Data

Genomics Data

QTL Data

GTEX

Data Handling - Processing

The workflow of processing the data depends on the omics-level.

Transcriptomics Data

Extract the meta data on the dataset / study
- Manual process, must be figured out from the dataset / study information provided along with the data
Read the data
- Necessary attributes: gene symbol, uniprot accession id, p-value, fold change
Drop data points if
- nan values for either gene symbol, p-value or fold change
- p-values < 0.05
- gene symbol cannot be found in the HGNC database
  - If the gene symbol is an “alias” or a “previous” symbol, it is exchanged to the approved symbol. The database for this was manually downloaded and is not updated automatically.
Look up the UBERON id of the sample tissue(s) on
- OLS website
- This could be done automatically, however, due to efficiency reasons, this way was preferred.
Look up the CL id of the cell type(s) on
- OLS website
- Only needed for single cell / single nucleus sequencing experiments
Only genes that are HGNC approved will end up in the transcriptomics table. In the genes table, only the genes of transcriptomics dataset will end in the genes table if the MyGenes packages finds an entry for that particular gene.

Proteomics Data

Extract the meta data on the dataset / study
- Manual process, must be figured out from the dataset / study information provided along with the data
Read the data
- Necessary attributes: gene symbol, uniprot accession id, p-value, fold change
Wrangle data so that
- One protein per datapoint applies
- Some proteomics dataset have multiple genes / proteins listed for one data point. However, the data model of this database expects only one unique foreign key in the proteomics table that references one gene / protein only. The datapoint will then be duplicated for each gene-protein pair.
Drop data points if
- nan values for either gene symbol, p-value or fold change
- p-values < 0.05
- gene symbol cannot be found in the HGNC database
  - If the gene symbol is an “alias” or a “previous” symbol, it is exchanged to the approved symbol. The database for this was manually downloaded and is not updated automatically.
If the base of the log fold change is not already 2, convert it (e.g. from log10 FC to log2 FC)
Only genes that are HGNC approved will end up in the proteomics table. Out of those genes, only those for which the MyGenes package finds an entry will end up in the genes database table.
All proteins within the dataset will end up in the proteomics table. The information the status of each protein is “reviewed”, ”not found”, “unreviewed” or “inactive” and whether the protein is an isoform can be found in the proteins table and can be used for customized filtering.

Genomics Data

The GWAS data for various neurodegenerative diseases is downloaded from the website.
- This is a manual process and not updated automatically.
This is done as well for the Open Target Platform.
- This is a manual process and not updated automatically.
The data are merged so that association scores to genomics data can be used for filtering.
Drop data points if
- gene symbol cannot be found in the HGNC database
  - If the gene symbol is an “alias” or a “previous” symbol, it is exchanged to the approved symbol. The database for this was manually downloaded and is not updated automatically.

QTL Data

Single Tissue eQTL
- All gene variants collected from genomics data are used to create GTEX request JSON files. The corresponding eQTL values for each tissue are then returned.
Single Tissue sQTL
- All gene variants collected from genomics data are used to create GTEX request JSON files. The corresponding sQTL values for each tissue are then returned.

Sample Site Ontology

For both tissues and cell types, ontology IDs (UBERON for tissues, CL for cell types) are requested, and the corresponding data is saved in the tissue and cell type tables.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
__pycache__		__pycache__
data		data
data_cleanin_todos		data_cleanin_todos
docs/src		docs/src
src		src
tests		tests
trials_and_errors_to_be_deleted		trials_and_errors_to_be_deleted
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Multi-omics Interactive Dashboard for Neurodegenerative Diseases (MIND NDDs)

Run the Dashboard App

References

Data Handling

Important Notice

Data Retrieval

Data Handling - General

Data Handling - Collection

Transcriptomics Data

Proteomics Data

Genomics Data

QTL Data

Data Handling - Processing

Transcriptomics Data

Proteomics Data

Genomics Data

QTL Data

Sample Site Ontology

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Multi-omics Interactive Dashboard for Neurodegenerative Diseases (MIND NDDs)

Run the Dashboard App

References

Data Handling

Important Notice

Data Retrieval

Data Handling - General

Data Handling - Collection

Transcriptomics Data

Proteomics Data

Genomics Data

QTL Data

Data Handling - Processing

Transcriptomics Data

Proteomics Data

Genomics Data

QTL Data

Sample Site Ontology

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages