Synthetic EEG Data Generation for Stress-Testing Classifiers

Authors: Lisa-Marie Vortmann & Andrei Medesan

This repository contains the work of Lisa-Marie Vortmann and Andrei Medesan for generating EEG data for stress-testing models. This approach enables researchers to test their models and find relevant limitations in their modelling design before deploying these on real data.

Taking into account synthetic data can be valuable for research purposes for various reasons, including time efficiency, as researchers can pinpoint issues in their models before actually getting the evaluation metrics. This ensures that the models are tested rigourously and saves researchers time with debugging procedures. Specifically, preprocessing EEG data can be complicated and tiresome, as one has to understand the data structure and analyse it for extracting relevant features that can be used by the models in their predictions. After the model is tested on the preprocessed EEG data, certain issues can arise such as poor performance (overfitting or underfitting), incorrect assumptions about the data distribution, or architectural choices that prove unsuitable for the specific characteristics of EEG signals.

Stress-testing the model first before deploying it to the real data offers multiple advantages. When researchers develop models for EEG analysis, they often face a fundamental uncertainty: is poor performance a problem with the model architecture, or is it a problem with the data itself? Real EEG data is messy, as it contains artifacts, noise, individual variability, and often limited sample sizes. These confounding factors make it difficult to isolate whether the model's shortcomings arise from its design or from the inherent challenges of the data. Thus, synthetic data cuts through this uncertainty by generating EEG-like data with known properties and ground truth labels. The advantages of this approach are:

Validate the model architecture in a controlled environment.
Identify the boundaries of their model's capabilities.
Establish baseline performance expectations.
Debug with confidence the current state of the study.
Separate concerns in the research pipeline.

Getting Started

Follow the following steps to set up and run the project after cloning the repository.

1. Clone the Repository

git clone https://github.com/mede8/simulation-framework.git
cd simulation-framework

Then, create the environment and activate it by:

conda env create -f environment.yml
conda activate simulator

2. Run the Simulation Process

To generate synthetic EEG data based on the YAML file (base_simulation.yaml), use the following command:

PYTHONPATH=$(pwd) python scripts/simulate_data.py --config scripts/base_simulation.yaml

We use the PYTHONPATH to create a package out of the contents of the src/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
visual_inspection.ipynb		visual_inspection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic EEG Data Generation for Stress-Testing Classifiers

Authors: Lisa-Marie Vortmann & Andrei Medesan

Getting Started

1. Clone the Repository

2. Run the Simulation Process

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synthetic EEG Data Generation for Stress-Testing Classifiers

Authors: Lisa-Marie Vortmann & Andrei Medesan

Getting Started

1. Clone the Repository

2. Run the Simulation Process

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages