Skip to content

mede8/simulation-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic EEG Data Generation for Stress-Testing Classifiers

Authors: Lisa-Marie Vortmann & Andrei Medesan

This repository contains the work of Lisa-Marie Vortmann and Andrei Medesan for generating EEG data for stress-testing models. This approach enables researchers to test their models and find relevant limitations in their modelling design before deploying these on real data.

Taking into account synthetic data can be valuable for research purposes for various reasons, including time efficiency, as researchers can pinpoint issues in their models before actually getting the evaluation metrics. This ensures that the models are tested rigourously and saves researchers time with debugging procedures. Specifically, preprocessing EEG data can be complicated and tiresome, as one has to understand the data structure and analyse it for extracting relevant features that can be used by the models in their predictions. After the model is tested on the preprocessed EEG data, certain issues can arise such as poor performance (overfitting or underfitting), incorrect assumptions about the data distribution, or architectural choices that prove unsuitable for the specific characteristics of EEG signals.

Stress-testing the model first before deploying it to the real data offers multiple advantages. When researchers develop models for EEG analysis, they often face a fundamental uncertainty: is poor performance a problem with the model architecture, or is it a problem with the data itself? Real EEG data is messy, as it contains artifacts, noise, individual variability, and often limited sample sizes. These confounding factors make it difficult to isolate whether the model's shortcomings arise from its design or from the inherent challenges of the data. Thus, synthetic data cuts through this uncertainty by generating EEG-like data with known properties and ground truth labels. The advantages of this approach are:

  • Validate the model architecture in a controlled environment.
  • Identify the boundaries of their model's capabilities.
  • Establish baseline performance expectations.
  • Debug with confidence the current state of the study.
  • Separate concerns in the research pipeline.

Getting Started

Follow the following steps to set up and run the project after cloning the repository.

1. Clone the Repository

git clone https://github.com/mede8/simulation-framework.git
cd simulation-framework

Then, create the environment and activate it by:

conda env create -f environment.yml
conda activate simulator

2. Run the Simulation Process

To generate synthetic EEG data based on the YAML file (base_simulation.yaml), use the following command:

PYTHONPATH=$(pwd) python scripts/simulate_data.py --config scripts/base_simulation.yaml

We use the PYTHONPATH to create a package out of the contents of the src/ directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors