Skip to content

YouIsm1/Classification-of-Human-Brain-Signals

Repository files navigation

Classification of Human Brain Signals

EEG-Based Motor Imagery Classification Using Machine Learning

A complete pipeline for classifying motor imagery EEG signals from the BCI Competition IV Dataset 2a (Graz Data Set A). This project implements and compares 7 machine learning models across 3 evaluation protocols, with an additional Weighted Majority Vote Ensemble (WMVE) method.

Developed as a Machine Learning mini-project at the Faculty of Sciences and Techniques of Mohammedia (FSTM), Hassan II University of Casablanca, under the supervision of Pr. Nabil Azouagh.


Table of Contents

  1. Project Overview
  2. Dataset
  3. Pipeline Architecture
  4. Evaluation Protocols
  5. Models Implemented
  6. Results Summary
  7. Project Structure
  8. Requirements
  9. Installation and Setup
  10. Usage
  11. Technical Notes
  12. Authors
  13. References
  14. License

Project Overview

Brain-Computer Interfaces (BCI) enable direct communication between the human brain and external devices by interpreting neural signals. This project explores the classification of motor imagery intentions from EEG signals, addressing the fundamental question: can we reliably distinguish what a human brain imagines, and do all brains behave the same way?

The classification targets 4 motor imagery classes (chance level = 25%):

  • Left hand (class 1)
  • Right hand (class 2)
  • Feet (class 3)
  • Tongue (class 4)

Key findings:

  • Linear SVM achieved the best mean accuracy of 51.35% on Protocol 1 (per-subject), more than double the chance level.
  • Classical ML models outperformed deep learning (CNN) given the limited data (288 trials per subject).
  • Inter-subject variability is the dominant challenge: subject A08 consistently reached 60-67%, while A02 and A05 remained near chance level.
  • The progressive degradation from Protocol 1 (51.35%) to Protocol 1.5 (38.81%) to Protocol 2 (31.05%) quantifies the cost of cross-subject generalization.

Dataset

BCI Competition IV Dataset 2a (Graz Data Set A)

  • 9 subjects (A01 to A09)
  • 2 sessions per subject: Training (T) and Evaluation (E), recorded on different days
  • 288 trials per session (72 per class), 6 runs of 48 trials
  • 22 EEG channels (10-20 international system), sampled at 250 Hz
  • 3 EOG channels (excluded during preprocessing)
  • File format: GDF (.gdf) with evaluation labels in MATLAB (.mat) files
  • Cue-based paradigm: fixation cross (t=0s), beep (t=2s), visual cue (t=3s-6s), rest

The dataset is not included in this repository due to its size. It is available on our shared Google Drive (see Resources) or from the official source:


Pipeline Architecture

The processing pipeline is shared across all models and protocols:

Raw EEG (.gdf)
    |
    v
[1] Loading and Channel Selection (22 EEG channels via MNE-Python)
    |
    v
[2] Bandpass Filtering (Butterworth IIR, 8-30 Hz, order 4)
    |
    v
[3] Epoching (4s windows from cue onset, 1000 samples at 250 Hz)
    |
    v
[4] Artifact Rejection (event 1023)
    |
    v
[5] Feature Extraction
    |-- Bandpower: 4 bands x 22 channels = 88 features (Welch PSD)
    |-- CSP: Common Spatial Patterns (6 or 10 components)
    |-- Concatenation: 94 features (P1) or 98 features (P1.5, P2)
    |
    v
[6] Normalization (StandardScaler, fitted on training data only)
    |
    v
[7] Dimensionality Reduction (PCA at 95% cumulative variance)
    |
    v
[8] Classification (7 models with GridSearchCV, 5-fold Stratified CV)

Frequency bands used for Bandpower extraction:

  • Delta: 0.5-4 Hz
  • Theta: 4-8 Hz
  • Mu/Alpha: 8-13 Hz (sensorimotor rhythm)
  • Beta: 13-30 Hz (motor activity)

CSP implementation:

  • Protocol 1: Manual binary CSP (6 components, left hand vs right hand)
  • Protocols 1.5 and 2: MNE multiclass CSP with Ledoit-Wolf regularization (10 components, one-vs-rest)

Evaluation Protocols

Three evaluation protocols of increasing difficulty were implemented:

Protocol 1: Per-Subject Classification

  • Train on session T (288 trials), evaluate on session E (288 trials), per subject
  • Produces 9 individual accuracies
  • Preprocessing, feature extraction, normalization, and PCA fitted per subject
  • PCA components: 14 to 21 depending on the subject

Protocol 1.5: Cross-Session Global Classification

  • All 9 training sessions T pooled (2592 trials) as training set
  • All 9 evaluation sessions E pooled (2592 trials) as test set
  • Single global model, PCA retains 13 components

Protocol 2: Global Classification with 80/20 Split

  • All 18 sessions (9T + 9E) merged (5184 trials)
  • Stratified random split: 80% training (4147), 20% test (1037)
  • PCA retains 12 components

WMVE: Weighted Majority Vote Ensemble

  • Protocol 1: Top 5 models per subject, weighted by CV accuracy
  • Protocol 1.5: Top 3 models globally (RF + SVM RBF + CNN)

Models Implemented

Each model represents a distinct algorithmic family:

Model Family Hyperparameter Grid
Linear SVM Kernel methods (linear) C: {0.01, 0.1, 1, 10}, class_weight: {None, balanced}
RBF SVM Kernel methods (non-linear) C: {1, 10, 100}, gamma: {0.001, 0.01, 0.1, scale, auto}, class_weight
Random Forest Tree-based ensemble n_estimators: {100, 200, 300}, max_depth: {10, 20, None}, criterion: {gini, entropy}
AdaBoost Boosting ensemble n_estimators: {50, 100, 200, 300}, learning_rate: {0.01, 0.1, 0.5, 1.0}, max_depth: {1, 2, 3}
Naive Bayes Probabilistic var_smoothing: {1e-9, 1e-8, 1e-7, 1e-6, 1e-5}
KNN Instance-based n_neighbors: {3, 5, 7, 9, 11, 15, 17, 19, 21}, weights, metric
CNN (1D) Deep learning filters: {8, 16}, kernel_size: {3, 5}, dense_units: {16, 32}, batch_size: {16, 32}

CNN architecture: Input -> Reshape -> Conv1D -> ReLU -> MaxPooling1D -> Flatten -> Dense -> ReLU -> Dense(4) -> Softmax (Adam optimizer, 50 epochs, EarlyStopping patience=10)

All models are optimized using GridSearchCV with 5-fold StratifiedKFold on training data only. Models are saved as .pkl (joblib) or .keras (CNN) files.


Results Summary

Protocol 1: Per-Subject (Mean EVAL Accuracy)

Rank Model CV (%) EVAL (%)
1 Linear SVM 60.48 51.35
2 SVM RBF 58.21 48.42
3 Random Forest 56.52 48.34
4 Naive Bayes 52.93 47.53
5 AdaBoost 54.21 46.60
6 KNN 50.51 44.45
7 CNN 55.44 44.68
-- WMVE Top 5 -- 51.54

Protocol 1.5: Global T vs E

Model CV (%) TEST (%)
Random Forest 42.63 38.81
SVM RBF 39.39 38.54
CNN 40.05 37.96
WMVE Top 3 -- 40.16

Protocol 2: Global 80/20

Model CV (%) TEST (%)
CNN 39.69 40.31
Linear SVM 32.53 31.05

Inter-Subject Variability (Protocol 1, Linear SVM)

  • Best: A08 = 65.62%
  • Worst: A02 = 34.38%, A05 = 32.29%
  • Chance level: 25.00%

Project Structure

Code_Project/
|
|-- DataSet/                                    # Raw EEG data (Google Drive only)
|   |-- All_Data/
|       |-- Train_Data/                         # A01T.gdf ... A09T.gdf
|       |-- Evaluation_Data/                    # A01E.gdf/mat ... A09E.gdf/mat
|
|-- Explore_DataSet/                            # Data exploration notebooks
|   |-- Explore.ipynb
|   |-- Read_Data_DataSet.ipynb
|
|-- Le pipeline commun - Pre-traitement/        # Preprocessing pipeline
|   |-- Traitement_Donnees_Preparation/
|   |   |-- Traitement_Donnees_Preparation.ipynb          # Protocol 1
|   |   |-- Traitement_Donnees_Preparation_Protocole_1_5.ipynb  # Protocol 1.5
|   |   |-- Traitement_Donnees_Preparation_Protocole_2.ipynb    # Protocol 2
|   |-- Data_Processed/                         # Protocol 1 (per subject A01-A09)
|   |-- Data_Processed_Protocole_1_5/           # Protocol 1.5 (global T vs E)
|   |-- Data_Processed_Protocole_2/             # Protocol 2 (global 80/20)
|
|-- Models/                                     # Training notebooks and saved models
|   |-- SVM - Lineaire - 2/                     # Linear SVM, Protocol 1
|   |-- SVM - RBF - 2/                          # RBF SVM, Protocol 1
|   |-- Random Forest/                          # Random Forest, Protocol 1
|   |-- Naive Bayes/                            # Naive Bayes, Protocol 1
|   |-- AdaBoost/                               # AdaBoost, Protocol 1
|   |-- KNN/                                    # KNN, Protocol 1
|   |-- CNN/                                    # CNN, Protocol 1
|   |-- SVM - Lineaire - 2 - Protocole 1_5/     # Linear SVM, Protocol 1.5
|   |-- SVM - RBF - 2 - Protocole 1_5/          # RBF SVM, Protocol 1.5
|   |-- Random Forest - Protocole 1_5/          # Random Forest, Protocol 1.5
|   |-- Naive Bayes - Protocole 1_5/            # Naive Bayes, Protocol 1.5
|   |-- AdaBoost - Protocole 1_5/               # AdaBoost, Protocol 1.5
|   |-- KNN - Protocole 1_5/                    # KNN, Protocol 1.5
|   |-- CNN - Protocole 1_5/                    # CNN, Protocol 1.5
|   |-- SVM - Lineaire - 2 - Protocole 2/       # Linear SVM, Protocol 2
|   |-- CNN - Protocole 2/                      # CNN, Protocol 2
|   |-- WMVE - Protocole 1/                     # WMVE Top 5, per subject
|   |-- WMVE - Protocole 1_5/                   # WMVE Top 3, global
|
|-- Test Models/                                # Prediction and testing notebooks
|   |-- SVM - Lineaire - 2/                     # SVM test on A08
|   |-- WMVE - Protocole 1/                     # WMVE test on A02, A08, A09
|   |-- WMVE - Protocole 1_5/                   # WMVE Top 3 test
|
|-- .gitignore
|-- README.md

Requirements

Python Environment

  • Python 3.9+
  • Jupyter Notebook (Anaconda recommended)

Core Dependencies

numpy>=1.21.0
scipy>=1.7.0
scikit-learn>=1.0.0
mne>=1.0.0
tensorflow>=2.10.0
scikeras>=0.9.0
matplotlib>=3.5.0
seaborn>=0.11.0
joblib>=1.1.0
pandas>=1.3.0

Installation

# Clone the repository
git clone https://github.com/YouIsm1/Classification-of-Human-Brain-Signals.git
cd Classification-of-Human-Brain-Signals

# Create a conda environment (recommended)
conda create -n bci python=3.9
conda activate bci

# Install dependencies
pip install numpy scipy scikit-learn mne tensorflow scikeras matplotlib seaborn joblib pandas

Installation and Setup

  1. Clone this repository:
git clone https://github.com/YouIsm1/Classification-of-Human-Brain-Signals.git
  1. Download the dataset from Google Drive and place it in the DataSet/All_Data/ directory:

    • Training data: DataSet/All_Data/Train_Data/A01T.gdf ... A09T.gdf
    • Evaluation data: DataSet/All_Data/Evaluation_Data/A01E.gdf ... A09E.gdf with corresponding .mat label files
  2. Install the required Python packages (see Requirements).

  3. Open Jupyter Notebook and navigate to the project directory.


Usage

Step 1: Preprocessing

Run the appropriate preprocessing notebook depending on the protocol:

  • Protocol 1: Le pipeline commun - Pre-traitement/Traitement_Donnees_Preparation/Traitement_Donnees_Preparation.ipynb
  • Protocol 1.5: Traitement_Donnees_Preparation_Protocole_1_5.ipynb
  • Protocol 2: Traitement_Donnees_Preparation_Protocole_2.ipynb

This produces the preprocessed .npy files in the corresponding Data_Processed/ directories.

Step 2: Training

Navigate to the desired model folder under Models/ and run the training notebook. Each notebook follows a consistent structure:

  1. Imports and path configuration
  2. Data loading from Data_Processed/
  3. Grid Search with 5-fold Stratified CV
  4. Evaluation on test data
  5. Results export (CSV, accuracy plots, confusion matrices)
  6. Model saving (.pkl or .keras)

Step 3: Ensemble (WMVE)

After training all individual models, run the WMVE notebooks:

  • Models/WMVE - Protocole 1/WMVE_Protocole_1_Top_5.ipynb
  • Models/WMVE - Protocole 1_5/WMVE_Protocole_1_5 - Ver 2 - Top_3.ipynb

Step 4: Testing

Use the notebooks in Test Models/ to run predictions on specific subjects.


Technical Notes

  • Data leakage prevention: StandardScaler, CSP, and PCA are fitted exclusively on training data and applied (transform only) to evaluation/test data.
  • Memory management for CNN: tf.keras.backend.clear_session(), gc.collect(), and del best_model are called between subjects to free GPU/RAM memory accumulated by TensorFlow computation graphs.
  • Intermediate saving: Results are saved after each subject to enable recovery from crashes during long training sessions.
  • Absolute paths: Jupyter Notebook launches from C:\Users\user\, so all notebooks use absolute paths to the project directory.
  • Google Drive sync: The .gitignore file excludes DataSet/, *.npy, *.pkl, *.mat, *.gdf, *.keras, Data_Processed/, .ipynb_checkpoints/, and desktop.ini to avoid corrupting Git when the project folder is synced with Google Drive.
  • Training hardware: All training was performed on personal laptop CPUs without dedicated GPU access. CNN training required approximately 10 hours for Protocol 1 (9 subjects) and 5.6 hours for Protocol 2.

Authors

  • ELWAFI Youssef - GitHub
  • BAKAR Oussama
  • LAAFAR Abdellah
  • OUARRAK Aymen

Supervised by Pr. Nabil AZOUAGH

FSTM - Faculty of Sciences and Techniques of Mohammedia Hassan II University of Casablanca BSc in Data Science and Decision-Making Informatics (SDID), S6 Academic Year: 2025-2026


Resources


15. References et ressources

Ressources du projet

[R1] Code source du projet (GitHub) : ELWAFI Youssef (YouIsm1). Classification-of-Human-Brain-Signals. Repository public contenant l'ensemble des notebooks Jupyter (pretraitement, entrainement des 7 modeles sous 3 protocoles, WMVE, tests), les fichiers de configuration et la documentation du projet. Disponible sur : https://github.com/YouIsm1/Classification-of-Human-Brain-Signals

[R2] Donnees et modeles du projet (Google Drive) : Dossier partage contenant les fichiers volumineux exclus de GitHub : le dataset brut BCI Competition IV 2a (fichiers .gdf et .mat), les donnees preprocessees (fichiers .npy), les modeles sauvegardes (fichiers .pkl et .keras), ainsi que les versions PDF de tous les notebooks. Disponible sur : https://drive.google.com/drive/folders/14GamxUlmzX3gRGbCMqZbqpGqCpEEkbUy

Dataset

[D1] Data sets 2a: "4-class motor imagery". Provided by the Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology (Clemens Brunner, Robert Leeb, Gernot Muller-Putz, Alois Schlogl, Gert Pfurtscheller). EEG, cued motor imagery (left hand, right hand, feet, tongue). 22 EEG channels (0.5-100 Hz; notch filtered), 3 EOG channels, 250 Hz sampling rate, 4 classes, 9 subjects. Disponible sur : http://www.bbci.de/competition/iv/

[D2] Description du dataset 2a. Disponible sur : https://www.bbci.de/competition/iv/desc_2a.pdf

[D3] Etiquettes reelles des ensembles d'evaluation de la competition - Ensemble de donnees 2a. Disponible sur : http://www.bbci.de/competition/iv/results/

References bibliographiques

[1] Brunner, C., Leeb, R., Muller-Putz, G., Schlogl, A., & Pfurtscheller, G. (2008). BCI Competition 2008 - Graz data set A. Institute for Knowledge Discovery, Graz University of Technology. Disponible sur : https://www.bbci.de/competition/iv/desc_2a.pdf

[2] Pfurtscheller, G., & Neuper, C. (2001). Motor imagery and direct brain-computer communication. Proceedings of the IEEE, 89(7), 1123-1134. DOI : https://doi.org/10.1109/5.939829

[3] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. Disponible sur : https://jmlr.org/papers/v12/pedregosa11a.html

[4] Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., ... & Hamalainen, M. S. (2013). MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7, 267. DOI : https://doi.org/10.3389/fnins.2013.00267

[5] Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy, A., & Yger, F. (2018). A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update. Journal of Neural Engineering, 15(3), 031005. DOI : https://doi.org/10.1088/1741-2552/aab2f2

[6] Ramoser, H., Muller-Gerking, J., & Pfurtscheller, G. (2000). Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Transactions on Rehabilitation Engineering, 8(4), 441-446. DOI : https://doi.org/10.1109/86.895946

[7] Chollet, F. et al. (2015). Keras: Deep Learning for humans. Disponible sur : https://keras.io/ Documentation TensorFlow : https://www.tensorflow.org/api_docs

[8] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 265-283. Disponible sur : https://www.tensorflow.org/

[9] Adrian Adriano (2023). scikeras: Scikit-Learn Wrapper for Keras. Disponible sur : https://github.com/adriangb/scikeras

[10] Virtanen, P., Gommers, R., Oliphant, T. E., et al. (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17, 261-272. DOI : https://doi.org/10.1038/s41592-019-0686-2

[11] Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). Array programming with NumPy. Nature, 585, 357-362. DOI : https://doi.org/10.1038/s41586-020-2649-2

[12] Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90-95. DOI : https://doi.org/10.1109/MCSE.2007.55

[13] Waskom, M. (2021). Seaborn: Statistical Data Visualization. Disponible sur : https://seaborn.pydata.org/

[14] Joblib: running Python functions as pipeline jobs. Disponible sur : https://joblib.readthedocs.io/


License

This project is developed for academic purposes as part of the Machine Learning course at FSTM. The BCI Competition IV Dataset 2a is provided by Graz University of Technology for research and educational use.

About

EEG-based motor imagery classification using 7 ML models across 3 evaluation protocols. BCI Competition IV Dataset 2a (Graz). Pipeline: filtering, Bandpower, CSP, PCA, GridSearchCV. Best: Linear SVM 51.35% (4-class). FSTM Mini-Project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors