Classification of Human Brain Signals

EEG-Based Motor Imagery Classification Using Machine Learning

A complete pipeline for classifying motor imagery EEG signals from the BCI Competition IV Dataset 2a (Graz Data Set A). This project implements and compares 7 machine learning models across 3 evaluation protocols, with an additional Weighted Majority Vote Ensemble (WMVE) method.

Developed as a Machine Learning mini-project at the Faculty of Sciences and Techniques of Mohammedia (FSTM), Hassan II University of Casablanca, under the supervision of Pr. Nabil Azouagh.

Project Overview

Brain-Computer Interfaces (BCI) enable direct communication between the human brain and external devices by interpreting neural signals. This project explores the classification of motor imagery intentions from EEG signals, addressing the fundamental question: can we reliably distinguish what a human brain imagines, and do all brains behave the same way?

The classification targets 4 motor imagery classes (chance level = 25%):

Left hand (class 1)
Right hand (class 2)
Feet (class 3)
Tongue (class 4)

Key findings:

Linear SVM achieved the best mean accuracy of 51.35% on Protocol 1 (per-subject), more than double the chance level.
Classical ML models outperformed deep learning (CNN) given the limited data (288 trials per subject).
Inter-subject variability is the dominant challenge: subject A08 consistently reached 60-67%, while A02 and A05 remained near chance level.
The progressive degradation from Protocol 1 (51.35%) to Protocol 1.5 (38.81%) to Protocol 2 (31.05%) quantifies the cost of cross-subject generalization.

Dataset

BCI Competition IV Dataset 2a (Graz Data Set A)

9 subjects (A01 to A09)
2 sessions per subject: Training (T) and Evaluation (E), recorded on different days
288 trials per session (72 per class), 6 runs of 48 trials
22 EEG channels (10-20 international system), sampled at 250 Hz
3 EOG channels (excluded during preprocessing)
File format: GDF (.gdf) with evaluation labels in MATLAB (.mat) files
Cue-based paradigm: fixation cross (t=0s), beep (t=2s), visual cue (t=3s-6s), rest

The dataset is not included in this repository due to its size. It is available on our shared Google Drive (see Resources) or from the official source:

Pipeline Architecture

The processing pipeline is shared across all models and protocols:

Raw EEG (.gdf)
    |
    v
[1] Loading and Channel Selection (22 EEG channels via MNE-Python)
    |
    v
[2] Bandpass Filtering (Butterworth IIR, 8-30 Hz, order 4)
    |
    v
[3] Epoching (4s windows from cue onset, 1000 samples at 250 Hz)
    |
    v
[4] Artifact Rejection (event 1023)
    |
    v
[5] Feature Extraction
    |-- Bandpower: 4 bands x 22 channels = 88 features (Welch PSD)
    |-- CSP: Common Spatial Patterns (6 or 10 components)
    |-- Concatenation: 94 features (P1) or 98 features (P1.5, P2)
    |
    v
[6] Normalization (StandardScaler, fitted on training data only)
    |
    v
[7] Dimensionality Reduction (PCA at 95% cumulative variance)
    |
    v
[8] Classification (7 models with GridSearchCV, 5-fold Stratified CV)

Frequency bands used for Bandpower extraction:

Delta: 0.5-4 Hz
Theta: 4-8 Hz
Mu/Alpha: 8-13 Hz (sensorimotor rhythm)
Beta: 13-30 Hz (motor activity)

CSP implementation:

Protocol 1: Manual binary CSP (6 components, left hand vs right hand)
Protocols 1.5 and 2: MNE multiclass CSP with Ledoit-Wolf regularization (10 components, one-vs-rest)

Evaluation Protocols

Three evaluation protocols of increasing difficulty were implemented:

Protocol 1: Per-Subject Classification

Train on session T (288 trials), evaluate on session E (288 trials), per subject
Produces 9 individual accuracies
Preprocessing, feature extraction, normalization, and PCA fitted per subject
PCA components: 14 to 21 depending on the subject

Protocol 1.5: Cross-Session Global Classification

All 9 training sessions T pooled (2592 trials) as training set
All 9 evaluation sessions E pooled (2592 trials) as test set
Single global model, PCA retains 13 components

Protocol 2: Global Classification with 80/20 Split

All 18 sessions (9T + 9E) merged (5184 trials)
Stratified random split: 80% training (4147), 20% test (1037)
PCA retains 12 components

WMVE: Weighted Majority Vote Ensemble

Protocol 1: Top 5 models per subject, weighted by CV accuracy
Protocol 1.5: Top 3 models globally (RF + SVM RBF + CNN)

Models Implemented

Each model represents a distinct algorithmic family:

Model	Family	Hyperparameter Grid
Linear SVM	Kernel methods (linear)	C: {0.01, 0.1, 1, 10}, class_weight: {None, balanced}
RBF SVM	Kernel methods (non-linear)	C: {1, 10, 100}, gamma: {0.001, 0.01, 0.1, scale, auto}, class_weight
Random Forest	Tree-based ensemble	n_estimators: {100, 200, 300}, max_depth: {10, 20, None}, criterion: {gini, entropy}
AdaBoost	Boosting ensemble	n_estimators: {50, 100, 200, 300}, learning_rate: {0.01, 0.1, 0.5, 1.0}, max_depth: {1, 2, 3}
Naive Bayes	Probabilistic	var_smoothing: {1e-9, 1e-8, 1e-7, 1e-6, 1e-5}
KNN	Instance-based	n_neighbors: {3, 5, 7, 9, 11, 15, 17, 19, 21}, weights, metric
CNN (1D)	Deep learning	filters: {8, 16}, kernel_size: {3, 5}, dense_units: {16, 32}, batch_size: {16, 32}

CNN architecture: Input -> Reshape -> Conv1D -> ReLU -> MaxPooling1D -> Flatten -> Dense -> ReLU -> Dense(4) -> Softmax (Adam optimizer, 50 epochs, EarlyStopping patience=10)

All models are optimized using GridSearchCV with 5-fold StratifiedKFold on training data only. Models are saved as .pkl (joblib) or .keras (CNN) files.

Results Summary

Protocol 1: Per-Subject (Mean EVAL Accuracy)

Rank	Model	CV (%)	EVAL (%)
1	Linear SVM	60.48	51.35
2	SVM RBF	58.21	48.42
3	Random Forest	56.52	48.34
4	Naive Bayes	52.93	47.53
5	AdaBoost	54.21	46.60
6	KNN	50.51	44.45
7	CNN	55.44	44.68
--	WMVE Top 5	--	51.54

Protocol 1.5: Global T vs E

Model	CV (%)	TEST (%)
Random Forest	42.63	38.81
SVM RBF	39.39	38.54
CNN	40.05	37.96
WMVE Top 3	--	40.16

Protocol 2: Global 80/20

Model	CV (%)	TEST (%)
CNN	39.69	40.31
Linear SVM	32.53	31.05

Inter-Subject Variability (Protocol 1, Linear SVM)

Best: A08 = 65.62%
Worst: A02 = 34.38%, A05 = 32.29%
Chance level: 25.00%

Project Structure

Code_Project/
|
|-- DataSet/                                    # Raw EEG data (Google Drive only)
|   |-- All_Data/
|       |-- Train_Data/                         # A01T.gdf ... A09T.gdf
|       |-- Evaluation_Data/                    # A01E.gdf/mat ... A09E.gdf/mat
|
|-- Explore_DataSet/                            # Data exploration notebooks
|   |-- Explore.ipynb
|   |-- Read_Data_DataSet.ipynb
|
|-- Le pipeline commun - Pre-traitement/        # Preprocessing pipeline
|   |-- Traitement_Donnees_Preparation/
|   |   |-- Traitement_Donnees_Preparation.ipynb          # Protocol 1
|   |   |-- Traitement_Donnees_Preparation_Protocole_1_5.ipynb  # Protocol 1.5
|   |   |-- Traitement_Donnees_Preparation_Protocole_2.ipynb    # Protocol 2
|   |-- Data_Processed/                         # Protocol 1 (per subject A01-A09)
|   |-- Data_Processed_Protocole_1_5/           # Protocol 1.5 (global T vs E)
|   |-- Data_Processed_Protocole_2/             # Protocol 2 (global 80/20)
|
|-- Models/                                     # Training notebooks and saved models
|   |-- SVM - Lineaire - 2/                     # Linear SVM, Protocol 1
|   |-- SVM - RBF - 2/                          # RBF SVM, Protocol 1
|   |-- Random Forest/                          # Random Forest, Protocol 1
|   |-- Naive Bayes/                            # Naive Bayes, Protocol 1
|   |-- AdaBoost/                               # AdaBoost, Protocol 1
|   |-- KNN/                                    # KNN, Protocol 1
|   |-- CNN/                                    # CNN, Protocol 1
|   |-- SVM - Lineaire - 2 - Protocole 1_5/     # Linear SVM, Protocol 1.5
|   |-- SVM - RBF - 2 - Protocole 1_5/          # RBF SVM, Protocol 1.5
|   |-- Random Forest - Protocole 1_5/          # Random Forest, Protocol 1.5
|   |-- Naive Bayes - Protocole 1_5/            # Naive Bayes, Protocol 1.5
|   |-- AdaBoost - Protocole 1_5/               # AdaBoost, Protocol 1.5
|   |-- KNN - Protocole 1_5/                    # KNN, Protocol 1.5
|   |-- CNN - Protocole 1_5/                    # CNN, Protocol 1.5
|   |-- SVM - Lineaire - 2 - Protocole 2/       # Linear SVM, Protocol 2
|   |-- CNN - Protocole 2/                      # CNN, Protocol 2
|   |-- WMVE - Protocole 1/                     # WMVE Top 5, per subject
|   |-- WMVE - Protocole 1_5/                   # WMVE Top 3, global
|
|-- Test Models/                                # Prediction and testing notebooks
|   |-- SVM - Lineaire - 2/                     # SVM test on A08
|   |-- WMVE - Protocole 1/                     # WMVE test on A02, A08, A09
|   |-- WMVE - Protocole 1_5/                   # WMVE Top 3 test
|
|-- .gitignore
|-- README.md

Requirements

Python Environment

Python 3.9+
Jupyter Notebook (Anaconda recommended)

Core Dependencies

numpy>=1.21.0
scipy>=1.7.0
scikit-learn>=1.0.0
mne>=1.0.0
tensorflow>=2.10.0
scikeras>=0.9.0
matplotlib>=3.5.0
seaborn>=0.11.0
joblib>=1.1.0
pandas>=1.3.0

Installation

# Clone the repository
git clone https://github.com/YouIsm1/Classification-of-Human-Brain-Signals.git
cd Classification-of-Human-Brain-Signals

# Create a conda environment (recommended)
conda create -n bci python=3.9
conda activate bci

# Install dependencies
pip install numpy scipy scikit-learn mne tensorflow scikeras matplotlib seaborn joblib pandas

Installation and Setup

Clone this repository:

git clone https://github.com/YouIsm1/Classification-of-Human-Brain-Signals.git

Download the dataset from Google Drive and place it in the DataSet/All_Data/ directory:
- Training data: DataSet/All_Data/Train_Data/A01T.gdf ... A09T.gdf
- Evaluation data: DataSet/All_Data/Evaluation_Data/A01E.gdf ... A09E.gdf with corresponding .mat label files
Install the required Python packages (see Requirements).
Open Jupyter Notebook and navigate to the project directory.

Usage

Step 1: Preprocessing

Run the appropriate preprocessing notebook depending on the protocol:

Protocol 1: Le pipeline commun - Pre-traitement/Traitement_Donnees_Preparation/Traitement_Donnees_Preparation.ipynb
Protocol 1.5: Traitement_Donnees_Preparation_Protocole_1_5.ipynb
Protocol 2: Traitement_Donnees_Preparation_Protocole_2.ipynb

This produces the preprocessed .npy files in the corresponding Data_Processed/ directories.

Step 2: Training

Navigate to the desired model folder under Models/ and run the training notebook. Each notebook follows a consistent structure:

Imports and path configuration
Data loading from Data_Processed/
Grid Search with 5-fold Stratified CV
Evaluation on test data
Results export (CSV, accuracy plots, confusion matrices)
Model saving (.pkl or .keras)

Step 3: Ensemble (WMVE)

After training all individual models, run the WMVE notebooks:

Models/WMVE - Protocole 1/WMVE_Protocole_1_Top_5.ipynb
Models/WMVE - Protocole 1_5/WMVE_Protocole_1_5 - Ver 2 - Top_3.ipynb

Step 4: Testing

Use the notebooks in Test Models/ to run predictions on specific subjects.

Technical Notes

Data leakage prevention: StandardScaler, CSP, and PCA are fitted exclusively on training data and applied (transform only) to evaluation/test data.
Memory management for CNN: tf.keras.backend.clear_session(), gc.collect(), and del best_model are called between subjects to free GPU/RAM memory accumulated by TensorFlow computation graphs.
Intermediate saving: Results are saved after each subject to enable recovery from crashes during long training sessions.
Absolute paths: Jupyter Notebook launches from C:\Users\user\, so all notebooks use absolute paths to the project directory.
Google Drive sync: The .gitignore file excludes DataSet/, *.npy, *.pkl, *.mat, *.gdf, *.keras, Data_Processed/, .ipynb_checkpoints/, and desktop.ini to avoid corrupting Git when the project folder is synced with Google Drive.
Training hardware: All training was performed on personal laptop CPUs without dedicated GPU access. CNN training required approximately 10 hours for Protocol 1 (9 subjects) and 5.6 hours for Protocol 2.

Authors

ELWAFI Youssef - GitHub
BAKAR Oussama
LAAFAR Abdellah
OUARRAK Aymen

Supervised by Pr. Nabil AZOUAGH

FSTM - Faculty of Sciences and Techniques of Mohammedia Hassan II University of Casablanca BSc in Data Science and Decision-Making Informatics (SDID), S6 Academic Year: 2025-2026

Resources

Source Code (GitHub)
Dataset and Models (Google Drive)

15. References et ressources

Ressources du projet

[R1] Code source du projet (GitHub) : ELWAFI Youssef (YouIsm1). Classification-of-Human-Brain-Signals. Repository public contenant l'ensemble des notebooks Jupyter (pretraitement, entrainement des 7 modeles sous 3 protocoles, WMVE, tests), les fichiers de configuration et la documentation du projet. Disponible sur : https://github.com/YouIsm1/Classification-of-Human-Brain-Signals

[R2] Donnees et modeles du projet (Google Drive) : Dossier partage contenant les fichiers volumineux exclus de GitHub : le dataset brut BCI Competition IV 2a (fichiers .gdf et .mat), les donnees preprocessees (fichiers .npy), les modeles sauvegardes (fichiers .pkl et .keras), ainsi que les versions PDF de tous les notebooks. Disponible sur : https://drive.google.com/drive/folders/14GamxUlmzX3gRGbCMqZbqpGqCpEEkbUy

Dataset

[D1] Data sets 2a: "4-class motor imagery". Provided by the Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology (Clemens Brunner, Robert Leeb, Gernot Muller-Putz, Alois Schlogl, Gert Pfurtscheller). EEG, cued motor imagery (left hand, right hand, feet, tongue). 22 EEG channels (0.5-100 Hz; notch filtered), 3 EOG channels, 250 Hz sampling rate, 4 classes, 9 subjects. Disponible sur : http://www.bbci.de/competition/iv/

[D2] Description du dataset 2a. Disponible sur : https://www.bbci.de/competition/iv/desc_2a.pdf

[D3] Etiquettes reelles des ensembles d'evaluation de la competition - Ensemble de donnees 2a. Disponible sur : http://www.bbci.de/competition/iv/results/

References bibliographiques

[1] Brunner, C., Leeb, R., Muller-Putz, G., Schlogl, A., & Pfurtscheller, G. (2008). BCI Competition 2008 - Graz data set A. Institute for Knowledge Discovery, Graz University of Technology. Disponible sur : https://www.bbci.de/competition/iv/desc_2a.pdf

[2] Pfurtscheller, G., & Neuper, C. (2001). Motor imagery and direct brain-computer communication. Proceedings of the IEEE, 89(7), 1123-1134. DOI : https://doi.org/10.1109/5.939829

[3] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. Disponible sur : https://jmlr.org/papers/v12/pedregosa11a.html

[4] Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., ... & Hamalainen, M. S. (2013). MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7, 267. DOI : https://doi.org/10.3389/fnins.2013.00267

[5] Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy, A., & Yger, F. (2018). A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update. Journal of Neural Engineering, 15(3), 031005. DOI : https://doi.org/10.1088/1741-2552/aab2f2

[6] Ramoser, H., Muller-Gerking, J., & Pfurtscheller, G. (2000). Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Transactions on Rehabilitation Engineering, 8(4), 441-446. DOI : https://doi.org/10.1109/86.895946

[7] Chollet, F. et al. (2015). Keras: Deep Learning for humans. Disponible sur : https://keras.io/ Documentation TensorFlow : https://www.tensorflow.org/api_docs

[8] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 265-283. Disponible sur : https://www.tensorflow.org/

[9] Adrian Adriano (2023). scikeras: Scikit-Learn Wrapper for Keras. Disponible sur : https://github.com/adriangb/scikeras

[10] Virtanen, P., Gommers, R., Oliphant, T. E., et al. (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17, 261-272. DOI : https://doi.org/10.1038/s41592-019-0686-2

[11] Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). Array programming with NumPy. Nature, 585, 357-362. DOI : https://doi.org/10.1038/s41586-020-2649-2

[12] Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90-95. DOI : https://doi.org/10.1109/MCSE.2007.55

[13] Waskom, M. (2021). Seaborn: Statistical Data Visualization. Disponible sur : https://seaborn.pydata.org/

[14] Joblib: running Python functions as pipeline jobs. Disponible sur : https://joblib.readthedocs.io/

License

This project is developed for academic purposes as part of the Machine Learning course at FSTM. The BCI Competition IV Dataset 2a is provided by Graz University of Technology for research and educational use.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Explore_DataSet		Explore_DataSet
Le pipeline commun - Pre-traitement/Traitement_Donnees_Preparation		Le pipeline commun - Pre-traitement/Traitement_Donnees_Preparation
Models		Models
Test Models		Test Models
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Classification of Human Brain Signals

Table of Contents

Project Overview

Dataset

Pipeline Architecture

Evaluation Protocols

Protocol 1: Per-Subject Classification

Protocol 1.5: Cross-Session Global Classification

Protocol 2: Global Classification with 80/20 Split

WMVE: Weighted Majority Vote Ensemble

Models Implemented

Results Summary

Protocol 1: Per-Subject (Mean EVAL Accuracy)

Protocol 1.5: Global T vs E

Protocol 2: Global 80/20

Inter-Subject Variability (Protocol 1, Linear SVM)

Project Structure

Requirements

Python Environment

Core Dependencies

Installation

Installation and Setup

Usage

Step 1: Preprocessing

Step 2: Training

Step 3: Ensemble (WMVE)

Step 4: Testing

Technical Notes

Authors

Resources

15. References et ressources

Ressources du projet

Dataset

References bibliographiques

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages