This repository contains the implementation of Contactless Human Activity Recognition (HAR) using Channel State Information (CSI) and deep learning models. The project explores the transformation of raw CSI signals into pseudocolor images and evaluates the performance of both 2D Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).
- Objective: Recognize human activities without wearable sensors, using WiFi CSI data.
- Dataset: Custom multi-layout CSI dataset (5 and 15 activity classes).
- Input Representation: CSI amplitude and phase preprocessed into RGB pseudocolor images.
- Models:
- Baseline: 2D CNN
- Transformers: ViT-Tiny, ViT-Small, ViT-Base (from
timmlibrary, pretrained on ImageNet)
- Evaluation: Accuracy, Precision, Recall, F1-Score, with augmentation experiments.
- Phase unwrapping to remove discontinuities.
- Gaussian smoothing to reduce noise.
- Outlier removal using ±3σ z-score and linear interpolation.
- Resampling to 256 temporal frames.
- Feature construction:
- Log-scaled amplitude
- Smoothed phase
- Temporal gradient of amplitude
- RGB stacking and normalization →
256 × 64 × 3pseudocolor image. - Resized to
224 × 224 × 3for ViTs.
- CNN Baseline: Provides solid performance on both public and custom datasets.
- Vision Transformers: Outperformed CNNs under augmentation, with ViT-Base achieving the best overall accuracy.
- Cross-Layout Testing: Accuracy dropped (~23% ViT, ~21–22% CNN), highlighting the challenge of layout-invariant recognition.