A CNN-Transformer hybrid model for recognizing American Sign Language (ASL) finger-spelling gestures from surface electromyography (sEMG) signals. This project provides an alternative digital communication method for transradial amputees by translating forearm muscle signals into keyboard inputs.
This system captures electrical signals from five forearm muscles using a custom-built sEMG sensor array and classifies ASL gestures with >85% accuracy. The architecture employs five independent CNN modules (one per channel) to extract localized signal features, which are then processed by a shared Transformer to capture temporal dependencies across channels.
- Custom sEMG Hardware: High-gain instrumentation amplifier array (INA129) with ~10,000x gain for superior signal quality
- Hybrid CNN-Transformer Architecture: Combines spatial feature extraction with temporal sequence modeling
- Real-time Classification: Processes 0.5-second gesture windows at 1 kHz sampling rate
- Low Data Requirements: Efficient architecture trained on just 10,400 samples from 2 subjects
┌─────────────────────────────────────────────────────────────────────┐
│ 5-Channel sEMG Input │
│ (500 samples × 5 channels) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────┬─────────┬─────────┬─────────┬─────────┐
│ CNN 1 │ CNN 2 │ CNN 3 │ CNN 4 │ CNN 5 │ ← Independent feature extraction
└────┬────┴────┬────┴────┬────┴────┬────┴────┬────┘
│ │ │ │ │
└─────────┴─────────┼─────────┴─────────┘
│
▼
┌─────────────────┐
│ Embedding │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Transformer │ ← Temporal dependency modeling
│ Encoder │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Mean Pooling │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Linear + Softmax│ ← 26-class output (A-Z)
└─────────────────┘
- DC Offset Removal – Center signals around zero
- Bandpass Filtering – Retain 20–500 Hz (EMG frequency range)
- Notch Filtering – Remove 60 Hz power line interference
- STFT Spectrogram – Convert time-domain signals to time-frequency representation
sr25/
├── src/
│ ├── cnn_transformer.py # Neural network architecture
│ ├── emg_dataset.py # PyTorch dataset class
│ ├── emg_sample.py # Signal processing utilities
│ ├── load_data.py # Data loading functions
│ └── main.py # Training script
├── paper.typ # Research paper (Typst format)
├── dataset-complex.npz # Processed training data
└── cnn_transformer_emg_model_full.pth # Trained model weights
- 5× INA129 instrumentation amplifiers (gain ≈ 9,881)
- Teensy 4.1 microcontroller
- Surface electrodes for forearm placement
| Compartment | Muscles | Function |
|---|---|---|
| Anterior | Flexor digitorum superficialis/profundus | Finger flexion |
| Anterior | Flexor carpi radialis/ulnaris | Wrist flexion, abduction, adduction |
| Anterior | Pronator teres/quadratus | Forearm pronation |
| Posterior | Extensor digitorum | Finger extension |
cd src
python main.pyimport torch
from src.cnn_transformer import CNNTransformerEMG
model = CNNTransformerEMG(num_classes=26)
model.load_state_dict(torch.load('cnn_transformer_emg_model_full.pth'))
model.eval()
# input_spectrogram: tensor of shape (batch, 5, freq_bins, time_steps)
prediction = model(input_spectrogram)
gesture_idx = prediction.argmax(dim=1)- Test Accuracy: >85% on independent test set
- Primary Error Source: Similar gestures (e.g., 'T' vs 'A' which differ only in thumb position)
- Dataset: 10,400 samples split from 2,600 recordings (2 subjects)
If you use this work, please cite:
Ishaan Sen, "Myoelectric Human–Computer Interfaces for Below–Elbow Amputees"
Regeneron International Science and Engineering Fair 2025
This project is for research and educational purposes.