wav2vec-RF: Applying ASR to Raw Radio Signals Intercepted From Low Earth Orbit Satellites (Official Repo)
Full release coming soon.
The LibriIQ-Dwingeloo dataset contains RF observations of low Earth orbit (LEO) satellite transmissions in the ultra-high-frequency (UHF) band sourced from https://charon.camras.nl/public/satnogs/. Each observation comprises an RF IQ (in-phase, quadrature) signal sampled at 48 kHz. LibriIQ-Dwingeloo spans 44 distinct satellites, 7 modulation types, and 100 total observations.
For seamless integration into ASR-based architectures, LibriIQ-Dwingeloo is designed to mimic the LibriSpeech ASR corpus. Taking account of the 48kHz sample rate, each RF IQ observation is split into a series of sequences with each sequence having an approximately equal number of samples as the LibriSpeech sequences used in wav2vec2.
Download LibriIQ-Dwingeloo from https://www.kaggle.com/datasets/matthewphelps/libriiq-dwingeloo. Requires registering a Kaggle account (free). Output is a 5 GB archive.zip file.
Clone the github repo.
git clone https://github.com/phelps-matthew/wav2vec-rf.git
cd ./wav2vec-rfExtract zipped contents and partition dataset, resulting in 28 GB of RF IQ sample sequences.
# requires python, numpy and tqdm
# alteratively, you can install the wav2vec-rf library as in #wav2vec-rf Installation
pip install numpy tqdm
# from repository directory
python ./libriiq_dwingeloo/create_dataset.pyAcronyms: SOI = signal of interest, AMC = automatic modulation classification, SEI = signal emitter identification.
LibriIQ-Dwingeloo contains 15240 RF IQ sequences, each having a duration 5 seconds. Among these, 6262 sequences contrain the target SOI. The soi_*.json files specify a 90/10 train/test split followed by a 80/20 train/val split for the task of SOI detection. Due to dataset imbalance, four-way random stratified sub-sampling can be performed using the provided seeds. Similarly, cls_*.json specify the train/val/test splits for performing AMC and SEI on the subset of sequences that contain the SOI.
libriiq_dwingeloo/dwingeloo
βββ samples # directory of numpy float32 RF IQ sequences of shape (2, 240000)
βΒ Β βββ iq_1452111_0000.npy
βΒ Β βββ ...
βΒ Β βββ iq_6291503_0142.npy
βββ annot.json # global annotation json containing all metadata for each sequence
βββ cls_80_train_20_val_seed_123.json # SOI sequence paths of 80/20 train/val split, seed 123, used for SEI and AMC
βββ cls_80_train_20_val_seed_1337.json
βββ cls_80_train_20_val_seed_271.json
βββ cls_80_train_20_val_seed_42.json
βββ cls_map.json # mapping from class (e.g. satellite ID) to integer ID
βββ cls_test_10.json # SOI sequence paths for held-out test set, used for SEI and AMC
βββ cls_weights.json # class balance weights for sequences containing SOI
βββ mode_map.json # mapping from modulation type to integer ID
βββ soi_80_train_20_val_seed_123.json # sequence paths of 80/20 train/val split, seed 123, used for SOI task
βββ soi_80_train_20_val_seed_1337.json
βββ soi_80_train_20_val_seed_271.json
βββ soi_80_train_20_val_seed_42.json
βββ soi_cls_map.json # mapping from class (e.g. Satellite ID) to integer ID, *including* null class
βββ soi_map.json # mapping from signal, no signal to integer 0 or 1
βββ soi_paths.json # sequence paths containing SOI
βββ soi_test_10.json # sequence paths for held-out test set, used for SOI task
- Create conda environment
conda create -n w2v-rf python=3.9 pip
conda activate w2v-rf
- Install torch and dependencies. Uses mlflow for logging artifacts/metrics and pyrallis for easy config management
pip install -U pip
# cuda version >= 11.0
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# cuda version < 11.0
pip install torch torchvision
pip install mlflow pyrallis pandas tqdm pillow
- Install repo
git clone https://github.com/phelps-matthew/wav2vec-rf.git
cd wav2vec-rf
pip install -e .