Skip to content

wav2vec-RF: Applying ASR to Raw Radio Signals Intercepted From Low Earth Orbit Satellites (Official Repo)

License

Notifications You must be signed in to change notification settings

phelps-matthew/wav2vec-rf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‘ wav2vec-rf

wav2vec-RF: Applying ASR to Raw Radio Signals Intercepted From Low Earth Orbit Satellites (Official Repo)

Full release coming soon.

LibriIQ-Dwingeloo Dataset

The LibriIQ-Dwingeloo dataset contains RF observations of low Earth orbit (LEO) satellite transmissions in the ultra-high-frequency (UHF) band sourced from https://charon.camras.nl/public/satnogs/. Each observation comprises an RF IQ (in-phase, quadrature) signal sampled at 48 kHz. LibriIQ-Dwingeloo spans 44 distinct satellites, 7 modulation types, and 100 total observations.

For seamless integration into ASR-based architectures, LibriIQ-Dwingeloo is designed to mimic the LibriSpeech ASR corpus. Taking account of the 48kHz sample rate, each RF IQ observation is split into a series of sequences with each sequence having an approximately equal number of samples as the LibriSpeech sequences used in wav2vec2.

Download

Download LibriIQ-Dwingeloo from https://www.kaggle.com/datasets/matthewphelps/libriiq-dwingeloo. Requires registering a Kaggle account (free). Output is a 5 GB archive.zip file.

Create

Clone the github repo.

git clone https://github.com/phelps-matthew/wav2vec-rf.git
cd ./wav2vec-rf

Extract zipped contents and partition dataset, resulting in 28 GB of RF IQ sample sequences.

# requires python, numpy and tqdm
# alteratively, you can install the wav2vec-rf library as in #wav2vec-rf Installation
pip install numpy tqdm

# from repository directory
python ./libriiq_dwingeloo/create_dataset.py

Format

Acronyms: SOI = signal of interest, AMC = automatic modulation classification, SEI = signal emitter identification.

LibriIQ-Dwingeloo contains 15240 RF IQ sequences, each having a duration 5 seconds. Among these, 6262 sequences contrain the target SOI. The soi_*.json files specify a 90/10 train/test split followed by a 80/20 train/val split for the task of SOI detection. Due to dataset imbalance, four-way random stratified sub-sampling can be performed using the provided seeds. Similarly, cls_*.json specify the train/val/test splits for performing AMC and SEI on the subset of sequences that contain the SOI.

libriiq_dwingeloo/dwingeloo
β”œβ”€β”€ samples  				# directory of numpy float32 RF IQ sequences of shape (2, 240000)
β”‚Β Β  β”œβ”€β”€ iq_1452111_0000.npy
β”‚Β Β  β”œβ”€β”€ ...
β”‚Β Β  └── iq_6291503_0142.npy
β”œβ”€β”€ annot.json  			# global annotation json containing all metadata for each sequence
β”œβ”€β”€ cls_80_train_20_val_seed_123.json   # SOI sequence paths of 80/20 train/val split, seed 123, used for SEI and AMC
β”œβ”€β”€ cls_80_train_20_val_seed_1337.json
β”œβ”€β”€ cls_80_train_20_val_seed_271.json
β”œβ”€β”€ cls_80_train_20_val_seed_42.json
β”œβ”€β”€ cls_map.json  			# mapping from class (e.g. satellite ID) to integer ID
β”œβ”€β”€ cls_test_10.json  			# SOI sequence paths for held-out test set, used for SEI and AMC
β”œβ”€β”€ cls_weights.json  			# class balance weights for sequences containing SOI
β”œβ”€β”€ mode_map.json  			# mapping from modulation type to integer ID
β”œβ”€β”€ soi_80_train_20_val_seed_123.json   # sequence paths of 80/20 train/val split, seed 123, used for SOI task
β”œβ”€β”€ soi_80_train_20_val_seed_1337.json
β”œβ”€β”€ soi_80_train_20_val_seed_271.json
β”œβ”€β”€ soi_80_train_20_val_seed_42.json
β”œβ”€β”€ soi_cls_map.json  			# mapping from class (e.g. Satellite ID) to integer ID, *including* null class
β”œβ”€β”€ soi_map.json                        # mapping from signal, no signal to integer 0 or 1
β”œβ”€β”€ soi_paths.json                      # sequence paths containing SOI
└── soi_test_10.json                    # sequence paths for held-out test set, used for SOI task

Install wav2vec-rf

  • Create conda environment
conda create -n w2v-rf python=3.9 pip
conda activate w2v-rf
  • Install torch and dependencies. Uses mlflow for logging artifacts/metrics and pyrallis for easy config management
pip install -U pip

# cuda version >= 11.0
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# cuda version < 11.0
pip install torch torchvision

pip install mlflow pyrallis pandas tqdm pillow
  • Install repo
git clone https://github.com/phelps-matthew/wav2vec-rf.git
cd wav2vec-rf
pip install -e .

About

wav2vec-RF: Applying ASR to Raw Radio Signals Intercepted From Low Earth Orbit Satellites (Official Repo)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages