Awesome Python Audio and Music 🎵

A curated list of Python tools, libraries, and resources for audio and music processing, analysis, synthesis, and playback.

Audio Processing & I/O

audioread: Cross-library audio decoding (GStreamer + Core Audio + MAD + FFmpeg)
audiomentations: Audio data augmentation library for machine learning
babycat: Audio manipulation library for Rust Python WebAssembly and C
matchering: Open source audio matching and mastering
Matchering-cli: Command line application for Matchering 2.0
noisereduce: Noise reduction using spectral gating
numpy & scipy.io.wavfile: Read/write and manipulate WAV files
pedalboard: Spotify's library for audio effects and processing
PyDub: Manipulate audio with a simple and easy high level interface
pyAudioProcessing: Audio feature extraction classification and segmentation
SoundDevice: Play and record audio with Python
soundfile: Read and write sound files using libsndfile
torch-audiomentations: Fast GPU audio data augmentation for PyTorch
torchaudio: Audio data manipulation and transformation powered by PyTorch
wave: Read and write WAV files (Python standard library)

Analysis & Feature Extraction

aubio: Library for audio and music analysis including pitch and beat detection
audioFlux: Library for audio and music analysis and feature extraction
Essentia: C++ library with Python bindings for audio analysis and MIR
librosa: Python package for music and audio analysis
Madmom: Audio signal processing library focused on MIR tasks
mir_eval: Evaluation functions for MIR and audio signal processing algorithms
mirdata: Python library for working with MIR datasets
nnAudio: GPU audio processing using PyTorch neural networks
pyAudioAnalysis: Audio feature extraction classification segmentation and visualization
Pyo: Python DSP module with synthesis and analysis capabilities
scipy.signal: Signal processing routines for SciPy
timeside: Framework for audio analysis imaging transcoding and streaming

Audio Embeddings & Representations

CLAP (LAION): Contrastive Language-Audio Pretraining for zero-shot audio classification
CLAP (Microsoft): Learning audio concepts from natural language supervision
OpenL3: Open-source deep audio and image embeddings
panns-inference: Pretrained audio neural networks for audio tagging and sound event detection
wav2vec2: Self-supervised speech representations from Facebook AI

Speech Processing

Speech-to-Text

Whisper: OpenAI's robust multilingual speech recognition model
faster-whisper: CTranslate2 reimplementation of Whisper up to 4x faster
WhisperX: Whisper with word-level timestamps and speaker diarization
SpeechRecognition: Library for performing speech recognition with multiple backends
Vosk: Offline speech recognition API supporting 20+ languages
SpeechBrain: PyTorch toolkit for speech processing and conversational AI
pyannote-audio: Neural speaker diarization and voice activity detection

Text-to-Speech

Coqui TTS: Deep learning toolkit for Text-to-Speech
Bark: Transformer-based text-to-audio model with emotions and non-speech sounds
pyttsx3: Offline text-to-speech conversion library

Source Separation

Demucs: State-of-the-art music source separation from Meta
audio-separator: Easy stem separation using MDX-Net VR Arch and Demucs models
Asteroid: PyTorch-based audio source separation toolkit for researchers
pydsm: Google's toolkit for sound separation using deep learning
Spleeter: Deezer source separation library (note: Demucs now preferred)

Music Transcription & Pitch

basic-pitch: Spotify's lightweight neural network for polyphonic pitch detection
CREPE: Monophonic pitch tracker using deep convolutional neural network
torchcrepe: PyTorch implementation of CREPE pitch tracker
MT3: Multi-instrument automatic music transcription from Google Magenta
piano_transcription_inference: High-resolution piano transcription with pedal detection

Music Generation & AI

AudioCraft: Meta's library for MusicGen AudioGen EnCodec and MAGNeT models
Stable Audio Tools: Generative models for conditional audio generation from Stability AI
Riffusion: Real-time music generation using stable diffusion on spectrograms
Magenta: Google's machine learning for music and art generation
musicautobot: Music generation with transformers using fastai
NSynth: Neural audio synthesis model from Magenta

Synthesis & Sound Design

ctcsound: Python bindings for Csound using ctypes
Mido: MIDI objects for Python
Pippi: Computer music composition library
pyfluidsynth: Python bindings for FluidSynth software synthesizer
Python-audio: Jupyter notebooks about audio signal processing with Python
Renardo: Maintained fork of FoxDot for Python live coding music
sc3nb: SuperCollider integration for Python and Jupyter notebooks

Music Theory & Composition

Abjad: Python API for building LilyPond music notation files
AthenaCL: Algorithmic composition tool (Python 3 fork)
maelzel: Framework for computer music in Python
MIDIUtil: Pure Python library for creating multi-track MIDI files
mingus: Advanced music theory and notation package
music21: Toolkit for computer-aided musical analysis
MusPy: Toolkit for symbolic music generation
pretty-midi: MIDI data handling and manipulation library
pychord: Handle and transform musical chords
scamp: Suite for Computer-Assisted Music in Python

Playback & Services

audiostream: Audio API for streaming raw data to speakers
beets: Music library manager and MusicBrainz tagger
discord.py: Python wrapper for Discord API with music streaming
freesound-python: Freesound API wrapper for audio retrieval and analysis
miniaudio: Python bindings for miniaudio audio playback library
Mopidy: Extensible music server written in Python
Mopidy-YouTube: Mopidy extension for playing music from YouTube
mpv: Python interface to MPV media player
MusicBot: Discord music bot written in Python
pyAV: Pythonic bindings for FFmpeg libraries
pygame.mixer: Pygame module for sound loading and playback
pyglet: Cross-platform windowing and multimedia library
pyradio: Command line internet radio player
Spotipy: Python client for the Spotify Web API

Datasets

Audio

AudioSet: Large-scale dataset of manually annotated audio events
Birdsong: Dataset of annotated bird songs and calls
Common Voice: Mozilla's open source multilingual speech dataset
ESC-50: Environmental sound classification dataset
Free Spoken Digit Dataset: Dataset of spoken digits in English
Freesound Dataset: Collaborative dataset of audio samples from Freesound
LibriSpeech: Large corpus of read English speech for ASR research
RAVDESS: Audio-visual dataset of emotional speech and song
Speech Commands: Dataset for speech command recognition
TIDIGITS: Spoken digit dataset for speech recognition
UrbanSound8K: 8000 urban sound samples in 10 classes
VCTK: Multispeaker speech dataset for voice technologies
VoxCeleb: Large-scale speaker identification dataset

Music

Beatport EDM Key: Electronic dance music tracks with musical key labels
DALI: Dataset of lyrics and audio with time alignments
DEAM: MediaEval dataset for music emotion recognition
FMA: Free Music Archive dataset for music analysis
GiantMIDI-Piano: Large-scale MIDI dataset of classical piano music
hsmusic: Huge symbolic music dataset
IRMAS: Instrument recognition in musical audio signals
Jamendo Audio Tagging: Multi-label audio tagging dataset
LAION-Audio-630K: Large collection of audio-text pairs for CLAP training
MAESTRO: MIDI and audio dataset for music transcription and generation
MagnaTagATune: Dataset for music annotation and audio tagging
MedleyDB: Dataset for multi-track mixing research
Musdb18: Dataset for music source separation
MusicCaps: Dataset of music clips with rich text descriptions
MusicNet: Dataset of classical music with instrument labels
NSynth: Large-scale dataset of annotated musical notes
Open MIC: Open Music Instrument Classification dataset
RWC Music Database: Musical instrument sound genre and rhythm databases
symbolic-music-datasets: Collection of symbolic music datasets
The Million Song Dataset: Massive collection of audio features and metadata

Tutorials

librosa tutorial - Introduction: Advanced librosa tutorial covering spectrograms and remixing
librosa tutorial - Visualization: Visualizing sounds using librosa and matplotlib
PyDub tutorial: Working with WAV files using PyDub
Whisper tutorial: Using OpenAI Whisper for speech-to-text
AudioCraft tutorial: Getting started with MusicGen and AudioGen
Hugging Face Audio Course: Comprehensive course on audio ML with transformers

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docker		docker
LICENSE		LICENSE
README.md		README.md
build.py		build.py
dataframe.csv		dataframe.csv
head.md		head.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Python Audio and Music 🎵

Audio Processing & I/O

Analysis & Feature Extraction

Audio Embeddings & Representations

Speech Processing

Speech-to-Text

Text-to-Speech

Source Separation

Music Transcription & Pitch

Music Generation & AI

Synthesis & Sound Design

Music Theory & Composition

Playback & Services

Datasets

Audio

Music

Tutorials

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

andreimatveyeu/awesome-python-audio

Folders and files

Latest commit

History

Repository files navigation

Awesome Python Audio and Music 🎵

Audio Processing & I/O

Analysis & Feature Extraction

Audio Embeddings & Representations

Speech Processing

Speech-to-Text

Text-to-Speech

Source Separation

Music Transcription & Pitch

Music Generation & AI

Synthesis & Sound Design

Music Theory & Composition

Playback & Services

Datasets

Audio

Music

Tutorials

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages