Marble is a modular, configuration-driven suite for training, evaluating, and performing inference on state-of-the-art pretrained music models. It leverages LightningCLI to provide easy extensibility and reproducibility.
- π Join Us on MIREX Discord!
- 2025-06-04 Now MARBLE v2 is published on main branch! You could find the old version in
main-v1-archived
branch.
- Modularity: Each componentβencoders, tasks, transforms, decodersβis isolated behind a common interface. You can mix and match without touching core logic.
- Configurability: All experiments are driven by YAML configs. No code changes are needed to switch datasets, encoders, or training settings.
- Reusability: Common routines (data loading, training loop, metrics) are implemented once in
BaseTask
,LightningDataModule
, and shared modules. - Extensibility: Adding new encoders or tasks requires implementing a small subclass and registering it via a config.
ββββββββββββββββββββ
β DataModule β yields (waveform, label, path), optional audio transforms
βββ²βββββββββββββββββ
β
β waveform Encoded β hidden_states[B, L, T, H]
βΌ
βββ΄ββββββββββββββββ embedding transforms (optional)
β Encoder β βββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ β
βΌ
(LayerSelector, TimeAvgPoolβ¦)
β
βΌ
βββββββββββββββββββββββββββββββββββ΄βββ
β Decoder(s) β
ββββββββββββββββββββββββββββββββββββββ
β logits
βΌ
Loss β Metrics β Callbacks
-
Install dependencies:
# 1. create a new conda env conda create -n marble python=3.10 -y conda activate marble # 2. install ffmpeg conda install -c conda-forge ffmpeg -y # 3. now install other dependencies pip install -e . # 4. [Optional] downgrade pip to 24.0 if you are using fairseq modules # pip install pip==24.0 # pip install fairseq # some encoders (e.g. Xcodec) may require additional dependencies, see marble/encoders/*/requirements.txt
-
Prepare data:
python download.py all
. -
Configure: Copy an existing YAML from
configs/
and edit paths, encoder settings, transforms, and task parameters. -
Run:
python cli.py fit --config configs/probe.MERT-v1-95M.GTZANGenre.yaml python cli.py test --config configs/probe.MERT-v1-95M.GTZANGenre.yaml
-
Results: Checkpoints and logs will be saved under
output/
and logged in Weights & Biases. -
Inference: We provide scripts for inference on pretrained models. See the Inference SOTA SSL MIR models section below.
We are collaborating with MIREX to introduce state-of-the-art SSL-based models for Music Information Retrieval (MIR). We believe that the future of MIR lies in Self-Supervised Learning (SSL), as acquiring labeled data for MIR is costly, and fully supervised paradigms are too expensive. In contrast, the computational cost is continuously decreasing and will eventually become more affordable than manual labeling.
The sota/predict_key.py
script performs key prediction on audio files using a pretrained model. It automatically downloads the model from Hugging Face if necessary, processes audio clips in batches, and saves the predictions (key and confidence) to a JSONL file. To run, use the following command:
python sota/predict_key.py --filelist_path <filelist> --output_path <output> --batch_size 16 --download_dir <dir>
# You may reproduce the training/testing (if you have access to corresponding data) by running
# bash sota/reproduce_key_sota_20250618.sh
.
βββ marble/ # Core code package
β βββ core/ # Base classes (BaseTask, BaseEncoder, BaseTransform)
β βββ encoders/ # Wrapper classes for various SSL encoders
β βββ modules/ # Shared transforms, callbacks, losses, decoders
β βββ tasks/ # Downstream tasks (probe, few-shot, datamodules)
β βββ utils/ # IO utilities, instantiation helpers
βββ cli.py # Entry-point for launching experiments
βββ sota/ # Scripts for state-of-the-art models and inference
βββ configs/ # Experiment configs (YAML)
βββ data/ # Datasets and metadata files
βββ scripts/ # Run scripts & utilities
βββ tests/ # Unit tests for transforms & datasets
βββ pyproject.toml # Python project metadata
βββ README.md # This file
See marble/encoders/
for available encoders.
See marble/tasks/
for available tasks.
Marble supports two flexible extension modes for encoders:
-
Implement your encoder under
marble/encoders/
:# marble/encoders/my_encoder.py from marble.core.base_encoder import BaseAudioEncoder class MyEncoder(BaseAudioEncoder): def __init__(self, arg1, arg2): super().__init__() # initialize your model def forward(self, waveforms): # return List[Tensor] of shape (batch, layer, seq_len, hidden_size) # or return a dict of representations
-
Reference it in your YAML:
model: encoder: class_path: marble.encoders.my_encoder.MyEncoder init_args: arg1: 123 arg2: 456
-
Place
my_encoder.py
anywhere in your project (e.g../my_project/my_encoder.py
). -
Use the full import path in your YAML:
model: encoder: class_path: my_project.my_encoder.MyEncoder init_args: arg1: 123
Optional:
- If your encoder needs embedding-level transforms, implement a
BaseEmbTransform
subclass and register underemb_transforms
.- If you need custom audio preprocessing, subclass
BaseAudioTransform
and register underaudio_transforms
.
emb_transforms:
- class_path: marble.modules.transforms.MyEmbTransform
init_args:
param: value
audio_transforms:
train:
- class_path: marble.modules.transforms.MyAudioTransform
init_args:
param: value
Marble supports two extension modes for tasks as well:
-
Create a new task package under
marble/tasks/YourTask/
:marble/tasks/YourTask/ βββ __init__.py βββ datamodule.py # Your LightningDataModule subclass βββ probe.py # Your BaseTask subclass, e.g. probe, finetune, fewshot
-
Implement your classes:
# datamodule.py import pytorch_lightning as pl class YourDataModule(pl.LightningDataModule): def setup(self, stage=None): ... def train_dataloader(self): ... # val_dataloader, test_dataloader, etc. # probe.py from marble.core.base_task import BaseTask class YourTask(BaseTask): def __init__(self, encoder, emb_transforms, decoders, losses, metrics, sample_rate, use_ema): super().__init__(...) # custom behavior here
-
Point your YAML to these classes:
task: class_path: marble.tasks.YourTask.probe.YourTask init_args: sample_rate: 22050 use_ema: false data: class_path: marble.tasks.YourTask.datamodule.YourDataModule
-
Place your task code anywhere in your project (e.g.
./my_project/probe.py
,./my_project/datamodule.py
). -
Reference via full import path:
model: class_path: my_project.probe.CustomTask data: class_path: my_project.datamodule.CustomDataModule
@article{yuan2023marble,
title={Marble: Music audio representation benchmark for universal evaluation},
author={Yuan, Ruibin and Ma, Yinghao and Li, Yizhi and Zhang, Ge and Chen, Xingran and Yin, Hanzhi and Liu, Yiqi and Huang, Jiawen and Tian, Zeyue and Deng, Binyue and others},
journal={Advances in Neural Information Processing Systems},
volume={36},
pages={39626--39647},
year={2023}
}