Citation

State-of-the-art pretrained music models for training, evaluation, inference

Marble is a modular, configuration-driven suite for training, evaluating, and performing inference on state-of-the-art pretrained music models. It leverages LightningCLI to provide easy extensibility and reproducibility.

News and Updates

📌 Join Us on MIREX Discord!
2025-06-04 Now MARBLE v2 is published on main branch! You could find the old version in main-v1-archived branch.

Key Features

Modularity: Each component—encoders, tasks, transforms, decoders—is isolated behind a common interface. You can mix and match without touching core logic.
Configurability: All experiments are driven by YAML configs. No code changes are needed to switch datasets, encoders, or training settings.
Reusability: Common routines (data loading, training loop, metrics) are implemented once in BaseTask, LightningDataModule, and shared modules.
Extensibility: Adding new encoders or tasks requires implementing a small subclass and registering it via a config.

┌──────────────────┐
│ DataModule       │  yields (waveform, label, path), optional audio transforms
└─▲────────────────┘
  │
  │ waveform                     Encoded →   hidden_states[B, L, T, H]
  ▼
┌─┴───────────────┐   embedding transforms (optional)
│ Encoder         │ ────────────────────────────────────────────────────┐
└─────────────────┘                                                     │
                                                                        ▼
                                                         (LayerSelector, TimeAvgPool…)
                                                                        │
                                                                        ▼
                                      ┌─────────────────────────────────┴──┐
                                      │ Decoder(s)                         │
                                      └────────────────────────────────────┘
                                                                  │ logits
                                                                  ▼
                                                   Loss ↔ Metrics ↔ Callbacks

Getting Started

Install dependencies:

# 1. create a new conda env
conda create -n marble python=3.10 -y
conda activate marble

# 2. install ffmpeg
conda install -c conda-forge ffmpeg -y

# 3. now install other dependencies
pip install -e .

# 4. [Optional] downgrade pip to 24.0 if you are using fairseq modules
# pip install pip==24.0
# pip install fairseq
# some encoders (e.g. Xcodec) may require additional dependencies, see marble/encoders/*/requirements.txt

Prepare data: python download.py all.
Configure: Copy an existing YAML from configs/ and edit paths, encoder settings, transforms, and task parameters.

Run:

python cli.py fit --config configs/probe.MERT-v1-95M.GTZANGenre.yaml
python cli.py test --config configs/probe.MERT-v1-95M.GTZANGenre.yaml

Results: Checkpoints and logs will be saved under output/ and logged in Weights & Biases.
Inference: We provide scripts for inference on pretrained models. See the Inference SOTA SSL MIR models section below.

Inference SOTA SSL MIR models

We are collaborating with MIREX to introduce state-of-the-art SSL-based models for Music Information Retrieval (MIR). We believe that the future of MIR lies in Self-Supervised Learning (SSL), as acquiring labeled data for MIR is costly, and fully supervised paradigms are too expensive. In contrast, the computational cost is continuously decreasing and will eventually become more affordable than manual labeling.

Key Prediction

The sota/predict_key.py script performs key prediction on audio files using a pretrained model. It automatically downloads the model from Hugging Face if necessary, processes audio clips in batches, and saves the predictions (key and confidence) to a JSONL file. To run, use the following command:

python sota/predict_key.py --filelist_path <filelist> --output_path <output> --batch_size 16 --download_dir <dir>

# You may reproduce the training/testing (if you have access to corresponding data) by running 
# bash sota/reproduce_key_sota_20250618.sh

Project Structure

.
├── marble/                   # Core code package
│   ├── core/                 # Base classes (BaseTask, BaseEncoder, BaseTransform)
│   ├── encoders/             # Wrapper classes for various SSL encoders
│   ├── modules/              # Shared transforms, callbacks, losses, decoders
│   ├── tasks/                # Downstream tasks (probe, few-shot, datamodules)
│   └── utils/                # IO utilities, instantiation helpers
├── cli.py                    # Entry-point for launching experiments
├── sota/                     # Scripts for state-of-the-art models and inference
├── configs/                  # Experiment configs (YAML)
├── data/                     # Datasets and metadata files
├── scripts/                  # Run scripts & utilities
├── tests/                    # Unit tests for transforms & datasets
├── pyproject.toml            # Python project metadata
└── README.md                 # This file

See marble/encoders/ for available encoders. See marble/tasks/ for available tasks.

🚀 Adding a New Encoder

Marble supports two flexible extension modes for encoders:

Mode 1: Internal Extension

Implement your encoder under marble/encoders/:

# marble/encoders/my_encoder.py
from marble.core.base_encoder import BaseAudioEncoder

class MyEncoder(BaseAudioEncoder):
   def __init__(self, arg1, arg2):
      super().__init__()
      # initialize your model

   def forward(self, waveforms):
      # return List[Tensor] of shape (batch, layer, seq_len, hidden_size)
      # or return a dict of representations

Reference it in your YAML:

model:
  encoder:
    class_path: marble.encoders.my_encoder.MyEncoder
    init_args:
      arg1: 123
      arg2: 456

Mode 2: External Extension

Place my_encoder.py anywhere in your project (e.g. ./my_project/my_encoder.py).

Use the full import path in your YAML:

model:
  encoder:
    class_path: my_project.my_encoder.MyEncoder
    init_args:
      arg1: 123

Optional:

If your encoder needs embedding-level transforms, implement a BaseEmbTransform subclass and register under emb_transforms.

If you need custom audio preprocessing, subclass BaseAudioTransform and register under audio_transforms.

emb_transforms:
  - class_path: marble.modules.transforms.MyEmbTransform
    init_args:
      param: value

audio_transforms:
  train:
    - class_path: marble.modules.transforms.MyAudioTransform
      init_args:
        param: value

🚀 Adding a New Task

Marble supports two extension modes for tasks as well:

Mode 1: Internal Extension

Create a new task package under marble/tasks/YourTask/:

marble/tasks/YourTask/
├── __init__.py
├── datamodule.py    # Your LightningDataModule subclass
└── probe.py          # Your BaseTask subclass, e.g. probe, finetune, fewshot

Implement your classes:

# datamodule.py
import pytorch_lightning as pl

class YourDataModule(pl.LightningDataModule):
    def setup(self, stage=None):
        ...
    def train_dataloader(self):
        ...
    # val_dataloader, test_dataloader, etc.

# probe.py
from marble.core.base_task import BaseTask

class YourTask(BaseTask):
    def __init__(self, encoder, emb_transforms, decoders, losses, metrics, sample_rate, use_ema):
        super().__init__(...)
        # custom behavior here

Point your YAML to these classes:

task:
  class_path: marble.tasks.YourTask.probe.YourTask
  init_args:
    sample_rate: 22050
    use_ema: false

data:
  class_path: marble.tasks.YourTask.datamodule.YourDataModule

Mode 2: External Extension

Place your task code anywhere in your project (e.g. ./my_project/probe.py, ./my_project/datamodule.py).

Reference via full import path:

model:
  class_path: my_project.probe.CustomTask

data:
  class_path: my_project.datamodule.CustomDataModule

Citation

@article{yuan2023marble,
  title={Marble: Music audio representation benchmark for universal evaluation},
  author={Yuan, Ruibin and Ma, Yinghao and Li, Yizhi and Zhang, Ge and Chen, Xingran and Yin, Hanzhi and Liu, Yiqi and Huang, Jiawen and Tian, Zeyue and Deng, Binyue and others},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={39626--39647},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

State-of-the-art pretrained music models for training, evaluation, inference

News and Updates

Key Features

Getting Started

Inference SOTA SSL MIR models

Key Prediction

Project Structure

🚀 Adding a New Encoder

Mode 1: Internal Extension

Mode 2: External Extension

🚀 Adding a New Task

Mode 1: Internal Extension

Mode 2: External Extension

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
configs		configs
marble		marble
scripts		scripts
sota		sota
tests		tests
.gitignore		.gitignore
README.md		README.md
cli.py		cli.py
download.py		download.py
pyproject.toml		pyproject.toml
setup.py		setup.py

a43992899/MARBLE

Folders and files

Latest commit

History

Repository files navigation

State-of-the-art pretrained music models for training, evaluation, inference

News and Updates

Key Features

Getting Started

Inference SOTA SSL MIR models

Key Prediction

Project Structure

🚀 Adding a New Encoder

Mode 1: Internal Extension

Mode 2: External Extension

🚀 Adding a New Task

Mode 1: Internal Extension

Mode 2: External Extension

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages