Skip to content

a43992899/MARBLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Marble Logo

State-of-the-art pretrained music models for training, evaluation, inference

Marble is a modular, configuration-driven suite for training, evaluating, and performing inference on state-of-the-art pretrained music models. It leverages LightningCLI to provide easy extensibility and reproducibility.

News and Updates

  • πŸ“Œ Join Us on MIREX Discord! join discord
  • 2025-06-04 Now MARBLE v2 is published on main branch! You could find the old version in main-v1-archived branch.

Key Features

  1. Modularity: Each componentβ€”encoders, tasks, transforms, decodersβ€”is isolated behind a common interface. You can mix and match without touching core logic.
  2. Configurability: All experiments are driven by YAML configs. No code changes are needed to switch datasets, encoders, or training settings.
  3. Reusability: Common routines (data loading, training loop, metrics) are implemented once in BaseTask, LightningDataModule, and shared modules.
  4. Extensibility: Adding new encoders or tasks requires implementing a small subclass and registering it via a config.
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DataModule       β”‚  yields (waveform, label, path), optional audio transforms
β””β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  β”‚
  β”‚ waveform                     Encoded β†’   hidden_states[B, L, T, H]
  β–Ό
β”Œβ”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   embedding transforms (optional)
β”‚ Encoder         β”‚ ────────────────────────────────────────────────────┐
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                     β”‚
                                                                        β–Ό
                                                         (LayerSelector, TimeAvgPool…)
                                                                        β”‚
                                                                        β–Ό
                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”
                                      β”‚ Decoder(s)                         β”‚
                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                  β”‚ logits
                                                                  β–Ό
                                                   Loss ↔ Metrics ↔ Callbacks

Getting Started

  1. Install dependencies:

    # 1. create a new conda env
    conda create -n marble python=3.10 -y
    conda activate marble
    
    # 2. install ffmpeg
    conda install -c conda-forge ffmpeg -y
    
    # 3. now install other dependencies
    pip install -e .
    
    # 4. [Optional] downgrade pip to 24.0 if you are using fairseq modules
    # pip install pip==24.0
    # pip install fairseq
    # some encoders (e.g. Xcodec) may require additional dependencies, see marble/encoders/*/requirements.txt
  2. Prepare data: python download.py all.

  3. Configure: Copy an existing YAML from configs/ and edit paths, encoder settings, transforms, and task parameters.

  4. Run:

    python cli.py fit --config configs/probe.MERT-v1-95M.GTZANGenre.yaml
    python cli.py test --config configs/probe.MERT-v1-95M.GTZANGenre.yaml
  5. Results: Checkpoints and logs will be saved under output/ and logged in Weights & Biases.

  6. Inference: We provide scripts for inference on pretrained models. See the Inference SOTA SSL MIR models section below.

Inference SOTA SSL MIR models

We are collaborating with MIREX to introduce state-of-the-art SSL-based models for Music Information Retrieval (MIR). We believe that the future of MIR lies in Self-Supervised Learning (SSL), as acquiring labeled data for MIR is costly, and fully supervised paradigms are too expensive. In contrast, the computational cost is continuously decreasing and will eventually become more affordable than manual labeling.

Key Prediction

The sota/predict_key.py script performs key prediction on audio files using a pretrained model. It automatically downloads the model from Hugging Face if necessary, processes audio clips in batches, and saves the predictions (key and confidence) to a JSONL file. To run, use the following command:

python sota/predict_key.py --filelist_path <filelist> --output_path <output> --batch_size 16 --download_dir <dir>

# You may reproduce the training/testing (if you have access to corresponding data) by running 
# bash sota/reproduce_key_sota_20250618.sh

Project Structure

.
β”œβ”€β”€ marble/                   # Core code package
β”‚   β”œβ”€β”€ core/                 # Base classes (BaseTask, BaseEncoder, BaseTransform)
β”‚   β”œβ”€β”€ encoders/             # Wrapper classes for various SSL encoders
β”‚   β”œβ”€β”€ modules/              # Shared transforms, callbacks, losses, decoders
β”‚   β”œβ”€β”€ tasks/                # Downstream tasks (probe, few-shot, datamodules)
β”‚   └── utils/                # IO utilities, instantiation helpers
β”œβ”€β”€ cli.py                    # Entry-point for launching experiments
β”œβ”€β”€ sota/                     # Scripts for state-of-the-art models and inference
β”œβ”€β”€ configs/                  # Experiment configs (YAML)
β”œβ”€β”€ data/                     # Datasets and metadata files
β”œβ”€β”€ scripts/                  # Run scripts & utilities
β”œβ”€β”€ tests/                    # Unit tests for transforms & datasets
β”œβ”€β”€ pyproject.toml            # Python project metadata
└── README.md                 # This file

See marble/encoders/ for available encoders. See marble/tasks/ for available tasks.

πŸš€ Adding a New Encoder

Marble supports two flexible extension modes for encoders:

Mode 1: Internal Extension

  1. Implement your encoder under marble/encoders/:

    # marble/encoders/my_encoder.py
    from marble.core.base_encoder import BaseAudioEncoder
    
    class MyEncoder(BaseAudioEncoder):
       def __init__(self, arg1, arg2):
          super().__init__()
          # initialize your model
    
       def forward(self, waveforms):
          # return List[Tensor] of shape (batch, layer, seq_len, hidden_size)
          # or return a dict of representations
  2. Reference it in your YAML:

    model:
      encoder:
        class_path: marble.encoders.my_encoder.MyEncoder
        init_args:
          arg1: 123
          arg2: 456

Mode 2: External Extension

  1. Place my_encoder.py anywhere in your project (e.g. ./my_project/my_encoder.py).

  2. Use the full import path in your YAML:

    model:
      encoder:
        class_path: my_project.my_encoder.MyEncoder
        init_args:
          arg1: 123

Optional:

  • If your encoder needs embedding-level transforms, implement a BaseEmbTransform subclass and register under emb_transforms.
  • If you need custom audio preprocessing, subclass BaseAudioTransform and register under audio_transforms.
emb_transforms:
  - class_path: marble.modules.transforms.MyEmbTransform
    init_args:
      param: value

audio_transforms:
  train:
    - class_path: marble.modules.transforms.MyAudioTransform
      init_args:
        param: value

πŸš€ Adding a New Task

Marble supports two extension modes for tasks as well:

Mode 1: Internal Extension

  1. Create a new task package under marble/tasks/YourTask/:

    marble/tasks/YourTask/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ datamodule.py    # Your LightningDataModule subclass
    └── probe.py          # Your BaseTask subclass, e.g. probe, finetune, fewshot
    
  2. Implement your classes:

    # datamodule.py
    import pytorch_lightning as pl
    
    class YourDataModule(pl.LightningDataModule):
        def setup(self, stage=None):
            ...
        def train_dataloader(self):
            ...
        # val_dataloader, test_dataloader, etc.
    
    # probe.py
    from marble.core.base_task import BaseTask
    
    class YourTask(BaseTask):
        def __init__(self, encoder, emb_transforms, decoders, losses, metrics, sample_rate, use_ema):
            super().__init__(...)
            # custom behavior here
  3. Point your YAML to these classes:

    task:
      class_path: marble.tasks.YourTask.probe.YourTask
      init_args:
        sample_rate: 22050
        use_ema: false
    
    data:
      class_path: marble.tasks.YourTask.datamodule.YourDataModule

Mode 2: External Extension

  1. Place your task code anywhere in your project (e.g. ./my_project/probe.py, ./my_project/datamodule.py).

  2. Reference via full import path:

    model:
      class_path: my_project.probe.CustomTask
    
    data:
      class_path: my_project.datamodule.CustomDataModule

Citation

@article{yuan2023marble,
  title={Marble: Music audio representation benchmark for universal evaluation},
  author={Yuan, Ruibin and Ma, Yinghao and Li, Yizhi and Zhang, Ge and Chen, Xingran and Yin, Hanzhi and Liu, Yiqi and Huang, Jiawen and Tian, Zeyue and Deng, Binyue and others},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={39626--39647},
  year={2023}
}

About

State-of-the-art pretrained music models for training, evaluation, inference

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published