HeartMuLa reimplementation #2442

bghira · 2026-01-18T18:41:00Z

This pull request adds support for the HeartCodec audio codec model, including its configuration, flow-matching inference, and integration into the codebase. It also introduces improvements for handling audio token fields and autoregressive models throughout the metadata and model helper modules. The most important changes are grouped below:

HeartCodec Model Integration

Added the new HeartCodec model, including its configuration (HeartCodecConfig), flow-matching logic (FlowMatching), and scalar codec (ScalarModel), along with all necessary methods for detokenizing audio from codebook tokens. [1] [2] [3] [4]

Support for Audio Token Fields

Updated metadata backends (huggingface.py, parquet.py) to extract and properly handle audio_tokens and audio_tokens_path fields, converting them to lists if needed. [1] [2]
Enhanced the audio sample processing in the discovery backend to merge existing image metadata into audio metadata, ensuring completeness.

Autoregressive Model and Token Handling

Added a new prediction type, AUTOREGRESSIVE_NEXT_TOKEN, to the PredictionTypes enum and updated the string conversion logic to recognize it. [1] [2]
Introduced model hooks for autoregressive audio models, such as uses_audio_tokens, collate_audio_tokens, and updated uses_noise_schedule logic to support models that do not use diffusion noise schedules. [1] [2] [3]

Dependency Updates

Added vector-quantize-pytorch as a required dependency in setup.py for the new codec implementation.

… logging

kabachuha · 2026-01-19T12:08:08Z

The model, including the code, has viral non-commercial CC_BY_NC license.

I don't think it makes sense including it here. Unviable for production, it will be forgotten in a couple of weeks, unlike Ace-Step

bghira · 2026-01-19T12:14:42Z

this is a reimplementation without the viral license, but yeah, the model is non-commercial. it's for research purposes, but, it sounds really good in Chinese.

HeartMuLa reimplementation

da503a3

bghira requested a review from Copilot January 18, 2026 18:41

Copilot started reviewing on behalf of bghira January 18, 2026 18:41 View session

This comment was marked as resolved.

Sign in to view

bghira added 4 commits January 18, 2026 12:58

bos/eos handling fix and warn instead of silently delete args. better…

9b24aee

… logging

make audio token branch opt-in during collate

30d76ad

add HeartMuLa quickstart

1c4d9ac

add HeartMuLa to README

790f90c

bghira marked this pull request as ready for review January 18, 2026 21:46

add HeartMuLa example config

b6c6a72

bghira force-pushed the feature/heartmula branch from 887b97b to b6c6a72 Compare January 18, 2026 22:07

bghira added 2 commits January 18, 2026 16:24

fix crash in tokeniser load

744811b

fix crash in text embed creation attempts

7389485

bghira merged commit 1865764 into main Jan 19, 2026
4 checks passed

bghira deleted the feature/heartmula branch January 19, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HeartMuLa reimplementation #2442

HeartMuLa reimplementation #2442

Uh oh!

bghira commented Jan 18, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

kabachuha commented Jan 19, 2026

Uh oh!

bghira commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HeartMuLa reimplementation #2442

HeartMuLa reimplementation #2442

Uh oh!

Conversation

bghira commented Jan 18, 2026

HeartCodec Model Integration

Support for Audio Token Fields

Autoregressive Model and Token Handling

Dependency Updates

Uh oh!

This comment was marked as resolved.

Uh oh!

kabachuha commented Jan 19, 2026

Uh oh!

bghira commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants