Skip to content

Conversation

@bghira
Copy link
Owner

@bghira bghira commented Jan 18, 2026

This pull request adds support for the HeartCodec audio codec model, including its configuration, flow-matching inference, and integration into the codebase. It also introduces improvements for handling audio token fields and autoregressive models throughout the metadata and model helper modules. The most important changes are grouped below:

HeartCodec Model Integration

  • Added the new HeartCodec model, including its configuration (HeartCodecConfig), flow-matching logic (FlowMatching), and scalar codec (ScalarModel), along with all necessary methods for detokenizing audio from codebook tokens. [1] [2] [3] [4]

Support for Audio Token Fields

  • Updated metadata backends (huggingface.py, parquet.py) to extract and properly handle audio_tokens and audio_tokens_path fields, converting them to lists if needed. [1] [2]
  • Enhanced the audio sample processing in the discovery backend to merge existing image metadata into audio metadata, ensuring completeness.

Autoregressive Model and Token Handling

  • Added a new prediction type, AUTOREGRESSIVE_NEXT_TOKEN, to the PredictionTypes enum and updated the string conversion logic to recognize it. [1] [2]
  • Introduced model hooks for autoregressive audio models, such as uses_audio_tokens, collate_audio_tokens, and updated uses_noise_schedule logic to support models that do not use diffusion noise schedules. [1] [2] [3]

Dependency Updates

  • Added vector-quantize-pytorch as a required dependency in setup.py for the new codec implementation.

This comment was marked as resolved.

@bghira bghira marked this pull request as ready for review January 18, 2026 21:46
@bghira bghira force-pushed the feature/heartmula branch from 887b97b to b6c6a72 Compare January 18, 2026 22:07
@kabachuha
Copy link
Contributor

The model, including the code, has viral non-commercial CC_BY_NC license.

I don't think it makes sense including it here. Unviable for production, it will be forgotten in a couple of weeks, unlike Ace-Step

@bghira
Copy link
Owner Author

bghira commented Jan 19, 2026

this is a reimplementation without the viral license, but yeah, the model is non-commercial. it's for research purposes, but, it sounds really good in Chinese.

@bghira bghira merged commit 1865764 into main Jan 19, 2026
4 checks passed
@bghira bghira deleted the feature/heartmula branch January 19, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants