Skip to content

Fix Bark failing tests #39478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Fix Bark failing tests #39478

wants to merge 8 commits into from

Conversation

ebezzam
Copy link
Contributor

@ebezzam ebezzam commented Jul 17, 2025

Goal of this PR

There are three failing tests for Bark, see here for modeling ones.


1) tests/models/bark/test_modeling_bark.py::BarkModelIntegrationTests::test_generate_end_to_end_with_args

FAILED tests/models/bark/test_modeling_bark.py::BarkModelIntegrationTests::test_generate_end_to_end_with_args - RuntimeError: shape '[1, 518400]' is invalid for input of size 40192

Fails because the vocab size is wrongly configured. By default it takes this value, but this is the input vocab size for Bark (instead of output vocab size), which causes failure when reshaping here.

So a new condition should be added to extract the correct value.

@gante I see a TODO about standardizing the special cases. Is proposed solution for now?

2) RESOLVED tests/models/bark/test_modeling_bark.py::BarkModelIntegrationTests::test_generate_batching

FAILED tests/models/bark/test_modeling_bark.py::BarkModelIntegrationTests::test_generate_batching - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Related to #34634 and fixed by #38985

3) tests/models/bark/test_processor_bark.py::BarkProcessorTest::test_save_load_pretrained_additional_features

This one is very subtle. Fails on first run with no cached models:

src/transformers/models/bark/processing_bark.py:208: ValueError
---------------------------------- Captured stderr call -------------------------------
HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
Retrying in 1s [Retry 1/5].
HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
Retrying in 2s [Retry 2/5].
HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
Retrying in 4s [Retry 3/5].
HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
Retrying in 8s [Retry 4/5].
HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
Retrying in 8s [Retry 5/5].
HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
----------------------------------- Captured log call -----------------------------------
WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
WARNING  huggingface_hub.utils._http:_http.py:332 Retrying in 1s [Retry 1/5].
WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
WARNING  huggingface_hub.utils._http:_http.py:332 Retrying in 2s [Retry 2/5].
WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
WARNING  huggingface_hub.utils._http:_http.py:332 Retrying in 4s [Retry 3/5].
WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
WARNING  huggingface_hub.utils._http:_http.py:332 Retrying in 8s [Retry 4/5].
WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
WARNING  huggingface_hub.utils._http:_http.py:332 Retrying in 8s [Retry 5/5].
WARNING  huggingface_hub.utils._http:_http.py:315 HTTP Error 429 thrown while requesting HEAD https://huggingface.co/ylacombe/bark-small/resolve/main/speaker_embeddings/pl_speaker_3_semantic_prompt.npy
=============================================================================== short test summary info ===============================================================================
FAILED tests/models/bark/test_processor_bark.py::BarkProcessorTest::test_save_load_pretrained_additional_features - ValueError: `ylacombe/bark-small/speaker_embeddings/pl_speaker_3_semantic_prompt.npy` does not exists

But may pass after multiple tries (maybe because missing files are downloaded).

It can explain issues like OP of #34634 where a voice preset may not be (first) available.

Also this error helped me realize the official checkpoints are still pointing to ylacombe's checkpoints. I opened PRs on the hub to fix this.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ebezzam ebezzam marked this pull request as draft July 18, 2025 07:57
Comment on lines +117 to +119
if speaker_embeddings is not None:
if "repo_or_path" in speaker_embeddings:
speaker_embeddings["repo_or_path"] = pretrained_processor_name_or_path
Copy link
Contributor Author

@ebezzam ebezzam Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because Suno models are badly configured to get speaker embedding from Yoach's checkpoints (see repo_or_path):

So when used from_pretrained, models get speaker embedding from Yoach's checkpoint.

Best is to probably open PRs on the Hub to fix the repo_or_path entry and remove these lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked Suno team to merge these two PRs but still waiting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if Suno doesn't merge, do we still keep pulling from Yoach and remove these lines?

cc @eustlb

Comment on lines +55 to +57
"semantic_prompt": 1, # 1D array of shape (X,)
"coarse_prompt": 2, # 2D array of shape (2,X)
"fine_prompt": 2, # 2D array of shape (8,X)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments for clarity

Comment on lines -156 to -158
for prompt_key in self.speaker_embeddings:
if prompt_key != "repo_or_path":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change is I added a property to easily get available voice presets. Rest is indenting inwards since we don't need prompt_key != "repo_or_path" anymore


So for testing purposes, we will remove the unavailable speaker embeddings before saving.
"""
processor._verify_speaker_embeddings(remove_unavailable=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR need to remove speaker embedding which couldn't download properly. Maybe because too many? (~700)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format nit: let's use single-line comments (#...), and add an HF team member (@eustlb ?) in the TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! and I'm a new HF team member working with @eustlb 😉

@ebezzam ebezzam marked this pull request as ready for review July 18, 2025 11:50
@Rocketknight1
Copy link
Member

cc @eustlb @ydshieh

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the PR 🤗

I'd like the approval from someone from the audio side as well, before merging (perhaps @eustlb ?)


So for testing purposes, we will remove the unavailable speaker embeddings before saving.
"""
processor._verify_speaker_embeddings(remove_unavailable=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format nit: let's use single-line comments (#...), and add an HF team member (@eustlb ?) in the TODO

@ebezzam
Copy link
Contributor Author

ebezzam commented Jul 21, 2025

UPDATE after merging with main, running slow tests leads to two new errors:

# RUN_SLOW=1 pytest tests/models/bark
======================= short test summary info =============================================
FAILED tests/models/bark/test_modeling_bark.py::BarkSemanticModelTest::test_eager_padding_matches_padding_free_with_position_ids - KeyError: 'eager'
FAILED tests/models/bark/test_modeling_bark.py::BarkCoarseModelTest::test_eager_padding_matches_padding_free_with_position_ids - KeyError: 'eager'
======================= 2 failed, 217 passed, 259 skipped, 3 warnings in 1101.26s (0:18:21) =============

probably from #39447? @zucchini-nlp let me know if there is something I can try!

@zucchini-nlp
Copy link
Member

zucchini-nlp commented Jul 21, 2025

probably from #39447? @zucchini-nlp let me know if there is something I can try!

Ah I see, I marked them as slow in purpose and I will be fixing them tomorrow. They are failing in any case for some special model including Bark, so feel free to ignore. I don't think current PR affected this test :)

Edit: submitted a PR in #39582

@ebezzam ebezzam requested a review from eustlb July 22, 2025 13:49
@ebezzam ebezzam added the Audio label Jul 22, 2025
@eustlb
Copy link
Contributor

eustlb commented Jul 24, 2025

run-slow: bark

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/bark']
quantizations: [] ...

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants