Skip to content

Whisper return_language with pipeline no longer working #39404

@Metric-Void

Description

@Metric-Void

System Info

Platform: Initially discovered on Nvidia. Can be reproduced on CPU and in Google Colab (see attached gist).

  • transformers version: 4.53.2
  • Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 0.33.4
  • Safetensors version: 0.5.3
  • Accelerate version: 1.8.1
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes and No.
  • GPU type: NVIDIA GeForce RTX 3090

Who can help?

@eustlb @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Sometime between transformers==4.46.3 and transfomers==4.53.2 (latest as of now), At #34135, the return_language argument for pipeline stopped working. The ending timestamp for the last word is also missing.

Example (exported from Google Colab): https://gist.github.com/Metric-Void/ce2b9fe2faed0cdf6e5fd328599fd4c7

Code for testing:

import torch
from transformers import pipeline
from transformers.configuration_utils import PretrainedConfig

pipeline = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-tiny",
    torch_dtype=torch.float16,
    config=PretrainedConfig(
      attn_implementation="flash_attention_2"
    )
)
result = pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac", return_language=True, return_timestamps='word')

result["chunks"]

Before (transformers==4.46.3):

[{'text': ' I', 'timestamp': (1.04, 1.36), 'language': 'english'},
 {'text': ' have', 'timestamp': (1.36, 1.68), 'language': 'english'},
 {'text': ' a', 'timestamp': (1.68, 1.94), 'language': 'english'},
 {'text': ' dream.', 'timestamp': (1.94, 3.82), 'language': 'english'},
 {'text': ' Good', 'timestamp': (3.82, 3.98), 'language': 'english'},
 {'text': ' one', 'timestamp': (3.98, 4.16), 'language': 'english'},
 {'text': ' day.', 'timestamp': (4.16, 6.4), 'language': 'english'},
 {'text': ' This', 'timestamp': (6.4, 6.58), 'language': 'english'},
 {'text': ' nation', 'timestamp': (6.58, 7.24), 'language': 'english'},
 {'text': ' will', 'timestamp': (7.24, 7.82), 'language': 'english'},
 {'text': ' rise', 'timestamp': (7.82, 8.3), 'language': 'english'},
 {'text': ' up.', 'timestamp': (8.3, 10.3), 'language': 'english'},
 {'text': ' Live', 'timestamp': (10.3, 10.56), 'language': 'english'},
 {'text': ' out', 'timestamp': (10.56, 10.98), 'language': 'english'},
 {'text': ' the', 'timestamp': (10.98, 11.02), 'language': 'english'},
 {'text': ' true', 'timestamp': (11.02, 11.3), 'language': 'english'},
 {'text': ' meaning', 'timestamp': (11.3, 11.6), 'language': 'english'},
 {'text': ' of', 'timestamp': (11.6, 11.86), 'language': 'english'},
 {'text': ' its', 'timestamp': (11.86, 12.08), 'language': 'english'},
 {'text': ' dream.', 'timestamp': (12.08, 12.98), 'language': 'english'}]

After (transfomers==4.53.2):

[{'text': ' I', 'timestamp': (1.04, 1.36), 'language': None},
 {'text': ' have', 'timestamp': (1.36, 1.68), 'language': None},
 {'text': ' a', 'timestamp': (1.68, 1.94), 'language': None},
 {'text': ' dream.', 'timestamp': (1.94, 3.82), 'language': None},
 {'text': ' But', 'timestamp': (3.82, 3.96), 'language': None},
 {'text': ' one', 'timestamp': (3.96, 4.18), 'language': None},
 {'text': ' day,', 'timestamp': (4.18, 6.22), 'language': None},
 {'text': ' this', 'timestamp': (6.22, 6.58), 'language': None},
 {'text': ' nation', 'timestamp': (6.58, 7.22), 'language': None},
 {'text': ' will', 'timestamp': (7.22, 7.82), 'language': None},
 {'text': ' rise', 'timestamp': (7.82, 8.3), 'language': None},
 {'text': ' up,', 'timestamp': (8.3, 10.2), 'language': None},
 {'text': ' live', 'timestamp': (10.2, 10.56), 'language': None},
 {'text': ' out', 'timestamp': (10.56, 10.98), 'language': None},
 {'text': ' the', 'timestamp': (10.98, 11.02), 'language': None},
 {'text': ' true', 'timestamp': (11.02, 11.3), 'language': None},
 {'text': ' meaning', 'timestamp': (11.3, 11.6), 'language': None},
 {'text': ' of', 'timestamp': (11.6, 11.86), 'language': None},
 {'text': ' its', 'timestamp': (11.86, 12.08), 'language': None},
 {'text': ' dream.', 'timestamp': (12.08, None), 'language': None}]

Expected behavior

The old behaviour was correct.

Maybe related: #21311, #21427, #25138, #27604, #29520, #31572

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions