-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Open
Description
System Info
Platform: Initially discovered on Nvidia. Can be reproduced on CPU and in Google Colab (see attached gist).
transformers
version: 4.53.2- Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.33.4
- Safetensors version: 0.5.3
- Accelerate version: 1.8.1
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes and No.
- GPU type: NVIDIA GeForce RTX 3090
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Sometime between At #34135, the transformers==4.46.3
and transfomers==4.53.2 (latest as of now)
,return_language
argument for pipeline stopped working. The ending timestamp for the last word is also missing.
Example (exported from Google Colab): https://gist.github.com/Metric-Void/ce2b9fe2faed0cdf6e5fd328599fd4c7
Code for testing:
import torch
from transformers import pipeline
from transformers.configuration_utils import PretrainedConfig
pipeline = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-tiny",
torch_dtype=torch.float16,
config=PretrainedConfig(
attn_implementation="flash_attention_2"
)
)
result = pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac", return_language=True, return_timestamps='word')
result["chunks"]
Before (transformers==4.46.3
):
[{'text': ' I', 'timestamp': (1.04, 1.36), 'language': 'english'},
{'text': ' have', 'timestamp': (1.36, 1.68), 'language': 'english'},
{'text': ' a', 'timestamp': (1.68, 1.94), 'language': 'english'},
{'text': ' dream.', 'timestamp': (1.94, 3.82), 'language': 'english'},
{'text': ' Good', 'timestamp': (3.82, 3.98), 'language': 'english'},
{'text': ' one', 'timestamp': (3.98, 4.16), 'language': 'english'},
{'text': ' day.', 'timestamp': (4.16, 6.4), 'language': 'english'},
{'text': ' This', 'timestamp': (6.4, 6.58), 'language': 'english'},
{'text': ' nation', 'timestamp': (6.58, 7.24), 'language': 'english'},
{'text': ' will', 'timestamp': (7.24, 7.82), 'language': 'english'},
{'text': ' rise', 'timestamp': (7.82, 8.3), 'language': 'english'},
{'text': ' up.', 'timestamp': (8.3, 10.3), 'language': 'english'},
{'text': ' Live', 'timestamp': (10.3, 10.56), 'language': 'english'},
{'text': ' out', 'timestamp': (10.56, 10.98), 'language': 'english'},
{'text': ' the', 'timestamp': (10.98, 11.02), 'language': 'english'},
{'text': ' true', 'timestamp': (11.02, 11.3), 'language': 'english'},
{'text': ' meaning', 'timestamp': (11.3, 11.6), 'language': 'english'},
{'text': ' of', 'timestamp': (11.6, 11.86), 'language': 'english'},
{'text': ' its', 'timestamp': (11.86, 12.08), 'language': 'english'},
{'text': ' dream.', 'timestamp': (12.08, 12.98), 'language': 'english'}]
After (transfomers==4.53.2
):
[{'text': ' I', 'timestamp': (1.04, 1.36), 'language': None},
{'text': ' have', 'timestamp': (1.36, 1.68), 'language': None},
{'text': ' a', 'timestamp': (1.68, 1.94), 'language': None},
{'text': ' dream.', 'timestamp': (1.94, 3.82), 'language': None},
{'text': ' But', 'timestamp': (3.82, 3.96), 'language': None},
{'text': ' one', 'timestamp': (3.96, 4.18), 'language': None},
{'text': ' day,', 'timestamp': (4.18, 6.22), 'language': None},
{'text': ' this', 'timestamp': (6.22, 6.58), 'language': None},
{'text': ' nation', 'timestamp': (6.58, 7.22), 'language': None},
{'text': ' will', 'timestamp': (7.22, 7.82), 'language': None},
{'text': ' rise', 'timestamp': (7.82, 8.3), 'language': None},
{'text': ' up,', 'timestamp': (8.3, 10.2), 'language': None},
{'text': ' live', 'timestamp': (10.2, 10.56), 'language': None},
{'text': ' out', 'timestamp': (10.56, 10.98), 'language': None},
{'text': ' the', 'timestamp': (10.98, 11.02), 'language': None},
{'text': ' true', 'timestamp': (11.02, 11.3), 'language': None},
{'text': ' meaning', 'timestamp': (11.3, 11.6), 'language': None},
{'text': ' of', 'timestamp': (11.6, 11.86), 'language': None},
{'text': ' its', 'timestamp': (11.86, 12.08), 'language': None},
{'text': ' dream.', 'timestamp': (12.08, None), 'language': None}]
Expected behavior
The old behaviour was correct.
Maybe related: #21311, #21427, #25138, #27604, #29520, #31572