Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model

### System Info

Hello,

Description:

I'm experiencing issues with the Whisper-Large-v3-turbo model when using it for transcription tasks with the Transformers library (version 4.38.3).

Problems:

Incorrect word timestamps: The timestamps generated by the model are not accurate. I've noticed that the timestamps are often incorrect.

<img width="140" alt="Image" src="https://github.com/user-attachments/assets/45042a47-d314-4046-887a-e5ee0d67e8fd" />


Word repetitions: I've also noticed that the model is repeating words in the transcription output. I've tried setting the repetition_penalty to 1.2, which has helped to reduce the repetitions, but the issue is not completely resolved.

Best regards



### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

Load the Whisper-Large-v3-turbo model using the Transformers library (version 4.38.3).
Use the model to transcribe an audio file.
Observe the word timestamps and transcription output.

### Expected behavior

Accurate word timestamps.
No word repetitions in the transcription output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model #37248

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model #37248

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions