Some suggestions on training and inference

Hey there, congrats on releasing this great effort!

There are two immediate suggestions that could help you:
* For inference: The current [W2V-bert implementation](https://github.com/huggingface/transformers/blob/b673c16cad81c71f70903a9a63f5b5f06014aa9e/src/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py#L449) doesn't use Flash Attention 2 and SDPA. Integrating it to `transformers` should be relatively easy, building upon the [Wav2vec implementation](https://github.com/huggingface/transformers/blob/9d2056f12b66e64978f78a2dcb023f65b2be2108/src/transformers/models/wav2vec2/modeling_wav2vec2.py#L1699)
* For training, if you get rid of [this part](https://github.com/huggingface/transformers/blob/b673c16cad81c71f70903a9a63f5b5f06014aa9e/src/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py#L1292-L1295), you can get easy training capability improvements by adding an adapter (`add_adapter=True`) 

Hope that helps!
Congrats on the release again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some suggestions on training and inference #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some suggestions on training and inference #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions