Hey there, congrats on releasing this great effort!
There are two immediate suggestions that could help you:
- For inference: The current W2V-bert implementation doesn't use Flash Attention 2 and SDPA. Integrating it to
transformers should be relatively easy, building upon the Wav2vec implementation
- For training, if you get rid of this part, you can get easy training capability improvements by adding an adapter (
add_adapter=True)
Hope that helps!
Congrats on the release again