Hi Mr Khanh, I tried to fine-tune the wav2vec2 model with the code you provided. I set up the same dataset structure as yours, just fix line number 18 dataset;
labels_batch = self.processor.tokenizer(transcripts, padding="longest", return_tensors="pt")
Then proceed to fine-tune, my data has about 2 hours of Vietnamese audio. However, after training with quite a few epochs, the loss is still very high and the wer on the val set does not change (1.00). What do I need to check and fine-tune to achieve good results in this task? Thank you very much

Hi Mr Khanh, I tried to fine-tune the wav2vec2 model with the code you provided. I set up the same dataset structure as yours, just fix line number 18 dataset;
labels_batch = self.processor.tokenizer(transcripts, padding="longest", return_tensors="pt")
Then proceed to fine-tune, my data has about 2 hours of Vietnamese audio. However, after training with quite a few epochs, the loss is still very high and the wer on the val set does not change (1.00). What do I need to check and fine-tune to achieve good results in this task? Thank you very much