Hi All,
I am working on pyctcdecode integration with Nemo ASR models. It works very well (without errors) for pre-trained nemo models like "stt_en_conformer_ctc_small" in below code snippet:
import nemo.collections.asr as nemo_asr
myFile=['sample-in-Speaker_1-11.wav']
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained( model_name='stt_en_conformer_ctc_small')
logits = asr_model.transcribe(myFile, logprobs=True)[0]
print((logits.shape, len(asr_model.decoder.vocabulary)))
decoder = build_ctcdecoder(asr_model.decoder.vocabulary)
decoder.decode(logits)
The same code snippet fails, if I use a fine-tuned nemo model in place of pretrained model. The error says, "ValueError: Input logits shape is (36, 513), but vocabulary is size 512. Need logits of shape: (time, vocabulary)"
The fine-tuned model is loaded as below:
asr_model = nemo_asr.models.EncDecCTCModelBPE.restore_from(restore_path="<path to fine-tuned model>")
Pls suggest @gkucsko @lopez86 . Thanks
Hi All,
I am working on pyctcdecode integration with Nemo ASR models. It works very well (without errors) for pre-trained nemo models like "stt_en_conformer_ctc_small" in below code snippet:
import nemo.collections.asr as nemo_asr
myFile=['sample-in-Speaker_1-11.wav']
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained( model_name='stt_en_conformer_ctc_small')
logits = asr_model.transcribe(myFile, logprobs=True)[0]
print((logits.shape, len(asr_model.decoder.vocabulary)))
decoder = build_ctcdecoder(asr_model.decoder.vocabulary)
decoder.decode(logits)
The same code snippet fails, if I use a fine-tuned nemo model in place of pretrained model. The error says, "ValueError: Input logits shape is (36, 513), but vocabulary is size 512. Need logits of shape: (time, vocabulary)"
The fine-tuned model is loaded as below:
asr_model = nemo_asr.models.EncDecCTCModelBPE.restore_from(restore_path="<path to fine-tuned model>")
Pls suggest @gkucsko @lopez86 . Thanks