I'm currently chunking up audio so that each chunk is a manageable duration using a BPE ASR model but I noticed there's the potential for a multi-subword word to be split across audio chunks and consequently, the subwords can get treated as separate words.
The pyctcdecode does the merging of subwords automagically so I'm wondering if there's a way to handle this edge case?
I'm currently chunking up audio so that each chunk is a manageable duration using a BPE ASR model but I noticed there's the potential for a multi-subword word to be split across audio chunks and consequently, the subwords can get treated as separate words.
The pyctcdecode does the merging of subwords automagically so I'm wondering if there's a way to handle this edge case?