-
Notifications
You must be signed in to change notification settings - Fork 143
Open
Description
🎯 Goal
- To have a realtime Vietnamese - English transaltion demo
- Current demo state: v0.2
Backlog
- ASR Quality and Speed (do some hacking like @new5558 did with existing checkpoint or finetune new model with better data?)
- Small bench voice for ASR for Vietnamese, Thai (low resource language), as suggested by @Yip-Jia-Qi in real life environment (coffeeshop, home, record by laptop mic, webcam mic, phone mic,...)
- VAD @thinhlpg or streaming @new5558 (I'm following the VAD path, the problem is that VAD sometime suck and missing some speech at the start of the speech)
- Serving model for optimial latency, currently took 6s from speech start to TTS (the demo wasn't optimized at all, lot of room for improvement). For context, Google speech translate demo 4 samples and all took exact 2 secs as I inspect manually https://youtu.be/hyXqcsWOONo?feature=shared)

- Traning experiemnts for translation: better data, hyperparameters?
How good is the commercial baseline for ASR?
Know your enemy - we want to beat the commercial ones, not just opensource
- Test Zalo Kiki (the ASR part).
- Test other popular options
=> Their reponse time, their output quality?
Reactions are currently unavailable
