Skip to content

Vietnamese Translation #226

@thinhlpg

Description

@thinhlpg

🎯 Goal

  • To have a realtime Vietnamese - English transaltion demo
  • Current demo state: v0.2

Image

Backlog

  • ASR Quality and Speed (do some hacking like @new5558 did with existing checkpoint or finetune new model with better data?)
  • Small bench voice for ASR for Vietnamese, Thai (low resource language), as suggested by @Yip-Jia-Qi in real life environment (coffeeshop, home, record by laptop mic, webcam mic, phone mic,...)
  • VAD @thinhlpg or streaming @new5558 (I'm following the VAD path, the problem is that VAD sometime suck and missing some speech at the start of the speech)
  • Serving model for optimial latency, currently took 6s from speech start to TTS (the demo wasn't optimized at all, lot of room for improvement). For context, Google speech translate demo 4 samples and all took exact 2 secs as I inspect manually https://youtu.be/hyXqcsWOONo?feature=shared) Image
  • Traning experiemnts for translation: better data, hyperparameters?

How good is the commercial baseline for ASR?

Know your enemy - we want to beat the commercial ones, not just opensource

  • Test Zalo Kiki (the ASR part).
  • Test other popular options
    => Their reponse time, their output quality?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions