呢個係粵文字幕生成器,輸入音頻文件(.mp3 .wav .webm .flac 等等)輸出.srt 字幕文件。
粵語轉寫用 FunAudioLLM/SenseVoiceSmall,時間點切分用fsmn-vad。如果開啓 BERT 糾正器,用嘅係 hon9kon9ize/bert-large-cantonese。
將本 repo clone 落本地後,跑下面嘅命令嚟安裝依賴,然後下載必需嘅模型:
apt install ffmpeg
pip install -r requirements.txt# 如果淨係用 OpenCC 就可以唔使下載
$ python download_models.py [--with-bert]跟住準備好你需要轉寫嘅音頻文件,如果你想下載 YouTube 片音頻,可以裝 pip install yt-dlp 然後跑下面嘅命令嚟下載
# 呢條命令係單純下載音頻,冇視頻嘅,如果想要下載埋視頻就刪咗個 -f ba 佢
yt-dlp -f ba https://youtu.be/rIBD6A4lnLQ跑下面嘅命令,將你嘅音頻文件轉寫成字幕,默認嘅糾正器係 opencc,如果你想用 bert 糾正器,可以加 --corrector=bert,不過你需要喺第一步先導出 bert 模型,而且會需要更多時間
單獨轉寫一個文件可以直接跑
python cli.py audio.mp3 --output_dir output如果唔特指某個文件而係成個路經,就會自動轉寫晒路經下所有嘅音頻:
# 自動轉寫晒所有 audio/ 入面嘅音頻
python cli.py ./audio/ --output_dir outputThis service API used SenseVoice, VAD and Bert model to generate Cantonese subtitle transcript for audio file.
This is version only support Youtube video URL.
- Download audio file from Youtube video URL
- Use VAD model to split audio file into small audio clips
- Use SenseVoice model to generate Cantonese subtitle transcript and timestamp for each audio clip
- Since the output of SenseVoice model is Simplified Chinese, we use OpenCC to convert it to Traditional Chinese and then use Bert to correct the translation
- Generate SRT file for the Cantonese subtitle transcript
All model are exporting as ONNX format.
- SenseVoice: iic/SenseVoiceSmall(on ModelScope)
- VAD: iic/speech_fsmn_vad_zh-cn-16k-common-pytorch(on ModelScope)
- Bert: hon9kon9ize/bert-large-cantonese
sudo apt install ffmpeg
pip install -r requirements.txtexport models to ONNX format, it would download the model weights and export to ONNX format in models folder, you can add --with-bert to export bert model
$ python download_models.py [--with-bert]You can run the following command to download a YouTube audio. Make sure you have yt-dlp installed by pip install yt-dlp.
# download audio file from youtube video url, if you want to download video as well, remove -f ba
yt-dlp -f ba https://youtu.be/rIBD6run the cli, the default corrector is opencc, you can use bert as corrector by adding --corrector=bert, but you need to export bert model in first step, and it would take more time to process
single file transcription can be run directly
$ python cli.py your_audio.mp3 --output-dir output [--corrector=opencc|bert]or in batch
# Auto transcribe all audio files under the audio/ directory
python cli.py ./audio/ --output_dir outputor run the web API service
$ python app.py