Implement FastSpeech2 by Yugo0000999921 · Pull Request #1744 · ailia-ai/ailia-models

Yugo0000999921 · 2025-11-27T05:04:07Z

#1735

Yugo0000999921 · 2025-12-15T08:53:23Z

textなど必要なファイルのコミットを行いました。
推論は単一話者のみ成功しています。

Yugo0000999921 · 2025-12-18T08:08:10Z

複数話者に対しても推論が成功しました。

kyakuno · 2025-12-25T04:21:33Z

モデルをアップロードしました。
https://storage.googleapis.com/ailia-models/fastspeech2/aishell3.onnx.prototxt

kyakuno · 2025-12-25T07:21:06Z

コード的には単機能でシンプルにした方が良いと考えていまして、#1764 をマージ後に、Batch SynthesisのREADME.mdからの削除と該当コードの削除をお願いできればと思います。

fastspeech2をailia SDKで動作するように修正

Yugo0000999921

approve changes

kyakuno · 2025-12-29T11:24:40Z

@Yugo0000999921 下記のコマンドで音声合成した場合、音声の末尾にノイズが入りました。
output.wav

$ python3 fastspeech2.py \
  --text "Hello, I am speaking from a multi-speaker model." \
  --preprocess_config config/LibriTTS/preprocess.yaml \
  --onnx_fs2 libritts.onnx \
  --speaker_id 0

Torchの場合はactual_mel_lenで処理していますが、ONNXをFixed Shapeでエクスポートしてしまっているために、パディングでノイズが発生していると考えています。

    # HiFi-GANのONNXモデルは固定長（3000フレーム）を期待しているため、パディングが必要
    # ただし、元のリポジトリの処理に近づけるため、シンプルなパディングを使用
    HIFI_FIXED_LENGTH = 3000
    hop_length = preprocess_config["preprocessing"]["stft"]["hop_length"]
    actual_mel_len = mel_input.shape[2]

    if actual_mel_len < HIFI_FIXED_LENGTH:
        # 元のリポジトリに近い処理：最後のフレームを繰り返してパディング
        pad_length = HIFI_FIXED_LENGTH - actual_mel_len
        last_frame = mel_input[:, :, -1:]
        padding = np.repeat(last_frame, pad_length, axis=2)
        mel_input = np.concatenate([mel_input, padding], axis=2)
        logger.info(f"Padded mel_input from {actual_mel_len} to {HIFI_FIXED_LENGTH} frames")
    elif actual_mel_len > HIFI_FIXED_LENGTH:
        # 3000フレームを超える場合は切り詰め
        mel_input = mel_input[:, :, :HIFI_FIXED_LENGTH]
        actual_mel_len = HIFI_FIXED_LENGTH
        logger.info(f"Truncated mel_input from {actual_mel_len} to {HIFI_FIXED_LENGTH} frames")

そこで、ONNXをDynamic Shapeでエクスポートして、このノイズを抑制することは可能でしょうか？

また、ONNXで推論した場合の音声ファイルと、Torchで推論した場合の音声ファイルを聞いてみて、概ね一致しているかを確認いただければと思います。

- Remove fixed-length padding, use actual sequence length - Use ailia.Net(None, onnx_path) without prototxt - Use onnx.load() to get input/output names - Add preprocess_text with auto language detection - Remove HiFi-GAN fixed-length padding - Add mel spectrogram plot saving Co-authored-by: Cursor <cursoragent@cursor.com>

feat(fastspeech2): Add FastSpeech2 + HiFi-GAN TTS sample

929f03d

Yugo0000999921 requested a review from kyakuno November 27, 2025 05:04

commit new file

4a8029d

fix

c64c3ed

kyakuno added 2 commits December 25, 2025 13:14

Merge branch 'master' into feature/add-fastspeech2

706476d

Set remote path, Fix input file no found error

10befa1

kyakuno added 4 commits December 25, 2025 15:33

Use ailia API

4650f8a

Fix speaker id

8ad89ee

Add model to list

9776469

Update usage

d8c87d5

Merge pull request #1764 from axinc-ai/feature/ailia-fastspeech2

60899fb

fastspeech2をailia SDKで動作するように修正

Yugo0000999921 commented Dec 26, 2025

View reviewed changes

Yugo0000999921 and others added 3 commits December 26, 2025 13:45

Do refactoring. Remove batch synthesize

b1d7e98

Merge branch 'master' into feature/add-fastspeech2

d4edaa0

Update usage

26976b1

kyakuno changed the title ~~feat(fastspeech2): Add FastSpeech2 + HiFi-GAN TTS sample~~ Implement FastSpeech2 Dec 29, 2025

kyakuno added the waiting_enhancement label Dec 30, 2025

Yugo0000999921 and others added 7 commits February 23, 2026 17:03

commit gitignore

5d88ecc

fix

94b4202

fix

c16cca9

fix and add visualization tools

ab86297

fix

23c211e

fix inference files and README.

b32e08b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Implement FastSpeech2 #1744

Implement FastSpeech2 #1744
Yugo0000999921 wants to merge 20 commits intomasterfrom
feature/add-fastspeech2

Yugo0000999921 commented Nov 27, 2025 •

edited by kyakuno

Loading

Uh oh!

Yugo0000999921 commented Dec 15, 2025

Uh oh!

Yugo0000999921 commented Dec 18, 2025

Uh oh!

kyakuno commented Dec 25, 2025

Uh oh!

kyakuno commented Dec 25, 2025

Uh oh!

Yugo0000999921 left a comment

Uh oh!

kyakuno commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Yugo0000999921 commented Nov 27, 2025 • edited by kyakuno Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yugo0000999921 commented Dec 15, 2025

Uh oh!

Yugo0000999921 commented Dec 18, 2025

Uh oh!

kyakuno commented Dec 25, 2025

Uh oh!

kyakuno commented Dec 25, 2025

Uh oh!

Yugo0000999921 left a comment

Choose a reason for hiding this comment

Uh oh!

kyakuno commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yugo0000999921 commented Nov 27, 2025 •

edited by kyakuno

Loading