TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

基於音素的 TTS 管線，使用在 LJSpeech 上訓練的 Tacotron2 [Ito and Johnson, 2017] 進行 1,500 個 epoch，以及在 LJSpeech 的 8 位元深度波形上訓練的 WaveRNN 聲碼器 [Ito and Johnson, 2017] 進行 10,000 個 epoch。

文字處理器根據音素編碼輸入文字。它使用 DeepPhonemizer 將字素轉換為音素。模型 (en_us_cmudict_forward) 在 CMUDict 上訓練。

您可以在這裡找到 Tacotron2 的訓練腳本。使用了以下參數：win_length=1100、hop_length=275、n_fft=2048、mel_fmin=40 和 mel_fmax=11025。

您可以在這裡找到 WaveRNN 的訓練腳本。

請參考 torchaudio.pipelines.Tacotron2TTSBundle() 以了解用法。

範例 - “Hello world! T T S stands for Text to Speech!”

範例 - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

文件

教學

資源