Tacotron2TTSBundle¶
- class torchaudio.pipelines.Tacotron2TTSBundle[source]¶
資料類別,捆綁使用預訓練 Tacotron2 和聲碼器相關的資訊。
此類別提供介面,用於實例化預訓練模型,以及檢索預訓練權重和與模型一起使用的額外資料所需的資訊。
Torchaudio 函式庫實例化此類別的物件,每個物件代表不同的預訓練模型。用戶端程式碼應透過這些實例存取預訓練模型。
請參閱下文以瞭解用法和可用值。
- 範例 - 基於字元的 TTS 管線,使用 Tacotron2 和 WaveRNN
>>> import torchaudio >>> >>> text = "Hello, T T S !" >>> bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH >>> >>> # Build processor, Tacotron2 and WaveRNN model >>> processor = bundle.get_text_processor() >>> tacotron2 = bundle.get_tacotron2() Downloading: 100%|███████████████████████████████| 107M/107M [00:01<00:00, 87.9MB/s] >>> vocoder = bundle.get_vocoder() Downloading: 100%|███████████████████████████████| 16.7M/16.7M [00:00<00:00, 78.1MB/s] >>> >>> # Encode text >>> input, lengths = processor(text) >>> >>> # Generate (mel-scale) spectrogram >>> specgram, lengths, _ = tacotron2.infer(input, lengths) >>> >>> # Convert spectrogram to waveform >>> waveforms, lengths = vocoder(specgram, lengths) >>> >>> torchaudio.save('hello-tts.wav', waveforms, vocoder.sample_rate)
- 範例 - 基於音素的 TTS 管線,使用 Tacotron2 和 WaveRNN
>>> >>> # Note: >>> # This bundle uses pre-trained DeepPhonemizer as >>> # the text pre-processor. >>> # Please install deep-phonemizer. >>> # See https://github.com/as-ideas/DeepPhonemizer >>> # The pretrained weight is automatically downloaded. >>> >>> import torchaudio >>> >>> text = "Hello, TTS!" >>> bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH >>> >>> # Build processor, Tacotron2 and WaveRNN model >>> processor = bundle.get_text_processor() Downloading: 100%|███████████████████████████████| 63.6M/63.6M [00:04<00:00, 15.3MB/s] >>> tacotron2 = bundle.get_tacotron2() Downloading: 100%|███████████████████████████████| 107M/107M [00:01<00:00, 87.9MB/s] >>> vocoder = bundle.get_vocoder() Downloading: 100%|███████████████████████████████| 16.7M/16.7M [00:00<00:00, 78.1MB/s] >>> >>> # Encode text >>> input, lengths = processor(text) >>> >>> # Generate (mel-scale) spectrogram >>> specgram, lengths, _ = tacotron2.infer(input, lengths) >>> >>> # Convert spectrogram to waveform >>> waveforms, lengths = vocoder(specgram, lengths) >>> >>> torchaudio.save('hello-tts.wav', waveforms, vocoder.sample_rate)
- 使用
Tacotron2TTSBundle
的教學
方法¶
get_tacotron2¶
- abstract Tacotron2TTSBundle.get_tacotron2(*, dl_kwargs=None) Tacotron2 [source]¶
建立具有預訓練權重的 Tacotron2 模型。
- 參數:
dl_kwargs (關鍵字引數字典) – 傳遞給
torch.hub.load_state_dict_from_url()
。- 傳回:
產生的模型。
- 傳回類型:
get_text_processor¶
- abstract Tacotron2TTSBundle.get_text_processor(*, dl_kwargs=None) TextProcessor [source]¶
建立文字處理器
對於基於字元的管線,此處理器按字元分割輸入文字。對於基於音素的管線,此處理器將輸入文字(字素)轉換為音素。
如果需要預訓練權重檔案,則使用
torch.hub.download_url_to_file()
下載。- 參數:
dl_kwargs (關鍵字引數字典,) – 傳遞給
torch.hub.download_url_to_file()
。- 傳回:
可調用物件,它接受字串或字串列表作為輸入,並傳回編碼文字的 Tensor 和有效長度的 Tensor。該物件還具有
tokens
屬性,允許恢復已標記化的形式。- 傳回類型:
- 範例 - 基於字元
>>> text = [ >>> "Hello World!", >>> "Text-to-speech!", >>> ] >>> bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH >>> processor = bundle.get_text_processor() >>> input, lengths = processor(text) >>> >>> print(input) tensor([[19, 16, 23, 23, 26, 11, 34, 26, 29, 23, 15, 2, 0, 0, 0], [31, 16, 35, 31, 1, 31, 26, 1, 30, 27, 16, 16, 14, 19, 2]], dtype=torch.int32) >>> >>> print(lengths) tensor([12, 15], dtype=torch.int32) >>> >>> print([processor.tokens[i] for i in input[0, :lengths[0]]]) ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!'] >>> print([processor.tokens[i] for i in input[1, :lengths[1]]]) ['t', 'e', 'x', 't', '-', 't', 'o', '-', 's', 'p', 'e', 'e', 'c', 'h', '!']
- 範例 - 基於音素
>>> text = [ >>> "Hello, T T S !", >>> "Text-to-speech!", >>> ] >>> bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH >>> processor = bundle.get_text_processor() Downloading: 100%|███████████████████████████████| 63.6M/63.6M [00:04<00:00, 15.3MB/s] >>> input, lengths = processor(text) >>> >>> print(input) tensor([[54, 20, 65, 69, 11, 92, 44, 65, 38, 2, 0, 0, 0, 0], [81, 40, 64, 79, 81, 1, 81, 20, 1, 79, 77, 59, 37, 2]], dtype=torch.int32) >>> >>> print(lengths) tensor([10, 14], dtype=torch.int32) >>> >>> print([processor.tokens[i] for i in input[0]]) ['HH', 'AH', 'L', 'OW', ' ', 'W', 'ER', 'L', 'D', '!', '_', '_', '_', '_'] >>> print([processor.tokens[i] for i in input[1]]) ['T', 'EH', 'K', 'S', 'T', '-', 'T', 'AH', '-', 'S', 'P', 'IY', 'CH', '!']
get_vocoder¶
- abstract Tacotron2TTSBundle.get_vocoder(*, dl_kwargs=None) Vocoder [source]¶
根據 WaveRNN 或 GriffinLim 建立聲碼器模組。
如果需要預訓練權重檔案,則使用
torch.hub.load_state_dict_from_url()
下載。- 參數:
dl_kwargs (關鍵字引數字典) – 傳遞給
torch.hub.load_state_dict_from_url()
。- 傳回:
聲碼器模組,它接受頻譜圖 Tensor 和可選的長度 Tensor,然後傳回產生的波形 Tensor 和可選的長度 Tensor。
- 傳回類型: