HiFiGANVocoder¶

class torchaudio.prototype.models.HiFiGANVocoder(in_channels: int, upsample_rates: Tuple[int, ...], upsample_initial_channel: int, upsample_kernel_sizes: Tuple[int, ...], resblock_kernel_sizes: Tuple[int, ...], resblock_dilation_sizes: Tuple[Tuple[int, ...], ...], resblock_type: int, lrelu_slope: float)[source]¶

HiFi GAN [Kong et al., 2020] 的生成器部分。來源：https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/models.py#L75

注意

若要建置模型，請使用下列其中一個工廠函數： hifigan_vocoder()、hifigan_vocoder_v1()、hifigan_vocoder_v2()、hifigan_vocoder_v3()。

參數:

in_channels (int) – 輸入特徵中的通道數量。
upsample_rates (tuple of int) – 每個升採樣層增加時間維度的倍數。
upsample_initial_channel (int) – 輸入特徵張量中的通道數量。
upsample_kernel_sizes (tuple of int) – 每個升採樣層的核心大小。
resblock_kernel_sizes (tuple of int) – 每個殘差區塊的核心大小。
resblock_dilation_sizes (tuple of tuples of int) – 每個殘差區塊中每個 1D 卷積層的擴張大小。對於 resblock type 1，內部元組的長度應為 3，因為每層中有 3 個卷積。對於 resblock type 2，它們的長度應為 2。
resblock_type (int, 1 或 2) – 決定將使用 ResBlock1 還是 ResBlock2。
lrelu_slope (float) – 啟動函數中 leaky ReLU 的斜率。

方法¶

forward¶

HiFiGANVocoder.forward(x: Tensor) → Tensor[source]¶

參數:: x (Tensor) – 形狀為 (batch_size, num_channels, time_length) 的特徵輸入張量。
傳回:: 形狀為 (batch_size, 1, time_length * upsample_rate) 的張量，其中 upsample_rate 是所有層升採樣率的乘積。

工廠函數¶

`hifigan_vocoder`	建置 HiFi GAN Vocoder [Kong et al., 2020]。
`hifigan_vocoder_v1`	建置具有 V1 架構的 HiFiGAN Vocoder [Kong et al., 2020]。
`hifigan_vocoder_v2`	建置具有 V2 架構的 HiFiGAN Vocoder [Kong et al., 2020]。
`hifigan_vocoder_v3`	建置具有 V3 架構的 HiFiGAN Vocoder [Kong et al., 2020]。

HiFiGANVocoder¶

方法¶

forward¶

工廠函數¶

文件

教學

資源