HDemucs¶

class torchaudio.models.HDemucs(sources: List[str], audio_channels: int = 2, channels: int = 48, growth: int = 2, nfft: int = 4096, depth: int = 6, freq_emb: float = 0.2, emb_scale: int = 10, emb_smooth: bool = True, kernel_size: int = 8, time_stride: int = 2, stride: int = 4, context: int = 1, context_enc: int = 0, norm_starts: int = 4, norm_groups: int = 4, dconv_depth: int = 2, dconv_comp: int = 4, dconv_attn: int = 4, dconv_lstm: int = 4, dconv_init: float = 0.0001)[原始碼]¶

來自混合頻譜圖和波形源分離的混合 Demucs 模型 [Défossez, 2021]。

另請參閱

torchaudio.pipelines.SourceSeparationBundle：具有預訓練模型的源分離管線。

參數:

sources (List[str]) – 源名稱列表。列表可以包含以下源選項：["bass", "drums", "other", "mixture", "vocals"]。
audio_channels (int, optional) – 輸入/輸出音訊通道。（預設值：2）
channels (int, optional) – 初始隱藏通道數。（預設值：48）
growth (int, optional) – 每層將隱藏通道數增加此係數。（預設值：2）
nfft (int, optional) – FFT bin 數。請注意，更改此值需要仔細計算各種形狀參數，並且對於混合模型而言無法直接使用。（預設值：4096）
depth (int, optional) – 編碼器和解碼器中的層數（預設值：6）
freq_emb (float, optional) – 如果 > 0，則在第一個頻率層之後新增頻率嵌入，實際值控制嵌入的權重。（預設值：0.2）
emb_scale (int, optional) – 相當於縮放嵌入學習率（預設值：10）
emb_smooth (bool, optional) – 使用平滑嵌入（相對於頻率）初始化嵌入。（預設值：True）
kernel_size (int, optional) – 編碼器和解碼器層的 kernel_size。（預設值：8）
time_stride (int, optional) – 合併後最終時間層的步幅。（預設值：2）
stride (int, optional) – 編碼器和解碼器層的步幅。（預設值：4）
context (int, optional) – 解碼器中 1x1 卷積的上下文。（預設值：4）
context_enc (int, optional) – 編碼器中 1x1 卷積的上下文。（預設值：0）
norm_starts (int, optional) – 開始使用群組範數的層。解碼器層以相反的順序編號。（預設值：4）
norm_groups (int, optional) – 群組範數的群組數。（預設值：4）
dconv_depth (int, optional) – 殘差 DConv 分支的深度。（預設值：2）
dconv_comp (int, optional) – DConv 分支的壓縮。（預設值：4）
dconv_attn (int, optional) – 在此層開始的 DConv 分支中新增注意力層。（預設值：4）
dconv_lstm (int, optional) – 在此層開始的 DConv 分支中新增 LSTM 層。（預設值：4）
dconv_init (float, optional) – DConv 分支 LayerScale 的初始比例。（預設值：1e-4）

使用 HDemucs 的教學: 使用 Hybrid Demucs 的音樂源分離

使用 Hybrid Demucs 的音樂源分離

方法¶

forward¶

HDemucs.forward(input: Tensor)[原始碼]¶

HDemucs 前向呼叫

參數:

input (torch.Tensor) – 形狀為 (batch_size, channel, num_frames) 的輸入混合張量

回傳:

Tensor: 輸出張量拆分為形狀為 (batch_size, num_sources, channel, num_frames) 的源

工廠函數¶

`hdemucs_low`	建構 `HDemucs` 的低 nfft (1024) 版本，適用於 8 kHz 左右的取樣率。
`hdemucs_medium`	建構 `HDemucs` 的中 nfft (2048) 版本，適用於 16-32 kHz 的取樣率。
`hdemucs_high`	建構 `HDemucs` 的中 nfft (4096) 版本，適用於 44.1-48 kHz 的取樣率。

HDemucs¶

方法¶

forward¶

工廠函數¶

文件

教學

資源