torchaudio.sox_effects.apply_effects_file¶
- torchaudio.sox_effects.apply_effects_file(path: str, effects: List[List[str]], normalize: bool = True, channels_first: bool = True, format: Optional[str] = None) Tuple[Tensor, int] [原始碼]¶
將 sox 效果套用至音訊檔案,並將結果資料載入為 Tensor
注意
此函數的工作方式與
sox
命令非常相似,但仍有些許差異。例如,sox
命令會自動新增某些效果 (例如speed
、pitch
等效果後的rate
效果),但此函數僅套用給定的效果。因此,若要實際套用speed
效果,您也需要提供具有所需取樣率的rate
效果,因為在內部,speed
效果僅變更取樣率,而不會更動樣本。- 參數:
path (path-like object) – 音訊資料來源。
effects (List[List[str]]) – 效果清單。
normalize (bool, optional) –
當
True
時,此函數會將原生樣本類型轉換為float32
。預設值:True
。如果輸入檔案是整數 WAV,則給定
False
會將結果 Tensor 類型變更為整數類型。此引數對於整數 WAV 類型以外的格式無效。channels_first (bool, optional) – 若為 True,則傳回的 Tensor 維度為 [channel, time]。否則,傳回的 Tensor 維度為 [time, channel]。
format (str 或 None, optional) – 使用給定格式覆寫格式偵測。當 libsox 無法從標頭或副檔名推斷格式時,提供此引數可能會有所幫助,
- 回傳:
結果 Tensor 和取樣率。如果
normalize=True
,則結果 Tensor 一律為float32
類型。如果normalize=False
且輸入音訊檔案是整數 WAV 檔案,則結果 Tensor 具有對應的整數類型。(請注意,不支援 24 位元整數類型) 如果channels_first=True
,則結果 Tensor 維度為 [channel, time],否則為 [time, channel]。- 回傳類型:
(Tensor, int)
- 範例 - 基本用法
>>> >>> # Defines the effects to apply >>> effects = [ ... ['gain', '-n'], # normalises to 0dB ... ['pitch', '5'], # 5 cent pitch shift ... ['rate', '8000'], # resample to 8000 Hz ... ] >>> >>> # Apply effects and load data with channels_first=True >>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True) >>> >>> # Check the result >>> waveform.shape torch.Size([2, 8000]) >>> waveform tensor([[ 5.1151e-03, 1.8073e-02, 2.2188e-02, ..., 1.0431e-07, -1.4761e-07, 1.8114e-07], [-2.6924e-03, 2.1860e-03, 1.0650e-02, ..., 6.4122e-07, -5.6159e-07, 4.8103e-07]]) >>> sample_rate 8000
- 範例 - 對資料集套用隨機速度擾動
>>> >>> # Load data from file, apply random speed perturbation >>> class RandomPerturbationFile(torch.utils.data.Dataset): ... """Given flist, apply random speed perturbation ... ... Suppose all the input files are at least one second long. ... """ ... def __init__(self, flist: List[str], sample_rate: int): ... super().__init__() ... self.flist = flist ... self.sample_rate = sample_rate ... ... def __getitem__(self, index): ... speed = 0.5 + 1.5 * random.randn() ... effects = [ ... ['gain', '-n', '-10'], # apply 10 db attenuation ... ['remix', '-'], # merge all the channels ... ['speed', f'{speed:.5f}'], # duration is now 0.5 ~ 2.0 seconds. ... ['rate', f'{self.sample_rate}'], ... ['pad', '0', '1.5'], # add 1.5 seconds silence at the end ... ['trim', '0', '2'], # get the first 2 seconds ... ] ... waveform, _ = torchaudio.sox_effects.apply_effects_file( ... self.flist[index], effects) ... return waveform ... ... def __len__(self): ... return len(self.flist) ... >>> dataset = RandomPerturbationFile(file_list, sample_rate=8000) >>> loader = torch.utils.data.DataLoader(dataset, batch_size=32) >>> for batch in loader: >>> pass
- 使用
apply_effects_file
的教學