注意
點擊這裡下載完整的範例程式碼
AudioEffector 用法¶
作者: Moto Hira
本教學展示如何使用 torchaudio.io.AudioEffector
將各種效果和編解碼器應用於波形張量。
注意
本教學需要 FFmpeg 函式庫。請參考 FFmpeg 依賴性 以了解詳細資訊。
概述¶
AudioEffector
結合了由 StreamWriter
和 StreamReader
提供的記憶體內編碼、解碼和濾波。
下圖說明了此過程。
data:image/s3,"s3://crabby-images/9c0eb/9c0eb4e200450048c5d79c722fb4e4e6cf90dce5" alt="https://download.pytorch.org/torchaudio/tutorial-assets/AudioEffector.png"
import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
2.6.0
2.6.0
from torchaudio.io import AudioEffector, CodecConfig
import matplotlib.pyplot as plt
from IPython.display import Audio
libavcodec (60, 3, 100)
libavdevice (60, 1, 100)
libavfilter (9, 3, 100)
libavformat (60, 3, 100)
libavutil (58, 2, 100)
用法¶
要使用 AudioEffector
,請使用 effect
和 format
實例化它,然後將波形傳遞給 apply()
或 stream()
方法。
effector = AudioEffector(effect=..., format=...,)
# Apply at once
applied = effector.apply(waveform, sample_rate)
apply
方法一次將效果和編解碼器應用於整個波形。因此,如果輸入波形很長,並且記憶體消耗是一個問題,可以使用 stream
方法逐塊處理。
# Apply chunk by chunk
for applied_chunk = effector.stream(waveform, sample_rate):
...
範例¶
圖片集¶
def show(effect, *, stereo=False):
wf = torch.cat([waveform] * 2, dim=1) if stereo else waveform
figsize = (6.4, 2.1 if stereo else 1.2)
effector = AudioEffector(effect=effect, pad_end=False)
result = effector.apply(wf, int(sr))
num_channels = result.size(1)
f, ax = plt.subplots(num_channels, 1, squeeze=False, figsize=figsize, sharex=True)
for i in range(num_channels):
ax[i][0].specgram(result[:, i], Fs=sr)
f.set_tight_layout(True)
return Audio(result.numpy().T, rate=sr)
原始¶
show(effect=None)
data:image/s3,"s3://crabby-images/87354/87354798aae1dc4bed015d18097b727d4927c585" alt="effector tutorial"
效果¶
tempo¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#atempo
show("atempo=0.7")
data:image/s3,"s3://crabby-images/478b0/478b00ec7a263d3ba4cea6e819a209701a488cd1" alt="effector tutorial"
show("atempo=1.8")
data:image/s3,"s3://crabby-images/66e8b/66e8ba9aca8b44d4a041986d3842c29f8f867add" alt="effector tutorial"
highpass¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#highpass
show("highpass=frequency=1500")
data:image/s3,"s3://crabby-images/52253/5225309eec5241ebdb61445a6693681cdb6c1ef2" alt="effector tutorial"
lowpass¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#lowpass
show("lowpass=frequency=1000")
data:image/s3,"s3://crabby-images/6d59b/6d59b1b46f1d67df8ef2b49a9e5ce201436ae496" alt="effector tutorial"
allpass¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#allpass
show("allpass")
data:image/s3,"s3://crabby-images/71ab3/71ab3dd126f29aba0bed3da0ab22758d13c21c4b" alt="effector tutorial"
帶通濾波器¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#bandpass
show("bandpass=frequency=3000")
data:image/s3,"s3://crabby-images/cd08b/cd08bc99674ce15cb897f0042fbc8e83d8a5fc3f" alt="effector tutorial"
帶拒濾波器¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#bandreject
show("bandreject=frequency=3000")
data:image/s3,"s3://crabby-images/c6d8f/c6d8f8fe056487d70fa95d110eb5248cd6d73052" alt="effector tutorial"
回聲¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#aecho
show("aecho=in_gain=0.8:out_gain=0.88:delays=6:decays=0.4")
data:image/s3,"s3://crabby-images/b4404/b4404fcf6ff3cde05b7b639735a4ebb9372ae2b6" alt="effector tutorial"
show("aecho=in_gain=0.8:out_gain=0.88:delays=60:decays=0.4")
data:image/s3,"s3://crabby-images/bf255/bf255731364455d5791eec4294829d9d1f96d1fc" alt="effector tutorial"
show("aecho=in_gain=0.8:out_gain=0.9:delays=1000:decays=0.3")
data:image/s3,"s3://crabby-images/07f6c/07f6c477e1ce738855623db2107a263c9737bc94" alt="effector tutorial"
合唱¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#chorus
show("chorus=0.5:0.9:50|60|40:0.4|0.32|0.3:0.25|0.4|0.3:2|2.3|1.3")
data:image/s3,"s3://crabby-images/99b64/99b643aaa3cd15b13f5c79ecb155d73f9b7cb069" alt="effector tutorial"
FFT 濾波器¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#afftfilt
# fmt: off
show(
"afftfilt="
"real='re * (1-clip(b * (b/nb), 0, 1))':"
"imag='im * (1-clip(b * (b/nb), 0, 1))'"
)
data:image/s3,"s3://crabby-images/15d4e/15d4ee811202b41515f4f27d8cee2204e127b1b9" alt="effector tutorial"
show(
"afftfilt="
"real='hypot(re,im) * sin(0)':"
"imag='hypot(re,im) * cos(0)':"
"win_size=512:"
"overlap=0.75"
)
data:image/s3,"s3://crabby-images/c6e79/c6e793489af729120e8336d8ae35cb6176eb9d76" alt="effector tutorial"
show(
"afftfilt="
"real='hypot(re,im) * cos(2 * 3.14 * (random(0) * 2-1))':"
"imag='hypot(re,im) * sin(2 * 3.14 * (random(1) * 2-1))':"
"win_size=128:"
"overlap=0.8"
)
# fmt: on
data:image/s3,"s3://crabby-images/5cf1d/5cf1dfa800d28a91ce1adea11721eb622c1a2e45" alt="effector tutorial"
顫音¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#vibrato
show("vibrato=f=10:d=0.8")
data:image/s3,"s3://crabby-images/0b3f1/0b3f13e3160761108ca710d56233a8c072c56b2f" alt="effector tutorial"
/pytorch/audio/ci_env/lib/python3.10/site-packages/IPython/lib/display.py:188: RuntimeWarning: invalid value encountered in cast
return scaled.astype("<h").tobytes(), nchan
震音¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#tremolo
show("tremolo=f=8:d=0.8")
data:image/s3,"s3://crabby-images/c693e/c693efdcb21198d7775a7d63e0cee6560f3fa380" alt="effector tutorial"
結晶器¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#crystalizer
show("crystalizer")
data:image/s3,"s3://crabby-images/f60b6/f60b60ddb08ee2dfbc8ad859bfe251536929405d" alt="effector tutorial"
鑲邊¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#flanger
show("flanger")
data:image/s3,"s3://crabby-images/32800/32800f24bc77a9275b1137efdf8994b3ee30bb99" alt="effector tutorial"
相位器¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#aphaser
show("aphaser")
data:image/s3,"s3://crabby-images/279d6/279d6558bac973e9db103180425b9235df7779d3" alt="effector tutorial"
脈衝器¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#apulsator
show("apulsator", stereo=True)
data:image/s3,"s3://crabby-images/649ab/649abf8ef6465c2efd1422dc0abd2225982e14ea" alt="effector tutorial"
哈斯效應¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#haas
show("haas")
data:image/s3,"s3://crabby-images/87d5f/87d5f2949fe6dcec415ea746904800f0e5849358" alt="effector tutorial"
編解碼器¶
def show_multi(configs):
results = []
for config in configs:
effector = AudioEffector(**config)
results.append(effector.apply(waveform, int(sr)))
num_configs = len(configs)
figsize = (6.4, 0.3 + num_configs * 0.9)
f, axes = plt.subplots(num_configs, 1, figsize=figsize, sharex=True)
for result, ax in zip(results, axes):
ax.specgram(result[:, 0], Fs=sr)
f.set_tight_layout(True)
return [Audio(r.numpy().T, rate=sr) for r in results]
ogg¶
results = show_multi(
[
{"format": "ogg"},
{"format": "ogg", "encoder": "vorbis"},
{"format": "ogg", "encoder": "opus"},
]
)
data:image/s3,"s3://crabby-images/f5f8a/f5f8abd29124f64a3e72ebdb08e97c3cf9c5eb36" alt="effector tutorial"
ogg - 預設編碼器 (flac)¶
results[0]
ogg - vorbis¶
results[1]
ogg - opus¶
results[2]
mp3¶
https://trac.ffmpeg.org/wiki/Encode/MP3
results = show_multi(
[
{"format": "mp3"},
{"format": "mp3", "codec_config": CodecConfig(compression_level=1)},
{"format": "mp3", "codec_config": CodecConfig(compression_level=9)},
{"format": "mp3", "codec_config": CodecConfig(bit_rate=192_000)},
{"format": "mp3", "codec_config": CodecConfig(bit_rate=8_000)},
{"format": "mp3", "codec_config": CodecConfig(qscale=9)},
{"format": "mp3", "codec_config": CodecConfig(qscale=1)},
]
)
data:image/s3,"s3://crabby-images/8fb2d/8fb2da47e1c6527a39bd8268ddbca4b561c9d9ca" alt="effector tutorial"
預設¶
results[0]
compression_level=1¶
results[1]
compression_level=9¶
results[2]
bit_rate=192k¶
results[3]
bit_rate=8k¶
results[4]
qscale=9¶
results[5]
qscale=1¶
results[6]
標籤: torchaudio.io
腳本總執行時間: ( 0 分鐘 3.057 秒)