注意
點擊這裡下載完整的範例程式碼
加成合成¶
作者: Moto Hira
本教學是振盪器和 ADSR 包絡的延續。
本教學示範如何使用 TorchAudio 的 DSP 函式執行加成合成和減成合成。
加成合成透過組合多個波形來建立音色。 減成合成透過套用濾波器來建立音色。
警告
本教學需要原型 DSP 功能,這些功能在 nightly builds 中可用。
請參閱https://pytorch.dev.org.tw/get-started/locally,以取得安裝 nightly build 的說明。
import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
2.6.0
2.6.0
概述¶
try:
from torchaudio.prototype.functional import adsr_envelope, extend_pitch, oscillator_bank
except ModuleNotFoundError:
print(
"Failed to import prototype DSP features. "
"Please install torchaudio nightly builds. "
"Please refer to https://pytorch.dev.org.tw/get-started/locally "
"for instructions to install a nightly build."
)
raise
import matplotlib.pyplot as plt
from IPython.display import Audio
建立多個頻率音高¶
加成合成的核心是振盪器。 我們透過將振盪器產生的多個波形加總起來建立音色。
在振盪器教學中,我們使用oscillator_bank()
和 adsr_envelope()
來產生各種波形。
在本教學中,我們使用 extend_pitch()
從基本頻率建立音色。
首先,我們定義一些將在整個教學中使用的常數和輔助函式。
PI = torch.pi
PI2 = 2 * torch.pi
F0 = 344.0 # fundamental frequency
DURATION = 1.1 # [seconds]
SAMPLE_RATE = 16_000 # [Hz]
NUM_FRAMES = int(DURATION * SAMPLE_RATE)
def plot(freq, amp, waveform, sample_rate, zoom=None, vol=0.1):
t = (torch.arange(waveform.size(0)) / sample_rate).numpy()
fig, axes = plt.subplots(4, 1, sharex=True)
axes[0].plot(t, freq.numpy())
axes[0].set(title=f"Oscillator bank (bank size: {amp.size(-1)})", ylabel="Frequency [Hz]", ylim=[-0.03, None])
axes[1].plot(t, amp.numpy())
axes[1].set(ylabel="Amplitude", ylim=[-0.03 if torch.all(amp >= 0.0) else None, None])
axes[2].plot(t, waveform)
axes[2].set(ylabel="Waveform")
axes[3].specgram(waveform, Fs=sample_rate)
axes[3].set(ylabel="Spectrogram", xlabel="Time [s]", xlim=[-0.01, t[-1] + 0.01])
for i in range(4):
axes[i].grid(True)
pos = axes[2].get_position()
fig.tight_layout()
if zoom is not None:
ax = fig.add_axes([pos.x0 + 0.02, pos.y0 + 0.03, pos.width / 2.5, pos.height / 2.0])
ax.plot(t, waveform)
ax.set(xlim=zoom, xticks=[], yticks=[])
waveform /= waveform.abs().max()
return Audio(vol * waveform, rate=sample_rate, normalize=False)
諧波泛音¶
諧波泛音是頻率分量,其為基頻的整數倍。
我們將探討如何產生合成器中常用的波形。也就是:
鋸齒波
方波
三角波
鋸齒波¶
鋸齒波可以表示如下。它包含所有整數諧波,因此也常被用於減法合成中。
以下函式接收基頻和振幅,並根據上述公式新增擴展音高。
現在合成一個波形
freq0 = torch.full((NUM_FRAMES, 1), F0)
amp0 = torch.ones((NUM_FRAMES, 1))
freq, amp, waveform = sawtooth_wave(freq0, amp0, int(SAMPLE_RATE / F0), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
data:image/s3,"s3://crabby-images/e0616/e0616336ba0fbeee34e328f4a6dd6963ffece7a7" alt="Oscillator bank (bank size: 46)"
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
可以振盪基頻以基於鋸齒波創建隨時間變化的音調。
fm = 10 # rate at which the frequency oscillates [Hz]
f_dev = 0.1 * F0 # the degree of frequency oscillation [Hz]
phase = torch.linspace(0, fm * PI2 * DURATION, NUM_FRAMES)
freq0 = F0 + f_dev * torch.sin(phase).unsqueeze(-1)
freq, amp, waveform = sawtooth_wave(freq0, amp0, int(SAMPLE_RATE / F0), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
data:image/s3,"s3://crabby-images/81f5e/81f5ef66702cc66845ee9e5351a1ba2afdd21800" alt="Oscillator bank (bank size: 46)"
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
方波¶
方波僅包含奇數整數諧波。
def square_wave(freq0, amp0, num_pitches, sample_rate):
mults = [2.0 * i + 1.0 for i in range(num_pitches)]
freq = extend_pitch(freq0, mults)
mults = [4 / (PI * (2.0 * i + 1.0)) for i in range(num_pitches)]
amp = extend_pitch(amp0, mults)
waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
return freq, amp, waveform
freq0 = torch.full((NUM_FRAMES, 1), F0)
amp0 = torch.ones((NUM_FRAMES, 1))
freq, amp, waveform = square_wave(freq0, amp0, int(SAMPLE_RATE / F0 / 2), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
data:image/s3,"s3://crabby-images/82f49/82f49d1788255c69ca4fe96c19b530802e0b7bb7" alt="Oscillator bank (bank size: 23)"
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
三角波¶
三角波也僅包含奇數整數諧波。
def triangle_wave(freq0, amp0, num_pitches, sample_rate):
mults = [2.0 * i + 1.0 for i in range(num_pitches)]
freq = extend_pitch(freq0, mults)
c = 8 / (PI**2)
mults = [c * ((-1) ** i) / ((2.0 * i + 1.0) ** 2) for i in range(num_pitches)]
amp = extend_pitch(amp0, mults)
waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
return freq, amp, waveform
freq, amp, waveform = triangle_wave(freq0, amp0, int(SAMPLE_RATE / F0 / 2), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
data:image/s3,"s3://crabby-images/2accc/2accce6b158eeafb4bfb14e3221590b20a7e91e3" alt="Oscillator bank (bank size: 23)"
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
非諧和分音¶
非諧和分音是指不是基頻的整數倍的頻率。
它們對於重新創建逼真的聲音或使合成結果更有趣至關重要。
鐘聲¶
https://computermusicresource.com/Simple.bell.tutorial.html
num_tones = 9
duration = 2.0
num_frames = int(SAMPLE_RATE * duration)
freq0 = torch.full((num_frames, 1), F0)
mults = [0.56, 0.92, 1.19, 1.71, 2, 2.74, 3.0, 3.76, 4.07]
freq = extend_pitch(freq0, mults)
amp = adsr_envelope(
num_frames=num_frames,
attack=0.002,
decay=0.998,
sustain=0.0,
release=0.0,
n_decay=2,
)
amp = torch.stack([amp * (0.5**i) for i in range(num_tones)], dim=-1)
waveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, vol=0.4)
data:image/s3,"s3://crabby-images/b8a85/b8a855467b91679ffd32d5c7933707014c1d0638" alt="Oscillator bank (bank size: 9)"
作為比較,以下是上述的諧波版本。只有頻率值不同。泛音的數量及其振幅相同。
data:image/s3,"s3://crabby-images/5f5f9/5f5f967ca67afcdee6b6f85c5faa59b2abfd8124" alt="Oscillator bank (bank size: 9)"
參考文獻¶
腳本總執行時間: (0 分鐘 4.900 秒)