注意
點擊這裡下載完整的範例程式碼
StreamReader 進階用法¶
作者:Moto Hira
本教學是StreamReader 基本用法的延續。
本教學展示如何使用StreamReader
來處理:
裝置輸入,例如麥克風、網路攝影機和螢幕錄製
產生合成音訊/視訊
使用自訂的篩選器表達式套用預處理
import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
import IPython
import matplotlib.pyplot as plt
from torchaudio.io import StreamReader
base_url = "https://download.pytorch.org/torchaudio/tutorial-assets"
AUDIO_URL = f"{base_url}/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
VIDEO_URL = f"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
2.6.0
2.6.0
音訊/視訊裝置輸入¶
假設系統具有適當的媒體裝置,並且 libavdevice 已設定為使用這些裝置,則串流 API 可以從這些裝置提取媒體串流。
為此,我們將額外的參數 format
和 option
傳遞給建構函式。format
指定裝置元件,而 option
字典特定於指定的元件。
要傳遞的確切參數取決於系統配置。請參閱 https://ffmpeg.dev.org.tw/ffmpeg-devices.html 了解詳情。
以下範例說明了如何在 MacBook Pro 上執行此操作。
首先,我們需要檢查可用的裝置。
$ ffmpeg -f avfoundation -list_devices true -i ""
[AVFoundation indev @ 0x143f04e50] AVFoundation video devices:
[AVFoundation indev @ 0x143f04e50] [0] FaceTime HD Camera
[AVFoundation indev @ 0x143f04e50] [1] Capture screen 0
[AVFoundation indev @ 0x143f04e50] AVFoundation audio devices:
[AVFoundation indev @ 0x143f04e50] [0] MacBook Pro Microphone
我們使用 FaceTime HD Camera 作為視訊裝置 (索引 0),並使用 MacBook Pro Microphone 作為音訊裝置 (索引 0)。
如果我們不傳遞任何 option
,裝置將使用其預設配置。解碼器可能不支援該配置。
>>> StreamReader(
... src="0:0", # The first 0 means `FaceTime HD Camera`, and
... # the second 0 indicates `MacBook Pro Microphone`.
... format="avfoundation",
... )
[avfoundation @ 0x125d4fe00] Selected framerate (29.970030) is not supported by the device.
[avfoundation @ 0x125d4fe00] Supported modes:
[avfoundation @ 0x125d4fe00] 1280x720@[1.000000 30.000000]fps
[avfoundation @ 0x125d4fe00] 640x480@[1.000000 30.000000]fps
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
RuntimeError: Failed to open the input: 0:0
藉由提供 option
,我們可以變更裝置串流的格式,使其成為解碼器支援的格式。
>>> streamer = StreamReader(
... src="0:0",
... format="avfoundation",
... option={"framerate": "30", "pixel_format": "bgr0"},
... )
>>> for i in range(streamer.num_src_streams):
... print(streamer.get_src_stream_info(i))
SourceVideoStream(media_type='video', codec='rawvideo', codec_long_name='raw video', format='bgr0', bit_rate=0, width=640, height=480, frame_rate=30.0)
SourceAudioStream(media_type='audio', codec='pcm_f32le', codec_long_name='PCM 32-bit floating point little-endian', format='flt', bit_rate=3072000, sample_rate=48000.0, num_channels=2)
合成來源串流¶
作為裝置整合的一部分,ffmpeg 提供了一個「虛擬裝置」介面。這個介面使用 libavfilter 來產生合成音訊 / 視訊資料。
若要使用此功能,我們將 format=lavfi
設為 true,並將篩選器描述提供給 src
。
篩選器描述的詳細資訊可以在 https://ffmpeg.dev.org.tw/ffmpeg-filters.html 找到。
音訊範例¶
正弦波¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#sine
StreamReader(src="sine=sample_rate=8000:frequency=360", format="lavfi")
data:image/s3,"s3://crabby-images/245a8/245a86d7ec1a33a5ef0c03639fbaf0c251d38255" alt=""
具有任意表達式的訊號¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#aevalsrc
# 5 Hz binaural beats on a 360 Hz carrier
StreamReader(
src=(
'aevalsrc='
'sample_rate=8000:'
'exprs=0.1*sin(2*PI*(360-5/2)*t)|0.1*sin(2*PI*(360+5/2)*t)'
),
format='lavfi',
)
data:image/s3,"s3://crabby-images/faaca/faaca5ccc1b09ce15ffeb0623792caf1ec9915b3" alt=""
噪音¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#anoisesrc
StreamReader(src="anoisesrc=color=pink:sample_rate=8000:amplitude=0.5", format="lavfi")
data:image/s3,"s3://crabby-images/ab618/ab6185519fc19f888630c57d953a694b6b0947f2" alt=""
視訊範例¶
細胞自動機¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#cellauto
StreamReader(src=f"cellauto", format="lavfi")
曼德勃羅集¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#cellauto
StreamReader(src=f"mandelbrot", format="lavfi")
MPlayer 測試圖案¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#mptestsrc
StreamReader(src=f"mptestsrc", format="lavfi")
約翰·康威的生命遊戲¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#life
StreamReader(src=f"life", format="lavfi")
謝爾賓斯基地毯/三角形碎形¶
https://ffmpeg.dev.org.tw/ffmpeg-filters.html#sierpinski
StreamReader(src=f"sierpinski", format="lavfi")
自訂篩選器¶
在定義輸出串流時,您可以使用 add_audio_stream()
和 add_video_stream()
方法。
這些方法採用 filter_desc
引數,這是一個根據 ffmpeg 的 篩選器表達式格式化的字串。
add_basic_(audio|video)_stream
和 add_(audio|video)_stream
之間的區別在於 add_basic_(audio|video)_stream
建構篩選器表達式並將其傳遞到相同的底層實作。所有 add_basic_(audio|video)_stream
可以使用 add_(audio|video)_stream
來實現。
注意
當應用自訂篩選器時,用戶端程式碼必須將音訊/視訊串流轉換為 torchaudio 可以轉換為張量格式的格式之一。例如,可以透過將
format=pix_fmts=rgb24
應用於視訊串流,並將aformat=sample_fmts=fltp
應用於音訊串流來實現此目的。每個輸出串流都有單獨的篩選器圖。因此,不可能為篩選器表達式使用不同的輸入/輸出串流。但是,可以將一個輸入串流分成多個,然後稍後將它們合併。
音訊範例¶
# fmt: off
descs = [
# No filtering
"anull",
# Apply a highpass filter then a lowpass filter
"highpass=f=200,lowpass=f=1000",
# Manipulate spectrogram
(
"afftfilt="
"real='hypot(re,im)*sin(0)':"
"imag='hypot(re,im)*cos(0)':"
"win_size=512:"
"overlap=0.75"
),
# Manipulate spectrogram
(
"afftfilt="
"real='hypot(re,im)*cos((random(0)*2-1)*2*3.14)':"
"imag='hypot(re,im)*sin((random(1)*2-1)*2*3.14)':"
"win_size=128:"
"overlap=0.8"
),
]
# fmt: on
sample_rate = 8000
streamer = StreamReader(AUDIO_URL)
for desc in descs:
streamer.add_audio_stream(
frames_per_chunk=40000,
filter_desc=f"aresample={sample_rate},{desc},aformat=sample_fmts=fltp",
)
chunks = next(streamer.stream())
def _display(i):
print("filter_desc:", streamer.get_out_stream_info(i).filter_description)
fig, axs = plt.subplots(2, 1)
waveform = chunks[i][:, 0]
axs[0].plot(waveform)
axs[0].grid(True)
axs[0].set_ylim([-1, 1])
plt.setp(axs[0].get_xticklabels(), visible=False)
axs[1].specgram(waveform, Fs=sample_rate)
fig.tight_layout()
return IPython.display.Audio(chunks[i].T, rate=sample_rate)
原始¶
_display(0)
data:image/s3,"s3://crabby-images/ce540/ce540836ab2439cfc63a44ba2ee50226075cf1f2" alt="streamreader advanced tutorial"
filter_desc: aresample=8000,anull,aformat=sample_fmts=fltp
高通 / 低通篩選器¶
_display(1)
data:image/s3,"s3://crabby-images/b5e25/b5e25cff5739c34b806501a1d104029b57c5f7b9" alt="streamreader advanced tutorial"
filter_desc: aresample=8000,highpass=f=200,lowpass=f=1000,aformat=sample_fmts=fltp
FFT 篩選器 - 機器人 🤖¶
_display(2)
data:image/s3,"s3://crabby-images/bfc93/bfc933cdfa61050a6a53423b83d44bde51f611e1" alt="streamreader advanced tutorial"
filter_desc: aresample=8000,afftfilt=real='hypot(re,im)*sin(0)':imag='hypot(re,im)*cos(0)':win_size=512:overlap=0.75,aformat=sample_fmts=fltp
FFT 篩選器 - 耳語¶
_display(3)
data:image/s3,"s3://crabby-images/daf8a/daf8acadba556f8833798366cd5a85b72398e3fb" alt="streamreader advanced tutorial"
filter_desc: aresample=8000,afftfilt=real='hypot(re,im)*cos((random(0)*2-1)*2*3.14)':imag='hypot(re,im)*sin((random(1)*2-1)*2*3.14)':win_size=128:overlap=0.8,aformat=sample_fmts=fltp
視訊範例¶
# fmt: off
descs = [
# No effect
"null",
# Split the input stream and apply horizontal flip to the right half.
(
"split [main][tmp];"
"[tmp] crop=iw/2:ih:0:0, hflip [flip];"
"[main][flip] overlay=W/2:0"
),
# Edge detection
"edgedetect=mode=canny",
# Rotate image by randomly and fill the background with brown
"rotate=angle=-random(1)*PI:fillcolor=brown",
# Manipulate pixel values based on the coordinate
"geq=r='X/W*r(X,Y)':g='(1-X/W)*g(X,Y)':b='(H-Y)/H*b(X,Y)'"
]
# fmt: on
streamer = StreamReader(VIDEO_URL)
for desc in descs:
streamer.add_video_stream(
frames_per_chunk=30,
filter_desc=f"fps=10,{desc},format=pix_fmts=rgb24",
)
streamer.seek(12)
chunks = next(streamer.stream())
def _display(i):
print("filter_desc:", streamer.get_out_stream_info(i).filter_description)
_, axs = plt.subplots(1, 3, figsize=(8, 1.9))
chunk = chunks[i]
for j in range(3):
axs[j].imshow(chunk[10 * j + 1].permute(1, 2, 0))
axs[j].set_axis_off()
plt.tight_layout()
原始¶
_display(0)
data:image/s3,"s3://crabby-images/4f642/4f642e8e74fb72cab0333b1d39697567a0d14e8e" alt="streamreader advanced tutorial"
filter_desc: fps=10,null,format=pix_fmts=rgb24
鏡像¶
_display(1)
data:image/s3,"s3://crabby-images/737bd/737bd80fdb66c551d8736e0cdcd718a0d7ec370a" alt="streamreader advanced tutorial"
filter_desc: fps=10,split [main][tmp];[tmp] crop=iw/2:ih:0:0, hflip [flip];[main][flip] overlay=W/2:0,format=pix_fmts=rgb24
邊緣檢測¶
_display(2)
data:image/s3,"s3://crabby-images/65f21/65f21c8851d8822383f7346112bbe6652345b9f6" alt="streamreader advanced tutorial"
filter_desc: fps=10,edgedetect=mode=canny,format=pix_fmts=rgb24
隨機旋轉¶
_display(3)
data:image/s3,"s3://crabby-images/3b68b/3b68b642c9c7847c8dbfdd38c30acd3e69bdacee" alt="streamreader advanced tutorial"
filter_desc: fps=10,rotate=angle=-random(1)*PI:fillcolor=brown,format=pix_fmts=rgb24
像素操作¶
_display(4)
data:image/s3,"s3://crabby-images/f1edc/f1edcb99287ae3f4fd0bc1786ee416af49c6de8d" alt="streamreader advanced tutorial"
filter_desc: fps=10,geq=r='X/W*r(X,Y)':g='(1-X/W)*g(X,Y)':b='(H-Y)/H*b(X,Y)',format=pix_fmts=rgb24
腳本總執行時間: ( 0 分鐘 18.695 秒)