注意

點擊此處下載完整的範例程式碼

StreamReader 基本用法¶

作者: Moto Hira

本教學示範如何使用 torchaudio.io.StreamReader 來獲取和解碼音訊/視訊資料，並應用 libavfilter 提供的預處理。

注意

本教學需要 FFmpeg 函式庫。請參考 FFmpeg 相依性以取得詳細資訊。

概述¶

串流 API 利用了 ffmpeg 強大的 I/O 功能。

它可以

載入各種格式的音訊/視訊
從本地/遠端來源載入音訊/視訊
從檔案類物件載入音訊/視訊
從麥克風、相機和螢幕載入音訊/視訊
產生合成音訊/視訊訊號。
逐塊載入音訊/視訊
即時變更取樣率/影格率、影像大小
應用濾波器和預處理

串流 API 分三個步驟運作。

開啟媒體來源（檔案、裝置、合成模式產生器）
設定輸出串流
串流媒體

目前，ffmpeg 整合提供的功能僅限於以下形式：

<某個媒體來源> -> <可選的處理> -> <tensor>

如果您有其他對您的使用案例有用的形式（例如與 torch.Tensor 類型整合），請提交功能請求。

準備¶

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import matplotlib.pyplot as plt
from torchaudio.io import StreamReader

base_url = "https://download.pytorch.org/torchaudio/tutorial-assets"
AUDIO_URL = f"{base_url}/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
VIDEO_URL = f"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"

2.6.0
2.6.0

開啟來源¶

串流 API 主要可以處理三種不同的來源。無論使用哪種來源，剩餘的過程（設定輸出、應用預處理）都是相同的。

常見的媒體格式（字串類型或檔案類物件的資源指示器）
音訊/視訊裝置
合成音訊/視訊來源

以下章節介紹如何開啟常見的媒體格式。對於其他串流，請參考 StreamReader 進階用法。

注意

支援的媒體（例如容器、編解碼器和協定）的涵蓋範圍取決於系統中找到的 FFmpeg 函式庫。

如果 StreamReader 在開啟來源時引發錯誤，請檢查 ffmpeg 命令是否可以處理它。

本地檔案¶

若要開啟媒體檔案，您可以簡單地將檔案的路徑傳遞給 StreamReader 的建構函式。

StreamReader(src="audio.wav")

StreamReader(src="audio.mp3")

這適用於影像檔案、視訊檔案和視訊串流。

# Still image
StreamReader(src="image.jpeg")

# Video file
StreamReader(src="video.mpeg")

網路協定¶

您也可以直接傳遞 URL。

# Video on remote server
StreamReader(src="https://example.com/video.mp4")

# Playlist format
StreamReader(src="https://example.com/playlist.m3u")

# RTMP
StreamReader(src="rtmp://example.com:1935/live/app")

類檔案物件¶

您也可以傳遞類檔案物件。類檔案物件必須實作符合 io.RawIOBase.read 的 read 方法。

如果給定的類檔案物件具有 seek 方法，StreamReader 也會使用它。在此情況下，seek 方法預期符合 io.IOBase.seek。

# Open as fileobj with seek support
with open("input.mp4", "rb") as src:
    StreamReader(src=src)

如果第三方函式庫實作的 seek 會引發錯誤，您可以編寫一個 wrapper 類別來遮蔽 seek 方法。

class UnseekableWrapper:
    def __init__(self, obj):
        self.obj = obj

    def read(self, n):
        return self.obj.read(n)

import requests

response = requests.get("https://example.com/video.mp4", stream=True)
s = StreamReader(UnseekableWrapper(response.raw))

import boto3

response = boto3.client("s3").get_object(Bucket="my_bucket", Key="key")
s = StreamReader(UnseekableWrapper(response["Body"]))

注意

當使用不可搜尋 (unseekable) 的類檔案物件時，來源媒體必須是可串流的。例如，一個有效的 MP4 格式物件可以將其 metadata 放在媒體資料的開頭或結尾。 metadata 位於開頭的物件可以在沒有 seek 方法的情況下開啟，但 metadata 位於結尾的物件則必須使用 seek 才能開啟。

無標頭媒體¶

如果嘗試載入無標頭的原始資料，您可以使用 format 和 option 來指定資料的格式。

假設您使用 sox 指令將音訊檔案轉換為 faw 格式，如下所示：

# Headerless, 16-bit signed integer PCM, resampled at 16k Hz.
$ sox original.wav -r 16000 raw.s2

這樣的音訊可以像下面這樣開啟。

StreamReader(src="raw.s2", format="s16le", option={"sample_rate": "16000"})

檢查來源串流¶

一旦媒體被開啟，我們就可以檢查串流並配置輸出串流。

您可以使用 num_src_streams 檢查來源串流的數量。

注意

串流的數量 *不是* 通道的數量。每個音訊串流可以包含任意數量的通道。

要檢查來源串流的 metadata，您可以使用 get_src_stream_info() 方法並提供來源串流的索引。

此方法會返回 SourceStream。如果來源串流是音訊類型，則返回類型為 SourceAudioStream，它是 SourceStream 的子類別，具有額外的音訊特定屬性。同樣地，如果來源串流是視訊類型，則返回類型為 SourceVideoStream。

對於常規音訊格式和靜態影像格式，例如 WAV 和 JPEG，來源串流的數量為 1。

streamer = StreamReader(AUDIO_URL)
print("The number of source streams:", streamer.num_src_streams)
print(streamer.get_src_stream_info(0))

The number of source streams: 1
SourceAudioStream(media_type='audio', codec='pcm_s16le', codec_long_name='PCM signed 16-bit little-endian', format='s16', bit_rate=256000, num_frames=0, bits_per_sample=0, metadata={}, sample_rate=16000.0, num_channels=1)

容器格式和播放清單格式可能包含多個不同媒體類型的串流。

src = "https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8"
streamer = StreamReader(src)
print("The number of source streams:", streamer.num_src_streams)
for i in range(streamer.num_src_streams):
    print(streamer.get_src_stream_info(i))

The number of source streams: 27
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=335457, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:39.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '2177116'}, width=960, height=540, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1351204, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:42.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '8001098'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1019347, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:48.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '6312875'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=750899, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:54.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '4943747'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=513057, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:59.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '3216424'}, width=1280, height=720, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=185749, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:03.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1268994'}, width=768, height=432, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=111939, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:06.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '902298'}, width=640, height=360, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=59938, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:07.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '541052'}, width=480, height=270, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=335457, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:39.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '2399619'}, width=960, height=540, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1351204, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:42.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '8223601'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1019347, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:48.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '6535378'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=750899, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:54.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '5166250'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=513057, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:59.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '3438927'}, width=1280, height=720, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=185749, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:03.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1491497'}, width=768, height=432, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=111939, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:06.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1124801'}, width=640, height=360, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=59938, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:07.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '763555'}, width=480, height=270, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=335457, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:39.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '2207619'}, width=960, height=540, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1351204, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:42.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '8031601'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=1019347, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:48.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '6343378'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=750899, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:54.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '4974250'}, width=1920, height=1080, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=513057, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:37:59.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '3246927'}, width=1280, height=720, frame_rate=60.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=185749, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:03.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '1299497'}, width=768, height=432, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=111939, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:06.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '932801'}, width=640, height=360, frame_rate=30.0)
SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=59938, num_frames=0, bits_per_sample=8, metadata={'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T21:38:07.000000Z', 'major_brand': 'mp42', 'minor_version': '1', 'variant_bitrate': '571555'}, width=480, height=270, frame_rate=30.0)
SourceAudioStream(media_type='audio', codec='aac', codec_long_name='AAC (Advanced Audio Coding)', format='fltp', bit_rate=60076, num_frames=0, bits_per_sample=0, metadata={'comment': 'English', 'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T22:14:57.000000Z', 'language': 'en', 'major_brand': 'mp42', 'minor_version': '1'}, sample_rate=48000.0, num_channels=2)
SourceAudioStream(media_type='audio', codec='ac3', codec_long_name='ATSC A/52A (AC-3)', format='fltp', bit_rate=384000, num_frames=0, bits_per_sample=0, metadata={'comment': 'English', 'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T22:15:46.000000Z', 'language': 'en', 'major_brand': 'mp42', 'minor_version': '1'}, sample_rate=48000.0, num_channels=6)
SourceAudioStream(media_type='audio', codec='eac3', codec_long_name='ATSC A/52B (AC-3, E-AC-3)', format='fltp', bit_rate=192000, num_frames=0, bits_per_sample=0, metadata={'comment': 'English', 'compatible_brands': 'mp41mp42isomhlsf', 'creation_time': '2016-12-21T22:16:17.000000Z', 'language': 'en', 'major_brand': 'mp42', 'minor_version': '1'}, sample_rate=48000.0, num_channels=6)

配置輸出串流¶

串流 API 允許您從輸入串流的任意組合串流資料。如果您的應用程式不需要音訊或視訊，您可以省略它們。或者，如果您想對同一個來源串流應用不同的預處理，您可以複製來源串流。

預設串流¶

當來源中有多个串流時，無法立即清楚應該使用哪個串流。

FFmpeg 實作了一些 heuristic 來決定預設串流。產生的串流索引通過以下方式公開

default_audio_stream 和 default_video_stream。

配置輸出串流¶

一旦您知道要使用哪個來源串流，您可以使用 add_basic_audio_stream() 和 add_basic_video_stream() 來配置輸出串流。

這些方法提供了一種簡單的方式來更改媒體的基本屬性，以符合應用程式的需求。

這兩個方法共有的參數是：

frames_per_chunk：每次迭代最多應返回多少 frames。對於音訊，產生的 tensor 的形狀將為 (frames_per_chunk, num_channels)。對於視訊，它將是 (frames_per_chunk, num_channels, height, width)。
buffer_chunk_size：內部緩衝的最大 chunk 數量。當 StreamReader 緩衝了這個數量的 chunks 並被要求提取更多 frames 時，StreamReader 會丟棄舊的 frames/chunks。
stream_index：來源串流的索引。
decoder：如果提供，則覆蓋 decoder。如果無法檢測到 codec，則很有用。
decoder_option：decoder 的選項。

對於音訊輸出串流，您可以提供以下額外的參數來更改音訊屬性。

format：預設情況下，StreamReader 返回 float32 dtype 的 tensor，sample 值範圍為 [-1, 1]。通過提供 format 參數，可以更改產生的 dtype 和值範圍。
sample_rate：如果提供，StreamReader 會即時重新取樣音訊。

對於視訊輸出串流，可以使用以下參數。

format：影像 frame 格式。預設情況下，StreamReader 以 8-bit 3 通道，RGB 順序返回 frames。
frame_rate：通過丟棄或複製 frames 來更改 frame rate。不執行插值。
width, height：更改影像大小。

streamer = StreamReader(...)

# Stream audio from default audio source stream
# 256 frames at a time, keeping the original sampling rate.
streamer.add_basic_audio_stream(
    frames_per_chunk=256,
)

# Stream audio from source stream `i`.
# Resample audio to 8k Hz, stream 256 frames at each
streamer.add_basic_audio_stream(
    frames_per_chunk=256,
    stream_index=i,
    sample_rate=8000,
)

# Stream video from default video source stream.
# 10 frames at a time, at 30 FPS
# RGB color channels.
streamer.add_basic_video_stream(
    frames_per_chunk=10,
    frame_rate=30,
    format="rgb24"
)

# Stream video from source stream `j`,
# 10 frames at a time, at 30 FPS
# BGR color channels with rescaling to 128x128
streamer.add_basic_video_stream(
    frames_per_chunk=10,
    stream_index=j,
    frame_rate=30,
    width=128,
    height=128,
    format="bgr24"
)

您可以以與檢查來源串流類似的方式檢查產生的輸出串流。 num_out_streams 報告配置的輸出串流的數量，並且 get_out_stream_info() 獲取有關輸出串流的資訊。

for i in range(streamer.num_out_streams):
    print(streamer.get_out_stream_info(i))

如果要刪除輸出串流，可以使用 remove_stream() 方法。

# Removes the first output stream.
streamer.remove_stream(0)

串流¶

要串流媒體資料，streamer 會交替執行以下過程：提取和解碼來源資料，並將產生的音訊/視訊資料傳遞給客戶端程式碼。

有一些執行這些操作的低階方法。 is_buffer_ready()、 process_packet() 和 pop_chunks()。

在本教程中，我們將使用高階 API，即 iterator 協議。它就像一個 for 循環一樣簡單。

streamer = StreamReader(...)
streamer.add_basic_audio_stream(...)
streamer.add_basic_video_stream(...)

for chunks in streamer.stream():
    audio_chunk, video_chunk = chunks
    ...

範例¶

讓我們以一個範例視訊來配置輸出串流。我們將使用以下視訊。

來源: https://svs.gsfc.nasa.gov/13013 (此視訊屬於公共領域)

Credit: NASA’s Goddard Space Flight Center.

NASA’s Media Usage Guidelines: https://www.nasa.gov/multimedia/guidelines/index.html

開啟來源媒體¶

首先，讓我們列出可用的串流及其屬性。

streamer = StreamReader(VIDEO_URL)
for i in range(streamer.num_src_streams):
    print(streamer.get_src_stream_info(i))

SourceVideoStream(media_type='video', codec='h264', codec_long_name='H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', format='yuv420p', bit_rate=9958354, num_frames=6175, bits_per_sample=8, metadata={'creation_time': '2018-07-24T17:40:48.000000Z', 'encoder': 'AVC Coding', 'handler_name': '\x1fMainconcept Video Media Handler', 'language': 'eng', 'vendor_id': '[0][0][0][0]'}, width=1920, height=1080, frame_rate=29.97002997002997)
SourceAudioStream(media_type='audio', codec='aac', codec_long_name='AAC (Advanced Audio Coding)', format='fltp', bit_rate=317375, num_frames=9658, bits_per_sample=0, metadata={'creation_time': '2018-07-24T17:40:48.000000Z', 'handler_name': '#Mainconcept MP4 Sound Media Handler', 'language': 'eng', 'vendor_id': '[0][0][0][0]'}, sample_rate=48000.0, num_channels=2)

現在我們設定輸出串流。

設定輸出串流¶

# fmt: off
# Audio stream with 8k Hz
streamer.add_basic_audio_stream(
    frames_per_chunk=8000,
    sample_rate=8000,
)

# Audio stream with 16k Hz
streamer.add_basic_audio_stream(
    frames_per_chunk=16000,
    sample_rate=16000,
)

# Video stream with 960x540 at 1 FPS.
streamer.add_basic_video_stream(
    frames_per_chunk=1,
    frame_rate=1,
    width=960,
    height=540,
    format="rgb24",
)

# Video stream with 320x320 (stretched) at 3 FPS, grayscale
streamer.add_basic_video_stream(
    frames_per_chunk=3,
    frame_rate=3,
    width=320,
    height=320,
    format="gray",
)
# fmt: on

注意

在設定多個輸出串流時，為了保持所有串流同步，請設定參數使 frames_per_chunk 和 sample_rate 或 frame_rate 之間的比例在所有輸出串流中保持一致。

檢查輸出串流。

for i in range(streamer.num_out_streams):
    print(streamer.get_out_stream_info(i))

OutputAudioStream(source_index=1, filter_description='aresample=8000,aformat=sample_fmts=fltp', media_type='audio', format='fltp', sample_rate=8000.0, num_channels=2)
OutputAudioStream(source_index=1, filter_description='aresample=16000,aformat=sample_fmts=fltp', media_type='audio', format='fltp', sample_rate=16000.0, num_channels=2)
OutputVideoStream(source_index=0, filter_description='fps=1,scale=width=960:height=540,format=pix_fmts=rgb24', media_type='video', format='rgb24', width=960, height=540, frame_rate=1.0)
OutputVideoStream(source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0)

移除第二個音訊串流。

streamer.remove_stream(1)
for i in range(streamer.num_out_streams):
    print(streamer.get_out_stream_info(i))

OutputAudioStream(source_index=1, filter_description='aresample=8000,aformat=sample_fmts=fltp', media_type='audio', format='fltp', sample_rate=8000.0, num_channels=2)
OutputVideoStream(source_index=0, filter_description='fps=1,scale=width=960:height=540,format=pix_fmts=rgb24', media_type='video', format='rgb24', width=960, height=540, frame_rate=1.0)
OutputVideoStream(source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0)

串流¶

跳到第 10 秒。

streamer.seek(10.0)

現在，讓我們最終遍歷輸出串流。

n_ite = 3
waveforms, vids1, vids2 = [], [], []
for i, (waveform, vid1, vid2) in enumerate(streamer.stream()):
    waveforms.append(waveform)
    vids1.append(vid1)
    vids2.append(vid2)
    if i + 1 == n_ite:
        break

對於音訊串流，chunk Tensor 的形狀將是 (frames_per_chunk, num_channels)，對於視訊串流，則是 (frames_per_chunk, num_color_channels, height, width)。

print(waveforms[0].shape)
print(vids1[0].shape)
print(vids2[0].shape)

torch.Size([8000, 2])
torch.Size([1, 3, 540, 960])
torch.Size([3, 1, 320, 320])

讓我們視覺化我們收到的內容。

k = 3
fig = plt.figure()
gs = fig.add_gridspec(3, k * n_ite)
for i, waveform in enumerate(waveforms):
    ax = fig.add_subplot(gs[0, k * i : k * (i + 1)])
    ax.specgram(waveform[:, 0], Fs=8000)
    ax.set_yticks([])
    ax.set_xticks([])
    ax.set_title(f"Iteration {i}")
    if i == 0:
        ax.set_ylabel("Stream 0")
for i, vid in enumerate(vids1):
    ax = fig.add_subplot(gs[1, k * i : k * (i + 1)])
    ax.imshow(vid[0].permute(1, 2, 0))  # NCHW->HWC
    ax.set_yticks([])
    ax.set_xticks([])
    if i == 0:
        ax.set_ylabel("Stream 1")
for i, vid in enumerate(vids2):
    for j in range(3):
        ax = fig.add_subplot(gs[2, k * i + j : k * i + j + 1])
        ax.imshow(vid[j].permute(1, 2, 0), cmap="gray")
        ax.set_yticks([])
        ax.set_xticks([])
        if i == 0 and j == 0:
            ax.set_ylabel("Stream 2")
plt.tight_layout()

標籤： torchaudio.io

腳本總執行時間： ( 0 分鐘 14.284 秒)

由 Sphinx-Gallery 產生圖庫

StreamReader 基本用法¶

概述¶

準備¶

開啟來源¶

本地檔案¶

網路協定¶

類檔案物件¶

無標頭媒體¶

檢查來源串流¶

配置輸出串流¶

預設串流¶

配置輸出串流¶

串流¶

範例¶

開啟來源媒體¶

設定輸出串流¶

串流¶

文件

教學

資源