警告

TorchAudio 的 C++ API 是一個原型功能。API/ABI 向後相容性不保證。

注意

頂層命名空間已從 torchaudio 變更為 torio。StreamReader 已重新命名為 StreamingMediaDecoder。

torio::io::StreamingMediaDecoder¶

StreamingMediaDecoder 是 Python 等效版本使用的實作，並提供類似的介面。當使用自訂 I/O 時，例如記憶體內資料，可以使用 StreamingMediaDecoderCustomIO 類別。

這兩個類別都定義了相同的方法，因此它們的用法相同。

建構函式¶

StreamingMediaDecoder¶

class StreamingMediaDecoder¶

逐塊提取和解碼音訊/視訊串流。

由 torio::io::StreamingMediaDecoderCustomIO 繼承

警告

doxygenfunction: 無法解析函數 “torio::io::StreamingMediaDecoder::StreamingMediaDecoder”，參數為 (const std::string&, const std::optional<std::string>&, const c10::optional<OptionDict>&)，來源是來自目錄：cpp/xml 的 “libtorio” 專案的 doxygen xml 輸出。潛在符合項目

- StreamingMediaDecoder(const std::string &src, const std::optional<std::string> &format = c10::nullopt, const std::optional<OptionDict> &option = c10::nullopt)

StreamingMediaDecoderCustomIO¶

class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::StreamingMediaDecoder ¶: 一個 StreamingMediaDecoder 的子類別，可以使用自定義讀取函式。可以用於解碼來自記憶體或自定義物件的媒體。

torio::io::StreamingMediaDecoderCustomIO::StreamingMediaDecoderCustomIO(void *opaque, const std::optional<std::string> &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const std::optional<OptionDict> &option = c10::nullopt)¶

使用自定義讀取和搜尋函式建構 StreamingMediaDecoder。

參數：

opaque – read_packet 和 seek 函式使用的自定義資料。
format – 指定輸入格式。
buffer_size – 中間緩衝區的大小，FFmpeg 使用它將資料傳遞給函式 read_packet。
read_packet – 從 FFmpeg 呼叫的自定義讀取函式，用於從目的地讀取資料。
seek – 用於搜尋目的地的可選搜尋函式。
option – 初始化格式上下文時傳遞的自定義選項。

查詢方法¶

find_best_audio_stream¶

int64_t torio::io::StreamingMediaDecoder::find_best_audio_stream() const¶

使用來自 ffmpeg 的啟發式方法找到合適的音訊串流。

如果成功，則返回最佳串流的索引 (>=0)。否則，返回一個負值。

find_best_video_stream¶

int64_t torio::io::StreamingMediaDecoder::find_best_video_stream() const¶

使用來自 ffmpeg 的啟發式方法找到合適的視訊串流。

如果成功，則返回最佳串流的索引 (0>=)。否則，返回一個負值。

get_metadata¶

OptionDict torio::io::StreamingMediaDecoder::get_metadata() const¶: 取得來源媒體的 metadata (中繼資料)。

num_src_streams¶

int64_t torio::io::StreamingMediaDecoder::num_src_streams() const¶

取得輸入媒體中找到的來源串流數量。

來源串流不僅包含音訊/視訊串流，還包含字幕和其他串流。

get_src_stream_info¶

SrcStreamInfo torio::io::StreamingMediaDecoder::get_src_stream_info(int i) const¶

取得指定來源串流的資訊。

有效值範圍為 [0, num_src_streams())。

num_out_streams¶

int64_t torio::io::StreamingMediaDecoder::num_out_streams() const¶: 取得用戶端程式碼定義的輸出串流數量。

get_out_stream_info¶

OutputStreamInfo torio::io::StreamingMediaDecoder::get_out_stream_info(int i) const¶

取得指定輸出串流的資訊。

有效值範圍為 [0, num_out_streams())。

is_buffer_ready¶

bool torio::io::StreamingMediaDecoder::is_buffer_ready() const¶: 檢查輸出串流的所有緩衝區是否具有足夠的解碼幀。

Configure Methods¶

add_audio_stream¶

void torio::io::StreamingMediaDecoder::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const std::optional<std::string> &filter_desc = c10::nullopt, const std::optional<std::string> &decoder = c10::nullopt, const std::optional<OptionDict> &decoder_option = c10::nullopt)¶

定義一個輸出音訊串流。

參數：

i – 來源串流的索引。
frames_per_chunk – 作為一個 chunk 回傳的影格數量。
如果在 frames_per_chunk 個影格被緩衝之前，來源串流已耗盡，則該 chunk 會原樣回傳。因此，chunk 中的影格數量可能小於 frames_per_chunk。

提供 -1 會停用分塊，在這種情況下，方法 pop_chunks() 會將所有緩衝的影格作為一個 chunk 回傳。
num_chunks – 內部緩衝區大小。
當緩衝的 chunk 數量超過此數字時，舊的 chunk 會被丟棄。例如，如果 frames_per_chunk 為 5 且 buffer_chunk_size 為 3，則早於 15 個影格的影格會被丟棄。

提供 -1 會停用此行為，強制保留所有 chunk。
filter_desc – 應用於來源串流的濾波器圖形描述。
decoder – 要使用的解碼器名稱。如果提供，則使用指定的解碼器而不是預設解碼器。
decoder_option – 傳遞給解碼器的選項。
若要列出解碼器的解碼器選項，您可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了解碼器特定的選項之外，您還可以傳遞與多執行緒相關的選項。只有當解碼器支援它們時，它們才會生效。如果未提供其中任何一個，StreamingMediaDecoder 預設為單執行緒。
- "threads"：執行緒的數量或值 "0"，讓 FFmpeg 根據其啟發式方法決定。
- "thread_type"：要使用的多執行緒方法。有效值為 "frame" 或 "slice"。請注意，每個解碼器都支援不同的方法集。如果未提供，則使用預設值。
  - "frame"：一次解碼多個影格。每個執行緒處理一個影格。這將使每個執行緒的解碼延遲增加一個影格
  - "slice"：一次解碼單個影格的多個部分。

add_video_stream¶

void torio::io::StreamingMediaDecoder::add_video_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const std::optional<std::string> &filter_desc = c10::nullopt, const std::optional<std::string> &decoder = c10::nullopt, const std::optional<OptionDict> &decoder_option = c10::nullopt, const std::optional<std::string> &hw_accel = c10::nullopt)¶

定義一個輸出視訊串流。

參數：

i, frames_per_chunk, num_chunks, filter_desc, decoder, decoder_option – 參閱 add_audio_stream()。
hw_accel – 啟用硬體加速。
當視訊在 CUDA 硬體上解碼時 (例如，透過指定 "h264_cuvid" 解碼器)，將 CUDA 裝置指示器傳遞給 hw_accel (即 hw_accel="cuda:0") 將會使 StreamingMediaDecoder 直接將產生的影格放置在指定的 CUDA 裝置上，作為 CUDA 張量。

如果 None，則區塊將會被移動到 CPU 記憶體。

remove_stream¶

void torio::io::StreamingMediaDecoder::remove_stream(int64_t i)¶

移除一個輸出串流。

參數：: i – 要移除的輸出串流的索引。有效數值範圍為 [0, num_out_streams())。

串流方法¶

seek¶

void torio::io::StreamingMediaDecoder::seek(double timestamp, int64_t mode)¶

搜尋至給定的時間戳記。

參數：

timestamp – 目標時間戳記，單位為秒。
mode – 搜尋模式。
- 0: 關鍵影格模式。搜尋至給定時間戳記之前最接近的關鍵影格。
- 1: 任意模式。搜尋至給定時間戳記之前的任何影格 (包括非關鍵影格)。
- 2: 精確模式。首先搜尋至給定時間戳記之前最接近的關鍵影格，然後解碼影格直到抵達最接近給定時間戳記的影格。

process_packet¶

int torio::io::StreamingMediaDecoder::process_packet()¶

解多工並處理一個封包。

傳回值:

0: 成功處理一個封包，且串流中仍有剩餘封包，因此客戶端程式碼可以再次呼叫此方法。
1: 成功處理一個封包，且已到達 EOF。客戶端程式碼不應再次呼叫此方法。
<0: 發生錯誤。

process_packet_block¶

int torio::io::StreamingMediaDecoder::process_packet_block(const double timeout, const double backoff)¶

類似於 process_packet()，但如果因為資源暫時不可用而失敗，它會自動重試。

當使用設備輸入（例如麥克風）時，當正在進行樣本採集時，緩衝區可能會很忙，這種行為很有幫助。

參數：

timeout – 超時時間，以毫秒為單位。
- >=0：持續重試直到經過指定的時間。
- <0：永遠持續重試。
backoff – 在重試之前等待的時間，以毫秒為單位。

process_all_packets¶

void torio::io::StreamingMediaDecoder::process_all_packets()¶: 處理封包直到 EOF。

fill_buffer¶

int torio::io::StreamingMediaDecoder::fill_buffer(const std::optional<double> &timeout = c10::nullopt, const double backoff = 10.)¶

處理封包直到所有區塊緩衝區都至少有一個區塊。

參數：

timeout – 參見 process_packet_block()
backoff – 參見 process_packet_block()

Retrieval Methods¶

pop_chunks¶

std::vector<std::optional<Chunk>> torio::io::StreamingMediaDecoder::pop_chunks()¶: 如果每個輸出流可用，則從每個輸出流中彈出一個區塊。

Support Structures¶

Chunk¶

struct Chunk¶

儲存解碼後的影格和元數據。

Public Members

torch::Tensor frames¶

音訊/視訊影格。

對於音訊，形狀為 [time, num_channels]，並且 dtype 取決於輸出流配置。

對於視訊，形狀為 [time, channel, height, width]，並且 dtype 為 torch.uint8。

double pts¶: 第一個影格的呈現時間戳記，以秒為單位。

SrcStreaminfo¶

struct SrcStreamInfo¶

關於輸入媒體中找到的來源串流的資訊。

通用成員

AVMediaType media_type¶

串流媒體類型。

請參閱 FFmpeg 文件以取得可用的值

待辦事項: 引入自己的 enum 並擺脫 FFmpeg 依賴性

const char *codec_name = "N/A"¶: 編解碼器的名稱。

const char *codec_long_name = "N/A"¶: 編解碼器的完整名稱，以人類友善的形式呈現。

const char *fmt_name = "N/A"¶

對於音訊，這是取樣格式。

常見的值包括：

"u8", "u8p"：8 位元無符號整數。
"s16", "s16p"：16 位元帶符號整數。
"s32", "s32p"：32 位元帶符號整數。
"s64", "s64p"：64 位元帶符號整數。
"flt", "fltp"：32 位元浮點數。
"dbl", "dblp"：64 位元浮點數。

對於視訊，這是色彩通道格式。

常見的值包括：

"gray8"：灰階
"rgb24"：RGB
"bgr24"：BGR
"yuv420p"：YUV420p

int64_t bit_rate = 0¶: 位元率。

int64_t num_frames = 0¶: 影格數量。

注意

在某些格式中，該值不可靠或不可用。

int bits_per_sample = 0¶: 每個樣本的位元數。

OptionDict metadata = {}¶

Metadata (元資料)

此方法可以從 MP3 提取 ID3 標籤。

範例

{
  "title": "foo",
  "artist": "bar",
  "date": "2017"
}

音訊特定成員

double sample_rate = 0¶: 取樣率。

int num_channels = 0¶: 通道數量。

視訊特定成員

int width = 0¶: 寬度。

int height = 0¶: 高度。

double frame_rate = 0¶: 影格率。

OutputStreaminfo¶

struct OutputStreamInfo¶

關於由使用者程式碼配置的輸出串流的資訊。

音訊特定成員

double sample_rate = -1¶: 取樣率。

int num_channels = -1¶: 通道數量。

視訊特定成員

int width = -1¶: 寬度。

int height = -1¶: 高度。

AVRational frame_rate = {0, 1}¶: 影格率。

Public Members

int source_index¶: 輸入來源串流的索引。

AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN¶

串流媒體類型。

請參閱 FFmpeg 文件以取得可用的值

待辦事項: 引入自己的 enum 並擺脫 FFmpeg 依賴性

int format = -1¶: 媒體格式。音訊為 AVSampleFormat，影片為 AVPixelFormat。

std::string filter_description = {}¶: 濾波器圖形定義，例如 "aresample=16000,aformat=sample_fmts=fltp"。

torio::io::StreamingMediaDecoder¶

建構函式¶

StreamingMediaDecoder¶

StreamingMediaDecoderCustomIO¶

查詢方法¶

find_best_audio_stream¶

find_best_video_stream¶

get_metadata¶

num_src_streams¶

get_src_stream_info¶

num_out_streams¶

get_out_stream_info¶

is_buffer_ready¶

Configure Methods¶

add_audio_stream¶

add_video_stream¶

remove_stream¶

串流方法¶

seek¶

process_packet¶

process_packet_block¶

process_all_packets¶

fill_buffer¶

Retrieval Methods¶

pop_chunks¶

Support Structures¶

Chunk¶

SrcStreaminfo¶

OutputStreaminfo¶

文件

教學

資源