CTCDecoder¶

class torchaudio.models.decoder.CTCDecoder[原始碼]¶

來自 Flashlight 的 CTC beam search 解碼器 [Kahn et al., 2022]。

注意

要構建解碼器，請使用工廠函數 ctc_decoder()。

使用 CTCDecoder 的教學: 使用 CTC 解碼器進行 ASR 推論

使用 CTC 解碼器進行 ASR 推論

方法¶

call¶

CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) → List[List[CTCHypothesis]][原始碼]¶

執行批次離線解碼。

注意

此方法一次執行離線解碼。要執行增量解碼，請參考 decode_step()。

參數:

emissions (torch.FloatTensor) – 形狀為 (batch, frame, num_tokens) 的 CPU 張量，用於儲存標籤的機率分佈序列；聲學模型的輸出。
lengths (Tensor 或 None, 選擇性) – CPU tensor，形狀為 (batch, )，儲存每個批次中輸出 Tensor 在時間軸上的有效長度。

回傳:

批次中每個音訊序列的排序最佳假設列表。

回傳類型:

List[List[CTCHypothesis]]

decode_begin¶

CTCDecoder.decode_begin()[原始碼]¶

初始化解碼器的內部狀態。

有關用法，請參閱 decode_step()。

注意

只有在執行線上解碼時才需要此方法。使用 __call__() 執行批次解碼時，則不需要。

decode_end¶

CTCDecoder.decode_end()[原始碼]¶

完成解碼器的內部狀態。

有關用法，請參閱 decode_step()。

注意

只有在執行線上解碼時才需要此方法。使用 __call__() 執行批次解碼時，則不需要。

decode_step¶

CTCDecoder.decode_step(emissions: FloatTensor)[原始碼]¶

在當前內部狀態之上執行增量解碼。

注意

只有在執行線上解碼時才需要此方法。使用 __call__() 執行批次解碼時，則不需要。

參數:: emissions (torch.FloatTensor) – CPU tensor，形狀為 (frame, num_tokens)，儲存標籤上的機率分佈序列；聲學模型的輸出。

範例

>>> decoder = torchaudio.models.decoder.ctc_decoder(...)
>>> decoder.decode_begin()
>>> decoder.decode_step(emission1)
>>> decoder.decode_step(emission2)
>>> decoder.decode_end()
>>> result = decoder.get_final_hypothesis()

get_final_hypothesis¶

CTCDecoder.get_final_hypothesis() → List[CTCHypothesis][原始碼]¶

取得最終假設

回傳:: 排序的最佳假設列表。
回傳類型:: List[CTCHypothesis]

注意

只有在執行線上解碼時才需要此方法。使用 __call__() 執行批次解碼時，則不需要。

idxs_to_tokens¶

CTCDecoder.idxs_to_tokens(idxs: LongTensor) → List[原始碼]¶

將原始 token ID 映射到相應的 token

參數:: idxs (LongTensor) – 從解碼器產生的原始 token ID
回傳:: 對應於輸入 ID 的 token
回傳類型:: List

支援結構¶

CTCHypothesis¶

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[原始碼]¶

表示由 CTC beam search 解碼器 CTCDecoder 產生的假設。

使用 CTCHypothesis 的教學課程: 使用 CTC 解碼器進行 ASR 推論

使用 CTC 解碼器進行 ASR 推論

tokens: LongTensor¶: 預測的 token ID 序列。形狀為 (L, )，其中 L 是輸出序列的長度

words: List[str]¶: 預測的單字列表。

注意

僅當為解碼器提供詞典時，此屬性才適用。如果在沒有詞典的情況下進行解碼，則它將為空白。請參閱 tokens 和 idxs_to_tokens() 。

score: float¶: 對應於假設的分數

timesteps: IntTensor¶: 對應於 token 的時間步長。形狀為 (L, )，其中 L 是輸出序列的長度

CTCDecoderLM¶

class torchaudio.models.decoder.CTCDecoderLM[原始碼]¶

用於建立自訂語言模型以搭配解碼器的語言模型基底類別。

使用 CTCDecoderLM 的教學課程: 使用 CTC 解碼器進行 ASR 推論

使用 CTC 解碼器進行 ASR 推論

abstract start(start_with_nothing: bool) → CTCDecoderLMState[原始碼]¶

初始化或重置語言模型。

參數:: start_with_nothing (bool) – 是否要以 sil token 開始句子。
回傳:: 起始狀態
回傳類型:: CTCDecoderLMState

abstract score(state: CTCDecoderLMState, usr_token_idx: int) → Tuple[CTCDecoderLMState, float][原始碼]¶

根據目前的 LM 狀態和新單字評估語言模型。

參數:

state (CTCDecoderLMState) – 目前的 LM 狀態
usr_token_idx (int) – 單字的索引

回傳:

(CTCDecoderLMState, float)

CTCDecoderLMState: 新的 LM 狀態
float: 分數

abstract finish(state: CTCDecoderLMState) → Tuple[CTCDecoderLMState, float][原始碼]¶

根據目前的 LM 狀態評估語言模型的結束。

參數:

state (CTCDecoderLMState) – 目前的 LM 狀態

回傳:

(CTCDecoderLMState, float)

CTCDecoderLMState: 新的 LM 狀態
float: 分數

CTCDecoderLMState¶

class torchaudio.models.decoder.CTCDecoderLMState[原始碼]¶

語言模型狀態。

使用 CTCDecoderLMState 的教學課程: 使用 CTC 解碼器進行 ASR 推論

使用 CTC 解碼器進行 ASR 推論

property children: Dict[int, CTCDecoderLMState]¶: 索引到 LM 狀態的映射

child(usr_index: int) → CTCDecoderLMState[原始碼]¶

傳回對應於 usr_index 的子狀態，如果找不到輸入索引，則建立並傳回新的狀態。

參數:: usr_index (int) – 對應於子狀態的索引
回傳:: 對應於 usr_index 的子狀態
回傳類型:: CTCDecoderLMState

compare(state: CTCDecoderLMState) → CTCDecoderLMState[原始碼]¶

比較兩個語言模型狀態。

參數:: state (CTCDecoderLMState) – 要比較的 LM 狀態
回傳:: 如果狀態相同則為 0，如果 self 較小則為 -1，如果 self 較大則為 +1。
回傳類型:: int

CTCDecoder¶

方法¶

call¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

支援結構¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

文件

教學

資源

CTCDecoder¶

方法¶

__call__¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

支援結構¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

文件

教學

資源

call¶