WAV2VEC2_XLSR53¶

torchaudio.pipelines.WAV2VEC2_XLSR53¶

Wav2vec 2.0 模型（“base”架構），在來自多個資料集的 56,000 小時未標記音訊上進行預訓練 (Multilingual LibriSpeech [Pratap et al., 2020], CommonVoice [Ardila et al., 2020] 和 BABEL [Gales et al., 2014])，未經過微調。

最初由 Unsupervised Cross-lingual Representation Learning for Speech Recognition [Conneau et al., 2020] 的作者以 MIT 授權發布，並以相同授權重新發布。[授權, 來源]