TBE CPU 自動向量化¶

FP8/16/32 自動向量化實作方法¶

template<typename InType, typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDM_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)

方法 EmbeddingSpMDM_ref 的自動向量化版本，適用於 FP32 權重類型。

範本參數:

InType – 輸入資料類型 (uint8_t 已使用)
IndexType – 索引資料類型 (int64_t 已使用)
OffsetType – 偏移資料類型 (int32_t 已使用)
OutType – 輸出資料類型 (float 已使用)

參數:

block_size – 區塊中的元素數量 (int64_t)
output_size – 輸出中的元素數量 (int64_t)
index_size – 索引中的元素數量 (int64_t)
data_size – 資料中的元素數量 (int64_t)
input – 輸入的位址 (InType*)
indices – 索引的位址 (IndexType*)
offsets_or_lengths – 偏移的位址 (OffsetType*)
weights – 總和的權重；選用，非加權總和可為空值 (float*)
normalize_by_lengths – 是否依長度正規化 (bool)
out – 輸出的位址 (OutType*)
is_weight_positional – 若為 true，則權重為位置權重；對於 FP32 自動向量化實作，設定為 false (bool)
use_offsets – 若為 true，將使用偏移而非長度；對於 FP32 自動向量化實作，設定為 true (bool)
output_stride – 若為 -1，output_stride 與 block_size 相同；對於 FP32 自動向量化實作，設定為 -1 (int64_t)
input_stride – 若為 -1，input_stride 與 block_size 相同；對於 FP32 自動向量化實作，設定為 -1 (int64_t)
scale_bias_last – 若為 true，縮放和偏差會出現在每一列的末尾；對於 FP32 自動向量化實作，設定為 true (bool)
no_bag – 若為 true，則無嵌入袋 (embedding bag)；對於 FP32 自動向量化實作，設定為 false (bool)
is_bf16_out – 若為 true，則輸出為 BFLOAT16 類型；對於 FP32 自動向量化實作，設定為 false (bool)
is_bf16_in – 若為 true，則輸入為 BFLOAT16 類型；對於 FP32 自動向量化實作，設定為 false (bool)

template<typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDMFP8_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)

方法 EmbeddingSpMDM_ref 的自動向量化版本，適用於 FP8 權重類型。

範本參數:

InType – 輸入資料類型 (uint8_t 已使用)
IndexType – 索引資料類型 (int64_t 已使用)
OffsetType – 偏移資料類型 (int32_t 已使用)
OutType – 輸出資料類型 (float 已使用)

參數:

block_size – 區塊中的元素數量 (int64_t)
output_size – 輸出中的元素數量 (int64_t)
index_size – 索引中的元素數量 (int64_t)
data_size – 資料中的元素數量 (int64_t)
input – 輸入的位址 (InType*)
indices – 索引的位址 (IndexType*)
offsets_or_lengths – 偏移的位址 (OffsetType*)
weights – 總和的權重；選用，非加權總和可為空值 (float*)
normalize_by_lengths – 是否依長度正規化 (bool)
out – 輸出的位址 (OutType*)
is_weight_positional – 若為 true，則權重為位置權重；對於 FP8 自動向量化實作，設定為 false (bool)
use_offsets – 若為 true，將使用偏移而非長度；對於 FP8 自動向量化實作，設定為 true (bool)
output_stride – 若為 -1，output_stride 與 block_size 相同；對於 FP8 自動向量化實作，設定為 -1 (int64_t)
exponent_bits – 指數中使用的位元
exponent_bias – 指數中使用的偏差
is_bf16_out – 若為 true，則輸出為 BFLOAT16 類型；對於 FP8 自動向量化實作，設定為 false (bool)

TBE CPU 自動向量化¶

FP8/16/32 自動向量化實作方法¶

文件

教學

資源