torchrec.inference¶

torchrec.inference.model_packager¶

類別 torchrec.inference.model_packager.PredictFactoryPackager¶

基底： object

類別方法 save_predict_factory(predict_factory: ~typing.Type[~torchrec.inference.modules.PredictFactory], configs: ~typing.Dict[str, ~typing.Any], output: ~typing.Union[str, ~pathlib.Path, ~typing.BinaryIO], extra_files: ~typing.Dict[str, ~typing.Union[str, bytes]], loader_code: str = '\nimport %PACKAGE%\n\nMODULE_FACTORY=%PACKAGE%.%CLASS%\n', package_importer: ~typing.Union[~torch.package.importer.Importer, ~typing.List[~torch.package.importer.Importer]] = <torch.package.importer._SysImporter object>) → None¶

抽象類別方法 set_extern_modules()¶

表示抽象類別方法的裝飾器。

已淘汰，請改用具有「abstractmethod」的「類別方法」。

抽象類別方法 set_mocked_modules()¶

表示抽象類別方法的裝飾器。

已淘汰，請改用具有「abstractmethod」的「類別方法」。

torchrec.inference.model_packager.load_config_text(name: str) → str¶

torchrec.inference.model_packager.load_pickle_config(name: str, clazz: Type[T]) → T¶

torchrec.inference.modules¶

類別 torchrec.inference.modules.BatchingMetadata(type: str, device: str, pinned: List[str])¶

基底： object

批次處理的元資料類別，這應該與 C++ 定義保持同步。

device: str¶

pinned: List[str]¶

type: str¶

類別 torchrec.inference.modules.PredictFactory¶

基底： ABC

建立一個模型（具有已學習的權重），以在推論時間使用。

抽象 batching_metadata() → Dict[str, BatchingMetadata]¶: 返回從輸入名稱到 BatchingMetadata 的字典。此資訊用於輸入請求的批次處理。

batching_metadata_json() → str¶: 將批次處理元資料序列化為 JSON，以便於使用 torch::deploy 環境進行解析。

抽象 create_predict_module() → Module¶: 返回已分片的模型以及已分配的權重。 state_dict() 必須與 TransformModule.transform_state_dict() 相符。它假設已呼叫 torch.distributed.init_process_group，並且將根據 torch.distributed.get_world_size() 對模型進行分片。

model_inputs_data() → Dict[str, Any]¶: 返回用於基準測試輸入生成的各種資料字典。

qualname_metadata() → Dict[str, QualNameMetadata]¶: 返回從 qualname（方法名稱）到 QualNameMetadata 的字典。這是執行模型特定方法的其他資訊。

qualname_metadata_json() → str¶: 將 qualname 中繼資料序列化為 JSON，以便在 torch::deploy 環境中輕鬆解析。

abstract result_metadata() → str¶: 傳回表示結果類型的字串。此資訊用於結果分割。

abstract run_weights_dependent_transformations(predict_module: Module) → Module¶: 執行與預測模組的權重相關的轉換。例如，降低到後端。

abstract run_weights_independent_tranformations(predict_module: Module) → Module¶: 執行不依賴於預測模組權重的轉換。例如，fx 追蹤、模型分割等。

class torchrec.inference.modules.PredictModule(module: Module)¶

基底： Module

用於在基於 torch.deploy 的後端中工作的模組介面。使用者應該覆寫 predict_forward 以將批次輸入格式轉換為模組輸入格式。

呼叫參數: batch：輸入張量的字典

傳回值:

輸出張量的字典

傳回類型:

輸出

參數:

module – 實際的預測模組
device – 此模組的主要裝置，將用於前向呼叫。

範例

module = PredictModule(torch.device("cuda", torch.cuda.current_device()))

forward(batch: Dict[str, Tensor]) → Any¶

定義每次呼叫時執行的計算。

應該由所有子類別覆寫。

注意

雖然前向傳遞的配方需要在此函數中定義，但之後應該呼叫 Module 實例，而不是此函數，因為前者負責執行已註冊的鉤子，而後者則靜默地忽略它們。

abstract predict_forward(batch: Dict[str, Tensor]) → Any¶

property predict_module: Module¶

state_dict(destination: Optional[Dict[str, Any]] = None, prefix: str = '', keep_vars: bool = False) → Dict[str, Any]¶

傳回一個字典，其中包含對模組整個狀態的引用。

包含參數和持久性緩衝區（例如，運行平均值）。鍵是對應的參數和緩衝區名稱。設定為 None 的參數和緩衝區不包含在內。

注意

傳回的物件是淺層副本。它包含對模組參數和緩衝區的引用。

警告

目前，state_dict() 也接受 destination、prefix 和 keep_vars 的位置引數。但是，這已被棄用，並且將在未來的版本中強制使用關鍵字引數。

警告

請避免使用引數 destination，因為它不是為最終使用者設計的。

參數:

destination (dict, 選用) – 如果提供，模組的狀態將更新到字典中，並傳回相同的物件。否則，將建立並傳回一個 OrderedDict。預設值：None。
prefix (str, 選用) – 新增到參數和緩衝區名稱的前綴，以組成 state_dict 中的鍵。預設值：''。
keep_vars (bool, 選用) – 根據預設，狀態字典中傳回的 Tensor 與 autograd 分離。如果設定為 True，則不會執行分離。預設值：False。

傳回值:

包含模組完整狀態的字典

傳回類型:

dict

範例

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']

training: bool¶

class torchrec.inference.modules.QualNameMetadata(need_preproc: bool)¶

基底： object

need_preproc: bool¶

torchrec.inference.modules.quantize_dense(predict_module: PredictModule, dtype: dtype, additional_embedding_module_type: List[Type[Module]] = []) → Module¶

torchrec.inference.modules.quantize_embeddings(module: Module, dtype: dtype, inplace: bool, additional_qconfig_spec_keys: Optional[List[Type[Module]]] = None, additional_mapping: Optional[Dict[Type[Module], Type[Module]]] = None, output_dtype: dtype = torch.float32, per_table_weight_dtype: Optional[Dict[str, dtype]] = None) → Module¶

torchrec.inference.modules.quantize_feature(module: Module, inputs: Tuple[Tensor, ...]) → Tuple[Tensor, ...]¶

torchrec.inference.modules.quantize_inference_model(model: Module, quantization_mapping: Optional[Dict[str, Type[Module]]] = None, per_table_weight_dtype: Optional[Dict[str, dtype]] = None, fp_weight_dtype: dtype = torch.int8) → Module¶: 將模型量化。

torchrec.inference.modules.shard_quant_model(model: Module, world_size: int = 1, compute_device: str = 'cuda', sharders: Optional[List[ModuleSharder[Module]]] = None, fused_params: Optional[Dict[str, Any]] = None, device_memory_size: Optional[int] = None, constraints: Optional[Dict[str, ParameterConstraints]] = None) → Tuple[Module, ShardingPlan]¶: 將模型分片。

torchrec.inference.modules.trim_torch_package_prefix_from_typename(typename: str) → str¶

torchrec.inference¶

torchrec.inference.model_packager¶

torchrec.inference.modules¶

模組內容¶

文件

教學課程

資源