快速開始¶
選項 1:torch.compile¶
您可以在任何使用 torch.compile 的地方使用 Torch-TensorRT
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
x = torch.randn((1, 3, 224, 224)).cuda() # define what the inputs to the model will look like
optimized_model = torch.compile(model, backend="tensorrt")
optimized_model(x) # compiled on first run
optimized_model(x) # this will be fast!
選項 2:匯出¶
如果您想預先最佳化模型,及/或在 C++ 環境中部署,Torch-TensorRT 提供了匯出式工作流程,可序列化最佳化模組。此模組可以部署在 PyTorch 或 libtorch 中(即不需 Python 相依性)。
步驟 1:最佳化 + 序列化¶
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # define a list of representative inputs here
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)
torch_tensorrt.save(trt_gm, "trt.ep", inputs=inputs) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt.save(trt_gm, "trt.ts", output_format="torchscript", inputs=inputs)
步驟 2:部署¶
在 Python 中部署:¶
import torch
import torch_tensorrt
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # your inputs go here
# You can run this in a new python session!
model = torch.export.load("trt.ep").module()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model(*inputs)
在 C++ 中部署:¶
#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
auto trt_mod = torch::jit::load("trt.ts");
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});