

torchtrtc 是一個 CLI 應用程式,用於使用 Torch-TensorRT 編譯器。它作為一種簡單的方式,從命令列使用 Torch-TensorRT 編譯 TorchScript 模組,以快速檢查支援或作為部署流程的一部分。編譯器的所有基本功能都受到支援,包括訓練後量化(雖然您必須已經有一個校準快取檔案才能使用 PTQ 功能)。編譯器可以輸出兩種格式,一種是嵌入 TensorRT 引擎的 TorchScript 程式,另一種是 TensorRT 引擎本身作為 PLAN 檔案。

編譯後執行程式所需的一切都是針對 libtorchtrt.so 的 C++ 連結,或在 Python 中匯入 torch_tensorrt 套件。使用編譯模組的所有其他方面都與標準 TorchScript 相同。使用 torch.jit.load() 載入,並像執行任何其他模組一樣執行。

torchtrtc [input_file_path] [output_file_path]
  [input_specs...] {OPTIONS}

  torchtrtc is a compiler for TorchScript, it will compile and optimize
  TorchScript programs to run on NVIDIA GPUs using TensorRT


    -h, --help                        Display this help menu
    Verbiosity of the compiler
      -v, --verbose                     Dumps debugging information about the
                                        compilation process onto the console
      -w, --warnings                    Disables warnings generated during
                                        compilation onto the console (warnings
                                        are on by default)
      --i, --info                       Dumps info messages generated during
                                        compilation onto the console
    --build-debuggable-engine         Creates a debuggable engine
    --allow-gpu-fallback              (Only used when targeting DLA
                                      (device-type)) Lets engine run layers on
                                      GPU if they are not supported on DLA
    --require-full-compilation        Require that the model should be fully
                                      compiled to TensorRT or throw an error
                                      Check the support for end to end
                                      compilation of a specified method in the
                                      TorchScript module
    --disable-tf32                    Prevent Float32 layers from using the
                                      TF32 data format
    --sparse-weights                  Enable sparsity for weights of conv and
                                      FC layers
    --enable-precision=[precision...] (Repeatable) Enabling an operating
                                      precision for kernels to use when
                                      building the engine (Int8 requires a
                                      calibration-cache argument) [ float |
                                      float32 | f32 | fp32 | half | float16 |
                                      f16 | fp16 | int8 | i8 | char ]
                                      (default: float)
    -d[type], --device-type=[type]    The type of device the engine should be
                                      built for [ gpu | dla ] (default: gpu)
    --gpu-id=[gpu_id]                 GPU id if running on multi-GPU platform
                                      (defaults to 0)
    --dla-core=[dla_core]             DLACore id if running on available DLA
                                      (defaults to 0)
    --engine-capability=[capability]  The type of device the engine should be
                                      built for [ standard | safety |
                                      dla_standalone ]
                                      Path to calibration cache file to use
                                      for post training quantization
    --torch-executed-op=[op_name...]  (Repeatable) Operator in the graph that
                                      should always be run in PyTorch for
                                      execution (partial compilation must be
                                      (Repeatable) Module that should always
                                      be run in Pytorch for execution (partial
                                      compilation must be enabled)
    --min-block-size=[num_ops]        Minimum number of contiguous TensorRT
                                      supported ops to compile a subgraph to
    --embed-engine                    Whether to treat input file as a
                                      serialized TensorRT engine and embed it
                                      into a TorchScript module (device spec
                                      must be provided)
                                      Number of averaging timing iterations
                                      used to select kernels
    --workspace-size=[workspace_size] Maximum size of workspace given to
    --dla-sram-size=[dla_sram_size]   Fast software managed RAM used by DLA
                                      to communicate within a layer.
    --dla-local-dram-size=[dla_local_dram_size]  Host RAM used by DLA to share
                                      intermediate tensor data across operations.
    --dla-global-dram-size=[dla_global_dram_size] Host RAM used by DLA to store
                                      weights and metadata for execution
    --atol=[atol]                     Absolute tolerance threshold for acceptable
                                      numerical deviation from standard torchscript
                                      output (default 1e-8)
    --rtol=[rtol]                     Relative tolerance threshold for acceptable
                                      numerical deviation from standard torchscript
                                      output  (default 1e-5)
    --no-threshold-check              Skip checking threshold compliance
    --truncate, --truncate-64bit      Truncate weights that are provided in
                                      64bit to 32bit (Long, Double to Int,
    --save-engine                     Instead of compiling a full a
                                      TorchScript program, save the created
                                      engine to the path specified as the
                                      output path
    --custom-torch-ops                (repeatable) Shared object/DLL containing custom torch operators
    --custom-converters               (repeatable) Shared object/DLL containing custom converters
    input_file_path                   Path to input TorchScript file
    output_file_path                  Path for compiled TorchScript (or
                                      TensorRT engine) file
    input_specs...                    Specs for inputs to engine, can either
                                      be a single size or a range defined by
                                      Min, Optimal, Max sizes, e.g.
                                      Data Type and format can be specified by
                                      adding an "@" followed by dtype and "%"
                                      followed by format to the end of the
                                      shape spec. e.g. "(3, 3, 32,
    "--" can be used to terminate flag options and force all following
    arguments to be treated as positional options


torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@f16%contiguous" -p f16
  • 包含一組自訂運算子

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16
  • 包含一組自訂轉換器

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16


取得 PyTorch 的完整開發者文件






