使用自定義轉換器覆載 Torch-TensorRT 轉換器¶

如果由於某些原因，您想要變更特定 PyTorch 運算轉換為 TensorRT 的行為，您可以透過編寫自定義轉換器並覆載 Torch-TensorRT 的轉換器來實現。這可能是因為想要使用自定義核心而不是 TensorRT 的核心，或者因為您想要在 TensorRT 中使用與 Torch-TensorRT 通常使用的不同層實作。

在本教學中，我們將示範如何使用自定義轉換器覆載 Torch-TensorRT 將 torch.nn.functional.gelu 運算轉換為 TensorRT 的行為，該自定義轉換器使用 GeLU 層的不同實作。

import logging
import sys

import torch
import torch_tensorrt

GeLU 在 PyTorch 中有 2 種模式，一種使用 erf 函數，另一種使用 tanh 近似。TensorRT 原生支援這兩種實作作為啟動層，但假設我們只想在 TensorRT 中針對 tanh 模式使用 GeLU 的自定義實作。

class GeLU(torch.nn.Module):
    def __init__(self, mode="tanh"):
        super().__init__()
        self.mode = mode

    def forward(self, x):
        return torch.nn.functional.gelu(x, approximate=self.mode)


my_mod = GeLU(mode="tanh")
ex_input = torch.randn(2, 5).to("cuda")

作為基準，我們可以使用標準的 Torch-TensorRT GeLU 轉換器（在 tanh 近似模式下）與我們的模組。

my_standard_gelu = torch_tensorrt.compile(
    my_mod, arg_inputs=(ex_input,), min_block_size=1
)
print(my_standard_gelu.graph)
print(my_standard_gelu(ex_input))

撰寫自定義轉換器¶

轉換器是一些函數，它接受 PyTorch 圖中特定 PyTorch 操作的實例，並將其轉換為一個正在建構的 TensorRT 圖中等效的 TensorRT 操作集合。它們使用 @torch_tensorrt.dynamo.conversion.dynamo_tensorrt_converter 修飾符向 Torch-TensorRT 註冊。在程式碼層面，轉換器接收當前的轉換狀態 (ConversionCtx)、圖中下一個要轉換的運算符，以及該節點的參數，並返回該操作的佔位符輸出，同時作為副作用將必要的 TensorRT 層插入到 TensorRT 網路中。

from typing import Dict, Sequence, Tuple, Union

from torch.fx.node import Argument, Node, Target
from torch_tensorrt.dynamo import CompilationSettings
from torch_tensorrt.dynamo.conversion import ConversionContext

import tensorrt as trt

轉換器元數據¶

@torch_tensorrt.dynamo.conversion.dynamo_tensorrt_converter(
    # The PyTorch operation to convert, when this operation is encountered, this converter will be called
    torch.ops.aten.gelu.default,
    # Validators are functions that determine that given a specific node, if it can be converted by the converter
    capability_validator=lambda node, settings: (
        "approximate" in node.kwargs and node.kwargs["approximate"] == "tanh"
    ),
    # Can this converter be used in cases where the input shapes are dynamic
    supports_dynamic_shapes=True,
    # Set the priority of the converter to supersede the default one
    priority=torch_tensorrt.dynamo.conversion.ConverterPriority.HIGH,
)

對於定義轉換器的修飾符，有一個必需的參數和幾個可選的參數。所有轉換器都需要一個它們將針對執行的目標運算符，其想法是當圖中存在 torch.ops.aten.gelu.default 的實例時，將調用此轉換器。

在目標運算符之後，您可以提供額外的元數據，這些元數據定義了轉換器的能力以及轉換器相對於目標的其他可能轉換器的優先順序。

定義轉換器能力的主要工具是 capability_validator 參數，它是一個 lambda 函數，它接受圖中的特定節點以及用戶編譯設定，並返回一個布林值，指示轉換器是否可以用於該節點。此驗證器函數在圖分割階段之前針對轉換器目標操作的每個實例運行。在此階段期間沒有通過驗證器的轉換器的節點將在運行時在 PyTorch 中執行。這對於您只想在特定情況下使用自定義轉換器的情況非常有用，例如在我們的案例中，我們只想在 approximate == "tanh" 時使用我們的轉換器。

與驗證器不同的是 supports_dynamic_shapes 參數，它是一個布林值，指示轉換器是否可以在輸入形狀是動態的情況下使用。如果將其設定為 False，則在用戶提供的輸入是動態的情況下，將禁用此轉換器。如果沒有其他支援動態形狀的替代方案，則該操作將在 PyTorch 中運行。

最後是 priority 參數，它是來自 torch_tensorrt.dynamo.conversion.ConverterPriority 類的枚舉，它定義了轉換器的優先順序。兩個選項是 HIGH 和 STANDARD。使用 STANDARD 註冊的轉換器將附加到給定操作的轉換器列表，而使用 HIGH 註冊的轉換器將添加到列表的前面。候選轉換器會按照此優先順序評估其適用性，並且使用第一個通過驗證器的轉換器。

轉換器實作¶

轉換器函數本身採用以下參數：當前轉換上下文、目標運算符、目標運算符的參數、目標運算符的關鍵字參數以及目標運算符的名稱。參數可以是任何 Python 原始類型、torch.Tensor、np.Arrays 或 ITensor 物件。轉換器函數應該主要以 TensorRT ITensor 的形式返回目標運算符的輸出。這些輸入和輸出應該對應於目標 PyTorch 運算符的模式，可以在這裡找到 https://pytorch.dev.org.tw/docs/main/torch.compiler_ir.html。

由於 Torch-TensorRT 涵蓋了核心 ATen 運算集，它已經將許多常見的低階操作抽象成輔助函數，這些函數可用於建構 TensorRT 網路。這允許開發人員避免直接建立 TensorRT 層的樣板程式碼，而是專注於轉換的高階邏輯。輔助函數位於 torch_tensorrt.dynamo.conversion.impl 模組中，旨在可組合並與原始 TensorRT 實作互通。在這種情況下，我們將使用 impl 中的 Torch-TensorRT mul、add 和 tanh 函數來實作我們的替代 GeLU 層。

def aten_ops_gelu(
    ctx: ConversionContext,
    target: Target,
    args: Tuple[Argument, ...],
    kwargs: Dict[str, Argument],
    name: str,
) -> Union[trt.ITensor, Sequence[trt.ITensor]]:
    # The schema for torch.ops.aten.gelu.default is gelu(Tensor self, *, str approximate=’none’) -> Tensor

    from torch_tensorrt.dynamo import SourceIR
    from torch_tensorrt.dynamo.conversion import impl

    # Cheap way to allow layer names to be unqiue
    op_count = 0

    def get_op_count():
        nonlocal op_count
        op_count += 1
        return op_count

    mul = lambda x, y: impl.elementwise.mul(
        ctx,
        target,
        name=f"mul_{get_op_count()}",
        source_ir=SourceIR.ATEN,
        lhs_val=x,
        rhs_val=y,
    )
    add = lambda x, y: impl.elementwise.add(
        ctx,
        target,
        name=f"add_{get_op_count()}",
        source_ir=SourceIR.ATEN,
        lhs_val=x,
        rhs_val=y,
    )
    tanh = lambda x: impl.activation.tanh(
        ctx, target, name=f"tanh_{get_op_count()}", source_ir=SourceIR.ATEN, input_val=x
    )

    # So we know that our custom converter is being run instead of the standard one
    print("\n\n---------------------------")
    print("Using custom GeLU converter")
    print("---------------------------\n\n")

    x_7 = mul(args[0], 0.5)
    x_8 = mul(args[0], 0.79788456080000003)
    x_9 = mul(args[0], 0.044714999999999998)
    x_10 = mul(x_9, args[0])
    x_11 = add(x_10, 1.0)
    x_12 = mul(x_8, x_11)
    x_13 = tanh(x_12)
    x_14 = add(x_13, 1.0)
    x_15 = mul(x_7, x_14)

    return x_15

使用我們的自定義轉換器¶

我們現在可以重新編譯，並看到我們的自定義轉換器正在被調用，以將 GeLU 轉換為 TensorRT。

my_custom_gelu = torch_tensorrt.compile(
    my_mod, arg_inputs=(ex_input,), min_block_size=1
)

print(my_custom_gelu.graph)
print(my_custom_gelu(ex_input))

我們可以驗證我們的實作是否與 tanh 近似的 TensorRT 實作相符。

print(
    f"tanh approximations are close: {torch.allclose(my_standard_gelu(ex_input), my_custom_gelu(ex_input))}"
)

最後，我們要驗證在 approximate 參數未設定為 tanh 的情況下，不會使用我們的自定義轉換器。

my_mod_erf = GeLU(mode="none")
my_gelu_erf = torch_tensorrt.compile(
    my_mod_erf, arg_inputs=(ex_input,), min_block_size=1
)

請注意，我們沒有看到來自我們的自定義轉換器的列印語句，表明它沒有被使用。但是，查看圖，我們仍然可以看到建立了一個 TensorRT 引擎來運行 GeLU 操作。在這種情況下，我們的自定義轉換器的驗證器返回 False，因此轉換系統移動到列表中的下一個轉換器，即標準 GeLU 轉換器，並使用該轉換器來轉換操作。

print(my_gelu_erf.graph)
print(my_gelu_erf(ex_input))

腳本的總運行時間：（0 分鐘 0.000 秒）

由 Sphinx-Gallery 產生