捷徑

AOTInductor Minifier

如果您在使用 AOT Inductor API(例如 torch._inductor.aoti_compile_and_packagetorch._indcutor.aoti_load_package)時,或在某些輸入上執行 aoti_load_package 載入的模型時遇到錯誤,您可以使用 AOTInductor Minifier 建立一個最小的 nn.Module 來重現該錯誤,方法是設定 from torch._inductor import config; config.aot_inductor.dump_aoti_minifier = True

從高層次來看,使用 Minifier 有兩個步驟:

  • 設定 from torch._inductor import config; config.aot_inductor.dump_aoti_minifier = True 或設定環境變數 DUMP_AOTI_MINIFIER=1。然後執行會產生錯誤的腳本會產生一個 minifier_launcher.py 腳本。可以透過將 torch._dynamo.config.debug_dir_root 設定為有效的目錄名稱來配置輸出目錄。

  • 執行 minifier_launcher.py 腳本。如果 Minifier 成功執行,它會在 repro.py 中生成可執行的 Python 程式碼,該程式碼會重現完全相同的錯誤。

範例程式碼

以下是一個範例程式碼,由於我們透過 torch._inductor.config.triton.inject_relu_bug_TESTING_ONLY = "compile_error" 在 relu 上注入錯誤,因此會產生錯誤。

import torch
from torch._inductor import config as inductor_config

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(10, 16)
        self.relu = torch.nn.ReLU()
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.sigmoid(x)
        return x


inductor_config.aot_inductor.dump_aoti_minifier = True
torch._inductor.config.triton.inject_relu_bug_TESTING_ONLY = "compile_error"

with torch.no_grad():
    model = Model().to("cuda")
    example_inputs = (torch.randn(8, 10).to("cuda"),)
    ep = torch.export.export(model, example_inputs)
    package_path = torch._inductor.aoti_compile_and_package(ep)
    compiled_model = torch._inductor.aoti_load_package(package_path)
    result = compiled_model(*example_inputs)

上面的程式碼會產生以下錯誤:

RuntimeError: Failed to import /tmp/torchinductor_shangdiy/fr/cfrlf4smkwe4lub4i4cahkrb3qiczhf7hliqqwpewbw3aplj5g3s.py
SyntaxError: invalid syntax (cfrlf4smkwe4lub4i4cahkrb3qiczhf7hliqqwpewbw3aplj5g3s.py, line 29)

這是因為我們在 relu 上注入了一個錯誤,因此產生的 Triton Kernel 看起來如下所示。請注意,我們在 relu 的位置有 compile error!,因此我們得到一個 SyntaxError

@triton.jit
def triton_poi_fused_addmm_relu_sigmoid_0(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
    xnumel = 128
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x2 = xindex
    x0 = xindex % 16
    tmp0 = tl.load(in_out_ptr0 + (x2), xmask)
    tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
    tmp2 = tmp0 + tmp1
    tmp3 = compile error!
    tmp4 = tl.sigmoid(tmp3)
    tl.store(in_out_ptr0 + (x2), tmp4, xmask)

由於我們有 torch._inductor.config.aot_inductor.dump_aoti_minifier=True,我們也會看到額外的一行,指示 minifier_launcher.py 已經寫入的位置。可以透過將 torch._dynamo.config.debug_dir_root 設定為有效的目錄名稱來配置輸出目錄。

W1031 16:21:08.612000 2861654 pytorch/torch/_dynamo/debug_utils.py:279] Writing minified repro to:
W1031 16:21:08.612000 2861654 pytorch/torch/_dynamo/debug_utils.py:279] /data/users/shangdiy/pytorch/torch_compile_debug/run_2024_10_31_16_21_08_602433-pid_2861654/minifier/minifier_launcher.py

Minifier 啟動器

minifier_launcher.py 檔案具有以下程式碼。exported_program 包含 torch._inductor.aoti_compile_and_package 的輸入。command='minify' 參數表示該腳本將執行 Minifier,以建立一個最小的圖形模組來重現錯誤。或者,您可以設定使用 command='run' 來僅編譯、載入和執行載入的模型(而不執行 Minifier)。

import torch
import torch._inductor.inductor_prims

import torch._dynamo.config
import torch._inductor.config
import torch._functorch.config
import torch.fx.experimental._config

torch._inductor.config.triton.inject_relu_bug_TESTING_ONLY = 'compile_error'
torch._inductor.config.aot_inductor.dump_aoti_minifier = True




isolate_fails_code_str = None



# torch version: 2.6.0a0+gitcd9c6e9
# torch cuda version: 12.0
# torch git version: cd9c6e9408dd79175712223895eed36dbdc84f84


# CUDA Info:
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2023 NVIDIA Corporation
# Built on Fri_Jan__6_16:45:21_PST_2023
# Cuda compilation tools, release 12.0, V12.0.140
# Build cuda_12.0.r12.0/compiler.32267302_0

# GPU Hardware Info:
# NVIDIA PG509-210 : 8

exported_program = torch.export.load('/data/users/shangdiy/pytorch/torch_compile_debug/run_2024_11_06_13_52_35_711642-pid_3567062/minifier/checkpoints/exported_program.pt2')
# print(exported_program.graph)
config_patches={}
if __name__ == '__main__':
    from torch._dynamo.repro.aoti import run_repro
    with torch.no_grad():
        run_repro(exported_program, config_patches=config_patches, accuracy=False, command='minify', save_dir='/data/users/shangdiy/pytorch/torch_compile_debug/run_2024_11_06_13_52_35_711642-pid_3567062/minifier/checkpoints', check_str=None)

假設我們保留 command='minify' 選項,並執行該腳本,我們將獲得以下輸出:

...
W1031 16:48:08.938000 3598491 torch/_dynamo/repro/aoti.py:89] Writing checkpoint with 3 nodes to /data/users/shangdiy/pytorch/torch_compile_debug/run_2024_10_31_16_48_02_720863-pid_3598491/minifier/checkpoints/3.py
W1031 16:48:08.975000 3598491 torch/_dynamo/repro/aoti.py:101] Copying repro file for convenience to /data/users/shangdiy/pytorch/repro.py
Wrote minimal repro out to repro.py

如果在執行 minifier_launcher.py 時遇到 AOTIMinifierError,請在此處報告錯誤:這裡

最小化結果

repro.py 看起來像這樣。請注意,匯出的程式在檔案的頂部列印,並且僅包含 relu 節點。Minifier 成功地將圖形簡化為引發錯誤的操作。

# from torch.nn import *
# class Repro(torch.nn.Module):
#     def __init__(self) -> None:
#         super().__init__()



#     def forward(self, linear):
#         relu = torch.ops.aten.relu.default(linear);  linear = None
#         return (relu,)

import torch
from torch import tensor, device
import torch.fx as fx
from torch._dynamo.testing import rand_strided
from math import inf
import torch._inductor.inductor_prims

import torch._dynamo.config
import torch._inductor.config
import torch._functorch.config
import torch.fx.experimental._config

torch._inductor.config.generate_intermediate_hooks = True
torch._inductor.config.triton.inject_relu_bug_TESTING_ONLY = 'compile_error'
torch._inductor.config.aot_inductor.dump_aoti_minifier = True




isolate_fails_code_str = None



# torch version: 2.6.0a0+gitcd9c6e9
# torch cuda version: 12.0
# torch git version: cd9c6e9408dd79175712223895eed36dbdc84f84


# CUDA Info:
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2023 NVIDIA Corporation
# Built on Fri_Jan__6_16:45:21_PST_2023
# Cuda compilation tools, release 12.0, V12.0.140
# Build cuda_12.0.r12.0/compiler.32267302_0

# GPU Hardware Info:
# NVIDIA PG509-210 : 8


exported_program = torch.export.load('/data/users/shangdiy/pytorch/torch_compile_debug/run_2024_11_25_13_59_33_102283-pid_3658904/minifier/checkpoints/exported_program.pt2')
# print(exported_program.graph)
config_patches={'aot_inductor.package': True}
if __name__ == '__main__':
    from torch._dynamo.repro.aoti import run_repro
    with torch.no_grad():
        run_repro(exported_program, config_patches=config_patches, accuracy=False, command='run', save_dir='/data/users/shangdiy/pytorch/torch_compile_debug/run_2024_11_25_13_59_33_102283-pid_3658904/minifier/checkpoints', check_str=None)

文件

存取 PyTorch 的完整開發者文件

檢視文件

教學

取得初學者和進階開發人員的深入教學

檢視教學

資源

尋找開發資源並獲得問題解答

檢視資源