如何在 Windows CPU 上使用 TorchInductor¶

建立於：2024 年 10 月 01 日 | 最後更新：2024 年 10 月 22 日 | 最後驗證：2024 年 10 月 01 日

作者：Zhaoqiong Zheng, Xu, Han

TorchInductor 是一個編譯器後端，可將 TorchDynamo 產生的 FX Graphs 轉換為高度優化的 C++/Triton 核心。本教學將引導您在 Windows CPU 上使用 TorchInductor 的過程。

您將學到什麼

如何使用 PyTorch 編譯和執行 Python 函數，並針對 Windows CPU 進行優化
使用 C++/Triton 核心優化 TorchInductor 的基礎知識。

先決條件

PyTorch v2.5 或更高版本
Microsoft Visual C++ (MSVC)
適用於 Windows 的 Miniforge

安裝所需的軟體¶

首先，讓我們安裝所需的軟體。 TorchInductor 優化需要 C++ 編譯器。在此範例中，我們將使用 Microsoft Visual C++ (MSVC)。

下載並安裝 MSVC。
在安裝過程中，在工作負載表格中的 桌面和行動裝置 區段中選擇 使用 C++ 的桌面開發。然後安裝軟體

注意

我們推薦 C++ 編譯器 Clang 和 Intel Compiler。請檢查用於獲得更好效能的替代編譯器。

下載並安裝 Miniforge3-Windows-x86_64.exe。

設定環境¶

透過 cmd.exe 開啟命令列環境。

使用以下命令啟動 MSVC

"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"

使用以下命令啟動 conda

"C:/ProgramData/miniforge3/Scripts/activate.bat"

建立並啟動自訂 conda 環境

conda create -n inductor_cpu_windows python=3.10 -y
conda activate inductor_cpu_windows

安裝 PyTorch 2.5 或更高版本。

在 Windows CPU 上使用 TorchInductor¶

這是一個簡單的範例，示範如何使用 TorchInductor

import torch
def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

這是此程式碼可能傳回的範例輸出

tensor([[-3.9074e-02,  1.3994e+00,  1.3894e+00,  3.2630e-01,  8.3060e-01,
        1.1833e+00,  1.4016e+00,  7.1905e-01,  9.0637e-01, -1.3648e+00],
        [ 1.3728e+00,  7.2863e-01,  8.6888e-01, -6.5442e-01,  5.6790e-01,
        5.2025e-01, -1.2647e+00,  1.2684e+00, -1.2483e+00, -7.2845e-01],
        [-6.7747e-01,  1.2028e+00,  1.1431e+00,  2.7196e-02,  5.5304e-01,
        6.1945e-01,  4.6654e-01, -3.7376e-01,  9.3644e-01,  1.3600e+00],
        [-1.0157e-01,  7.7200e-02,  1.0146e+00,  8.8175e-02, -1.4057e+00,
        8.8119e-01,  6.2853e-01,  3.2773e-01,  8.5082e-01,  8.4615e-01],
        [ 1.4140e+00,  1.2130e+00, -2.0762e-01,  3.3914e-01,  4.1122e-01,
        8.6895e-01,  5.8852e-01,  9.3310e-01,  1.4101e+00,  9.8318e-01],
        [ 1.2355e+00,  7.9290e-02,  1.3707e+00,  1.3754e+00,  1.3768e+00,
        9.8970e-01,  1.1171e+00, -5.9944e-01,  1.2553e+00,  1.3394e+00],
        [-1.3428e+00,  1.8400e-01,  1.1756e+00, -3.0654e-01,  9.7973e-01,
        1.4019e+00,  1.1886e+00, -1.9194e-01,  1.3632e+00,  1.1811e+00],
        [-7.1615e-01,  4.6622e-01,  1.2089e+00,  9.2011e-01,  1.0659e+00,
        9.0892e-01,  1.1932e+00,  1.3888e+00,  1.3898e+00,  1.3218e+00],
        [ 1.4139e+00, -1.4000e-01,  9.1192e-01,  3.0175e-01, -9.6432e-01,
        -1.0498e+00,  1.4115e+00, -9.3212e-01, -9.0964e-01,  1.0127e+00],
        [ 5.7244e-04,  1.2799e+00,  1.3595e+00,  1.0907e+00,  3.7191e-01,
        1.4062e+00,  1.3672e+00,  6.8502e-02,  8.5216e-01,  8.6046e-01]])

使用替代編譯器以獲得更好的效能¶

若要增強 Windows inductor 上的效能，您可以使用 Intel Compiler 或 LLVM Compiler。但是，它們依賴 Microsoft Visual C++ (MSVC) 的執行時間程式庫。因此，您的第一步應該是安裝 MSVC。

Intel Compiler¶

下載並安裝具有 Windows 版本的 Intel Compiler。
使用 CXX 環境變數 set CXX=icx-cl 設定 Windows Inductor 編譯器。

Intel 還提供了包含效能資料的完整逐步指南。請檢查 Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices。

LLVM Compiler¶

下載並安裝 LLVM Compiler 並選擇 win64 版本。
使用 CXX 環境變數 set CXX=clang-cl 設定 Windows Inductor 編譯器。

結論¶

在本教學中，我們學習了如何在 Windows CPU 上使用 PyTorch 的 Inductor。此外，我們還討論了使用 Intel Compiler 和 LLVM Compiler 進一步提高效能的方法。