ExecuTorch Vulkan Delegate¶

ExecuTorch Vulkan delegate 是一個 ExecuTorch 的原生 GPU delegate，建構於跨平台 Vulkan GPU API 標準之上。它主要設計用於利用 GPU 來加速 Android 裝置上的模型推論，但可以用於任何支援 Vulkan 實作的平台：筆記型電腦、伺服器和邊緣裝置。

注意

Vulkan delegate 目前正在積極開發中，其元件可能會變更。

什麼是 Vulkan？¶

Vulkan 是一個底層的 GPU API 規範，是作為 OpenGL 的後繼者而開發的。它的設計目的是為開發者提供對 GPU 更明確的控制，相較於先前的規範，它能減少額外開銷並最大化現代圖形硬體的性能。

Vulkan 已經被 GPU 供應商廣泛採用，市場上大多數現代 GPU（包含桌上型和行動裝置）都支援 Vulkan。Android 自 Android 7.0 起也包含了 Vulkan。

請注意，Vulkan 是一個 GPU API，而非 GPU 數學函式庫。也就是說，它提供了一種在 GPU 上執行運算和圖形操作的方法，但不包含內建的高效能運算核心函式庫。

Vulkan Compute Library¶

ExecuTorch Vulkan Delegate 是一個圍繞著獨立執行時環境的封裝，該執行時環境稱為 Vulkan Compute Library。Vulkan Compute Library 的目標是透過 GLSL 運算著色器為 PyTorch 運算子提供 GPU 實作。

Vulkan Compute Library 是 PyTorch Vulkan Backend 的一個分支/迭代版本。PyTorch Vulkan 後端的組件被分支到 ExecuTorch 中，並針對 AOT (Ahead-of-Time) 圖模式的模型推論風格進行了調整（與 PyTorch 採用的 eager execution 模式的模型推論相反）。

Vulkan Compute Library 的組件包含在 executorch/backends/vulkan/runtime/ 目錄中。核心組件列出並描述如下：

runtime/
├── api/ .................... Wrapper API around Vulkan to manage Vulkan objects
└── graph/ .................. ComputeGraph class which implements graph mode inference
    └── ops/ ................ Base directory for operator implementations
        ├── glsl/ ........... GLSL compute shaders
        │   ├── *.glsl
        │   └── conv2d.glsl
        └── impl/ ........... C++ code to dispatch GPU compute shaders
            ├── *.cpp
            └── Conv2d.cpp

功能¶

Vulkan delegate 目前支援以下功能：

記憶體規劃
- 生命週期不重疊的中間張量將共享記憶體分配。這降低了模型推論的峰值記憶體使用量。
基於能力的分區 (Capability Based Partitioning):
- 圖形可以透過分區器部分降低到 Vulkan delegate，分區器將識別 Vulkan delegate 支援的節點（即運算子），並且只降低支援的子圖形。
支援上限動態形狀 (upper-bound dynamic shapes):
- 張量可以在推論之間改變形狀，只要其目前的形狀小於降低期間指定的邊界。

除了增加運算子覆蓋率之外，以下功能目前正在開發中：

量化支援 (Quantization Support)
- 我們目前正在開發對 8 位元動態量化的支援，並計畫在未來擴展到其他量化方案。
記憶體佈局管理 (Memory Layout Management)
- 記憶體佈局是優化效能的重要因素。我們計畫引入圖形流程，在整個圖形中引入記憶體佈局轉換，以優化對記憶體佈局敏感的運算子，例如 Convolution 和 Matrix Multiplication。
選擇性建置 (Selective Build)
- 我們計畫讓使用者能夠透過選擇要建置的運算子/著色器來控制建置大小。

端到端範例¶

為了更深入了解 Vulkan Delegate 的功能以及如何使用它，請考慮以下具有簡單的單一運算子模型的端到端範例。

編譯並降低模型到 Vulkan Delegate¶

假設已經設定並安裝 ExecuTorch，可以使用以下腳本來產生降低後的 MobileNet V2 模型，命名為 vulkan_mobilenetv2.pte。

一旦設定並安裝 ExecuTorch，可以使用以下腳本來產生一個簡單的模型並將其降低到 Vulkan delegate。

# Note: this script is the same as the script from the "Setting up ExecuTorch"
# page, with one minor addition to lower to the Vulkan backend.
import torch
from torch.export import export
from executorch.exir import to_edge

from executorch.backends.vulkan.partitioner.vulkan_partitioner import VulkanPartitioner

# Start with a PyTorch model that adds two input tensors (matrices)
class Add(torch.nn.Module):
  def __init__(self):
    super(Add, self).__init__()

  def forward(self, x: torch.Tensor, y: torch.Tensor):
      return x + y

# 1. torch.export: Defines the program with the ATen operator set.
aten_dialect = export(Add(), (torch.ones(1), torch.ones(1)))

# 2. to_edge: Make optimizations for Edge devices
edge_program = to_edge(aten_dialect)
# 2.1 Lower to the Vulkan backend
edge_program = edge_program.to_backend(VulkanPartitioner())

# 3. to_executorch: Convert the graph to an ExecuTorch program
executorch_program = edge_program.to_executorch()

# 4. Save the compiled .pte program
with open("vk_add.pte", "wb") as file:
    file.write(executorch_program.buffer)

與其他 ExecuTorch delegates 一樣，可以使用 to_backend() API 將模型降低到 Vulkan Delegate。Vulkan Delegate 實作了 VulkanPartitioner 類別，該類別識別圖形中 Vulkan delegate 支援的節點（即運算子），並分離模型的相容部分以在 GPU 上執行。

這表示即使模型包含一些不受支援的運算子，也可以將模型降低到 Vulkan delegate。這只意味著只有部分圖形會在 GPU 上執行。

注意

可以檢查支援的運算子列表 Vulkan 分區器程式碼，以檢視 Vulkan delegate 目前實作了哪些運算子。

建置 Vulkan Delegate 函式庫¶

建置和測試 Vulkan Delegate 最簡單的方法是為 Android 建置並在本地 Android 裝置上進行測試。Android 裝置內建支援 Vulkan，並且 Android NDK 附帶了 GLSL 編譯器，這是編譯 Vulkan Compute Library 的 GLSL 運算著色器所必需的。

Vulkan Delegate 函式庫可以透過在使用 CMake 建置時設定 -DEXECUTORCH_BUILD_VULKAN=ON 來建置。

首先，請確保您已安裝 Android NDK；任何高於 NDK r19c 的 NDK 版本都應該可以運作。請注意，本文件中的範例已使用 NDK r27b 進行驗證。也應該安裝 Android SDK，以便您可以存取 adb。

本頁中的說明假設已設定以下環境變數。

export ANDROID_NDK=<path_to_ndk>
# Select the appropriate Android ABI for your device
export ANDROID_ABI=arm64-v8a
# All subsequent commands should be performed from ExecuTorch repo root
cd <path_to_executorch_root>
# Make sure adb works
adb --version

要使用 Vulkan Delegate 建置和安裝 ExecuTorch 函式庫（適用於 Android）：

# From executorch root directory
(rm -rf cmake-android-out && \
  pp cmake . -DCMAKE_INSTALL_PREFIX=cmake-android-out \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=$ANDROID_ABI \
    -DEXECUTORCH_BUILD_VULKAN=ON \
    -DPYTHON_EXECUTABLE=python \
    -Bcmake-android-out && \
  cmake --build cmake-android-out -j16 --target install)

在裝置上執行 Vulkan 模型¶

注意

由於目前運算子支援有限，只有二元算術運算子會在 GPU 上執行。由於大多數運算子都透過 Portable 運算子執行，因此預期推論速度會很慢。

現在，可以（部分地）在您裝置的 GPU 上執行部分委派的模型了！

# Build a model runner binary linked with the Vulkan delegate libs
cmake --build cmake-android-out --target vulkan_executor_runner -j32

# Push model to device
adb push vk_add.pte /data/local/tmp/vk_add.pte
# Push binary to device
adb push cmake-android-out/backends/vulkan/vulkan_executor_runner /data/local/tmp/runner_bin

# Run the model
adb shell /data/local/tmp/runner_bin --model_path /data/local/tmp/vk_add.pte