Codegen 遷移指南¶

當 PyTorch/XLA 遷移到 LTC (Lazy Tensor Core) 時，我們需要清理現有的 Stub 程式碼 (跨越 6 個以上的檔案)，這些程式碼用於執行 OP Lowering。舊 OP Lowering 的完整流程和檔案結構可在 OP Lowering 指南 :ref:'op-lowering' 中找到。使用 Codegen 取代支援的 OP 不應引入任何新行為，這純粹是為了清理目的。

開始之前¶

您應遵循此處的指示，安裝必要的相依性，並從原始碼建置 pytorch 和 pytorch/XLA。您不需要存取 TPU 即可實作 Lowering。建議在工作站上進行實驗，並將其設定為使用 XLA:CPU。您可以透過執行以下命令將 Pytorch/XLA 設定為使用 XLA:CPU：

export PJRT_DEVICE=CPU

也建議您在處理 Codegen 之前，先熟悉我們的OP Lowering 流程。

PyTorch/XLA 使用 https://github.com/pytorch/xla/issues/3560 來追蹤 Codegen 遷移的狀態。在處理 Codegen 時，請將您的 GitHub 別名和 PR 連結放在 issue 上，以避免重複工作。

檔案結構¶

以下提及的所有檔案皆位於 xla/torch_xla/csrc 資料夾下，除了 xla_native_functions.yaml 之外

PyTorch Codegen 檔案¶

torch/csrc/lazy/core/shape_inference.h
- 為每個 OP 定義形狀推論函式，這些函式將接受 torch::lazy::shapes 作為輸入，並傳回輸出 torch::lazy::shape。只有非結構性的 OP 才需要手動形狀推論函式
torchgen/gen_lazy_tensor.py
- 建立在所有 ATen 後端使用的現有資料模型和 Helper 之上，並新增特定於 Lazy Tensor 後端的新功能。run_gen_lazy_tensor 定義在此檔案中
torchgen/dest/lazy_ir.py
- 包含資料類別 GenLazyIR，後端可以覆寫此類別並定義產生的 IR 類別

PyTorch/XLA Codegen 檔案¶

xla/xla_native_functions.yaml
- 包含現今 XLA 支援的所有 OP。大多數 OP 都屬於 supported 類別，本文檔的目標是將大多數 OP 移至 full_codegen 類別。
xla/scripts/gen_lazy_tensor.py
- 提供 Codegen 類別的必要 XLA 版本，並呼叫上游 Codegen API。
xla/torch_xla/csrc/XLANativeFunctions.cpp
- xla/codegen/xla_native_functions.yaml 的 full_codegen 欄位的結果。此處定義的 OP 函式將實作 XLANativeFunctions.h 中宣告的 OP。每個 OP 都將採用 at::tensor 並傳回另一個包裝在 XLATensor 周圍的 at::tensor。
xla/torch_xla/csrc/LazyIr.h
- xla/codegen/xla_native_functions.yaml 的 full_codegen 欄位的結果。定義用於建構 full_codegen OP 的 IR。

PyTorch/XLA 舊版 OP Lowering 檔案¶

xla/torch_xla/csrc/generated/aten_xla_type.cpp
- 手動實作 xla/codegen/xla_native_functions.yaml 中定義的 OP。將被 XLANativeFunctions.cpp 取代
xla/torch_xla/csrc/generated/tensor.h
- 定義 XLATensor 類別和 XLATensor 方法宣告。這些宣告通常是我們在 XLANativeFunctions.h 中宣告的 at::Tensor 節點的一對一映射。XLATensor 方法將針對 full_codegen OP 移除
xla/torch_xla/csrc/generated/tensor_method.cpp
- 實作 tensor.h 中定義的 Tensor 方法。此檔案將針對 full_codegen OP 移除
xla/torch_xla/csrc/generated/ops/…
- 為「大多數」OP 定義 IR 類別。多個 OP 可能共用相同的 IR。

Codegen 逐步指南¶

1. 識別 OP¶

當您處理最初的幾個 Codegen 時，我們通常建議您從較簡單的 OP 開始。本指南將以一個一元 OP 和一個二元 OP 作為範例進行說明，但建議您避免具有以下特徵的 OP：1. 包含自訂 Fallback 程式碼。例如，在 _adaptive_avg_pool3d 中，有一個條件式 Fallback

if (!IsSupportedAdaptivePool(XlaHelpers::I64List(self.sizes()),
                             output_size_list, /*pool_dim=*/3)) {
  return at::native::call_fallback_fn<&xla_fallback, ATEN_OP(_adaptive_avg_pool3d)>::call(self, output_size);
}

導致動態形狀，因為這些 OP 正在進行中，並且可能會隨著時間演變。在未來的某個時間點，我們可能會將這些 OP 納入 Codegen。
不直接調用 tensor_method。例如

if (!self_tensor) {
  static bool sync_update =
      torch_xla::runtime::sys_util::GetEnvBool("XLA_TENSOR_UPDATE_SYNC", true);
  XLA_CHECK(dst_tensor);
  dst_tensor->UpdateFromTensor(self, /*sync=*/sync_update);
}

具有複雜的 tensor_method，理想情況下，它應該是從 OP 到 IR 的直接映射。

「簡單」OP 的一個好例子是類似 abs 的東西

at::Tensor XLANativeFunctions::abs(const at::Tensor& self) {
  TORCH_LAZY_FN_COUNTER("xla::");
  return bridge::AtenFromXlaTensor(XLATensor::abs(bridge::GetXlaTensor(self)));
}

2. Codegen OP 並檢查產生的檔案¶

在 xla/codegen/xla_native_functions.yaml 中找到 OP，並將其移至 full_codegen 欄位，然後再次在 xla 目錄下執行 python setup.py install。建置將會失敗 (原因稍後在本指南中說明)，但您仍然可以看到產生的檔案。以下程式碼片段使用 abs 作為範例。#### XLANativeFunctions.cpp

at::Tensor XLANativeFunctions::abs(const at::Tensor & self) {
  TORCH_LAZY_FN_COUNTER("xla::");
  auto common_device = torch_xla::bridge::GetXlaDevice(self);
  TORCH_INTERNAL_ASSERT(common_device);

  torch_xla::XLATensorPtr lazy_self = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(self, *common_device);

  torch::lazy::NodePtr node = torch::lazy::ReuseNode<Abs>(lazy_self->GetIrValue());
  if (!node) {
    node = torch_xla::MakeNode<Abs>(lazy_self->GetIrValue());
    CacheNode(node);
  }

  auto result = torch_xla::bridge::AtenFromXlaTensor(
        torch_xla::XLATensor::Create(std::move(node), *common_device));
  return result;
};

逐行描述產生的程式碼：- 從輸入 Tensor 取得並驗證裝置

auto common_device = torch_xla::bridge::GetXlaDevice(self);
TORCH_INTERNAL_ASSERT(common_device);

檢查我們是否可以重複使用先前建立的節點。如果沒有，則建立對應的 IR 節點並快取它。

torch::lazy::NodePtr node = torch::lazy::ReuseNode<Abs>(lazy_self->GetIrValue());
if (!node) {
  node = torch_xla::MakeNode<Abs>(lazy_self->GetIrValue());
  CacheNode(node);
}

將新建立的 IR 節點包裝在 XLATensor 中。並將 XLATensor 包裝在 at::Tensor 中，並將其作為結果傳回。請注意，這部分以前是在 tensor_method.cpp 中手動完成的。

auto result = torch_xla::bridge::AtenFromXlaTensor(
      torch_xla::XLATensor::Create(std::move(node), *common_device));
return result;

LazyIr.h¶

class Abs : public XlaNode {
 public:
  Abs(const torch_xla::XlaValue& self)
      : XlaNode(torch::lazy::OpKind(at::aten::abs), {self},
                [&]() { return AbsOutputShape(self); },
                /* num_outputs */ 1, torch::lazy::MHash())
  {}

  std::string ToString() const override {
    std::stringstream ss;
    ss << XlaNode::ToString();
    return ss.str();
  }
  torch_xla::XlaOpVector Lower(LoweringContext* loctx) const override;
};

有幾件事需要記住：- Codegen 不會產生預期的 Clone 方法。即使在今天的 PyTorch/XLA 中也沒有使用 Clone 方法，我們將在遷移過程中移除它們。- 對於每個 OP，它都會產生一個 {OP}OutputShape 方法。我們需要在單獨的檔案中手動宣告和實作此方法。- 對於每個 OP，它都會產生一個 Lower 宣告。我們需要在單獨的檔案中手動實作此 Lowering 函式。

3. 實作遺失的 IR 函式¶

torch_xla/csrc/ops/ops_xla_shape_fn.h¶

宣告 {OP}OutputShape

xla::Shape AbsOutputShape(const XlaValue& input);

torch_xla/csrc/ops/ops_xla_shape_fn.cpp¶

實作 {OP}OutputShape

xla::Shape AbsOutputShape(const XlaValue& input) { return input.xla_shape(); }

Abs 是一個過於簡化的範例，在正常情況下，您需要再次呼叫 BuildXXXOp 函式才能取得輸出形狀。一個稍微好一點的例子是

xla::Shape MaximumOutputShape(const XlaValue& input, const XlaValue& other) {
  auto lower_for_shape_fn =
      [&](absl::Span<const xla::XlaOp> operands) -> xla::XlaOp {
    auto promoted = XlaHelpers::Promote(operands[0], operands[1]);
    return xla::Max(promoted.first, promoted.second);
  };
  return InferOutputShape({input.xla_shape(), other.xla_shape()},
                          lower_for_shape_fn);
}

請注意，您不應從頭開始。從現有的 OP 中找到 Xla::Shape 計算邏輯，並將其移至這兩個檔案。

4. 實作 Lowering 函式¶

torch_xla/csrc/ops/ops_lower_fn.cpp¶

torch_xla::XlaOpVector Abs::Lower(LoweringContext* loctx) const {
  xla::XlaOp xla_input = loctx->GetOutputOp(operand(0));
  return ReturnOp(BuildAbs(xla_input), loctx);
}

請注意，此函式應直接從現有的 Lowering 移過來。最初在 torch_xla/csrc/ops/ops.cpp 中實作的某些 OP 使用 GenericOp。您需要稍微修改它們的 Lowering 實作，以符合上面提供的實作。

5. 清理¶

從 aten_xla_type.cpp、tensor_methods.h、tensor_methods.cpp 和 ops/… 中刪除現有的 OP。請注意，有時您必須保留 tensor_method，因為它在 tensor_ops 等中使用。因此，在移除 OP 之前，請使用 tensor_ops.cpp 交叉參考它。

XLATensor s1 = XLATensor::sub(XLATensor::mul(u2, v3), XLATensor::mul(u3, v2), one);

有時其他 IRNode 使用您遷移的 'IRNode'。在這種情況下，您也需要更新這些 IRNode Lowering 邏輯。從長遠來看，我們需要從我們的端點擺脫這些複合 IR，並為每個 OP 提供 Lowering 函式。

torch::lazy::NodePtr exp = Pow(Abs(input), norm_exp);

到

torch::lazy::NodePtr exp =
    Pow(torch_xla::MakeNode<Abs>(input, std::vector<torch::lazy::Shape>()),
        norm_exp);

執行測試並驗證結果¶

執行 C++ OP 測試或僅涉及產生 OP 的簡單測試。若要執行 C++ 測試：1. 透過 python setup.py install 建置 xla (注意：請勿使用 BUILD_CPP_TESTS=0 旗標，因為這會略過建置 C++ 測試) 2. 進入 pytorch/xla 中的 test/cpp/build 目錄 3. 執行命令以執行所需的 C++ 測試 (例如，執行 Abs C++ 測試)

./test_ptxla --gtest_filter=AtenXlaTensorTest.TestAbs

與往常一樣，要驗證的兩件事是正確性以及 XLA 計數器是否正確遞增。

範例 PR¶

一元/二元 OP -> Codegen erf、erfc、erfinv 和 exp (https://github.com/pytorch/xla/pull/3659)
具有 optional 的 OP -> Codegen binary_cross_entropy/backward (https://github.com/pytorch/xla/pull/3809)
具有 at::Scalar 的 OP -> Codegen addcdiv 和 addcmul (https://github.com/pytorch/xla/pull/3768)
具有支援負索引的向量的 OP -> Codegen amin amax (https://github.com/pytorch/xla/pull/3771)
具有特殊 Fallback 邏輯的 OP -> 部分 Codegen adaptive_avgpool3d 和 backward (https://github.com/pytorch/xla/pull/3790) 若要查看更多範例，請查看追蹤 Issue (https://github.com/pytorch/xla/issues/3560)。