注意

點擊這裡下載完整的範例程式碼

(原型) MaskedTensor 概述¶

建立於：2022 年 10 月 28 日 | 上次更新：2022 年 10 月 28 日 | 上次驗證：未驗證

本教學旨在作為使用 MaskedTensor 的起點，並討論其遮罩語意。

MaskedTensor 作為 torch.Tensor 的擴展，為使用者提供以下能力：

使用任何遮罩語意 (例如，可變長度張量、nan* 運算符等)
區分 0 和 NaN 梯度
各種稀疏應用 (請參閱下面的教學)

有關 MaskedTensor 的更詳細介紹，請參閱 torch.masked 文件。

使用 MaskedTensor¶

在本節中，我們將討論如何使用 MaskedTensor，包括如何建構、存取資料和遮罩，以及索引和切片。

準備¶

我們將首先進行本教學所需的設定

import torch
from torch.masked import masked_tensor, as_masked_tensor
import warnings

# Disable prototype warnings and such
warnings.filterwarnings(action='ignore', category=UserWarning)

建構¶

有幾種不同的方法可以建構 MaskedTensor

第一種方法是直接調用 MaskedTensor 類別
第二種 (也是我們推薦的方式) 是使用 masked.masked_tensor() 和 masked.as_masked_tensor() 工廠函數，它們類似於 torch.tensor() 和 torch.as_tensor()

在本教學中，我們將假設導入行：from torch.masked import masked_tensor。

存取資料和遮罩¶

MaskedTensor 中的底層欄位可以透過以下方式存取：

MaskedTensor.get_data() 函數
MaskedTensor.get_mask() 函數。回想一下，True 表示「已指定」或「有效」，而 False 表示「未指定」或「無效」。

通常，傳回的底層資料在未指定的條目中可能無效，因此我們建議使用者在需要沒有任何遮罩條目的 Tensor 時，使用 MaskedTensor.to_tensor() (如上所示) 傳回具有填滿值的 Tensor。

索引和切片¶

MaskedTensor 是 Tensor 子類別，這意味著它繼承了與 torch.Tensor 相同的索引和切片語意。以下是一些常見的索引和切片模式範例

data = torch.arange(24).reshape(2, 3, 4)
mask = data % 2 == 0

print("data:\n", data)
print("mask:\n", mask)

data:
 tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])
mask:
 tensor([[[ True, False,  True, False],
         [ True, False,  True, False],
         [ True, False,  True, False]],

        [[ True, False,  True, False],
         [ True, False,  True, False],
         [ True, False,  True, False]]])

# float is used for cleaner visualization when being printed
mt = masked_tensor(data.float(), mask)

print("mt[0]:\n", mt[0])
print("mt[:, :, 2:4]:\n", mt[:, :, 2:4])

mt[0]:
 MaskedTensor(
  [
    [  0.0000,       --,   2.0000,       --],
    [  4.0000,       --,   6.0000,       --],
    [  8.0000,       --,  10.0000,       --]
  ]
)
mt[:, :, 2:4]:
 MaskedTensor(
  [
    [
      [  2.0000,       --],
      [  6.0000,       --],
      [ 10.0000,       --]
    ],
    [
      [ 14.0000,       --],
      [ 18.0000,       --],
      [ 22.0000,       --]
    ]
  ]
)

為什麼 MaskedTensor 有用？¶

由於 MaskedTensor 將指定和未指定的值視為一等公民，而不是事後才考慮 (使用填滿值、nan 等)，因此它能夠解決常規 Tensor 無法解決的幾個缺點；事實上，MaskedTensor 的誕生很大程度上是由於這些反覆出現的問題。

在下面，我們將討論 PyTorch 目前尚未解決的一些最常見問題，並說明 MaskedTensor 如何解決這些問題。

區分 0 和 NaN 梯度¶

torch.Tensor 遇到的一個問題是無法區分未定義 (NaN) 的梯度與實際為 0 的梯度。由於 PyTorch 沒有辦法將一個值標記為已指定/有效與未指定/無效，因此它被迫依賴 NaN 或 0（取決於使用案例），從而導致不可靠的語意，因為許多操作並非設計為正確處理 NaN 值。更令人困惑的是，有時梯度可能會因操作順序而異（例如，取決於 NaN 值在操作鏈中多早出現）。

MaskedTensor 是這個問題的完美解決方案！

torch.where¶

在 Issue 10729 中，我們注意到一個案例，當使用 torch.where() 時，操作順序可能很重要，因為我們很難區分 0 是真實的 0 還是來自未定義的梯度。因此，我們保持一致性並遮罩（mask out）結果

目前結果

x = torch.tensor([-10., -5, 0, 5, 10, 50, 60, 70, 80, 90, 100], requires_grad=True, dtype=torch.float)
y = torch.where(x < 0, torch.exp(x), torch.ones_like(x))
y.sum().backward()
x.grad

tensor([4.5400e-05, 6.7379e-03, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00,        nan,        nan])

MaskedTensor 結果

x = torch.tensor([-10., -5, 0, 5, 10, 50, 60, 70, 80, 90, 100])
mask = x < 0
mx = masked_tensor(x, mask, requires_grad=True)
my = masked_tensor(torch.ones_like(x), ~mask, requires_grad=True)
y = torch.where(mask, torch.exp(mx), my)
y.sum().backward()
mx.grad

MaskedTensor(
  [  0.0000,   0.0067,       --,       --,       --,       --,       --,       --,       --,       --,       --]
)

此處的梯度僅提供給選定的子集。實際上，這將 where 的梯度更改為遮罩（mask out）元素，而不是將其設置為零。

另一個 torch.where¶

Issue 52248 是另一個範例。

目前結果

a = torch.randn((), requires_grad=True)
b = torch.tensor(False)
c = torch.ones(())
print("torch.where(b, a/0, c):\n", torch.where(b, a/0, c))
print("torch.autograd.grad(torch.where(b, a/0, c), a):\n", torch.autograd.grad(torch.where(b, a/0, c), a))

torch.where(b, a/0, c):
 tensor(1., grad_fn=<WhereBackward0>)
torch.autograd.grad(torch.where(b, a/0, c), a):
 (tensor(nan),)

MaskedTensor 結果

a = masked_tensor(torch.randn(()), torch.tensor(True), requires_grad=True)
b = torch.tensor(False)
c = torch.ones(())
print("torch.where(b, a/0, c):\n", torch.where(b, a/0, c))
print("torch.autograd.grad(torch.where(b, a/0, c), a):\n", torch.autograd.grad(torch.where(b, a/0, c), a))

torch.where(b, a/0, c):
 MaskedTensor(  1.0000, True)
torch.autograd.grad(torch.where(b, a/0, c), a):
 (MaskedTensor(--, False),)

這個問題很相似（甚至連結到下面的下一個問題），它表達了由於無法區分「無梯度」與「零梯度」而導致的意外行為的挫敗感，這反過來使得其他運算的推理變得困難。

當使用 mask 時，x/0 產生 NaN 梯度¶

在 Issue 4132 中，使用者建議 x.grad 應該是 [0, 1] 而不是 [nan, 1]，而 MaskedTensor 通過完全遮罩梯度來非常清楚地說明這一點。

目前結果

x = torch.tensor([1., 1.], requires_grad=True)
div = torch.tensor([0., 1.])
y = x/div # => y is [inf, 1]
mask = (div != 0)  # => mask is [0, 1]
y[mask].backward()
x.grad

tensor([nan, 1.])

MaskedTensor 結果

x = torch.tensor([1., 1.], requires_grad=True)
div = torch.tensor([0., 1.])
y = x/div # => y is [inf, 1]
mask = (div != 0) # => mask is [0, 1]
loss = as_masked_tensor(y, mask)
loss.sum().backward()
x.grad

MaskedTensor(
  [      --,   1.0000]
)

`torch.nansum()` 和 `torch.nanmean()`¶

在 Issue 67180 中，梯度計算不正確（一個長期存在的問題），而 MaskedTensor 可以正確處理它。

目前結果

a = torch.tensor([1., 2., float('nan')])
b = torch.tensor(1.0, requires_grad=True)
c = a * b
c1 = torch.nansum(c)
bgrad1, = torch.autograd.grad(c1, b, retain_graph=True)
bgrad1

tensor(nan)

MaskedTensor 結果

a = torch.tensor([1., 2., float('nan')])
b = torch.tensor(1.0, requires_grad=True)
mt = masked_tensor(a, ~torch.isnan(a))
c = mt * b
c1 = torch.sum(c)
bgrad1, = torch.autograd.grad(c1, b, retain_graph=True)
bgrad1

MaskedTensor(  3.0000, True)

安全 Softmax¶

安全 softmax 是另一個很好的例子，說明了經常出現的問題。簡而言之，如果整個批次（batch）被「遮罩（mask out）」，或完全由 padding 組成（在 softmax 的情況下，這表示被設置為 -inf），那麼這將導致 NaNs，這可能會導致訓練發散。

幸運的是，MaskedTensor 已經解決了這個問題。考慮以下設置

data = torch.randn(3, 3)
mask = torch.tensor([[True, False, False], [True, False, True], [False, False, False]])
x = data.masked_fill(~mask, float('-inf'))
mt = masked_tensor(data, mask)
print("x:\n", x)
print("mt:\n", mt)

x:
 tensor([[ 0.2345,    -inf,    -inf],
        [-0.1863,    -inf, -0.6380],
        [   -inf,    -inf,    -inf]])
mt:
 MaskedTensor(
  [
    [  0.2345,       --,       --],
    [ -0.1863,       --,  -0.6380],
    [      --,       --,       --]
  ]
)

例如，我們想要沿 dim=0 計算 softmax。請注意，第二列是「不安全的」（即完全被遮罩），因此當計算 softmax 時，結果將產生 0/0 = nan，因為 exp(-inf) = 0。但是，我們真正想要的是遮罩（mask out）梯度，因為它們是未指定的，並且對於訓練而言是無效的。

PyTorch 結果

x.softmax(0)

tensor([[0.6037,    nan, 0.0000],
        [0.3963,    nan, 1.0000],
        [0.0000,    nan, 0.0000]])

MaskedTensor 結果

mt.softmax(0)

MaskedTensor(
  [
    [  0.6037,       --,       --],
    [  0.3963,       --,   1.0000],
    [      --,       --,       --]
  ]
)

實作遺失的 torch.nan* 運算符¶

在 Issue 61474 中，有一個請求添加額外的運算符來涵蓋各種 torch.nan* 應用，例如 torch.nanmax、torch.nanmin 等。

通常，這些問題更自然地適用於被遮罩（masked）的語意，因此我們建議使用 MaskedTensor 來代替引入額外的運算符。由於 nanmean 已經實作，我們可以將其用作比較點

x = torch.arange(16).float()
y = x * x.fmod(4)
z = y.masked_fill(y == 0, float('nan'))  # we want to get the mean of y when ignoring the zeros

print("y:\n", y)
# z is just y with the zeros replaced with nan's
print("z:\n", z)

y:
 tensor([ 0.,  1.,  4.,  9.,  0.,  5., 12., 21.,  0.,  9., 20., 33.,  0., 13.,
        28., 45.])
z:
 tensor([nan,  1.,  4.,  9., nan,  5., 12., 21., nan,  9., 20., 33., nan, 13.,
        28., 45.])

print("y.mean():\n", y.mean())
print("z.nanmean():\n", z.nanmean())
# MaskedTensor successfully ignores the 0's
print("torch.mean(masked_tensor(y, y != 0)):\n", torch.mean(masked_tensor(y, y != 0)))

y.mean():
 tensor(12.5000)
z.nanmean():
 tensor(16.6667)
torch.mean(masked_tensor(y, y != 0)):
 MaskedTensor( 16.6667, True)

在上面的範例中，我們構建了一個 y，並且想要在忽略零的情況下計算序列的平均值。可以使用 torch.nanmean 來做到這一點，但是我們沒有其他 torch.nan* 運算的實作。 MaskedTensor 通過能夠使用基本操作來解決此問題，並且我們已經支援問題中列出的其他操作。例如

torch.argmin(masked_tensor(y, y != 0))

MaskedTensor(  1.0000, True)

確實，忽略 0 時，最小參數的索引是索引 1 中的 1。

當數據完全被遮罩（masked out）時，MaskedTensor 也可以支援縮減（reductions），這相當於數據 Tensor 完全為 nan 的情況。 nanmean 將返回 nan（一個模糊的返回值），而 MaskedTensor 將更準確地指示一個被遮罩（masked out）的結果。

x = torch.empty(16).fill_(float('nan'))
print("x:\n", x)
print("torch.nanmean(x):\n", torch.nanmean(x))
print("torch.nanmean via maskedtensor:\n", torch.mean(masked_tensor(x, ~torch.isnan(x))))

x:
 tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
torch.nanmean(x):
 tensor(nan)
torch.nanmean via maskedtensor:
 MaskedTensor(--, False)

這是一個與安全 softmax 類似的問題，其中 0/0 = nan，而我們真正想要的是一個未定義的值。

結論¶

在本教程中，我們介紹了什麼是 MaskedTensor，演示了如何使用它們，並通過一系列它們有助於解決的範例和問題來說明它們的價值。

延伸閱讀¶

要繼續學習更多內容，您可以找到我們的 MaskedTensor 稀疏性教程，以了解 MaskedTensor 如何實現稀疏性以及我們目前支援的不同存儲格式。

腳本的總運行時間： ( 0 分鐘 0.049 秒)

由 Sphinx-Gallery 生成的圖庫

(原型) MaskedTensor 概述¶

使用 MaskedTensor¶

準備¶

建構¶

存取資料和遮罩¶

索引和切片¶

為什麼 MaskedTensor 有用？¶

區分 0 和 NaN 梯度¶

torch.where¶

另一個 torch.where¶

當使用 mask 時，x/0 產生 NaN 梯度¶

`torch.nansum()` 和 `torch.nanmean()`¶

安全 Softmax¶

實作遺失的 torch.nan* 運算符¶

結論¶

延伸閱讀¶

文件

教學課程

資源

(原型) MaskedTensor 概述¶

使用 MaskedTensor¶

準備¶

建構¶

存取資料和遮罩¶

索引和切片¶

為什麼 MaskedTensor 有用？¶

區分 0 和 NaN 梯度¶

torch.where¶

另一個 torch.where¶

當使用 mask 時，x/0 產生 NaN 梯度¶

torch.nansum() 和 torch.nanmean()¶

安全 Softmax¶

實作遺失的 torch.nan* 運算符¶

結論¶

延伸閱讀¶

文件

教學課程

資源

`torch.nansum()` 和 `torch.nanmean()`¶