注意
點擊這裡下載完整的範例程式碼
前向模式自動微分 (Beta)¶
建立於:2021 年 12 月 07 日 | 最後更新:2023 年 4 月 18 日 | 最後驗證:2024 年 11 月 05 日
本教學展示如何使用前向模式 AD 來計算方向導數 (或等效地,Jacobian-向量乘積)。
以下教學使用了僅在版本 >= 1.11 (或 nightly builds) 中可用的一些 API。
另請注意,前向模式 AD 目前處於 Beta 階段。API 可能會變更,並且運算子涵蓋範圍仍不完整。
基本用法¶
與反向模式 AD 不同,前向模式 AD 在前向傳遞的同時立即計算梯度。 我們可以使用前向模式 AD 來計算方向導數,方法是像以前一樣執行前向傳遞,除了我們先將輸入與另一個張量關聯起來,該張量代表方向導數的方向(或等效地,Jacobian-向量乘積中的 v
)。當一個輸入(我們稱之為“原始”)與一個“方向”張量(我們稱之為“切線”)相關聯時,產生的新張量物件稱為“對偶張量”,因為它與對偶數[0]有關。
執行前向傳遞時,如果任何輸入張量是對偶張量,則會執行額外的計算以傳播函數的這種“敏感度”。
import torch
import torch.autograd.forward_ad as fwAD
primal = torch.randn(10, 10)
tangent = torch.randn(10, 10)
def fn(x, y):
return x ** 2 + y ** 2
# All forward AD computation must be performed in the context of
# a ``dual_level`` context. All dual tensors created in such a context
# will have their tangents destroyed upon exit. This is to ensure that
# if the output or intermediate results of this computation are reused
# in a future forward AD computation, their tangents (which are associated
# with this computation) won't be confused with tangents from the later
# computation.
with fwAD.dual_level():
# To create a dual tensor we associate a tensor, which we call the
# primal with another tensor of the same size, which we call the tangent.
# If the layout of the tangent is different from that of the primal,
# The values of the tangent are copied into a new tensor with the same
# metadata as the primal. Otherwise, the tangent itself is used as-is.
#
# It is also important to note that the dual tensor created by
# ``make_dual`` is a view of the primal.
dual_input = fwAD.make_dual(primal, tangent)
assert fwAD.unpack_dual(dual_input).tangent is tangent
# To demonstrate the case where the copy of the tangent happens,
# we pass in a tangent with a layout different from that of the primal
dual_input_alt = fwAD.make_dual(primal, tangent.T)
assert fwAD.unpack_dual(dual_input_alt).tangent is not tangent
# Tensors that do not have an associated tangent are automatically
# considered to have a zero-filled tangent of the same shape.
plain_tensor = torch.randn(10, 10)
dual_output = fn(dual_input, plain_tensor)
# Unpacking the dual returns a ``namedtuple`` with ``primal`` and ``tangent``
# as attributes
jvp = fwAD.unpack_dual(dual_output).tangent
assert fwAD.unpack_dual(dual_output).tangent is None
與模組一起使用¶
若要將 nn.Module
與前向 AD 搭配使用,請在執行前向傳遞之前,將模型參數替換為對偶張量。 在撰寫本文時,無法建立對偶張量 `nn.Parameter`s。 作為一種解決方法,必須將對偶張量註冊為模組的非參數屬性。
import torch.nn as nn
model = nn.Linear(5, 5)
input = torch.randn(16, 5)
params = {name: p for name, p in model.named_parameters()}
tangents = {name: torch.rand_like(p) for name, p in params.items()}
with fwAD.dual_level():
for name, p in params.items():
delattr(model, name)
setattr(model, name, fwAD.make_dual(p, tangents[name]))
out = model(input)
jvp = fwAD.unpack_dual(out).tangent
使用函數式 Module API (beta)¶
將 nn.Module
與前向 AD 搭配使用的另一種方法是使用函數式 Module API (也稱為無狀態 Module API)。
from torch.func import functional_call
# We need a fresh module because the functional call requires the
# the model to have parameters registered.
model = nn.Linear(5, 5)
dual_params = {}
with fwAD.dual_level():
for name, p in params.items():
# Using the same ``tangents`` from the above section
dual_params[name] = fwAD.make_dual(p, tangents[name])
out = functional_call(model, dual_params, input)
jvp2 = fwAD.unpack_dual(out).tangent
# Check our results
assert torch.allclose(jvp, jvp2)
自定義 autograd 函數¶
自定義函數也支援前向模式 AD。 若要建立支援前向模式 AD 的自定義函數,請註冊 jvp()
靜態方法。 自定義函數可以選擇性地同時支援前向和反向 AD。 如需更多資訊,請參閱文件。
class Fn(torch.autograd.Function):
@staticmethod
def forward(ctx, foo):
result = torch.exp(foo)
# Tensors stored in ``ctx`` can be used in the subsequent forward grad
# computation.
ctx.result = result
return result
@staticmethod
def jvp(ctx, gI):
gO = gI * ctx.result
# If the tensor stored in`` ctx`` will not also be used in the backward pass,
# one can manually free it using ``del``
del ctx.result
return gO
fn = Fn.apply
primal = torch.randn(10, 10, dtype=torch.double, requires_grad=True)
tangent = torch.randn(10, 10)
with fwAD.dual_level():
dual_input = fwAD.make_dual(primal, tangent)
dual_output = fn(dual_input)
jvp = fwAD.unpack_dual(dual_output).tangent
# It is important to use ``autograd.gradcheck`` to verify that your
# custom autograd Function computes the gradients correctly. By default,
# ``gradcheck`` only checks the backward-mode (reverse-mode) AD gradients. Specify
# ``check_forward_ad=True`` to also check forward grads. If you did not
# implement the backward formula for your function, you can also tell ``gradcheck``
# to skip the tests that require backward-mode AD by specifying
# ``check_backward_ad=False``, ``check_undefined_grad=False``, and
# ``check_batched_grad=False``.
torch.autograd.gradcheck(Fn.apply, (primal,), check_forward_ad=True,
check_backward_ad=False, check_undefined_grad=False,
check_batched_grad=False)
True
函數式 API (beta)¶
我們還在 functorch 中提供更高等級的函數式 API,用於計算 Jacobian-向量乘積,根據您的使用案例,您可能會發現它更容易使用。
函數式 API 的好處是不需要理解或使用較低等級的對偶張量 API,並且您可以將其與其他 functorch 轉換 (如 vmap) 組合; 缺點是它為您提供的控制較少。
請注意,本教學的其餘部分需要 functorch (https://github.com/pytorch/functorch) 才能執行。 請在指定的連結中找到安裝說明。
import functorch as ft
primal0 = torch.randn(10, 10)
tangent0 = torch.randn(10, 10)
primal1 = torch.randn(10, 10)
tangent1 = torch.randn(10, 10)
def fn(x, y):
return x ** 2 + y ** 2
# Here is a basic example to compute the JVP of the above function.
# The ``jvp(func, primals, tangents)`` returns ``func(*primals)`` as well as the
# computed Jacobian-vector product (JVP). Each primal must be associated with a tangent of the same shape.
primal_out, tangent_out = ft.jvp(fn, (primal0, primal1), (tangent0, tangent1))
# ``functorch.jvp`` requires every primal to be associated with a tangent.
# If we only want to associate certain inputs to `fn` with tangents,
# then we'll need to create a new function that captures inputs without tangents:
primal = torch.randn(10, 10)
tangent = torch.randn(10, 10)
y = torch.randn(10, 10)
import functools
new_fn = functools.partial(fn, y=y)
primal_out, tangent_out = ft.jvp(new_fn, (primal,), (tangent,))
/var/lib/workspace/intermediate_source/forward_ad_usage.py:203: FutureWarning:
We've integrated functorch into PyTorch. As the final step of the integration, `functorch.jvp` is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use `torch.func.jvp` instead; see the PyTorch 2.0 release notes and/or the `torch.func` migration guide for more details https://pytorch.dev.org.tw/docs/main/func.migrating.html
/var/lib/workspace/intermediate_source/forward_ad_usage.py:214: FutureWarning:
We've integrated functorch into PyTorch. As the final step of the integration, `functorch.jvp` is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use `torch.func.jvp` instead; see the PyTorch 2.0 release notes and/or the `torch.func` migration guide for more details https://pytorch.dev.org.tw/docs/main/func.migrating.html
將函數式 API 與模組一起使用¶
若要將 nn.Module
與 functorch.jvp
搭配使用,以計算相對於模型參數的 Jacobian-向量乘積,我們需要將 nn.Module
重新公式化為一個函數,該函數同時接受模型參數和模組的輸入。
model = nn.Linear(5, 5)
input = torch.randn(16, 5)
tangents = tuple([torch.rand_like(p) for p in model.parameters()])
# Given a ``torch.nn.Module``, ``ft.make_functional_with_buffers`` extracts the state
# (``params`` and buffers) and returns a functional version of the model that
# can be invoked like a function.
# That is, the returned ``func`` can be invoked like
# ``func(params, buffers, input)``.
# ``ft.make_functional_with_buffers`` is analogous to the ``nn.Modules`` stateless API
# that you saw previously and we're working on consolidating the two.
func, params, buffers = ft.make_functional_with_buffers(model)
# Because ``jvp`` requires every input to be associated with a tangent, we need to
# create a new function that, when given the parameters, produces the output
def func_params_only(params):
return func(params, buffers, input)
model_output, jvp_out = ft.jvp(func_params_only, (params,), (tangents,))
/var/lib/workspace/intermediate_source/forward_ad_usage.py:235: FutureWarning:
We've integrated functorch into PyTorch. As the final step of the integration, `functorch.make_functional_with_buffers` is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use `torch.func.functional_call` instead; see the PyTorch 2.0 release notes and/or the `torch.func` migration guide for more details https://pytorch.dev.org.tw/docs/main/func.migrating.html
/var/lib/workspace/intermediate_source/forward_ad_usage.py:242: FutureWarning:
We've integrated functorch into PyTorch. As the final step of the integration, `functorch.jvp` is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use `torch.func.jvp` instead; see the PyTorch 2.0 release notes and/or the `torch.func` migration guide for more details https://pytorch.dev.org.tw/docs/main/func.migrating.html
[0] https://en.wikipedia.org/wiki/Dual_number
腳本總執行時間: (0 分鐘 0.166 秒)