Tensor CUDA Stream API¶
CUDA Stream 是一種屬於特定 CUDA 裝置的線性執行序列。PyTorch C++ API 透過 CUDAStream 類別和實用的輔助函式支援 CUDA stream,讓串流操作變得容易。您可以在 CUDAStream.h 中找到它們。本注意事項提供關於如何使用 Pytorch C++ CUDA Stream API 的更多詳細資訊。
取得 CUDA stream¶
Pytorch 的 C++ API 提供以下取得 CUDA stream 的方法
從 CUDA stream 集區取得新的 stream,stream 會從集區預先配置並以循環方式傳回。
CUDAStream getStreamFromPool(const bool isHighPriority = false, DeviceIndex device = -1);
提示
您可以透過將 isHighPriority 設定為 true,從高優先順序集區請求 stream,或透過設定裝置索引(預設為目前 CUDA stream 的裝置索引),為特定裝置請求 stream。
為傳遞的 CUDA 裝置或目前的裝置(如果未傳遞裝置索引)取得預設的 CUDA stream。
CUDAStream getDefaultCUDAStream(DeviceIndex device_index = -1);
提示
當您未明確使用 stream 時,預設 stream 是大多數運算發生的位置。
為索引為
device_index
的 CUDA 裝置或目前的裝置(如果未傳遞裝置索引)取得目前的 CUDA stream。
CUDAStream getCurrentCUDAStream(DeviceIndex device_index = -1);
提示
目前的 CUDA stream 通常會是裝置的預設 CUDA stream,但如果有人呼叫了 setCurrentCUDAStream
或使用了 StreamGuard
或 CUDAStreamGuard
,則可能會有所不同。
設定 CUDA stream¶
Pytorch 的 C++ API 提供以下設定 CUDA stream 的方法
將傳入的 stream 裝置上的目前 stream 設定為傳入的 stream。
void setCurrentCUDAStream(CUDAStream stream);
注意
此函式可能與目前的裝置無關。它只會變更 stream 裝置上的目前 stream。我們建議改用 CUDAStreamGuard
,因為它會切換到 stream 的裝置,並使其成為該裝置上的目前 stream。CUDAStreamGuard
也會在銷毀時還原目前的裝置和 stream
使用
CUDAStreamGuard
在範圍內切換到 CUDA stream,它定義在 CUDAStreamGuard.h 中
提示
如果您需要在多個 CUDA 裝置上設定 stream,請使用 CUDAMultiStreamGuard
。
CUDA Stream 使用範例¶
在相同裝置上取得和設定 CUDA stream
// This example shows how to acquire and set CUDA stream on the same device.
// `at::cuda::setCurrentCUDAStream` is used to set current CUDA stream
// create a tensor on device 0
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device(torch::kCUDA));
// get a new CUDA stream from CUDA stream pool on device 0
at::cuda::CUDAStream myStream = at::cuda::getStreamFromPool();
// set current CUDA stream from default stream to `myStream` on device 0
at::cuda::setCurrentCUDAStream(myStream);
// sum() on tensor0 uses `myStream` as current CUDA stream
tensor0.sum();
// get the default CUDA stream on device 0
at::cuda::CUDAStream defaultStream = at::cuda::getDefaultCUDAStream();
// set current CUDA stream back to default CUDA stream on device 0
at::cuda::setCurrentCUDAStream(defaultStream);
// sum() on tensor0 uses `defaultStream` as current CUDA stream
tensor0.sum();
// This example is the same as previous example, but explicitly specify device
// index and use CUDA stream guard to set current CUDA stream
// create a tensor on device 0
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device(torch::kCUDA));
// get a new stream from CUDA stream pool on device 0
at::cuda::CUDAStream myStream = at::cuda::getStreamFromPool(false, 0);
// set the current CUDA stream to `myStream` within the scope using CUDA stream guard
{
at::cuda::CUDAStreamGuard guard(myStream);
// current CUDA stream is `myStream` from here till the end of bracket.
// sum() on tensor0 uses `myStream` as current CUDA stream
tensor0.sum();
}
// current CUDA stream is reset to default CUDA stream after CUDA stream guard is destroyed
// sum() on tensor0 uses default CUDA stream on device 0 as current CUDA stream
tensor0.sum();
注意
以上程式碼在相同的 CUDA 裝置上執行。setCurrentCUDAStream 始終會在目前的裝置上設定目前的 CUDA stream,但請注意,setCurrentCUDAStream 實際上是在傳入的 CUDA stream 的裝置上設定目前的 stream。
在多個裝置上取得和設定 CUDA stream。
// This example shows how to acquire and set CUDA stream on two devices.
// acquire new CUDA streams from CUDA stream pool on device 0 and device 1
at::cuda::CUDAStream myStream0 = at::cuda::getStreamFromPool(false, 0);
at::cuda::CUDAStream myStream1 = at::cuda::getStreamFromPool(false, 1);
// set current CUDA stream to `myStream0` on device 0
at::cuda::setCurrentCUDAStream(myStream0);
// set current CUDA stream to `myStream1` on device 1
at::cuda::setCurrentCUDAStream(myStream1);
// create a tensor on device 0, no need to specify device index since
// current device index is 0
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device(at::kCUDA));
// sum() on tensor0 use `myStream0` as current CUDA stream on device 0
tensor0.sum();
// change the current device index to 1 by using CUDA device guard within a bracket scope
{
at::cuda::CUDAGuard device_guard{1};
// create a tensor on device 1
torch::Tensor tensor1 = torch::ones({2, 2}, torch::device(at::kCUDA));
// sum() on tensor 1 uses `myStream1` as current CUDA stream on device 1
tensor1.sum();
}
// current device is reset to device 0 after device_guard is destroyed
// acquire a new CUDA stream on device 1
at::cuda::CUDAStream myStream1_1 = at::cuda::getStreamFromPool(false, 1);
// create a new tensor on device 1
torch::Tensor tensor1 = torch::ones({2, 2}, torch::device({torch::kCUDA, 1}));
// change the current device index to 1 and current CUDA stream on device 1
// to `myStream1_1` using CUDA stream guard within a scope
{
at::cuda::CUDAStreamGuard stream_guard(myStream1_1);
// sum() on tensor1 use `myStream1_1` as current CUDA stream on device 1
tensor1.sum();
}
// current device is reset to device 0 and current CUDA stream on device 1 is
// reset to `myStream1`
// sum() on tensor1 uses `myStream1` as current CUDA stream on device 1
tensor1.sum();
使用 CUDA multistream guard
// This example shows how to use CUDA multistream guard to set
// two streams on two devices at the same time.
// create two tensor, one on device 0, one on device 1
torch::Tensor tensor0 = torch::ones({2, 2}, torch::device({torch::kCUDA, 0}));
torch::Tensor tensor1 = torch::ones({2, 2}, torch::device({torch::kCUDA, 1}));
// acquire new CUDA streams from CUDA stream pool on device 0 and device 1
at::cuda::CUDAStream myStream0 = at::cuda::getStreamFromPool(false, 0);
at::cuda::CUDAStream myStream1 = at::cuda::getStreamFromPool(false, 1);
// set current CUDA stream on device 0 to `myStream0` and
// set current CUDA stream on device 1 to `myStream1` CUDA using multistream guard
{
at::cuda::CUDAMultiStreamGuard multi_guard({myStream0, myStream1});
// sum() on tensor0 uses `myStream0` as current CUDA stream on device 0
tensor0.sum();
// sum() on tensor1 uses `myStream1` as current CUDA stream on device 1
tensor1.sum();
}
// current CUDA stream on device 0 is reset to default CUDA stream on device 0
// current CUDA stream on device 1 is reset to default CUDA stream on device 1
// sum() on tensor0 uses default CUDA stream as current CUDA stream on device 0
tensor0.sum();
// sum() on tensor1 uses default CUDA stream as current CUDA stream on device 1
tensor1.sum();
注意
CUDAMultiStreamGuard
不會變更目前的裝置索引,它只會變更每個傳入的 stream 裝置上的 stream。除了範圍控制之外,此 guard 等同於對每個傳入的 stream 呼叫 setCurrentCUDAStream
。
用於處理多個裝置上 CUDA stream 的骨架範例
// This is a skeleton example that shows how to handle CUDA streams on multiple devices
// Suppose you want to do work on the non-default stream on two devices simultaneously, and we
// already have streams on both devices in two vectors. The following code shows three ways
// of acquiring and setting the streams.
// Usage 0: acquire CUDA stream and set current CUDA stream with `setCurrentCUDAStream`
// Create a CUDA stream vector `streams0` on device 0
std::vector<at::cuda::CUDAStream> streams0 =
{at::cuda::getDefaultCUDAStream(), at::cuda::getStreamFromPool()};
// set current stream as `streams0[0]` on device 0
at::cuda::setCurrentCUDAStream(streams0[0]);
// create a CUDA stream vector `streams1` on device using CUDA device guard
std::vector<at::cuda::CUDAStream> streams1;
{
// device index is set to 1 within this scope
at::cuda::CUDAGuard device_guard(1);
streams1.push_back(at::cuda::getDefaultCUDAStream());
streams1.push_back(at::cuda::getStreamFromPool());
}
// device index is reset to 0 after device_guard is destroyed
// set current stream as `streams1[0]` on device 1
at::cuda::setCurrentCUDAStream(streams1[0]);
// Usage 1: use CUDA device guard to change the current device index only
{
at::cuda::CUDAGuard device_guard(1);
// current device index is changed to 1 within scope
// current CUDA stream is still `streams1[0]` on device 1, no change
}
// current device index is reset to 0 after `device_guard` is destroyed
// Usage 2: use CUDA stream guard to change both current device index and current CUDA stream.
{
at::cuda::CUDAStreamGuard stream_guard(streams1[1]);
// current device index and current CUDA stream are set to 1 and `streams1[1]` within scope
}
// current device index and current CUDA stream are reset to 0 and `streams0[0]` after
// stream_guard is destroyed
// Usage 3: use CUDA multi-stream guard to change multiple streams on multiple devices
{
// This is the same as calling `torch::cuda::setCurrentCUDAStream` on both streams
at::cuda::CUDAMultiStreamGuard multi_guard({streams0[1], streams1[1]});
// current device index is not change, still 0
// current CUDA stream on device 0 and device 1 are set to `streams0[1]` and `streams1[1]`
}
// current CUDA stream on device 0 and device 1 are reset to `streams0[0]` and `streams1[0]`
// after `multi_guard` is destroyed.