注意

點擊這裡以下載完整的範例程式碼

簡介 || 張量 || Autograd || 建構模型 || TensorBoard 支援 || 訓練模型 || 模型理解

PyTorch TensorBoard 支援¶

建立於：2021 年 11 月 30 日 | 最後更新：2024 年 5 月 29 日 | 最後驗證：2024 年 11 月 05 日

請按照下面的影片或 youtube 上的內容操作。

開始之前¶

要執行本教學，您需要安裝 PyTorch、TorchVision、Matplotlib 和 TensorBoard。

使用 conda

conda install pytorch torchvision -c pytorch
conda install matplotlib tensorboard

使用 pip

pip install torch torchvision matplotlib tensorboard

安裝完依賴項後，請在您安裝它們的 Python 環境中重新啟動此筆記本。

簡介¶

在本筆記本中，我們將針對 Fashion-MNIST 數據集訓練 LeNet-5 的變體。 Fashion-MNIST 是一組描述各種服裝的圖像圖塊，具有十個類別標籤，指示所描述的服裝類型。

# PyTorch model and training necessities
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# Image datasets and image manipulation
import torchvision
import torchvision.transforms as transforms

# Image display
import matplotlib.pyplot as plt
import numpy as np

# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter

# In case you are using an environment that has TensorFlow installed,
# such as Google Colab, uncomment the following code to avoid
# a bug with saving embeddings to your TensorBoard directory

# import tensorflow as tf
# import tensorboard as tb
# tf.io.gfile = tb.compat.tensorflow_stub.io.gfile

在 TensorBoard 中顯示圖像¶

讓我們首先將數據集中的範例圖像添加到 TensorBoard

# Gather datasets and prepare them for consumption
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

# Store separate training and validations splits in ./data
training_set = torchvision.datasets.FashionMNIST('./data',
    download=True,
    train=True,
    transform=transform)
validation_set = torchvision.datasets.FashionMNIST('./data',
    download=True,
    train=False,
    transform=transform)

training_loader = torch.utils.data.DataLoader(training_set,
                                              batch_size=4,
                                              shuffle=True,
                                              num_workers=2)


validation_loader = torch.utils.data.DataLoader(validation_set,
                                                batch_size=4,
                                                shuffle=False,
                                                num_workers=2)

# Class labels
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')

# Helper function for inline image display
def matplotlib_imshow(img, one_channel=False):
    if one_channel:
        img = img.mean(dim=0)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if one_channel:
        plt.imshow(npimg, cmap="Greys")
    else:
        plt.imshow(np.transpose(npimg, (1, 2, 0)))

# Extract a batch of 4 images
dataiter = iter(training_loader)
images, labels = next(dataiter)

# Create a grid from the images and show them
img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)

  0%|          | 0.00/26.4M [00:00<?, ?B/s]
  0%|          | 65.5k/26.4M [00:00<01:12, 363kB/s]
  1%|          | 229k/26.4M [00:00<00:38, 683kB/s]
  4%|3         | 950k/26.4M [00:00<00:11, 2.19MB/s]
 15%|#4        | 3.83M/26.4M [00:00<00:02, 7.62MB/s]
 38%|###7      | 9.96M/26.4M [00:00<00:00, 17.1MB/s]
 61%|######    | 16.1M/26.4M [00:01<00:00, 22.8MB/s]
 84%|########4 | 22.2M/26.4M [00:01<00:00, 26.4MB/s]
100%|##########| 26.4M/26.4M [00:01<00:00, 19.4MB/s]

  0%|          | 0.00/29.5k [00:00<?, ?B/s]
100%|##########| 29.5k/29.5k [00:00<00:00, 328kB/s]

  0%|          | 0.00/4.42M [00:00<?, ?B/s]
  1%|1         | 65.5k/4.42M [00:00<00:11, 365kB/s]
  5%|5         | 229k/4.42M [00:00<00:06, 686kB/s]
 21%|##1       | 950k/4.42M [00:00<00:01, 2.20MB/s]
 87%|########6 | 3.83M/4.42M [00:00<00:00, 7.65MB/s]
100%|##########| 4.42M/4.42M [00:00<00:00, 6.13MB/s]

  0%|          | 0.00/5.15k [00:00<?, ?B/s]
100%|##########| 5.15k/5.15k [00:00<00:00, 34.7MB/s]

上面，我們使用 TorchVision 和 Matplotlib 建立輸入數據的小批量的視覺網格。下面，我們在 SummaryWriter 上使用 add_image() 呼叫來記錄圖像，以供 TensorBoard 使用，並且我們也呼叫 flush() 以確保它立即寫入磁碟。

# Default log_dir argument is "runs" - but it's good to be specific
# torch.utils.tensorboard.SummaryWriter is imported above
writer = SummaryWriter('runs/fashion_mnist_experiment_1')

# Write image data to TensorBoard log dir
writer.add_image('Four Fashion-MNIST Images', img_grid)
writer.flush()

# To view, start TensorBoard on the command line with:
#   tensorboard --logdir=runs
# ...and open a browser tab to https://127.0.0.1:6006/

如果您在命令列啟動 TensorBoard 並在新的瀏覽器標籤 (通常位於 localhost:6006) 中開啟它，您應該會在 IMAGES 標籤下看到圖像網格。

繪製標量圖以可視化訓練¶

TensorBoard 對於追蹤訓練的進度和效率非常有用。下面，我們將執行一個訓練迴圈，追蹤一些指標，並將資料儲存以供 TensorBoard 使用。

讓我們定義一個對圖像圖塊進行分類的模型，以及用於訓練的優化器和損失函數

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

現在讓我們訓練一個 epoch，並每 1000 個批次評估訓練與驗證集損失

print(len(validation_loader))
for epoch in range(1):  # loop over the dataset multiple times
    running_loss = 0.0

    for i, data in enumerate(training_loader, 0):
        # basic training loop
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 1000 == 999:    # Every 1000 mini-batches...
            print('Batch {}'.format(i + 1))
            # Check against the validation set
            running_vloss = 0.0

            # In evaluation mode some model specific operations can be omitted eg. dropout layer
            net.train(False) # Switching to evaluation mode, eg. turning off regularisation
            for j, vdata in enumerate(validation_loader, 0):
                vinputs, vlabels = vdata
                voutputs = net(vinputs)
                vloss = criterion(voutputs, vlabels)
                running_vloss += vloss.item()
            net.train(True) # Switching back to training mode, eg. turning on regularisation

            avg_loss = running_loss / 1000
            avg_vloss = running_vloss / len(validation_loader)

            # Log the running loss averaged per batch
            writer.add_scalars('Training vs. Validation Loss',
                            { 'Training' : avg_loss, 'Validation' : avg_vloss },
                            epoch * len(training_loader) + i)

            running_loss = 0.0
print('Finished Training')

writer.flush()

2500
Batch 1000
Batch 2000
Batch 3000
Batch 4000
Batch 5000
Batch 6000
Batch 7000
Batch 8000
Batch 9000
Batch 10000
Batch 11000
Batch 12000
Batch 13000
Batch 14000
Batch 15000
Finished Training

切換到您開啟的 TensorBoard 並查看 SCALARS 標籤。

可視化您的模型¶

TensorBoard 也可以用於檢查模型中的資料流。為此，請使用模型和範例輸入呼叫 add_graph() 方法

# Again, grab a single mini-batch of images
dataiter = iter(training_loader)
images, labels = next(dataiter)

# add_graph() will trace the sample input through your model,
# and render it as a graph.
writer.add_graph(net, images)
writer.flush()

當您切換到 TensorBoard 時，您應該會看到一個 GRAPHS 標籤。雙擊 “NET” 節點以查看模型中的圖層和資料流。

使用嵌入可視化您的數據集¶

我們使用的 28 x 28 圖像圖塊可以建模為 784 維向量 (28 * 28 = 784)。將其投影到較低維度的表示中可能很有啟發性。 add_embedding() 方法會將一組資料投影到具有最高變異數的三個維度上，並將它們顯示為互動式 3D 圖表。 add_embedding() 方法會自動執行此操作，方法是投影到具有最高變異數的三個維度。

下面，我們將取得一個資料樣本，並產生這樣的嵌入

# Select a random subset of data and corresponding labels
def select_n_random(data, labels, n=100):
    assert len(data) == len(labels)

    perm = torch.randperm(len(data))
    return data[perm][:n], labels[perm][:n]

# Extract a random subset of data
images, labels = select_n_random(training_set.data, training_set.targets)

# get the class labels for each image
class_labels = [classes[label] for label in labels]

# log embeddings
features = images.view(-1, 28 * 28)
writer.add_embedding(features,
                    metadata=class_labels,
                    label_img=images.unsqueeze(1))
writer.flush()
writer.close()

現在，如果您切換到 TensorBoard 並選擇 PROJECTOR 標籤，您應該會看到投影的 3D 表示。您可以旋轉和縮放模型。以大和小尺度檢查它，看看您是否可以在投影資料中發現模式以及標籤的集群。

為了更好的可見性，建議

從左側的「Color by (依顏色)」下拉式選單中選擇「label (標籤)」。
切換頂端的「Night Mode (夜間模式)」圖示，將淺色圖片放在深色背景上。

其他資源¶

如需更多資訊，請參閱

PyTorch 關於 torch.utils.tensorboard.SummaryWriter 的文件
PyTorch.org 教學中的 Tensorboard 教學內容
有關 TensorBoard 的更多資訊，請參閱 TensorBoard 文件

腳本的總執行時間： ( 2 分鐘 41.280 秒)

由 Sphinx-Gallery 生成的圖庫