捷徑

使用 Ray Tune 進行超參數調整

建立於:2020 年 8 月 31 日 | 最後更新:2024 年 10 月 31 日 | 最後驗證:2024 年 11 月 05 日

超參數調整可以使模型從普通變得非常精準。通常,選擇不同的學習率或變更網路層大小等簡單的事情,可能會對您的模型效能產生巨大的影響。

幸運的是,有一些工具可以幫助找到參數的最佳組合。Ray Tune 是一個用於分散式超參數調整的業界標準工具。 Ray Tune 包括最新的超參數搜尋演算法,與各種分析庫整合,並透過 Ray 的分散式機器學習引擎原生支援分散式訓練。

在本教學中,我們將向您展示如何將 Ray Tune 整合到您的 PyTorch 訓練工作流程中。 我們將擴充 PyTorch 文件中的本教學,以訓練 CIFAR10 影像分類器。

正如您將看到的,我們只需要添加一些小的修改。 特別是,我們需要

  1. 將資料載入和訓練包裝在函數中,

  2. 使一些網路參數可配置,

  3. 添加檢查點 (可選),

  4. 並定義模型調整的搜尋空間


要執行本教學課程,請確保已安裝以下套件

  • ray[tune]:分散式超參數調整庫

  • torchvision:用於資料轉換器

設定 / 匯入

讓我們從匯入開始

from functools import partial
import os
import tempfile
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray import train
from ray.train import Checkpoint, get_checkpoint
from ray.tune.schedulers import ASHAScheduler
import ray.cloudpickle as pickle

大多數匯入都是建構 PyTorch 模型所必需的。 只有最後的匯入用於 Ray Tune。

資料載入器

我們將資料載入器包裝在它們自己的函數中,並傳遞一個全域資料目錄。 這樣我們就可以在不同的試驗之間共享一個資料目錄。

def load_data(data_dir="./data"):
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )

    trainset = torchvision.datasets.CIFAR10(
        root=data_dir, train=True, download=True, transform=transform
    )

    testset = torchvision.datasets.CIFAR10(
        root=data_dir, train=False, download=True, transform=transform
    )

    return trainset, testset

可配置的類神經網路

我們只能調整那些可配置的參數。 在此範例中,我們可以指定完全連接層的層大小

class Net(nn.Module):
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, l1)
        self.fc2 = nn.Linear(l1, l2)
        self.fc3 = nn.Linear(l2, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

train 函數

現在它變得有趣了,因為我們對範例進行了一些變更 from the PyTorch documentation

我們將訓練腳本包裝在一個函數 train_cifar(config, data_dir=None) 中。 config 參數將接收我們想要訓練的超參數。 data_dir 指定我們載入和儲存資料的目錄,以便多次執行可以共享相同的資料來源。 如果提供了檢查點,我們也會在執行開始時載入模型和最佳化器狀態。 在本教學的後面,您將找到有關如何儲存檢查點以及它用於什麼的資訊。

net = Net(config["l1"], config["l2"])

checkpoint = get_checkpoint()
if checkpoint:
    with checkpoint.as_directory() as checkpoint_dir:
        data_path = Path(checkpoint_dir) / "data.pkl"
        with open(data_path, "rb") as fp:
            checkpoint_state = pickle.load(fp)
        start_epoch = checkpoint_state["epoch"]
        net.load_state_dict(checkpoint_state["net_state_dict"])
        optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
else:
    start_epoch = 0

最佳化器的學習率也是可配置的

optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

我們還將訓練資料分成訓練和驗證子集。 因此,我們在 80% 的資料上進行訓練,並在剩餘的 20% 上計算驗證損失。 我們迭代訓練集和測試集的批次大小也是可配置的。

使用 DataParallel 添加 (多) GPU 支援

影像分類很大程度上受益於 GPU。 幸運的是,我們可以繼續在 Ray Tune 中使用 PyTorch 的抽象。 因此,我們可以將我們的模型包裝在 nn.DataParallel 中,以支援多個 GPU 上的資料平行訓練

device = "cpu"
if torch.cuda.is_available():
    device = "cuda:0"
    if torch.cuda.device_count() > 1:
        net = nn.DataParallel(net)
net.to(device)

透過使用 device 變數,我們確保在沒有 GPU 可用時訓練也能運作。 PyTorch 要求我們將我們的資料明確地傳送到 GPU 記憶體,如下所示

for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    inputs, labels = inputs.to(device), labels.to(device)

現在,程式碼支援在 CPU、單個 GPU 和多個 GPU 上進行訓練。 值得注意的是,Ray 也支援 fractional GPUs,因此我們可以在試驗之間共享 GPU,只要模型仍然適合 GPU 記憶體即可。 我們稍後會再回到這一點。

與 Ray Tune 通訊

最有趣的部分是與 Ray Tune 的通訊

checkpoint_data = {
    "epoch": epoch,
    "net_state_dict": net.state_dict(),
    "optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
    data_path = Path(checkpoint_dir) / "data.pkl"
    with open(data_path, "wb") as fp:
        pickle.dump(checkpoint_data, fp)

    checkpoint = Checkpoint.from_directory(checkpoint_dir)
    train.report(
        {"loss": val_loss / val_steps, "accuracy": correct / total},
        checkpoint=checkpoint,
    )

在這裡,我們先儲存一個檢查點,然後將一些指標回報給 Ray Tune。 具體來說,我們將驗證損失和準確性回報給 Ray Tune。 然後,Ray Tune 可以使用這些指標來決定哪個超參數配置導致了最佳結果。 這些指標也可以用於提早停止表現不佳的試驗,以避免在這些試驗上浪費資源。

儲存檢查點是可選的,但如果我們想使用進階排程器(例如基於族群的訓練)這是必要的。此外,透過儲存檢查點,我們可以稍後載入訓練好的模型,並在測試集上驗證它們。最後,儲存檢查點對於容錯非常有用,它允許我們中斷訓練並在稍後繼續訓練。

完整訓練函數

完整的程式碼範例如下

def train_cifar(config, data_dir=None):
    net = Net(config["l1"], config["l2"])

    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if torch.cuda.device_count() > 1:
            net = nn.DataParallel(net)
    net.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    checkpoint = get_checkpoint()
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "rb") as fp:
                checkpoint_state = pickle.load(fp)
            start_epoch = checkpoint_state["epoch"]
            net.load_state_dict(checkpoint_state["net_state_dict"])
            optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
    else:
        start_epoch = 0

    trainset, testset = load_data(data_dir)

    test_abs = int(len(trainset) * 0.8)
    train_subset, val_subset = random_split(
        trainset, [test_abs, len(trainset) - test_abs]
    )

    trainloader = torch.utils.data.DataLoader(
        train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )
    valloader = torch.utils.data.DataLoader(
        val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )

    for epoch in range(start_epoch, 10):  # loop over the dataset multiple times
        running_loss = 0.0
        epoch_steps = 0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            epoch_steps += 1
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(
                    "[%d, %5d] loss: %.3f"
                    % (epoch + 1, i + 1, running_loss / epoch_steps)
                )
                running_loss = 0.0

        # Validation loss
        val_loss = 0.0
        val_steps = 0
        total = 0
        correct = 0
        for i, data in enumerate(valloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)

                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1

        checkpoint_data = {
            "epoch": epoch,
            "net_state_dict": net.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        }
        with tempfile.TemporaryDirectory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "wb") as fp:
                pickle.dump(checkpoint_data, fp)

            checkpoint = Checkpoint.from_directory(checkpoint_dir)
            train.report(
                {"loss": val_loss / val_steps, "accuracy": correct / total},
                checkpoint=checkpoint,
            )

    print("Finished Training")

如您所見,大多數程式碼都是直接從原始範例改編而來。

測試集準確度

通常,機器學習模型的效能會在一個保留的測試集上進行測試,該測試集包含未用於訓練模型的資料。 我們也將其包裝在一個函數中

def test_accuracy(net, device="cpu"):
    trainset, testset = load_data()

    testloader = torch.utils.data.DataLoader(
        testset, batch_size=4, shuffle=False, num_workers=2
    )

    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return correct / total

該函數還需要一個 device 參數,因此我們可以在 GPU 上進行測試集驗證。

設定搜尋空間

最後,我們需要定義 Ray Tune 的搜尋空間。 这是一个例子

config = {
    "l1": tune.choice([2 ** i for i in range(9)]),
    "l2": tune.choice([2 ** i for i in range(9)]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([2, 4, 8, 16])
}

tune.choice() 接受一個值的列表,這些值是從列表中均勻採樣的。 在這個例子中,l1l2 參數應該是介於 4 和 256 之間的 2 的冪,也就是 4、8、16、32、64、128 或 256。 lr(學習率)應該在 0.0001 和 0.1 之間均勻採樣。 最後,批次大小是 2、4、8 和 16 之間的一個選擇。

在每次試驗中,Ray Tune 現在將隨機地從這些搜尋空間中採樣參數的組合。 然後,它將並行訓練多個模型,並找到其中效能最佳的模型。 我們還使用 ASHAScheduler,它將提前終止效能不佳的試驗。

我們用 functools.partial 包裝 train_cifar 函數來設定常數 data_dir 參數。 我們還可以告訴 Ray Tune 每次試驗應該可用的資源

gpus_per_trial = 2
# ...
result = tune.run(
    partial(train_cifar, data_dir=data_dir),
    resources_per_trial={"cpu": 8, "gpu": gpus_per_trial},
    config=config,
    num_samples=num_samples,
    scheduler=scheduler,
    checkpoint_at_end=True)

您可以指定 CPU 的數量,然後可以使用這些 CPU 來增加 PyTorch DataLoader 實例的 num_workers。 選定的 GPU 數量在每次試驗中對 PyTorch 可見。 試驗無法存取未為它們請求的 GPU - 因此您不必擔心兩個試驗使用同一組資源。

在這裡,我們還可以指定 fractional GPUs,所以類似於 gpus_per_trial=0.5 是完全有效的。 然後,試驗將彼此共享 GPU。 您只需要確保模型仍然適合 GPU 記憶體。

在訓練模型後,我們會找到效能最佳的模型,並從檢查點檔案中載入訓練好的網路。 然後,我們獲得測試集準確度,並透過列印來報告所有內容。

完整的 main 函數如下所示

def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
    data_dir = os.path.abspath("./data")
    load_data(data_dir)
    config = {
        "l1": tune.choice([2**i for i in range(9)]),
        "l2": tune.choice([2**i for i in range(9)]),
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([2, 4, 8, 16]),
    }
    scheduler = ASHAScheduler(
        metric="loss",
        mode="min",
        max_t=max_num_epochs,
        grace_period=1,
        reduction_factor=2,
    )
    result = tune.run(
        partial(train_cifar, data_dir=data_dir),
        resources_per_trial={"cpu": 2, "gpu": gpus_per_trial},
        config=config,
        num_samples=num_samples,
        scheduler=scheduler,
    )

    best_trial = result.get_best_trial("loss", "min", "last")
    print(f"Best trial config: {best_trial.config}")
    print(f"Best trial final validation loss: {best_trial.last_result['loss']}")
    print(f"Best trial final validation accuracy: {best_trial.last_result['accuracy']}")

    best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"])
    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if gpus_per_trial > 1:
            best_trained_model = nn.DataParallel(best_trained_model)
    best_trained_model.to(device)

    best_checkpoint = result.get_best_checkpoint(trial=best_trial, metric="accuracy", mode="max")
    with best_checkpoint.as_directory() as checkpoint_dir:
        data_path = Path(checkpoint_dir) / "data.pkl"
        with open(data_path, "rb") as fp:
            best_checkpoint_data = pickle.load(fp)

        best_trained_model.load_state_dict(best_checkpoint_data["net_state_dict"])
        test_acc = test_accuracy(best_trained_model, device)
        print("Best trial test set accuracy: {}".format(test_acc))


if __name__ == "__main__":
    # You can change the number of GPUs per trial here:
    main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)
  0% 0.00/170M [00:00<?, ?B/s]
  0% 426k/170M [00:00<00:42, 4.02MB/s]
  2% 3.38M/170M [00:00<00:08, 18.6MB/s]
  4% 6.55M/170M [00:00<00:06, 24.3MB/s]
  6% 9.63M/170M [00:00<00:05, 26.8MB/s]
  7% 12.4M/170M [00:00<00:05, 27.1MB/s]
  9% 15.3M/170M [00:00<00:05, 27.9MB/s]
 11% 18.4M/170M [00:00<00:05, 28.8MB/s]
 13% 21.8M/170M [00:00<00:04, 30.4MB/s]
 16% 26.9M/170M [00:00<00:03, 36.8MB/s]
 20% 33.3M/170M [00:01<00:03, 45.1MB/s]
 24% 41.3M/170M [00:01<00:02, 55.6MB/s]
 30% 51.5M/170M [00:01<00:01, 69.8MB/s]
 37% 62.9M/170M [00:01<00:01, 83.1MB/s]
 44% 74.4M/170M [00:01<00:01, 92.7MB/s]
 50% 85.8M/170M [00:01<00:00, 98.8MB/s]
 57% 97.4M/170M [00:01<00:00, 104MB/s]
 64% 109M/170M [00:01<00:00, 107MB/s]
 71% 120M/170M [00:01<00:00, 109MB/s]
 77% 132M/170M [00:01<00:00, 111MB/s]
 84% 143M/170M [00:02<00:00, 112MB/s]
 91% 155M/170M [00:02<00:00, 113MB/s]
 97% 166M/170M [00:02<00:00, 113MB/s]
100% 170M/170M [00:02<00:00, 75.7MB/s]
2025-02-03 16:48:28,972 WARNING services.py:1889 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147479552 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2025-02-03 16:48:29,111 INFO worker.py:1642 -- Started a local Ray instance.
2025-02-03 16:48:30,453 INFO tune.py:228 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
2025-02-03 16:48:30,455 INFO tune.py:654 -- [output] This will use the new output engine with verbosity 2. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
+--------------------------------------------------------------------+
| Configuration for experiment     train_cifar_2025-02-03_16-48-30   |
+--------------------------------------------------------------------+
| Search algorithm                 BasicVariantGenerator             |
| Scheduler                        AsyncHyperBandScheduler           |
| Number of trials                 10                                |
+--------------------------------------------------------------------+

View detailed results here: /var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30
To visualize your results with TensorBoard, run: `tensorboard --logdir /var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30`

Trial status: 10 PENDING
Current time: 2025-02-03 16:48:30. Total running time: 0s
Logical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+-------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   PENDING      16      1   0.00213327               2 |
| train_cifar_b2ac8_00001   PENDING       1      2   0.013416                 4 |
| train_cifar_b2ac8_00002   PENDING     256     64   0.0113784                2 |
| train_cifar_b2ac8_00003   PENDING      64    256   0.0274071                8 |
| train_cifar_b2ac8_00004   PENDING      16      2   0.056666                 4 |
| train_cifar_b2ac8_00005   PENDING       8     64   0.000353097              4 |
| train_cifar_b2ac8_00006   PENDING      16      4   0.000147684              8 |
| train_cifar_b2ac8_00007   PENDING     256    256   0.00477469               8 |
| train_cifar_b2ac8_00008   PENDING     128    256   0.0306227                8 |
| train_cifar_b2ac8_00009   PENDING       2     16   0.0286986                2 |
+-------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00003 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00003 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                            64 |
| l2                                           256 |
| lr                                       0.02741 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00005 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00005 config             |
+--------------------------------------------------+
| batch_size                                     4 |
| l1                                             8 |
| l2                                            64 |
| lr                                       0.00035 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00007 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00007 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                           256 |
| l2                                           256 |
| lr                                       0.00477 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00004 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00004 config             |
+--------------------------------------------------+
| batch_size                                     4 |
| l1                                            16 |
| l2                                             2 |
| lr                                       0.05667 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00000 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00000 config             |
+--------------------------------------------------+
| batch_size                                     2 |
| l1                                            16 |
| l2                                             1 |
| lr                                       0.00213 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00006 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00006 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                            16 |
| l2                                             4 |
| lr                                       0.00015 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00002 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00002 config             |
+--------------------------------------------------+
| batch_size                                     2 |
| l1                                           256 |
| l2                                            64 |
| lr                                       0.01138 |
+--------------------------------------------------+

Trial train_cifar_b2ac8_00001 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00001 config             |
+--------------------------------------------------+
| batch_size                                     4 |
| l1                                             1 |
| l2                                             2 |
| lr                                       0.01342 |
+--------------------------------------------------+
(func pid=4838) [1,  2000] loss: 2.317

Trial status: 8 RUNNING | 2 PENDING
Current time: 2025-02-03 16:49:00. Total running time: 30s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+-------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   RUNNING      16      1   0.00213327               2 |
| train_cifar_b2ac8_00001   RUNNING       1      2   0.013416                 4 |
| train_cifar_b2ac8_00002   RUNNING     256     64   0.0113784                2 |
| train_cifar_b2ac8_00003   RUNNING      64    256   0.0274071                8 |
| train_cifar_b2ac8_00004   RUNNING      16      2   0.056666                 4 |
| train_cifar_b2ac8_00005   RUNNING       8     64   0.000353097              4 |
| train_cifar_b2ac8_00006   RUNNING      16      4   0.000147684              8 |
| train_cifar_b2ac8_00007   RUNNING     256    256   0.00477469               8 |
| train_cifar_b2ac8_00008   PENDING     128    256   0.0306227                8 |
| train_cifar_b2ac8_00009   PENDING       2     16   0.0286986                2 |
+-------------------------------------------------------------------------------+
(func pid=4838) [1,  4000] loss: 1.153 [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(func pid=4838) [1,  6000] loss: 0.768 [repeated 8x across cluster]
Trial status: 8 RUNNING | 2 PENDING
Current time: 2025-02-03 16:49:30. Total running time: 1min 0s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+-------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   RUNNING      16      1   0.00213327               2 |
| train_cifar_b2ac8_00001   RUNNING       1      2   0.013416                 4 |
| train_cifar_b2ac8_00002   RUNNING     256     64   0.0113784                2 |
| train_cifar_b2ac8_00003   RUNNING      64    256   0.0274071                8 |
| train_cifar_b2ac8_00004   RUNNING      16      2   0.056666                 4 |
| train_cifar_b2ac8_00005   RUNNING       8     64   0.000353097              4 |
| train_cifar_b2ac8_00006   RUNNING      16      4   0.000147684              8 |
| train_cifar_b2ac8_00007   RUNNING     256    256   0.00477469               8 |
| train_cifar_b2ac8_00008   PENDING     128    256   0.0306227                8 |
| train_cifar_b2ac8_00009   PENDING       2     16   0.0286986                2 |
+-------------------------------------------------------------------------------+
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000000)

Trial train_cifar_b2ac8_00006 finished iteration 1 at 2025-02-03 16:49:33. Total running time: 1min 2s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  56.50494 |
| time_total_s                                      56.50494 |
| training_iteration                                       1 |
| accuracy                                            0.0999 |
| loss                                               2.30117 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00003 finished iteration 1 at 2025-02-03 16:49:33. Total running time: 1min 3s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00003 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  57.19499 |
| time_total_s                                      57.19499 |
| training_iteration                                       1 |
| accuracy                                             0.175 |
| loss                                               2.13929 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00003 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00003_3_batch_size=8,l1=64,l2=256,lr=0.0274_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00007 finished iteration 1 at 2025-02-03 16:49:34. Total running time: 1min 3s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  57.39949 |
| time_total_s                                      57.39949 |
| training_iteration                                       1 |
| accuracy                                            0.4497 |
| loss                                                1.5528 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000000
(func pid=4838) [1,  8000] loss: 0.576 [repeated 5x across cluster]
(func pid=4840) [1,  8000] loss: 0.575 [repeated 4x across cluster]
(func pid=4838) [1, 10000] loss: 0.461 [repeated 4x across cluster]

Trial status: 8 RUNNING | 2 PENDING
Current time: 2025-02-03 16:50:00. Total running time: 1min 30s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+----------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+----------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   RUNNING      16      1   0.00213327               2                                                    |
| train_cifar_b2ac8_00001   RUNNING       1      2   0.013416                 4                                                    |
| train_cifar_b2ac8_00002   RUNNING     256     64   0.0113784                2                                                    |
| train_cifar_b2ac8_00003   RUNNING      64    256   0.0274071                8        1            57.195    2.13929       0.175  |
| train_cifar_b2ac8_00004   RUNNING      16      2   0.056666                 4                                                    |
| train_cifar_b2ac8_00005   RUNNING       8     64   0.000353097              4                                                    |
| train_cifar_b2ac8_00006   RUNNING      16      4   0.000147684              8        1            56.5049   2.30117       0.0999 |
| train_cifar_b2ac8_00007   RUNNING     256    256   0.00477469               8        1            57.3995   1.5528        0.4497 |
| train_cifar_b2ac8_00008   PENDING     128    256   0.0306227                8                                                    |
| train_cifar_b2ac8_00009   PENDING       2     16   0.0286986                2                                                    |
+----------------------------------------------------------------------------------------------------------------------------------+
(func pid=4846) [1, 10000] loss: 0.466 [repeated 3x across cluster]
(func pid=4848) [2,  4000] loss: 1.141 [repeated 2x across cluster]

Trial train_cifar_b2ac8_00001 finished iteration 1 at 2025-02-03 16:50:16. Total running time: 1min 45s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00001 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  98.42022 |
| time_total_s                                      98.42022 |
| training_iteration                                       1 |
| accuracy                                            0.0979 |
| loss                                               2.31387 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00001 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00001_1_batch_size=4,l1=1,l2=2,lr=0.0134_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00001 completed after 1 iterations at 2025-02-03 16:50:16. Total running time: 1min 45s

Trial train_cifar_b2ac8_00008 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_b2ac8_00008 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                           128 |
| l2                                           256 |
| lr                                       0.03062 |
+--------------------------------------------------+
(func pid=4839) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00001_1_batch_size=4,l1=1,l2=2,lr=0.0134_2025-02-03_16-48-30/checkpoint_000000) [repeated 3x across cluster]

Trial train_cifar_b2ac8_00005 finished iteration 1 at 2025-02-03 16:50:16. Total running time: 1min 45s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  99.87228 |
| time_total_s                                      99.87228 |
| training_iteration                                       1 |
| accuracy                                            0.3944 |
| loss                                               1.65765 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00004 finished iteration 1 at 2025-02-03 16:50:17. Total running time: 1min 47s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00004 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                 101.19969 |
| time_total_s                                     101.19969 |
| training_iteration                                       1 |
| accuracy                                            0.1023 |
| loss                                               2.40351 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00004 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00004_4_batch_size=4,l1=16,l2=2,lr=0.0567_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00004 completed after 1 iterations at 2025-02-03 16:50:17. Total running time: 1min 47s

Trial train_cifar_b2ac8_00009 started with configuration:
+-------------------------------------------------+
| Trial train_cifar_b2ac8_00009 config            |
+-------------------------------------------------+
| batch_size                                    2 |
| l1                                            2 |
| l2                                           16 |
| lr                                       0.0287 |
+-------------------------------------------------+
(func pid=4840) [1, 12000] loss: 0.386 [repeated 4x across cluster]

Trial train_cifar_b2ac8_00006 finished iteration 2 at 2025-02-03 16:50:26. Total running time: 1min 56s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  53.57944 |
| time_total_s                                     110.08438 |
| training_iteration                                       2 |
| accuracy                                            0.1612 |
| loss                                                2.2435 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000001
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000001) [repeated 3x across cluster]

Trial train_cifar_b2ac8_00007 finished iteration 2 at 2025-02-03 16:50:29. Total running time: 1min 58s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  55.22056 |
| time_total_s                                     112.62005 |
| training_iteration                                       2 |
| accuracy                                             0.495 |
| loss                                               1.39853 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000001

Trial train_cifar_b2ac8_00003 finished iteration 2 at 2025-02-03 16:50:29. Total running time: 1min 59s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00003 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  55.79607 |
| time_total_s                                     112.99106 |
| training_iteration                                       2 |
| accuracy                                            0.2452 |
| loss                                               2.04902 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00003 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00003_3_batch_size=8,l1=64,l2=256,lr=0.0274_2025-02-03_16-48-30/checkpoint_000001

Trial train_cifar_b2ac8_00003 completed after 2 iterations at 2025-02-03 16:50:29. Total running time: 1min 59s

Trial status: 7 RUNNING | 3 TERMINATED
Current time: 2025-02-03 16:50:30. Total running time: 2min 0s
Logical resource usage: 14.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   RUNNING        16      1   0.00213327               2                                                    |
| train_cifar_b2ac8_00002   RUNNING       256     64   0.0113784                2                                                    |
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        1            99.8723   1.65765       0.3944 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        2           110.084    2.2435        0.1612 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        2           112.62     1.39853       0.495  |
| train_cifar_b2ac8_00008   RUNNING       128    256   0.0306227                8                                                    |
| train_cifar_b2ac8_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [2,  2000] loss: 1.659 [repeated 2x across cluster]
(func pid=4838) [1, 16000] loss: 0.288 [repeated 3x across cluster]
(func pid=4846) [1,  4000] loss: 1.166 [repeated 5x across cluster]
(func pid=4838) [1, 18000] loss: 0.256 [repeated 2x across cluster]
(func pid=4847) [2,  6000] loss: 0.519 [repeated 3x across cluster]
Trial status: 7 RUNNING | 3 TERMINATED
Current time: 2025-02-03 16:51:01. Total running time: 2min 30s
Logical resource usage: 14.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   RUNNING        16      1   0.00213327               2                                                    |
| train_cifar_b2ac8_00002   RUNNING       256     64   0.0113784                2                                                    |
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        1            99.8723   1.65765       0.3944 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        2           110.084    2.2435        0.1612 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        2           112.62     1.39853       0.495  |
| train_cifar_b2ac8_00008   RUNNING       128    256   0.0306227                8                                                    |
| train_cifar_b2ac8_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00008 finished iteration 1 at 2025-02-03 16:51:07. Total running time: 2min 37s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00008 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  51.78252 |
| time_total_s                                      51.78252 |
| training_iteration                                       1 |
| accuracy                                            0.2116 |
| loss                                               2.11696 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00008 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2025-02-03_16-48-30/checkpoint_000000
(func pid=4839) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2025-02-03_16-48-30/checkpoint_000000) [repeated 3x across cluster]
(func pid=4838) [1, 20000] loss: 0.230 [repeated 3x across cluster]

Trial train_cifar_b2ac8_00006 finished iteration 3 at 2025-02-03 16:51:13. Total running time: 2min 42s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000002 |
| time_this_iter_s                                  46.37521 |
| time_total_s                                     156.45959 |
| training_iteration                                       3 |
| accuracy                                            0.1927 |
| loss                                               2.01544 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000002
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000002)
(func pid=4847) [2,  8000] loss: 0.377
(func pid=4840) [1, 18000] loss: 0.257

Trial train_cifar_b2ac8_00007 finished iteration 3 at 2025-02-03 16:51:16. Total running time: 2min 46s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000002 |
| time_this_iter_s                                  47.30755 |
| time_total_s                                      159.9276 |
| training_iteration                                       3 |
| accuracy                                            0.5507 |
| loss                                               1.26755 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000002
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000002)
(func pid=4839) [2,  2000] loss: 2.112 [repeated 2x across cluster]

Trial train_cifar_b2ac8_00000 finished iteration 1 at 2025-02-03 16:51:30. Total running time: 3min 0s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00000 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                 174.17129 |
| time_total_s                                     174.17129 |
| training_iteration                                       1 |
| accuracy                                            0.1024 |
| loss                                               2.30605 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00000 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00000_0_batch_size=2,l1=16,l2=1,lr=0.0021_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00000 completed after 1 iterations at 2025-02-03 16:51:30. Total running time: 3min 0s
(func pid=4838) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00000_0_batch_size=2,l1=16,l2=1,lr=0.0021_2025-02-03_16-48-30/checkpoint_000000)

Trial status: 4 TERMINATED | 6 RUNNING
Current time: 2025-02-03 16:51:31. Total running time: 3min 0s
Logical resource usage: 12.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00002   RUNNING       256     64   0.0113784                2                                                    |
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        1            99.8723   1.65765       0.3944 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        3           156.46     2.01544       0.1927 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        3           159.928    1.26755       0.5507 |
| train_cifar_b2ac8_00008   RUNNING       128    256   0.0306227                8        1            51.7825   2.11696       0.2116 |
| train_cifar_b2ac8_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4840) [1, 20000] loss: 0.231 [repeated 4x across cluster]

Trial train_cifar_b2ac8_00005 finished iteration 2 at 2025-02-03 16:51:39. Total running time: 3min 8s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  82.89381 |
| time_total_s                                     182.76609 |
| training_iteration                                       2 |
| accuracy                                            0.4844 |
| loss                                               1.43074 |
+------------------------------------------------------------+
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000001)
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000001
(func pid=4846) [1, 12000] loss: 0.389 [repeated 2x across cluster]
(func pid=4853) [4,  4000] loss: 0.589 [repeated 3x across cluster]

Trial train_cifar_b2ac8_00002 finished iteration 1 at 2025-02-03 16:51:50. Total running time: 3min 20s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00002 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                 194.08201 |
| time_total_s                                     194.08201 |
| training_iteration                                       1 |
| accuracy                                            0.1008 |
| loss                                               2.32728 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00002 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00002_2_batch_size=2,l1=256,l2=64,lr=0.0114_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00002 completed after 1 iterations at 2025-02-03 16:51:50. Total running time: 3min 20s
(func pid=4840) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00002_2_batch_size=2,l1=256,l2=64,lr=0.0114_2025-02-03_16-48-30/checkpoint_000000)
(func pid=4847) [3,  2000] loss: 1.455
(func pid=4846) [1, 14000] loss: 0.334

Trial train_cifar_b2ac8_00008 finished iteration 2 at 2025-02-03 16:51:54. Total running time: 3min 23s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00008 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                   46.4419 |
| time_total_s                                      98.22443 |
| training_iteration                                       2 |
| accuracy                                            0.2085 |
| loss                                               2.07197 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00008 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2025-02-03_16-48-30/checkpoint_000001

Trial train_cifar_b2ac8_00008 completed after 2 iterations at 2025-02-03 16:51:54. Total running time: 3min 23s
(func pid=4839) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2025-02-03_16-48-30/checkpoint_000001)

Trial train_cifar_b2ac8_00006 finished iteration 4 at 2025-02-03 16:51:55. Total running time: 3min 24s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000003 |
| time_this_iter_s                                  41.89828 |
| time_total_s                                     198.35787 |
| training_iteration                                       4 |
| accuracy                                            0.2663 |
| loss                                               1.88928 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000003

Trial train_cifar_b2ac8_00007 finished iteration 4 at 2025-02-03 16:51:59. Total running time: 3min 28s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000003 |
| time_this_iter_s                                   42.6159 |
| time_total_s                                      202.5435 |
| training_iteration                                       4 |
| accuracy                                            0.5658 |
| loss                                               1.22908 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000003

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2025-02-03 16:52:01. Total running time: 3min 30s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        2           182.766    1.43074       0.4844 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        4           198.358    1.88928       0.2663 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        4           202.544    1.22908       0.5658 |
| train_cifar_b2ac8_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4846) [1, 16000] loss: 0.292
(func pid=4847) [3,  4000] loss: 0.711
(func pid=4853) [5,  2000] loss: 1.082 [repeated 2x across cluster]
(func pid=4848) [5,  4000] loss: 0.923 [repeated 3x across cluster]
(func pid=4847) [3,  8000] loss: 0.347 [repeated 2x across cluster]

Trial train_cifar_b2ac8_00006 finished iteration 5 at 2025-02-03 16:52:30. Total running time: 3min 59s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000004 |
| time_this_iter_s                                  34.98383 |
| time_total_s                                      233.3417 |
| training_iteration                                       5 |
| accuracy                                            0.3108 |
| loss                                               1.80861 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000004
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000004) [repeated 3x across cluster]

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2025-02-03 16:52:31. Total running time: 4min 0s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        2           182.766    1.43074       0.4844 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        5           233.342    1.80861       0.3108 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        4           202.544    1.22908       0.5658 |
| train_cifar_b2ac8_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [3, 10000] loss: 0.273 [repeated 2x across cluster]

Trial train_cifar_b2ac8_00007 finished iteration 5 at 2025-02-03 16:52:36. Total running time: 4min 5s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000004 |
| time_this_iter_s                                  37.10509 |
| time_total_s                                     239.64859 |
| training_iteration                                       5 |
| accuracy                                            0.5593 |
| loss                                               1.26493 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000004
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000004)

Trial train_cifar_b2ac8_00009 finished iteration 1 at 2025-02-03 16:52:39. Total running time: 4min 9s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00009 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                 141.67313 |
| time_total_s                                     141.67313 |
| training_iteration                                       1 |
| accuracy                                            0.0982 |
| loss                                               2.33713 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00009 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00009_9_batch_size=2,l1=2,l2=16,lr=0.0287_2025-02-03_16-48-30/checkpoint_000000

Trial train_cifar_b2ac8_00009 completed after 1 iterations at 2025-02-03 16:52:39. Total running time: 4min 9s
(func pid=4846) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00009_9_batch_size=2,l1=2,l2=16,lr=0.0287_2025-02-03_16-48-30/checkpoint_000000)
(func pid=4848) [6,  2000] loss: 1.792

Trial train_cifar_b2ac8_00005 finished iteration 3 at 2025-02-03 16:52:43. Total running time: 4min 13s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000002 |
| time_this_iter_s                                  64.53621 |
| time_total_s                                      247.3023 |
| training_iteration                                       3 |
| accuracy                                             0.516 |
| loss                                                1.3426 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000002
(func pid=4853) [6,  2000] loss: 1.030
(func pid=4847) [4,  2000] loss: 1.348 [repeated 2x across cluster]
(func pid=4853) [6,  4000] loss: 0.534

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-02-03 16:53:01. Total running time: 4min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        3           247.302    1.3426        0.516  |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        5           233.342    1.80861       0.3108 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        5           239.649    1.26493       0.5593 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00006 finished iteration 6 at 2025-02-03 16:53:02. Total running time: 4min 32s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000005 |
| time_this_iter_s                                  32.68137 |
| time_total_s                                     266.02307 |
| training_iteration                                       6 |
| accuracy                                             0.351 |
| loss                                               1.73352 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000005
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000005) [repeated 2x across cluster]
(func pid=4847) [4,  4000] loss: 0.659

Trial train_cifar_b2ac8_00007 finished iteration 6 at 2025-02-03 16:53:10. Total running time: 4min 39s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000005 |
| time_this_iter_s                                  33.92317 |
| time_total_s                                     273.57176 |
| training_iteration                                       6 |
| accuracy                                            0.5622 |
| loss                                               1.24711 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000005
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000005)
(func pid=4847) [4,  6000] loss: 0.436
(func pid=4848) [7,  2000] loss: 1.726
(func pid=4853) [7,  2000] loss: 0.998
(func pid=4847) [4,  8000] loss: 0.325

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-02-03 16:53:31. Total running time: 5min 0s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        3           247.302    1.3426        0.516  |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        6           266.023    1.73352       0.351  |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        6           273.572    1.24711       0.5622 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [4, 10000] loss: 0.258 [repeated 2x across cluster]

Trial train_cifar_b2ac8_00006 finished iteration 7 at 2025-02-03 16:53:34. Total running time: 5min 4s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000006 |
| time_this_iter_s                                   31.7731 |
| time_total_s                                     297.79617 |
| training_iteration                                       7 |
| accuracy                                            0.3658 |
| loss                                               1.67171 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000006
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000006)

Trial train_cifar_b2ac8_00005 finished iteration 4 at 2025-02-03 16:53:40. Total running time: 5min 10s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000003 |
| time_this_iter_s                                  56.96775 |
| time_total_s                                     304.27004 |
| training_iteration                                       4 |
| accuracy                                            0.5408 |
| loss                                               1.26593 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000003
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000003)

Trial train_cifar_b2ac8_00007 finished iteration 7 at 2025-02-03 16:53:43. Total running time: 5min 13s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000006 |
| time_this_iter_s                                  33.55006 |
| time_total_s                                     307.12182 |
| training_iteration                                       7 |
| accuracy                                            0.5773 |
| loss                                               1.25191 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000006
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000006)
(func pid=4848) [8,  2000] loss: 1.664 [repeated 2x across cluster]
(func pid=4847) [5,  2000] loss: 1.247
(func pid=4853) [8,  2000] loss: 0.945

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-02-03 16:54:01. Total running time: 5min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        4           304.27     1.26593       0.5408 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        7           297.796    1.67171       0.3658 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        7           307.122    1.25191       0.5773 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00006 finished iteration 8 at 2025-02-03 16:54:06. Total running time: 5min 36s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000007 |
| time_this_iter_s                                  32.13646 |
| time_total_s                                     329.93263 |
| training_iteration                                       8 |
| accuracy                                            0.3815 |
| loss                                               1.62493 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000007
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000007)
(func pid=4853) [8,  4000] loss: 0.515 [repeated 3x across cluster]

Trial train_cifar_b2ac8_00007 finished iteration 8 at 2025-02-03 16:54:17. Total running time: 5min 46s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000007 |
| time_this_iter_s                                  33.51932 |
| time_total_s                                     340.64114 |
| training_iteration                                       8 |
| accuracy                                            0.5631 |
| loss                                               1.34588 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000007
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000007)
(func pid=4848) [9,  2000] loss: 1.606 [repeated 2x across cluster]
(func pid=4848) [9,  4000] loss: 0.799 [repeated 2x across cluster]

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-02-03 16:54:31. Total running time: 6min 0s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        4           304.27     1.26593       0.5408 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        8           329.933    1.62493       0.3815 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        8           340.641    1.34588       0.5631 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00005 finished iteration 5 at 2025-02-03 16:54:38. Total running time: 6min 7s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000004 |
| time_this_iter_s                                  57.39263 |
| time_total_s                                     361.66267 |
| training_iteration                                       5 |
| accuracy                                            0.5533 |
| loss                                                1.2361 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000004
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000004)

Trial train_cifar_b2ac8_00006 finished iteration 9 at 2025-02-03 16:54:38. Total running time: 6min 8s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000008 |
| time_this_iter_s                                  32.05868 |
| time_total_s                                     361.99131 |
| training_iteration                                       9 |
| accuracy                                            0.3796 |
| loss                                               1.63133 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000008
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000008)
(func pid=4853) [9,  4000] loss: 0.507 [repeated 3x across cluster]
(func pid=4847) [6,  2000] loss: 1.204
(func pid=4848) [10,  2000] loss: 1.571

Trial train_cifar_b2ac8_00007 finished iteration 9 at 2025-02-03 16:54:51. Total running time: 6min 20s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000008 |
| time_this_iter_s                                  33.55593 |
| time_total_s                                     374.19706 |
| training_iteration                                       9 |
| accuracy                                             0.568 |
| loss                                                1.3415 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000008
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000008)
(func pid=4847) [6,  4000] loss: 0.603
(func pid=4848) [10,  4000] loss: 0.780

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2025-02-03 16:55:01. Total running time: 6min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        5           361.663    1.2361        0.5533 |
| train_cifar_b2ac8_00006   RUNNING        16      4   0.000147684              8        9           361.991    1.63133       0.3796 |
| train_cifar_b2ac8_00007   RUNNING       256    256   0.00477469               8        9           374.197    1.3415        0.568  |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [6,  6000] loss: 0.402 [repeated 2x across cluster]

Trial train_cifar_b2ac8_00006 finished iteration 10 at 2025-02-03 16:55:10. Total running time: 6min 40s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000009 |
| time_this_iter_s                                  31.95598 |
| time_total_s                                     393.94729 |
| training_iteration                                      10 |
| accuracy                                            0.4027 |
| loss                                               1.56735 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00006 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000009

Trial train_cifar_b2ac8_00006 completed after 10 iterations at 2025-02-03 16:55:10. Total running time: 6min 40s
(func pid=4848) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2025-02-03_16-48-30/checkpoint_000009)
(func pid=4853) [10,  4000] loss: 0.495
(func pid=4847) [6,  8000] loss: 0.300

Trial train_cifar_b2ac8_00007 finished iteration 10 at 2025-02-03 16:55:22. Total running time: 6min 51s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000009 |
| time_this_iter_s                                   31.2748 |
| time_total_s                                     405.47186 |
| training_iteration                                      10 |
| accuracy                                            0.5454 |
| loss                                               1.39514 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00007 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000009

Trial train_cifar_b2ac8_00007 completed after 10 iterations at 2025-02-03 16:55:22. Total running time: 6min 51s
(func pid=4853) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2025-02-03_16-48-30/checkpoint_000009)
(func pid=4847) [6, 10000] loss: 0.240

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:55:31. Total running time: 7min 0s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        5           361.663    1.2361        0.5533 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00005 finished iteration 6 at 2025-02-03 16:55:32. Total running time: 7min 1s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000005 |
| time_this_iter_s                                  53.81876 |
| time_total_s                                     415.48143 |
| training_iteration                                       6 |
| accuracy                                             0.572 |
| loss                                               1.19976 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000005
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000005)
(func pid=4847) [7,  2000] loss: 1.165
(func pid=4847) [7,  4000] loss: 0.582
(func pid=4847) [7,  6000] loss: 0.386

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:56:01. Total running time: 7min 30s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        6           415.481    1.19976       0.572  |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [7,  8000] loss: 0.293
(func pid=4847) [7, 10000] loss: 0.235

Trial train_cifar_b2ac8_00005 finished iteration 7 at 2025-02-03 16:56:19. Total running time: 7min 48s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000006 |
| time_this_iter_s                                   47.0664 |
| time_total_s                                     462.54783 |
| training_iteration                                       7 |
| accuracy                                            0.5654 |
| loss                                               1.19836 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000006
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000006)
(func pid=4847) [8,  2000] loss: 1.129

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:56:31. Total running time: 8min 0s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        7           462.548    1.19836       0.5654 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [8,  4000] loss: 0.576
(func pid=4847) [8,  6000] loss: 0.383
(func pid=4847) [8,  8000] loss: 0.277
(func pid=4847) [8, 10000] loss: 0.228
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:57:01. Total running time: 8min 30s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        7           462.548    1.19836       0.5654 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_b2ac8_00005 finished iteration 8 at 2025-02-03 16:57:05. Total running time: 8min 35s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000007 |
| time_this_iter_s                                  46.57185 |
| time_total_s                                     509.11968 |
| training_iteration                                       8 |
| accuracy                                            0.5888 |
| loss                                               1.15632 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000007
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000007)
(func pid=4847) [9,  2000] loss: 1.126
(func pid=4847) [9,  4000] loss: 0.558
(func pid=4847) [9,  6000] loss: 0.372

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:57:31. Total running time: 9min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        8           509.12     1.15632       0.5888 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [9,  8000] loss: 0.278
(func pid=4847) [9, 10000] loss: 0.221

Trial train_cifar_b2ac8_00005 finished iteration 9 at 2025-02-03 16:57:52. Total running time: 9min 21s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000008 |
| time_this_iter_s                                  46.59829 |
| time_total_s                                     555.71797 |
| training_iteration                                       9 |
| accuracy                                            0.5921 |
| loss                                               1.13066 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000008
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000008)
(func pid=4847) [10,  2000] loss: 1.098

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:58:01. Total running time: 9min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        9           555.718    1.13066       0.5921 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [10,  4000] loss: 0.548
(func pid=4847) [10,  6000] loss: 0.367
(func pid=4847) [10,  8000] loss: 0.272
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2025-02-03 16:58:31. Total running time: 10min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00005   RUNNING         8     64   0.000353097              4        9           555.718    1.13066       0.5921 |
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) [10, 10000] loss: 0.220

Trial train_cifar_b2ac8_00005 finished iteration 10 at 2025-02-03 16:58:39. Total running time: 10min 9s
+------------------------------------------------------------+
| Trial train_cifar_b2ac8_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000009 |
| time_this_iter_s                                  47.23504 |
| time_total_s                                     602.95301 |
| training_iteration                                      10 |
| accuracy                                            0.5845 |
| loss                                               1.18742 |
+------------------------------------------------------------+
Trial train_cifar_b2ac8_00005 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000009

Trial train_cifar_b2ac8_00005 completed after 10 iterations at 2025-02-03 16:58:39. Total running time: 10min 9s

Trial status: 10 TERMINATED
Current time: 2025-02-03 16:58:39. Total running time: 10min 9s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_b2ac8_00000   TERMINATED     16      1   0.00213327               2        1           174.171    2.30605       0.1024 |
| train_cifar_b2ac8_00001   TERMINATED      1      2   0.013416                 4        1            98.4202   2.31387       0.0979 |
| train_cifar_b2ac8_00002   TERMINATED    256     64   0.0113784                2        1           194.082    2.32728       0.1008 |
| train_cifar_b2ac8_00003   TERMINATED     64    256   0.0274071                8        2           112.991    2.04902       0.2452 |
| train_cifar_b2ac8_00004   TERMINATED     16      2   0.056666                 4        1           101.2      2.40351       0.1023 |
| train_cifar_b2ac8_00005   TERMINATED      8     64   0.000353097              4       10           602.953    1.18742       0.5845 |
| train_cifar_b2ac8_00006   TERMINATED     16      4   0.000147684              8       10           393.947    1.56735       0.4027 |
| train_cifar_b2ac8_00007   TERMINATED    256    256   0.00477469               8       10           405.472    1.39514       0.5454 |
| train_cifar_b2ac8_00008   TERMINATED    128    256   0.0306227                8        2            98.2244   2.07197       0.2085 |
| train_cifar_b2ac8_00009   TERMINATED      2     16   0.0286986                2        1           141.673    2.33713       0.0982 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=4847) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2025-02-03_16-48-30/train_cifar_b2ac8_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2025-02-03_16-48-30/checkpoint_000009)

Best trial config: {'l1': 8, 'l2': 64, 'lr': 0.0003530972286268149, 'batch_size': 4}
Best trial final validation loss: 1.1874244678899646
Best trial final validation accuracy: 0.5845
Best trial test set accuracy: 0.5867

如果您運行程式碼,範例輸出可能如下所示

Number of trials: 10/10 (10 TERMINATED)
+-----+--------------+------+------+-------------+--------+---------+------------+
| ... |   batch_size |   l1 |   l2 |          lr |   iter |    loss |   accuracy |
|-----+--------------+------+------+-------------+--------+---------+------------|
| ... |            2 |    1 |  256 | 0.000668163 |      1 | 2.31479 |     0.0977 |
| ... |            4 |   64 |    8 | 0.0331514   |      1 | 2.31605 |     0.0983 |
| ... |            4 |    2 |    1 | 0.000150295 |      1 | 2.30755 |     0.1023 |
| ... |           16 |   32 |   32 | 0.0128248   |     10 | 1.66912 |     0.4391 |
| ... |            4 |    8 |  128 | 0.00464561  |      2 | 1.7316  |     0.3463 |
| ... |            8 |  256 |    8 | 0.00031556  |      1 | 2.19409 |     0.1736 |
| ... |            4 |   16 |  256 | 0.00574329  |      2 | 1.85679 |     0.3368 |
| ... |            8 |    2 |    2 | 0.00325652  |      1 | 2.30272 |     0.0984 |
| ... |            2 |    2 |    2 | 0.000342987 |      2 | 1.76044 |     0.292  |
| ... |            4 |   64 |   32 | 0.003734    |      8 | 1.53101 |     0.4761 |
+-----+--------------+------+------+-------------+--------+---------+------------+

Best trial config: {'l1': 64, 'l2': 32, 'lr': 0.0037339984519545164, 'batch_size': 4}
Best trial final validation loss: 1.5310075663924216
Best trial final validation accuracy: 0.4761
Best trial test set accuracy: 0.4737

大多數試驗都已提前停止,以避免浪費資源。 效能最佳的試驗達到了約 47% 的驗證準確度,這可以在測試集中得到證實。

就是這樣! 您現在可以調整 PyTorch 模型的參數了。

腳本的總執行時間: ( 10 分鐘 27.413 秒)

由 Sphinx-Gallery 產生的圖庫

文件

存取 PyTorch 的全面開發者文件

檢視文件

教學課程

取得初學者和進階開發者的深入教程

檢視教程

資源

尋找開發資源並取得問題解答

檢視資源