JumanjiWrapper¶
- torchrl.envs.JumanjiWrapper(*args, **kwargs)[來源]¶
Jumanji 環境封裝器。
Jumanji 提供基於 Jax 的向量化模擬框架。 TorchRL 的封裝器會產生一些用於 jax-to-torch 轉換的開銷,但計算圖仍然可以在模擬的軌跡之上建構,允許透過 rollout 進行反向傳播。
GitHub: https://github.com/instadeepai/jumanji
Doc: https://instadeepai.github.io/jumanji/
Paper: https://arxiv.org/abs/2306.09884
- 參數:
env (jumanji.env.Environment) – 要封裝的 env。
categorical_action_encoding (bool, optional) – 如果
True
,分類規格將轉換為 TorchRL 等效項 (torchrl.data.Categorical
),否則將使用 one-hot 編碼 (torchrl.data.OneHot
)。 預設為False
。
- 關鍵字引數:
from_pixels (bool, optional) – 環境是否應呈現其輸出。 這將大大影響環境輸送量。 只有第一個環境會被呈現。 請參閱
render()
以取得更多資訊。 預設為 False。frame_skip (int, optional) – 如果提供,表示同一個動作要重複多少步。 返回的觀察結果將是序列的最後一個觀察結果,而獎勵將是跨步的獎勵總和。
device (torch.device, optional) – 如果提供,則為要將資料投射到的裝置。 預設為
torch.device("cpu")
。batch_size (torch.Size, optional) – 環境的批次大小。 對於
jumanji
,這表示向量化環境的數量。 預設為torch.Size([])
。allow_done_after_reset (bool, optional) – 如果
True
,則允許環境在調用reset()
之後立即為done
。預設為False
。
- 變數:
available_envs – 可用於建構的環境
範例: .. rubric:: 範例
>>> import jumanji >>> from torchrl.envs import JumanjiWrapper >>> base_env = jumanji.make("Snake-v1") >>> env = JumanjiWrapper(base_env) >>> env.set_seed(0) >>> td = env.reset() >>> td["action"] = env.action_spec.rand() >>> td = env.step(td) >>> print(td) TensorDict( fields={ action: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False), grid: Tensor(shape=torch.Size([12, 12, 5]), device=cpu, dtype=torch.float32, is_shared=False), next: TensorDict( fields={ action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False), grid: Tensor(shape=torch.Size([12, 12, 5]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, is_shared=False), state: TensorDict( fields={ action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), body: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False), body_state: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.int32, is_shared=False), fruit_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), head_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), key: Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.int32, is_shared=False), length: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), tail: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), state: TensorDict( fields={ action_mask: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False), body: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False), body_state: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.int32, is_shared=False), fruit_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), head_position: TensorDict( fields={ col: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), row: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), key: Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.int32, is_shared=False), length: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), tail: Tensor(shape=torch.Size([12, 12]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False), step_count: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=cpu, is_shared=False) >>> print(env.available_envs) ['Game2048-v1', 'Maze-v0', 'Cleaner-v0', 'CVRP-v1', 'MultiCVRP-v0', 'Minesweeper-v0', 'RubiksCube-v0', 'Knapsack-v1', 'Sudoku-v0', 'Snake-v1', 'TSP-v1', 'Connector-v2', 'MMST-v0', 'GraphColoring-v0', 'RubiksCube-partly-scrambled-v0', 'RobotWarehouse-v0', 'Tetris-v0', 'BinPack-v2', 'Sudoku-very-easy-v0', 'JobShop-v0']
為了利用 Jumanji,通常會同時執行多個環境。
>>> import jumanji >>> from torchrl.envs import JumanjiWrapper >>> base_env = jumanji.make("Snake-v1") >>> env = JumanjiWrapper(base_env, batch_size=[10]) >>> env.set_seed(0) >>> td = env.reset() >>> td["action"] = env.action_spec.rand() >>> td = env.step(td)
在以下範例中,我們迭代地測試不同的批次大小,並報告短暫 rollout 的執行時間
範例
>>> from torch.utils.benchmark import Timer >>> for batch_size in [4, 16, 128]: ... timer = Timer( ... ''' ... env.rollout(100) ... ''', ... setup=f''' ... from torchrl.envs import JumanjiWrapper ... import jumanji ... env = JumanjiWrapper(jumanji.make('Snake-v1'), batch_size=[{batch_size}]) ... env.set_seed(0) ... env.rollout(2) ... ''') ... print(batch_size, timer.timeit(number=10)) 4 env.rollout(100) setup: [...] Median: 122.40 ms 2 measurements, 1 runs per measurement, 1 thread
16 env.rollout(100) 設定: […] 中位數: 134.39 毫秒 2 次測量,每次測量 1 次執行,1 個執行緒
128 env.rollout(100) 設定: […] 中位數: 172.31 毫秒 2 次測量,每次測量 1 次執行,1 個執行緒