OpenSpielEnv¶
- torchrl.envs.OpenSpielEnv(*args, **kwargs)[來源]¶
使用遊戲字串建構的 Google DeepMind OpenSpiel 環境包裝器。
GitHub: https://github.com/google-deepmind/open_spiel
文件:https://openspiel.readthedocs.io/en/latest/index.html
- 參數:
game_string (str) – 要包裝的遊戲名稱。必須是
available_envs
的一部分。- 關鍵字引數:
device (torch.device, optional) – 如果提供,則為要將資料投射到的裝置。預設為
None
。batch_size (torch.Size, optional) – 環境的批次大小。預設為
torch.Size([])
。allow_done_after_reset (bool, optional) – 如果
True
,則在呼叫reset()
後,允許環境處於done
狀態。預設為False
。group_map (MarlGroupMapType 或 Dict[str, List[str]]], optional) – 如何在 tensordict 中對代理進行分組以進行輸入/輸出。請參閱
MarlGroupMapType
以獲取更多資訊。預設為ALL_IN_ONE_GROUP
。categorical_actions (bool, optional) – 如果
True
,分類規格將轉換為 TorchRL 等效項 (torchrl.data.Categorical
),否則將使用 one-hot 編碼 (torchrl.data.OneHot
)。預設為False
。return_state (bool, optional) – 如果
True
,則 “state” 包含在reset()
和step()
的輸出中。狀態可以提供給reset()
以重置為該狀態,而不是重置為初始狀態。預設為False
。
- 變數:
available_envs – 可建構的環境
範例
>>> from torchrl.envs import OpenSpielEnv >>> from tensordict import TensorDict >>> env = OpenSpielEnv("chess", return_state=True) >>> td = env.reset() >>> td = env.step(env.full_action_spec.rand()) >>> print(td) TensorDict( fields={ agents: TensorDict( fields={ action: Tensor(shape=torch.Size([2, 4672]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), next: TensorDict( fields={ agents: TensorDict( fields={ observation: Tensor(shape=torch.Size([2, 20, 8, 8]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([2, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([2]), device=None, is_shared=False), current_player: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False), done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False), state: NonTensorData(data=FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 674 , batch_size=torch.Size([]), device=None), terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False) >>> print(env.available_envs) ['2048', 'add_noise', 'amazons', 'backgammon', ...]
reset()
可以還原到特定的狀態,而不是初始狀態,只要return_state=True
。>>> from torchrl.envs import OpenSpielEnv >>> from tensordict import TensorDict >>> env = OpenSpielEnv("chess", return_state=True) >>> td = env.reset() >>> td = env.step(env.full_action_spec.rand()) >>> td_restore = td["next"] >>> td = env.step(env.full_action_spec.rand()) >>> # Current state is not equal `td_restore` >>> (td["next"] == td_restore).all() False >>> td = env.reset(td_restore) >>> # After resetting, now the current state is equal to `td_restore` >>> (td == td_restore).all() True