LazyMemmapStorage¶
- class torchrl.data.replay_buffers.LazyMemmapStorage(max_size: int, *, scratch_dir=None, device: device = 'cpu', ndim: int = 1, existsok: bool = False)[原始碼]¶
用於 tensors 和 tensordicts 的記憶體對應儲存。
- 參數:
max_size (int) – 儲存空間的大小,即緩衝區中儲存的最大元素數量。
- 關鍵字參數:
scratch_dir (str 或 path) – 將寫入 memmap-tensors 的目錄。
device (torch.device, optional) – 將儲存和傳送取樣 tensors 的裝置。預設值為
torch.device("cpu")
。如果提供None
,則裝置會自動從傳遞的第一批資料中收集。預設不啟用此功能,以避免資料錯誤地放置在 GPU 上,導致 OOM 問題。ndim (int, optional) – 測量儲存空間大小時要考慮的維度數量。例如,形狀為
[3, 4]
的儲存空間,如果ndim=1
,則容量為3
,如果ndim=2
,則容量為12
。預設值為1
。existsok (bool, optional) – 是否應在任何張量已存在於磁碟上時引發錯誤。預設值為
True
。若為False
,張量將直接開啟,不會覆寫。
注意
當對
LazyMemmapStorage
進行檢查點操作時,您可以提供與儲存空間已儲存位置相同的路徑,以避免執行已儲存在磁碟上的資料的長時間複製。這僅在使用預設TensorStorageCheckpointer
檢查點程式時有效。範例>>> from tensordict import TensorDict >>> from torchrl.data import TensorStorage, LazyMemmapStorage, ReplayBuffer >>> import tempfile >>> from pathlib import Path >>> import time >>> td = TensorDict(a=0, b=1).expand(1000).clone() >>> # We pass a path that is <main_ckpt_dir>/storage to LazyMemmapStorage >>> rb_memmap = ReplayBuffer(storage=LazyMemmapStorage(10_000_000, scratch_dir="dump/storage")) >>> rb_memmap.extend(td); >>> # Checkpointing in `dump` is a zero-copy, as the data is already in `dump/storage` >>> rb_memmap.dumps(Path("./dump"))
範例
>>> data = TensorDict({ ... "some data": torch.randn(10, 11), ... ("some", "nested", "data"): torch.randn(10, 11, 12), ... }, batch_size=[10, 11]) >>> storage = LazyMemmapStorage(100) >>> storage.set(range(10), data) >>> len(storage) # only the first dimension is considered as indexable 10 >>> storage.get(0) TensorDict( fields={ some data: MemoryMappedTensor(shape=torch.Size([11]), device=cpu, dtype=torch.float32, is_shared=False), some: TensorDict( fields={ nested: TensorDict( fields={ data: MemoryMappedTensor(shape=torch.Size([11, 12]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([11]), device=cpu, is_shared=False)}, batch_size=torch.Size([11]), device=cpu, is_shared=False)}, batch_size=torch.Size([11]), device=cpu, is_shared=False)
此類別也支援 tensorclass 資料。
範例
>>> from tensordict import tensorclass >>> @tensorclass ... class MyClass: ... foo: torch.Tensor ... bar: torch.Tensor >>> data = MyClass(foo=torch.randn(10, 11), bar=torch.randn(10, 11, 12), batch_size=[10, 11]) >>> storage = LazyMemmapStorage(10) >>> storage.set(range(10), data) >>> storage.get(0) MyClass( bar=MemoryMappedTensor(shape=torch.Size([11, 12]), device=cpu, dtype=torch.float32, is_shared=False), foo=MemoryMappedTensor(shape=torch.Size([11]), device=cpu, dtype=torch.float32, is_shared=False), batch_size=torch.Size([11]), device=cpu, is_shared=False)
- attach(buffer: Any) None ¶
此函式將取樣器附加到此儲存空間。
從此儲存空間讀取的緩衝區必須透過呼叫此方法作為附加實體包含。這保證了當儲存空間中的資料變更時,即使儲存空間與其他緩衝區(例如,優先順序取樣器)共享,元件也能感知到變更。
- 參數:
buffer – 從此儲存空間讀取的物件。