MultiScaleRoIAlign¶

class torchvision.ops.MultiScaleRoIAlign(featmap_names: List[str], output_size: Union[int, Tuple[int], List[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[source]¶

多尺度 RoIAlign 池化，適用於使用或不使用 FPN 的物件偵測。

它透過 Feature Pyramid Network paper 的 eq. 1 中指定的啟發式方法來推斷池化的尺度。僅限關鍵字參數 canonical_scale 和 canonical_level 分別對應於 eq. 1 中的 224 和 k0=4，並且具有以下含義：canonical_level 是 pyramid 的目標層級，用於池化具有 w x h = canonical_scale x canonical_scale 的感興趣區域。

參數:

featmap_names (List[str]) – 將用於池化的特徵圖名稱。
output_size (List[Tuple[int, int]] 或 List[int]) – 池化區域的輸出大小
sampling_ratio (int) – ROIAlign 的採樣率
canonical_scale (int, optional) – LevelMapper 的 canonical_scale
canonical_level (int, optional) – LevelMapper 的 canonical_level

範例

>>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
>>> i = OrderedDict()
>>> i['feat1'] = torch.rand(1, 5, 64, 64)
>>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
>>> i['feat3'] = torch.rand(1, 5, 16, 16)
>>> # create some random bounding boxes
>>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
>>> # original image size, before computing the feature maps
>>> image_sizes = [(512, 512)]
>>> output = m(i, [boxes], image_sizes)
>>> print(output.shape)
>>> torch.Size([6, 5, 3, 3])

forward(x: Dict[str, Tensor], boxes: List[Tensor], image_shapes: List[Tuple[int, int]]) → Tensor[source]¶

參數:

x (OrderedDict[Tensor]) – 每個層級的特徵圖。它們應該具有相同的通道數，但可以具有不同的大小。
boxes (List[Tensor[N, 4]]) – 用於執行池化操作的框，格式為 (x1, y1, x2, y2)，並以圖像參考大小（而非特徵圖參考）表示。坐標必須滿足 0 <= x1 < x2 且 0 <= y1 < y2。
image_shapes (List[Tuple[height, width]]) – 每個影像在輸入 CNN 以取得特徵圖 (feature maps) 之前的大小。這讓我們可以推斷出每個要進行池化 (pooling) 的層級的縮放比例。

回傳值:

result (Tensor)

MultiScaleRoIAlign¶

文件

教學

資源