Understanding ROI Pool of fast r-cnn

Hello all,

I am facing difficulty when understanding the roi pool layer that was implemented in torchvision torchvision.ops.roi_pool. I have attached the sample code below

import torch
import torchvision

import numpy as np
feature_map = np.array([
    [0.70, 0.41, 0.38, 1.23, 0.24],
    [0.14, 0.45, 0.31, 0.73, 3.22],
    [0.11, 0.41, 0.79, 0.69, 0.44],
    [1.47, 0.25, 0.09, 0.32, 2.98],
    [0.48, 0.87, 0.77, 0.26, 0.11],
])


feature_map = torch.tensor(feature_map, requires_grad=False).float()

# (batch, channel, h, w) -> (1, 1, 5, 5)
feature_map = feature_map.unsqueeze(0).unsqueeze(0)

# boxes -> (1, 5)
boxes = np.array([
    [0, 0, 0, 4, 4],
])
boxes = torch.tensor(boxes, requires_grad=False).float()

# roi pooling layer of 3x3
pool = torchvision.ops.roi_pool(input=feature_map, boxes=boxes, output_size=3)
print(pool)

The output of the above code is

tensor([[[[0.7000, 1.2300, 3.2200],
          [1.4700, 0.7900, 3.2200],
          [1.4700, 0.8700, 2.9800]]]])

So here the 5x5 map is divided in such a way that the output is 3x3. So 5x5 region can be divided the following way to 3x3 (there are many ways but I only drew 2)

image

image

  • How the subregions are decided in pytorch code (I checked the paper and there was no detail on this one). Is there any algorithm pytorch is referring to ?
  • My understanding says that there should be no overlap between sub-regions but that doesn’t explain the output of above code (as you can see 3.22 is present in (0, 2) and (1, 2) positions of output)

The versions I am using are :

torch = 1.10.2+cpu
torchvision = 0.11.3+cpu

Thanks.

1 Like