Hello all,
I am facing difficulty when understanding the roi pool layer that was implemented in torchvision torchvision.ops.roi_pool
. I have attached the sample code below
import torch
import torchvision
import numpy as np
feature_map = np.array([
[0.70, 0.41, 0.38, 1.23, 0.24],
[0.14, 0.45, 0.31, 0.73, 3.22],
[0.11, 0.41, 0.79, 0.69, 0.44],
[1.47, 0.25, 0.09, 0.32, 2.98],
[0.48, 0.87, 0.77, 0.26, 0.11],
])
feature_map = torch.tensor(feature_map, requires_grad=False).float()
# (batch, channel, h, w) -> (1, 1, 5, 5)
feature_map = feature_map.unsqueeze(0).unsqueeze(0)
# boxes -> (1, 5)
boxes = np.array([
[0, 0, 0, 4, 4],
])
boxes = torch.tensor(boxes, requires_grad=False).float()
# roi pooling layer of 3x3
pool = torchvision.ops.roi_pool(input=feature_map, boxes=boxes, output_size=3)
print(pool)
The output of the above code is
tensor([[[[0.7000, 1.2300, 3.2200],
[1.4700, 0.7900, 3.2200],
[1.4700, 0.8700, 2.9800]]]])
So here the 5x5
map is divided in such a way that the output is 3x3
. So 5x5
region can be divided the following way to 3x3
(there are many ways but I only drew 2)
- How the subregions are decided in pytorch code (I checked the paper and there was no detail on this one). Is there any algorithm pytorch is referring to ?
- My understanding says that there should be no overlap between sub-regions but that doesn’t explain the output of above code (as you can see
3.22
is present in (0, 2) and (1, 2) positions of output)
The versions I am using are :
torch = 1.10.2+cpu
torchvision = 0.11.3+cpu
Thanks.