I am trying to implement ROI pooling by PyTorch. Here’s the minimal demo.
import torch
import torch.nn as nn
import torch.nn.functional as F
def roi_pooling(feature_map, rois, size=(7, 7)):
"""
:param feature_map: (1, C, H, W)
:param rois: (1, N, 4) N refers to bbox num, 4 represent (ltx, lty, w, h)
:param size: output size
:return: (1, C, size[0], size[1])
"""
output = []
rois_num = rois.size(1)
for i in range(rois_num):
roi = rois[0][i]
x, y, w, h = roi
output.append(F.adaptive_max_pool2d(feature_map[:, :, y:y+h, x:x+w], size))
return torch.cat(output)
if __name__ == '__main__':
test_tensor = torch.tensor([
[0.88, 0.44, 0.14, 0.16, 0.37, 0.77, 0.96, 0.27],
[0.19, 0.45, 0.57, 0.16, 0.63, 0.29, 0.71, 0.70],
[0.66, 0.26, 0.82, 0.64, 0.54, 0.73, 0.59, 0.26],
[0.85, 0.34, 0.76, 0.84, 0.29, 0.75, 0.62, 0.25],
[0.32, 0.74, 0.21, 0.39, 0.34, 0.03, 0.33, 0.48],
[0.20, 0.14, 0.16, 0.13, 0.73, 0.65, 0.96, 0.32],
[0.19, 0.69, 0.09, 0.86, 0.88, 0.07, 0.01, 0.48],
[0.83, 0.24, 0.97, 0.04, 0.24, 0.35, 0.50, 0.91]
])
test_tensor = test_tensor.view(1, 1, 8, 8)
rois = torch.tensor([[0, 3, 7, 5]])
rois = rois.view(1, -1, 4)
output = roi_pooling(test_tensor, rois, (2, 2))
print(output)
I implement this by referencing this website: https://deepsense.ai/region-of-interest-pooling-explained/. And the test_tensor and RoI also comes from the website’s example. However, as the website display, the output should be
[ [0.85, 0.84], [0.97, 0.96] ]
instead of my demo’s output:
[ [0.85, 0.96], [0.97, 0.96] ]
So, what’s the exact problem of my code? Is the coordinate split phase wrong?