PyTorch RoI pooling implementation deviation

I am trying to implement ROI pooling by PyTorch. Here’s the minimal demo.

import torch
import torch.nn as nn
import torch.nn.functional as F


def roi_pooling(feature_map, rois, size=(7, 7)):
    """
    :param feature_map: (1, C, H, W)
    :param rois: (1, N, 4) N refers to bbox num, 4 represent (ltx, lty, w, h) 
    :param size: output size
    :return: (1, C, size[0], size[1])
    """
    output = []
    rois_num = rois.size(1)

    for i in range(rois_num):
        roi = rois[0][i]
        x, y, w, h = roi
        output.append(F.adaptive_max_pool2d(feature_map[:, :, y:y+h, x:x+w], size))

    return torch.cat(output)


if __name__ == '__main__':
    test_tensor = torch.tensor([
        [0.88, 0.44, 0.14, 0.16, 0.37, 0.77, 0.96, 0.27],
        [0.19, 0.45, 0.57, 0.16, 0.63, 0.29, 0.71, 0.70],
        [0.66, 0.26, 0.82, 0.64, 0.54, 0.73, 0.59, 0.26],
        [0.85, 0.34, 0.76, 0.84, 0.29, 0.75, 0.62, 0.25],
        [0.32, 0.74, 0.21, 0.39, 0.34, 0.03, 0.33, 0.48],
        [0.20, 0.14, 0.16, 0.13, 0.73, 0.65, 0.96, 0.32],
        [0.19, 0.69, 0.09, 0.86, 0.88, 0.07, 0.01, 0.48],
        [0.83, 0.24, 0.97, 0.04, 0.24, 0.35, 0.50, 0.91]
    ])
    test_tensor = test_tensor.view(1, 1, 8, 8)
    rois = torch.tensor([[0, 3, 7, 5]])
    rois = rois.view(1, -1, 4)
    output = roi_pooling(test_tensor, rois, (2, 2))
    print(output)

I implement this by referencing this website: https://deepsense.ai/region-of-interest-pooling-explained/. And the test_tensor and RoI also comes from the website’s example. However, as the website display, the output should be
[ [0.85, 0.84], [0.97, 0.96] ]
instead of my demo’s output:
[ [0.85, 0.96], [0.97, 0.96] ]
So, what’s the exact problem of my code? Is the coordinate split phase wrong?

2 Likes

Your code looks fine.
Apparently the kernels of F.adaptive_max_pool2d overlap for odd input sizes:

a = torch.zeros(1, 1, 5, 5)
a[0, 0, 2, 4] = 1.0
F.adaptive_max_pool2d(a, (2, 2))
> tensor([[[[ 0.,  1.],
          [ 0.,  1.]]]])

I’m not sure this is wanted behavior.

What’s the meaning of “overlap for odd input sizes”? And Is the example in that website wrong?How should i implement it properly?

1 Like

@ptrblck highlighted your question.
Say you pool width / height X to new width / height Y < X. Then is Y does not divide X, you’re in a bit of a pickle. The Adaptive pooling seems to implicitly “copy” some data in the middle. A less surprising alternative might be to pad the input to a multiple of X (“same” or “reflection” might be more intuitive than zeros), as those will then be part of the same pooling region. Or you could crop the edges off.

Best regards

Thomas

2 Likes

Which strategy should be the correct form in the fast-rcnn method. The former one or the latter?
Also, I’ve seen the source code of chainerCV’s implementation of RoI pooling, which strategy did they use, here’s the cpu version code:

    def forward_cpu(self, inputs):
        self.retain_inputs((1,))
        self._bottom_data_shape = inputs[0].shape

        bottom_data, bottom_rois = inputs
        channels, height, width = bottom_data.shape[1:]
        n_rois = bottom_rois.shape[0]
        # `numpy.zeros` needs to be used because the arrays can be
        # returned without having some of its values updated.
        top_data = numpy.zeros((n_rois, channels, self.outh, self.outw),
                               dtype=numpy.float32)
        self.argmax_data = numpy.zeros(top_data.shape, numpy.int32)

        for i_roi in six.moves.range(n_rois):
            idx, xmin, ymin, xmax, ymax = bottom_rois[i_roi]
            xmin = int(round(xmin * self.spatial_scale))
            xmax = int(round(xmax * self.spatial_scale))
            ymin = int(round(ymin * self.spatial_scale))
            ymax = int(round(ymax * self.spatial_scale))
            roi_width = max(xmax - xmin + 1, 1)
            roi_height = max(ymax - ymin + 1, 1)
            strideh = 1. * roi_height / self.outh
            stridew = 1. * roi_width / self.outw

            for outh in six.moves.range(self.outh):
                sliceh, lenh = _roi_pooling_slice(
                    outh, strideh, height, ymin)
                if sliceh.stop <= sliceh.start:
                    continue
                for outw in six.moves.range(self.outw):
                    slicew, lenw = _roi_pooling_slice(
                        outw, stridew, width, xmin)
                    if slicew.stop <= slicew.start:
                        continue
                    roi_data = bottom_data[int(idx), :, sliceh, slicew]\
                        .reshape(channels, -1)
                    top_data[i_roi, :, outh, outw] =\
                        numpy.max(roi_data, axis=1)

                    # get the max idx respect to feature_maps coordinates
                    max_idx_slice = numpy.unravel_index(
                        numpy.argmax(roi_data, axis=1), (lenh, lenw))
                    max_idx_slice_h = max_idx_slice[0] + sliceh.start
                    max_idx_slice_w = max_idx_slice[1] + slicew.start
                    max_idx_slice = max_idx_slice_h * width + max_idx_slice_w
                    self.argmax_data[i_roi, :, outh, outw] = max_idx_slice
        return top_data,