Elegant implementation of Spatial Pyramid Pooling layer?

BarclayII · March 2, 2017, 6:18am

Is it possible to implement Spatial Pyramid Pooling (SPP) layer in PyTorch only, without using C/CUDA code?

A SPP layer essentially needs to pool over a variably-sized feature map into a fix-sized feature map. For instance, a SPP layer with a single output size 22 would pool over a 66 feature map by 33 windows, and over a 88 map by 4*4 windows.

apaszke · March 2, 2017, 7:49pm

Yes, you could use the functional version of pooling function with dynamically computed kernel sizes, dependent on the input size:

def spatial_pyramid_pooling(input, output_size):
    assert input.dim() == 4 and input.size(2) == input.size(3)
    F.max_pool2d(input, kernel_size=input.size(2) // output_size)

Rao_Shivansh · June 8, 2018, 6:22pm

Hi like in torch for using spatialpyramidpooling i can directly call (inn.SpatialPyramidPooling({8,8},{4,4},{2,2},{1,1})) can i do it similarily in some way in pytorch ?

What is the easiest way to do so.

Thanks

addisonklinke · January 29, 2020, 4:05am

This may not have been available at the time of the original discussion, however PyTorch 1.4 has nn.AdaptiveMaxPool2d which is designed to handle the exact use case of variable --> fixed size feature map conversions. You can see my implementation of the entire SPP layer on Github

Khawar_Islam · August 19, 2021, 12:41pm

I am implementing spatial pooling in my network but I still confuse about how to implement this? I thin there is some implementation problem in my class because my class only contain one Conv layer

class DWConv(nn.Module):

    def spatial_pyramid_pool(self, previous_conv, previous_conv_size, out_pool_size=[4, 2, 1]):
        '''
          previous_conv: a tensor vector of previous convolution layer
          previous_conv_size: an int vector [height, width] of the matrix features size of previous convolution layer
          out_pool_size: a int vector of expected output size of max pooling layer

          returns: a tensor vector with shape [1 x n] is the concentration of multi-level pooling
        '''
        num_sample = previous_conv.shape[0]
        for i in range(len(out_pool_size)):
            h_wid = int(math.ceil(previous_conv_size[0] / out_pool_size[i]))
            w_wid = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))
            h_pad = (h_wid * out_pool_size[i] - previous_conv_size[0] + 1) / 2
            w_pad = (w_wid * out_pool_size[i] - previous_conv_size[1] + 1) / 2
            maxpool = torch.nn.MaxPool2d((h_wid, w_wid), stride=(h_wid, w_wid), padding=(int(h_pad), int(w_pad)))
            x = maxpool(previous_conv)
            if (i == 0):
                spp = x.view(num_sample, -1)
            else:
                spp = torch.cat((spp, x.view(num_sample, -1)), 1)
        return spp
    
    def __init__(self, dim=768):
        super(DWConv, self).__init__()
        self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim)

    def forward(self, x, H, W):
        B, N, C = x.shape
        x = x.transpose(1, 2).view(B, C, H, W)
        x = self.dwconv(x)
        x = x.flatten(2).transpose(1, 2)
        x = self.spatial_pyramid_pool(previous_conv_size=x.shape)
        return x

Traceback

 x = self.spatial_pyramid_pool(previous_conv_size=x.shape)
TypeError: spatial_pyramid_pool() missing 1 required positional argument: 'previous_conv'