Elegant implementation of Spatial Pyramid Pooling layer?

Is it possible to implement Spatial Pyramid Pooling (SPP) layer in PyTorch only, without using C/CUDA code?

A SPP layer essentially needs to pool over a variably-sized feature map into a fix-sized feature map. For instance, a SPP layer with a single output size 22 would pool over a 66 feature map by 33 windows, and over a 88 map by 4*4 windows.

Yes, you could use the functional version of pooling function with dynamically computed kernel sizes, dependent on the input size:

def spatial_pyramid_pooling(input, output_size):
    assert input.dim() == 4 and input.size(2) == input.size(3)
    F.max_pool2d(input, kernel_size=input.size(2) // output_size)
4 Likes

Hi like in torch for using spatialpyramidpooling i can directly call (inn.SpatialPyramidPooling({8,8},{4,4},{2,2},{1,1})) can i do it similarily in some way in pytorch ?

What is the easiest way to do so.

Thanks

This may not have been available at the time of the original discussion, however PyTorch 1.4 has nn.AdaptiveMaxPool2d which is designed to handle the exact use case of variable --> fixed size feature map conversions. You can see my implementation of the entire SPP layer on Github

1 Like

I am implementing spatial pooling in my network but I still confuse about how to implement this? I thin there is some implementation problem in my class because my class only contain one Conv layer

class DWConv(nn.Module):

    def spatial_pyramid_pool(self, previous_conv, previous_conv_size, out_pool_size=[4, 2, 1]):
        '''
          previous_conv: a tensor vector of previous convolution layer
          previous_conv_size: an int vector [height, width] of the matrix features size of previous convolution layer
          out_pool_size: a int vector of expected output size of max pooling layer

          returns: a tensor vector with shape [1 x n] is the concentration of multi-level pooling
        '''
        num_sample = previous_conv.shape[0]
        for i in range(len(out_pool_size)):
            h_wid = int(math.ceil(previous_conv_size[0] / out_pool_size[i]))
            w_wid = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))
            h_pad = (h_wid * out_pool_size[i] - previous_conv_size[0] + 1) / 2
            w_pad = (w_wid * out_pool_size[i] - previous_conv_size[1] + 1) / 2
            maxpool = torch.nn.MaxPool2d((h_wid, w_wid), stride=(h_wid, w_wid), padding=(int(h_pad), int(w_pad)))
            x = maxpool(previous_conv)
            if (i == 0):
                spp = x.view(num_sample, -1)
            else:
                spp = torch.cat((spp, x.view(num_sample, -1)), 1)
        return spp
    
    def __init__(self, dim=768):
        super(DWConv, self).__init__()
        self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim)

    def forward(self, x, H, W):
        B, N, C = x.shape
        x = x.transpose(1, 2).view(B, C, H, W)
        x = self.dwconv(x)
        x = x.flatten(2).transpose(1, 2)
        x = self.spatial_pyramid_pool(previous_conv_size=x.shape)
        return x

Traceback

 x = self.spatial_pyramid_pool(previous_conv_size=x.shape)
TypeError: spatial_pyramid_pool() missing 1 required positional argument: 'previous_conv'