Adaptive_avg_pool2d vs avg_pool2d

What is the difference between adaptive_avg_pool2d and avg_pool2d under torch.nn.functional? What does adaptive mean?


In avg_pool2d, we define a kernel and stride size for the pooling operation, and the function just performs that operation on all valid inputs. For example, an avg_pool2d with kernel=3, stride=2 and padding=0, would reduce a 5x5 tensor to a 3x3 tensor, and a 7x7 tensor to a 4x4 tensor.(HxW)
In adaptive_avg_pool2d, we define the output size we require at the end of the pooling operation, and pytorch infers what pooling parameters to use to do that. For example, an adaptive_avg_pool2d with output size=(3,3) would reduce both a 5x5 and 7x7 tensor to a 3x3 tensor.
This is especially useful if there is some variation in your input size and you are making use of fully connected layers at the top of your CNN.


Thanks Mazhar! Based on what you said, it seems to me ‘adaptive’ is in the sense of adapting the kernel size and stride and maybe padding to the output size, not in the sense of varying the weights while taking the average. In other words, the average is the plain average (sum divided by the number of elements in the kernel), not a weighted average of elements that fall within the kernel. Is my understanding correct?

1 Like

That’s correct, LMA, to the best of my knowledge.

I am not sure how nn.AdaptiveAvgPool2d works…
I defined a tensor of shape (1,1,3,3)

inp = torch.tensor([[[[1,2.,3], [4,5,6], [7,8,9]]]], dtype = torch.float)

print(inp .shape)
torch.Size([1, 1, 3, 3])
tensor([[[[1., 2., 3.],
          [4., 5., 6.],
          [7., 8., 9.]]]])

Then i applied AdaptiveAvgPool2d on it and the result was not what i had expected.

out = nn.AdaptiveAvgPool2d((2,2))(inp)
tensor([[[[3., 4.],
          [6., 7.]]]])

I thought the result would look like

tensor([[[[5., 6.],
          [8., 9.]]]])

Please correct me understnding how adaptive pooling works. Thanks in advance ! :slight_smile:


Hi n0obcoder,
For the output_size of (2,2) and input size of (3,3), the kernel size would be defined as (2,2). Accordingly,
The output will be

tensor([[[[(1+2+4+5)/4., (2+3+5+6)/4.],     = tensor([[[[3., 4.],
          [(4+5+7+8)/4., (5+6+8+9)/4.]]]])              [6., 7.]]]])

If we were using max pooling, the output would be

tensor([[[[max(1,2,4,5), max(2,3,5,6)],     = tensor([[[[5., 6.],
          [max(4,5,7,8), max(5,6,8,9)]]]])              [8., 9.]]]])

Hope this helps!


my bad. I was using AdaptiveAvgPool2d and expecting it to work like AdaptiveMaxPool2d :stuck_out_tongue:
Thanks for the reply anyways : D

1 Like

btw, do u have any idea on how to make the architecture for an autoencoder for varying size input images ?

Is there a way to replace AdaptiveAvgPooling by AvgPooling, i. e. how to calculate kernel size, stride and padding?

Have a look at this.