What is AdaptiveAvgPool2d?

xu_wang · October 10, 2018, 5:58am

The AdaptiveAvgPool2d layers confuse me a lot.
Is there any math formula explaning it?

tom · October 10, 2018, 6:34am

Well, the specified output size is the output size, as in the documentation.

In more detail:
What happens is that the pooling stencil size (aka kernel size) is determined to be (input_size+target_size-1) // target_size, i.e. rounded up. With this Then the positions of where to apply the stencil are computed as rounded equidistant points between 0 and input_size - stencil_size.
Let’s have a 1d example:
Say you have an input size of 14 and a target size of 4. Then the stencil size is 4.
The four equidistant points would be 0, 3.3333, 6.6666, 10 and get rounded to 0, 3, 7, 10. And so the four items would be the mean of the slices 0:4, 3:7, 7:11, 10:14 (in Python manner, so including lower bound, excluding upper bound). You see that the first two and last two slices overlap by one. Something like - occasional overlaps of 1 - this will generally be the case when the input size is not divisible by the target size.
For experimentation, you could use arange and backward to see what happens. In the above toy example:

a = torch.arange(0,14., requires_grad=True)
b = torch.nn.functional.adaptive_avg_pool1d(a[None, None], 4)
b.backward(torch.arange(1., 1+b.size(-1))[None,None])
print (b, a.grad)

Then b is 1.5, 4.5, 8.5, 11.5 just as you would get from slicing as above and taking the mean.
The gradient a.grad shows the “receptive field of each output”:
0.2500, 0.2500, 0.2500, 0.7500, 0.5000, 0.5000, 0.5000, 0.7500, 0.7500, 0.7500, 1.7500, 1.0000, 1.0000, 1.0000
again, you see the overlap at item 3 and 10.

Best regards

Thomas

xu_wang · October 10, 2018, 9:47am

Thank you very much.It’s very detailed explanation that give me a lot of help.

By the way,if I meet similar situation in which I want to know the math detail about some layers,where should I find it?Before questioning this AdaptiveAvgPool2d problem here,I search a lot on line and official documents too.But none of them could help me.

tom · October 13, 2018, 11:39am

You can always check the code or do experiments, but I’d venture that asking on the forums is also a fairly reliable way to get good answers. That is because of two things:

You really get the best experts to answer here (well, unless it’s me writing the answer). My main reason to skim the forums these days is that I’ll learn a ton from the answers.
The forum also is part of the eyes and ears of the developers (there are quite a few core people here in general the connection is very close).
So if it turns out there is a frequent question, people will see to amending the docs. If you happen to hit a bug, it’ll get picked up, filed, and fixed. If your use case would benefit from additional features and it’s common enough to warrant addition, it’ll also be noticed as a feature request.

Best regards

Thomas

xu_wang · October 17, 2018, 7:58am

Think you very much!
It’s my honor to have the chance to get your help.

Haimin_Hunter_Zhang · October 26, 2018, 10:42am

Hi Thomas, thanks a lot for your help.
My confusion has been cleared.

Kind regards,
Haimin

mlopezantequera · December 14, 2018, 2:33pm

Rounded down, right?

tom · December 14, 2018, 11:05pm

No this expression will round up input_size / target_size due to the “+target_size-1” in the numerator.

Best regards

Thomas

kangkang_li · March 22, 2019, 8:19am

I am still confused about how to compute 0, 3.3333, 6.6666, 10? hope your reply,thanks!

sky950506 · April 18, 2019, 7:31am

I am also confused about it,have you understood it?

crcrpar · April 24, 2019, 3:39am

To get 4 (= target_size) sections with same length (4 = kernel_size) from the array of length 14 (= input_size) allowing some overlaps, we first separate the right most section: index of 10, 11, 12, and 13. Then we divide index 0 to 10 into 3 (4 - 1).

Here’s the numpy snippet for Thomas’s example.

In [1]: import numpy as np

In [2]: input_size = 14

In [3]: output_size = 4

In [4]: kernel_size = (input_size + output_size - 1) // output_size

In [5]: kernel_size
Out[5]: 4

In [6]: np.linspace(0, input_size - kernel_size, output_size)
Out[6]: array([ 0.        ,  3.33333333,  6.66666667, 10.        ])

Susmit_Agrawal · December 26, 2020, 6:12pm

To understand this better, I tried implementing my own version of it:

def torch_pool(inputs, target_size):
    kernel_size = (inputs.shape[-1] + target_size - 1) // target_size
    points_float = torch.linspace(0, inputs.shape[-1] - kernel_size, target_size)
    points = torch.cat([torch.squeeze(torch.round(points_float)).int(), torch.tensor([inputs.shape[-1]], dtype=torch.int32)], 0)
    pooled = []
    for idx in range(points.shape[0] - 1):
        pooled.append(torch.mean(inputs[:, :, points[idx]:points[idx + 1]], dim=-1, keepdim=False))
    pooled = torch.cat(pooled, -1)
    return pooled


inps = np.array([0, 1, 2, 3, 4, 5, 6], dtype=np.float32)[None, :, None]
inps_torch = np.transpose(inps, (0, 2, 1))
x = torch_pool(torch.tensor(inps_torch), 4)
print(x)

This gives the following output:

tensor([[0.5000, 2.0000, 3.5000, 5.5000]])

However, the actual function gives:

tensor([[[0.5000, 2.0000, 4.0000, 5.5000]]])

Can anyone help me out? Where did I go wrong?

tom · December 26, 2020, 7:49pm

First off, kudos for doing your own experiment and reporting the discrepancy here!

The explanation above is imprecise relative to (todays?) implementation in PyTorch. PyTorch actually uses floor and ceiling to compute start and end points instead of mathematical rounding, so the formulas should be

def torch_pool(inputs, target_size):
    start_points = (torch.arange(target_size, dtype=torch.float32) * (inputs.size(-1) / target_size)).long()
    end_points = ((torch.arange(target_size, dtype=torch.float32)+1) * (inputs.size(-1) / target_size)).ceil().long()
    pooled = []
    for idx in range(target_size):
        pooled.append(torch.mean(inputs[:, :, start_points[idx]:end_points[idx]], dim=-1, keepdim=False))
    pooled = torch.cat(pooled, -1)
    return pooled


inps = np.array([0, 1, 2, 3, 4, 5, 6], dtype=np.float32)[None, :, None]
inps_torch = np.transpose(inps, (0, 2, 1))
x = torch_pool(torch.tensor(inps_torch), 4)
print(x)

Best regards

Thomas

Susmit_Agrawal · December 26, 2020, 7:54pm

Thank you for the clarification!

Mohamad_Hadi · August 20, 2021, 10:26pm

Hello sir. Good time
excuse me.
Does using nn.AdaptiveAvgPool2d((1,1)) make sense?
In this case, is the filter size equal to the input size?
Why is this used?
What happens in this case?

ptrblck · August 20, 2021, 10:35pm

Double post from here with follow-up.

Jie-KUN · December 7, 2021, 12:37pm

Sorry to disturb you, if the results of the np.linspace is [0, 3, 4.5, 6], the 4.5 should be the 4 or the 5?

tom · May 11, 2022, 9:06am

Anything that works for you. I’m always happy when you take bits I post here to do your own experiments and improvements.

Muhammad_Hamza_Tariq · August 19, 2024, 8:55am

@tom can I get the reference paper of the adaptive average pooling which include the equation for calculating the kernel size.