Well, the specified output size is the output size, as in the documentation.
In more detail:
What happens is that the pooling stencil size (aka kernel size) is determined to be (input_size+target_size-1) // target_size
, i.e. rounded up. With this Then the positions of where to apply the stencil are computed as rounded equidistant points between 0 and input_size - stencil_size.
Let’s have a 1d example:
Say you have an input size of 14 and a target size of 4. Then the stencil size is 4.
The four equidistant points would be 0, 3.3333, 6.6666, 10 and get rounded to 0, 3, 7, 10. And so the four items would be the mean of the slices 0:4, 3:7, 7:11, 10:14 (in Python manner, so including lower bound, excluding upper bound). You see that the first two and last two slices overlap by one. Something like - occasional overlaps of 1 - this will generally be the case when the input size is not divisible by the target size.
For experimentation, you could use arange and backward to see what happens. In the above toy example:
a = torch.arange(0,14., requires_grad=True)
b = torch.nn.functional.adaptive_avg_pool1d(a[None, None], 4)
b.backward(torch.arange(1., 1+b.size(-1))[None,None])
print (b, a.grad)
Then b is 1.5, 4.5, 8.5, 11.5
just as you would get from slicing as above and taking the mean.
The gradient a.grad shows the “receptive field of each output”:
0.2500, 0.2500, 0.2500, 0.7500, 0.5000, 0.5000, 0.5000, 0.7500, 0.7500, 0.7500, 1.7500, 1.0000, 1.0000, 1.0000
again, you see the overlap at item 3 and 10.
Best regards
Thomas