Do y
and z
computation dispatch to the same kernel? Are they the same for efficiency concerns? (also asked in Difference between mean() method and AvgPooling)
x = torch.rand(16, 512, 64, 64)
y = F.adaptive_avg_pool2d(x, (1, 1)) # should I refactor my code to do this instead of the following line? (it's maybe less flexible)
z = x.movedim(-3, -1).mean(dim = [-3, -2], keepdim = True).movedim(-1, -3) # some convoluted reshapes and keepdim mean