AdaptiveAvgPool2d returns a flattened vector of batch_size*num_channels

Hi everyone,
A newbie PyTorch (from Keras) convert here. I’m a confused by the behavior of the code below. It seems that AdaptiveAvgPool2D should have returned [BxC] shaped vector but instead, it returns [B*C,1]

I’m trying to take a pretrained model and slap a completely different head to it.
The second piece of code gives me, where 32768=Batch size * number of channels before AdaptiveAvfPool2D

RuntimeError: size mismatch, m1: [32768 x 1], m2: [2048 x 5005] at c:\a\w\1\s\tmp_conda_3.5_091434\conda\conda-bld\pytorch_1544087939577\work\aten\src\th\generic/THTensorMath.cpp:940

Can you please help me understand the logic of what’s going on?

import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable


#This part works
resnet50 = models.resnet50(pretrained=True)
modules=list(resnet50.children())[:-2]
resnet50=nn.Sequential(*modules)
for p in resnet50.parameters():
    p.requires_grad = True

img = torch.Tensor(16, 3, 256, 512).normal_() # random image
img_var = Variable(img) # assign it to a variable
features_var = resnet50(img_var) # get the output from the last hidden layer of the pretrained resnet
features = features_var.data # get the tensor out of the variable
print(features.shape)

#This part doesn't
resnet50 = models.resnet50(pretrained=True)
modules=list(resnet50.children())[:-2]
resnet50=nn.Sequential(*modules)
resnet50=nn.Sequential(resnet50,
                    nn.AdaptiveAvgPool2d(1),
                    nn.Linear(2048, 5005))

for p in resnet50.parameters():
    p.requires_grad = True
print(img_var.shape)
features_var = resnet50(img_var) # get the output from the last hidden layer of the pretrained resnet
features = features_var.data # get the tensor out of the variable
print(features.shape)

Or should I approach this task differently?
PS. I know about this link
https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

Thanks!

It seems like AdaptiveAvgPool2d will return whatever height and width as you pass as input, which in your case is 1. It is not clever enough to infer that you want to discard the empty dimensions as well…

So if that is your wish, you should write a custom module which loads the resnet50 layers, removes the undesired ones, and adds yours to it, then in the forward() method of your custom module, call x = torch.squeeze(x) after the AdaptiveAvgPool2d layer (which you would have defined in the __init__() method).

Using the Sequential API makes it harder to debug such cases, so I’d recommend switching to inheriting the nn.Module API with a custom module and debugging from there, so that you could print the shape of the tensor before the AdaptiveAvgPool2d layer!

After trying an AdaptiveAvgPool2d layer on a [3, 16, 7, 7]-shaped tensor on my machine, I can confirm that it works correctly and outputs a [3, 16, 1, 1]-shaped tensor.

1 Like

Thanks, Alex. I will redo everything with Module API. It seemed like a simple task for sequential, though.

1 Like