Resnet conv2d output has 2 dimensions (batch size and the tensor size)?

Oguzhan · September 14, 2022, 9:58pm

Hi, I would like to use just the beginnig part of resnet, so I did the thing below:

feature_map_extractor = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
feature_map_extractor.layer1 = nn.Sequential()
feature_map_extractor.layer2 = nn.Sequential()
feature_map_extractor.layer3 = nn.Sequential()
feature_map_extractor.layer4 = nn.Sequential()
feature_map_extractor.fc = nn.Sequential()
feature_map_extractor.avgpool = nn.Sequential()
feature_map_extractor.maxpool = nn.Sequential()

and the model is like this as I expected:
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): Sequential()
(layer1): Sequential()
(layer2): Sequential()
(layer3): Sequential()
(layer4): Sequential()
(avgpool): Sequential()
(fc): Sequential()

and I would like to get the things that has a shape like square cause my image has square shape and resnet applies conv2d. But I got the shape below:

image = torch.randn((1,3,256,256))
a = feature_map_extractor(image)
a.shape

torch.Size([1, 1048576])

Why is that ?

Karthik_Ganesan · September 14, 2022, 10:51pm

The final layer is a FC layer, which outputs [M, N], where M is the batch size (1 in your case) and N, which is the total number of neurons in the final FC layer.

ResNet does not JUST apply Conv2D. It also applies several layers afterwards, including the final FC layer. when you do a = feature_map_extractor(image), it is the final FC layer which determines the final output shape.

Oguzhan · September 15, 2022, 8:09am

@Karthik_Ganesan But I define it as nn.Sequential(), so I must be bypass it. And as we see the output of the model architecture it is just Sequential() layer so it should do nothing.

Oguzhan · September 15, 2022, 11:06am

I did

feature_map_extractor = nn.Sequential(feature_map_extractor.conv1,feature_map_extractor.bn1,feature_map_extractor.relu)

and works fine. I think resnet50 implementation has F.flatten or something like that. So I could not remove it.

Karthik_Ganesan · September 16, 2022, 3:19pm

Yes, models typically have a flatten or view function before the first FC layer. To take a look at all the layers and the shapes of their inputs and outputs, you can use something like Torch Summary.