Resnet conv2d output has 2 dimensions (batch size and the tensor size)?

Hi, I would like to use just the beginnig part of resnet, so I did the thing below:

feature_map_extractor = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)
feature_map_extractor.layer1 = nn.Sequential()
feature_map_extractor.layer2 = nn.Sequential()
feature_map_extractor.layer3 = nn.Sequential()
feature_map_extractor.layer4 = nn.Sequential()
feature_map_extractor.fc = nn.Sequential()
feature_map_extractor.avgpool = nn.Sequential()
feature_map_extractor.maxpool = nn.Sequential()

and the model is like this as I expected:
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): Sequential()
(layer1): Sequential()
(layer2): Sequential()
(layer3): Sequential()
(layer4): Sequential()
(avgpool): Sequential()
(fc): Sequential()

and I would like to get the things that has a shape like square cause my image has square shape and resnet applies conv2d. But I got the shape below:

image = torch.randn((1,3,256,256))
a = feature_map_extractor(image)

torch.Size([1, 1048576])

Why is that ?

The final layer is a FC layer, which outputs [M, N], where M is the batch size (1 in your case) and N, which is the total number of neurons in the final FC layer.

ResNet does not JUST apply Conv2D. It also applies several layers afterwards, including the final FC layer. when you do a = feature_map_extractor(image), it is the final FC layer which determines the final output shape.

@Karthik_Ganesan But I define it as nn.Sequential(), so I must be bypass it. And as we see the output of the model architecture it is just Sequential() layer so it should do nothing.

I did

feature_map_extractor = nn.Sequential(feature_map_extractor.conv1,feature_map_extractor.bn1,feature_map_extractor.relu)

and works fine. I think resnet50 implementation has F.flatten or something like that. So I could not remove it.

Yes, models typically have a flatten or view function before the first FC layer. To take a look at all the layers and the shapes of their inputs and outputs, you can use something like Torch Summary.