ResNet as backbone/feature extractor - undesired output dimensions

Hi, I have been attempting to leverage the pre-trained ResNet model as a feature extractor. I have removed the final fc and pooling stages of the network and the output shape is (1, 2048, 7, 7).
This is not the type of features I want. What I want is a feature per pixel, i.e. an output of the following shape: (1, n_features, H, W) where H and W are the height and width of the input image.

Is there any intermediate layer which outputs this kind of shape in ResNet? Is there any other pre-trained model in pytorch which offers this?

I don’t think you could use ResNets for your use case, as the first conv layer would already reduce the spatial size (layer definition here):

conv1 = nn.Conv2d(3, 6, kernel_size=7, stride=2, padding=3, bias=False)
x = torch.randn(1, 3, 224, 224)
out = conv1(x)
out.shape
> torch.Size([1, 6, 112, 112])

I would guess that the majority of CNNs are downsampling the spatial size in the feature extractor stage and don’t know a specific model, which would keep the spatial size throughout the model.

Thanks for your response! That is sad to hear. Do you perhaps know of any networks that have been used to upsample resnet (or any other pre-trained network) features?

I think you might find interesting model architectures when searching for segmentation models, as they are often returning the original input shape (e.g. UNets).