Change input shape dimensions for ResNet model

Hi there,

I want to feed my 3,320,320 pictures in an existing ResNet model. The model actually expects input of size 3,32,32 . As I am afraid of loosing information I don’t simply want to resize my pictures. What is the best way to preprocess my images, so that they are able to run on the ResNet34? Should I add additional layers in the forward method of ResNet? If yes, what would be a suitable combination in my case?

import torch
import torch.nn as nn
import torch.nn.functional as F
from pytorch_fitmodule import FitModule
from torch.autograd import Variable
import numpy as np

def conv3x3(in_planes, out_planes, stride=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)

class BasicBlock(FitModule):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(in_planes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion * planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion * planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion * planes)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

class ResNet(FitModule):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = conv3x3(3, 64)
        self.bn1 = nn.BatchNorm2d(64)                                           
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)      
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)     
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)     
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)     
        self.linear = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):    # add additional layers here?                                       
        x = x.float()                                              
        out = F.relu(self.bn1(self.conv1(x).float()).float())      
        out = self.layer1(out)                                      
        out = self.layer2(out)                                     
        out = self.layer3(out)                                      
        out = self.layer4(out)                                      
        out = F.avg_pool2d(out, 4)                                 
        out = out.view(out.size(0), -1)                            
        out = self.linear(out)
        return out

def ResNet34():
    return ResNet(BasicBlock, [3, 4, 6, 3])

Thanks plenty,


Use nn.AdaptiveAvgPool2d. Using this you can use any image size and your model will work. Also, if you want to get more performance you can use this you can concatenate nn.AdaptiveAvgPool2d and nn.AdaptiveMaxPool2d for more performance. After, this you can retrain your head of the model (linear layers) and it will all just work.

1 Like

Thanks for the feedback. I am afraid that I loose valuable information when pooling the 40x40 pixels image in the pooling layer with nn.AdaptiveAvgPool2d. I think it might be more usefull to add additional conv layers. As I am quite new to ML I am looking for the right combination of layers for my specific input.

Almost every model nowadays uses Adaptive pooling at the end of their model. You can also try training your model with different input size images, which would provide regularization.

You had 320x320 images. Now start your training at 80x80 resized images. Then use 160x160 resized images and train and then use 320x320 images and train. In this way you are providing 4x more pixels for your model to train on at each step. This technique requires you to use Adaptive pooling. Use a concatenation of AdaptiveAvgPool and AdaptiveMaxPool for the best results.

Perfect, thanks for the explanation. Could you describe what you mean by “concatenation of AdaptiveAvgPool”? Sorry, I am quite new to ML.

Add this module

class AdaptiveConcatPool2d(nn.Module):
    "Concats `AdaptiveAvgPool2d` and `AdaptiveMaxPool2d`."
    def __init__(self):
        self.ap = nn.AdaptiveAvgPool2d(1) = nn.AdaptiveMaxPool2d(1)

    def forward(self, x): 
        return[, self.ap(x)], 1)

Perfect, thanks plenty. When implementing the concatenation of AdaptiveAvgPool and AdaptiveMaxPool the loss hugely varies from epoch to epoch and does not decrease constantly. Do you know why this happens?
Another thing is that the loss is still quite high when using the 320x320 pixel images as inputs. Do you have an idea how to improve this?

Probably training is not complete or the model does not have enough capacity to train. Get a batch and try to overfit the model on that batch. If it does, then it means the model has the power to atleast learn on the data you provided.