Handling image with different height and width and why does of the shelf model like resnet work out of the box with them?

I am trying to use the resnet from torchvision.models but feeding images of size (85,120). For some reason the network still seems to be working just by customizing the first and last layer. Can anyone help me understand how is it working given the unequal input dimensions and can I know how the data is flowing through each layer and being transformed?

Convolutions are spatial operations and they do not require fixed size. Ideally, you should be able to use any size since if the dataset is big enough, network will be used to any object size inside the image. As convolutions are spatial operations you are generating local features, thus, it doesn’t really matter the image size at all.

BTW last layer is nothing but a mapping (still fully connected network?) which is the most restrictive part.

But doesn’t resnet model expect square inputs only. I was assuming it might throw some error regarding unequal input dims. I tried a custom built network as follows but it gave me error:

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12 * 18 * 18, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=27)
        
    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        t = t.reshape(-1, 12 * 18 * 18)
        t = F.relu(self.fc1(t))
        
        t = F.relu(self.fc2(t))
        
        t = self.out(t)
        
        return t

What worked for me was:

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12 * 18 * 26, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=27)
        
    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=3, stride=2)
        
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        
        t = t.reshape(-1, 12 * 18 * 26)
        t = F.relu(self.fc1(t))
        
        t = F.relu(self.fc2(t))
        
        t = self.out(t)
        
        return t

And yeah I had to change the last layer so that output matches number of classes.

Well not really. Convolutions are not size-restricted. You may find assertions but it’s not something really related to that but the person who coded it.
What you find is an error in the reshape, which is designed to fit the fully connected size but nothing else.

1 Like