Modifying ResNet18 architecture


I am working with grayscale images. I want to use the Resnet 18 architecture. I don’t want to use the pre-trained model as I am planning to train it from scratch. However, my input_image size is (512, 1536) and I cannot resize or downsample it. I need to feed this as an input to the resnet18. Once the image (feature_vectors) size reaches (44, 120) I would like to append another channel to this output.
i.e. if the output at layer “x” is (44, 120, 256), I would like to concatenate another image vector here and make the output layer “x” (44, 120, 257).

In order to find the correct layer at which the shape (44, 120) is achieved, I did some hand calculations for the layers, however, I can’t seem to figure out the answer.

I would really appreciate some help.

Alternatively to the manual shape calculation you could also add print statements to the forward of your model and check the shape during the forward pass, which could be easier.

@ptrblck Thanks for the reply. Just to follow up on that, I actually built a small ConvNet (down below) and have a few questions regarding padding, out_channels.

Basic ConvNet is:

class OurConvNet(nn.Module):
    def __init__(self):
        self.projection = None
        # sig dims expected: HxWxC = 36x36x1
        # dep dims expected: HxWxC = 36x36x1

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=31, kernel_size=1, stride=1, padding=0)

        # check forward function where sig, dep are concatenated to get 32 out_channels and hence
        # the in_channels of self.conv2 layer is 32. But, is it right?

        self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=0)
        self.conv4 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=0)

    def forward(self, sig, dep):
        # Apply convolution
        x = self.conv1(sig)

        print(f'conv1: {x.size()}')

        # Apply tanh activation
        x = torch.tanh(x)

        x = self.conv2(x)

        # concat => is it correct?
        concat =, dep), dim=1)
        print(f'concat: {concat.shape}')

        # Apply convolution
        x = self.conv3(concat)

        print(f'conv2: {x.size()}')

        # Apply Convolution 3
        x = self.conv4(x)
        print(f'conv3: {x.size()}')

        x = nn.ReLU(x)

        return x

sig = torch.randn(1, 1, 36, 36)
dep = torch.randn(1, 1, 36, 36)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

net = OurConvNet()
y = net(sig, dep)

The net output is:

  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 31, kernel_size=(1, 1), stride=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (conv4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))

As one can notice, in the self.conv2 I have 31 out channels and then another array gets concatenated in the forward method to make it 32 which then becomes the input of self.conv3.

My questions however are:

  1. Will using odd number (31) output_channels be a problem? I have always come across the output channels that are a power of 2, i.e. 32, 64, 128, 256, etc.
  2. Similarly, can I use non-square padding, i.e. padding=(3,0. As I had mentioned in my question, I did some hand calculations to modify resnet18 architecture and had to use non-square padding to get to the (44, 120) shape.
  3. I can use the summary from torchsummary when there is only one input. But how do I get the summary of the above output?

Thanks for bearing with me.

  1. Yes, you can use any valid shape and it shouldn’t break anything. Powers of two are often friendly for memory access pattern etc. so you might see performance plateaus or cliffs.

  2. Yes, same as 1.

  3. I’m not deeply familiar with the package, but it seems that multiple inputs are supported.

Thanks for clarifying all the doubts.
I really appreciate it.

Thank You.