Altering ResNet18 for single channel images

Hi PyTorch users!

Is there a way to alter ResNet18 so that training will not cause size mismatch errors when using single channel images as opposed to 3-channel images?

I have so far changed my input images so that they are 224x224, altered the number of input channels, and as this is a regression problem I have changed the output to be 1 node but the convolutions are having trouble:

ResNet(
  (conv1): Conv2d(, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (fc): Linear(in_features=512, out_features=1, bias=True)
)

The error:

RuntimeError: size mismatch, m1: [64 x 802816], m2: [65536 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

Did you change more than the last layer?
I assume your input is now in the shape [batch_size, 3, 224, 224]?
The model looks alright and I can’t see, where this size mismatch occurs.
Could you post the whole stack trace or the mode where you’ve manipulated the model?

I also think it will be helpful to show how you have resized the image, created three channels, and your dataloader. This likely is not a problem with your model but rather with your input.

My input is now [batch_size, 1, 244, 244] as my images are only single channel.
I’m not sure how to output the stack trace but I can give it a try!

My model manipulation is as follows:

resnet18 = models.resnet18()
resnet18.conv1 = torch.nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
resnet18.fc = = torch.nn.Linear(512, 1)

As suggested by David below I’ll output the input sizes just to be sure though.

Thanks your the reply :slight_smile: I resize the images by using:

transform = transforms.Compose([transforms.ToPILImage(),transforms.Resize((224,224)),transforms.ToTensor(), transforms.Normalize([0.5], [0.5])])

The images are 64x64 numpy arrays originally before the tranformation but they are only single channel images. I’ll take a look into the input dimensions as they should be [batch_size, 1, 224, 224] after the dataloader which looks as follows:

loader = DataLoader(FITSCubeDataset(data_path, cube_length, transforms, img_size), 
                    batch_size=batch_size, shuffle = False, sampler=train_sampler)

Your code works fine using this code:

x = torch.randn(1, 1, 224, 224)
output = resnet18(x)

I think you might have a typo in your shapes.
Are you using 224 or 244 as the spatial size?
Your transformation code looks fine. However, here you say your input is [batch_size, 1, 244, 244].

I had a look and printed the batch.size()

OUT:

torch.Size([64, 1, 224, 224])

Yes so the transform was to resize the images to 224 by 224 as they are originally 64x64.

Not really sure why it’s producing an error now :thinking:

That’s strange.
Could you just run your code with my dummy input:

resnet18 = models.resnet18()
resnet18.conv1 = torch.nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
resnet18.fc = = torch.nn.Linear(512, 1)
x = torch.randn(64, 1, 224, 224)
output = resnet18(x)

That works fine…I think it could be due to fact I haven’t altered the output shape given that the error has the following:

output = self.classifier(features.view(int(x.size()[0]),-1))

Are you sure you are using the ResNet?
models.resnet18 doesn’t have a self.classifier member, but self.fc.
Maybe you are unintentionally using another model like models.vgg16?

Ah yep you’re completely right! I’ve wrapped the training into a class and forgot to switch the model out for ResNet :roll_eyes: sorry about that!

1 Like

No worries. I had the same mistake quite a few times. :slight_smile:
Good it’s working now!

Thanks! I’ve got another error now but I’ll try to work it out myself before I post on here :slight_smile:

Quick update:

So I’ve got the pretrained model working with some final layers frozen but I’m only managing to make it work with the following:

resnet = models.resnet50(pretrained=True)
resnet.conv1 = torch.nn.Conv2d(1,64, kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False)
resnet.fc = torch.nn.Linear(2048,1)

Even though in my mind the fc layer should be (512,1) but if I don’t use (2041,1) I get a size mismatch error

resnet18 uses 512 input features for the fc layer, while resnet50 uses 2048.

1 Like

Ah yep you’re quite right… It’s been a long week :sweat_smile:

1 Like

Yea, there is an expansion number in the blocks. If you look at resnet there is an expansion number.

You can use the following code:

resnet = models.resnet50(pretrained=True)
resnet.conv1 = torch.nn.Conv2d(1,64, kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False)
resnet.fc = torch.nn.Linear(512 * resnet.layer1[0].expansion,1)

That will work if you use any of the resnets…

Also if you want it to work with any size input, you can use torch.conv.AdaptiveAvgPool2d(1) for the average pooling layer. As long as the initial input’s size is large enough to go through all the convolutions.