Modify ResNet or VGG for single channel grayscale

Hi,

I’m working on infrared data which I convert to grayscale 64x64 (although I can use other sizes, but usually my GPU runs out of memory). I used VGG11 but I manually recreated the architecture in order to use it, which is a very cumbersome and error prone task.
I was wondering if there is an easier way to modify VGG19 or ResNet architectures in a fast and simpler way to use my 64x64 single channel input, and if yes, would that make sense since those models are fine-tuned for 3 channel RGB?
I’ve read this discussion and it seems to me that all I have is create a class which inherits from VGG19 or ResNet-152 and modify the first Conv layer?

Many thanks,
Alex

5 Likes

Modifiying ResNet is very easy and more powerful (than VGG).
This is a copy of official pytorch implementation

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

you just have to check self.conv1 to

        self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)

And that’s all.

I am afraind that there is no fine tunning… you would be training from the scratch.

ResNet input is 224x224 by default. Code will run with 64 by 64 of course but all the pretraining would be not very useful.

You should also consider what are you using this net to. The output size using a 224x224 input is 8x8 (forgetting about fully connected and these stuff). Using a 64x64 input will generate a much smaller output.

Considering those facts, do the best choise :slight_smile:

8 Likes

If you want to make use of a pretrained network, consider feeding your grayscale image as RGB image to the network, by pasting your grayscale information to all three channels. There might be some clever variants of this technique, Jeremy Howard from fast.ai talked about this a bit in his lectures, unfortunately I don’t remember in which lecture / timestamp exactly.

3 Likes

@JuanFMontesinos Thank you a ton! Didn’t expect to be spoon-fed the answer. Yes, I agree that 64x64 is much smaller, only reason I use it is because the entire dataset fits in my GPU’s RAM (about 10Gb). Using 224x224 makes it considerably slower because every mini-batch must be uploaded to fit.

@ptab Thanks for the suggestion. I am not doubting there might be benefits from doing that, but it seems to me like an overkill, what could it potentially offer? I can use RGB, but I’d rather stick to grayscale for now, however thanks again!

Yeh I know but… even if you have to do some iterations it’s better than using that size i guess.

Abou that @ptab said, using grayscale you loose the preteaining, and imagenet weights are very good weights trained on millions of images. It’s an standard when using a pretrained visual architecture for general purposes

My task is face valence/expression classification so I doubt imagenet weights will be of much use.
I’ll try and see if I can somehow speed up training in mini-batch allocation since I’ve got a Titan Xp which seems to idle when I don’t fit the data-set in GPU memory. Thanks for the advise and help, I’ll give it a try and report back!
My current overall accuracy appears to be around 69-70% using a greyscale derivate of VGG11 with batch normalisation.

Not sure right now, you can check dlib which is a library for face recognition. I don’t know the architecture it uses

Pre-rained imageNet models did well even with OCR data, so, I would not be surprised if they do well with face data too.

I can’t seem to get it to work.
I’ve tried a slightly different approach:

import torch
import torch.nn as nn
import torchvision.models as models

class resnet152_mech(models.resnet152(pretrained=False)):

    def __init__(self, block, layers, num_classes=4):
        self.inplanes = 64
        super(resnet152_mech, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)

However when I create such an object, I get an error:

Traceback (most recent call last):
File “resnet_test.py”, line 1, in
import resnet_mechion as mechion
File “/home/zuperath/code/mechion_core/python/resnet_mechion.py”, line 7, in
class resnet152_mech(models.resnet152(pretrained=False)):
File “/home/zuperath/anaconda2/lib/python2.7/site-packages/torchvision/models/resnet.py”, line 106, in init
self.layer1 = self._make_layer(block, 64, layers[0])
File “/home/zuperath/anaconda2/lib/python2.7/site-packages/torchvision/models/resnet.py”, line 123, in _make_layer
if stride != 1 or self.inplanes != planes * block.expansion:
AttributeError: ‘str’ object has no attribute ‘expansion’

I’m not sure what this is, seems to me some logic test tabout planes and blocks fails?

Hi, I think you are commiting a coding error.
You are creating a inherited class from an object. I reviewed the source code and when u call

models.resnet152(pretrained=False)

You are defining a model, it means, u are creating an instance of the class ResNet. When you inherit a class from another class you have to call the class, not the instance.
Therefore:
Your fixed code is:

import torch
import torch.nn as nn
import torchvision.models as models

class resnet152_mech(models.resnet.ResNet):

    def __init__(self, block, layers, num_classes=4):
        self.inplanes = 64
        super(resnet152_mech, self).__init__()
        self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)

For later on defining your model you have to use a similar instance definition:

 model = resnet152_mech(models.resnet.Bottleneck, [3, 8, 36, 3], **kwargs)

Oh, dont forget about modifiying **kwargs xd
Something like that

5 Likes

Apologies, I came from a C++ background, of course what you said makes perfect sense.
I’ll try with it again.

Hi, I have tried

class resnet2ch(torchvision.models.resnet.ResNet):
    def __init__(self,block, layers, num_classes=4):
        self.inplanes = 64
        super(resnet2ch, self).__init__()
        self.conv1 = nn.Conv2d(2, 64, kernel_size=7, stride=2, padding=3,bias=False)

and use

model= resnet2ch(torchvision.models.resnet.Bottleneck,[2, 2, 2, 2])

to build resnet18 with two channels. But I got the error:

super(resnet2ch, self).__init__()
TypeError: __init__() missing 2 required positional arguments: 'block' and 'layers'

Do you have any idea? thank you.

The parent class expects these arguments, so try to pass them to the super() call:

def __init__(self, block, layers, num_classes=4):
    super(resnet2ch, self).__init__(block, layers)
    ...
2 Likes

Thanks alot.
I have tried something like:

model = resnet18()
model.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3, bias=False)

from this post Transfer learning with different inputs - #5 by austin

1 Like
model= resnet2ch(torchvision.models.resnet.Bottleneck,[2, 2, 2, 2])

what does the [2,2,2,2] specify here?

The list specifies the layers argument to create different resnet architectures as seen here.

1 Like

I don’t see why this has to be true. For example, fastai automatically sums the 3-channel weights to produce 1-channel weights for the input layer when you provide a 1-channel input instead of the usual 3-channel input.

1 Like