Inceptionv3 for different image size

bmarami · January 2, 2018, 3:58pm

Hello, Pytorch forum!
I am looking for an example of modifying and fine tuning the pretrained inceptionV3 for different image sizes! Any hint?

colesbury · January 3, 2018, 8:02pm

Fine tuning makes sense. Look at the ImageNet example or the Transfer learning tutorial.

I suspect it’ll be easier to scale and/or crop your images than to try to adapt InceptionV3 to a different image size. What size images do you have?

For smaller images, you’ll have to zero-pad or scale and crop them.

For larger images, you can scale and crop them or apply them in a “fully convolutional” manner. Scaling and cropping will be more efficient.

To apply them in a FC manner, replace the nn.Linear layers with 1x1 nn.Conv2d convolutions. You’ll then get multiple predictions per-image. This can be significantly more computationally expensive, but might give you a bit better predictions. (Image detection and segmentation networks tend to use this method.)

You could also try adding layers to the Inception definition, but I wouldn’t recommend it. I don’t think the initializing from the pre-trained weights would help much in that case.

Here’s the inception model definition for reference.

Again, I think your best bet is to scale and crop your images. But if that’s not acceptable for some reason, you can try one of the methods above.

bmarami · February 12, 2018, 8:09pm

I had to use a specific size image and resize/scale was not an option. So, I used a code like:

class CustomInceptionV3(models.Inception3):
def init(self, model_orig, num_classes):
super(CustomInceptionV3, self).init()
num_feats = model_orig.fc.in_features
self.fc = nn.Linear(num_feats, num_classes)
self.aux_logits = model_orig.aux_logits

    self.Conv2d_1a_3x3 = model_orig.Conv2d_1a_3x3
    self.Conv2d_2a_3x3 = model_orig.Conv2d_2a_3x3
    self.Conv2d_2b_3x3 = model_orig.Conv2d_2b_3x3

    self.Conv2d_3b_1x1 = model_orig.Conv2d_3b_1x1
    self.Conv2d_4a_3x3 = model_orig.Conv2d_4a_3x3

    self.Mixed_5b = model_orig.Mixed_5b
    self.Mixed_5c = model_orig.Mixed_5c
    self.Mixed_5d = model_orig.Mixed_5d
    self.Mixed_6a = model_orig.Mixed_6a
    self.Mixed_6b = model_orig.Mixed_6b
    self.Mixed_6c = model_orig.Mixed_6c
    self.Mixed_6d = model_orig.Mixed_6d
    self.Mixed_6e = model_orig.Mixed_6e
    self.Mixed_7a = model_orig.Mixed_7a
    self.Mixed_7b = model_orig.Mixed_7b
    self.Mixed_7c = model_orig.Mixed_7c

def forward(self, x):

    if self.transform_input:
        x = x.clone()
        x[:, 0] = x[:, 0] * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
        x[:, 1] = x[:, 1] * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
        x[:, 2] = x[:, 2] * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
    x = self.Conv2d_1a_3x3(x)
    x = self.Conv2d_2a_3x3(x)
    x = self.Conv2d_2b_3x3(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = self.Conv2d_3b_1x1(x)
    x = self.Conv2d_4a_3x3(x)
    x = F.max_pool2d(x, kernel_size=3, stride=2)
    x = self.Mixed_5b(x)
    x = self.Mixed_5c(x)
    x = self.Mixed_5d(x)
    x = self.Mixed_6a(x)
    x = self.Mixed_6b(x)
    x = self.Mixed_6c(x)
    x = self.Mixed_6d(x)
    x = self.Mixed_6e(x)
    if self.training and self.aux_logits:
        aux = self.AuxLogits(x)
    x = self.Mixed_7a(x)
    x = self.Mixed_7b(x)
    x = self.Mixed_7c(x)
    x = F.adaptive_avg_pool2d(x, 1)
    x = x.view(x.size(0), -1)
    x = self.fc(x)
    if self.training and self.aux_logits:
        return x, aux
    return x

class CustomInceptionAux(nn.Module):

def __init__(self, in_channels, num_classes):
    super(CustomInceptionAux, self).__init__()
    self.conv0 = BasicConv2d(in_channels, 128, kernel_size=1)
    self.conv1 = BasicConv2d(128, 768, kernel_size=5)
    self.conv1.stddev = 0.01
    num_feats = 768
    self.fc = nn.Linear(num_feats, num_classes)

def forward(self, x):
    x = F.adaptive_avg_pool2d(x, 5)
    x = self.conv0(x)
    x = self.conv1(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)
    return x

class BasicConv2d(nn.Module):

def __init__(self, in_channels, out_channels, **kwargs):
    super(BasicConv2d, self).__init__()
    self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
    self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

def forward(self, x):
    x = self.conv(x)
    x = self.bn(x)
    return F.relu(x, inplace=True)

model_orig = torchvision.models.inception_v3(pretrained=True)
model = CustomInceptionV3(model_orig, num_classes=4)
model.AuxLogits = CustomInceptionAux(768, 4)
model.AuxLogits.conv0.conv.weight.data = model_orig.AuxLogits.conv0.conv.weight.data
model.AuxLogits.conv0.bn.data = model_orig.AuxLogits.conv0.bn.weight.data
model.AuxLogits.conv1.conv.weight.data = model_orig.AuxLogits.conv1.conv.weight.data
model.AuxLogits.conv1.bn.data = model_orig.AuxLogits.conv1.bn.weight.data
model.eval()

hyy123 · October 30, 2018, 9:28am

After change the code to make available of variable size of images. Have you test the performance? Is it same as the original one?

bmarami · October 30, 2018, 1:43pm

I tested the performance of the model on new 512*512 images, did not do comparison. However, as @colesbury suggested, the best option for different size images is resizing the images and tuning the pretrained networks for your application.

Rogerwilco · December 4, 2018, 6:56pm

@bmarami is it possible to extend this, up to 1024*1204 input? I am currently using Detectnet and have a very large dataset already adjusted to this size. I want to try a different network but size is a problem. I cannot resize dataset since there are big and very small objects together in images.

bmarami · December 4, 2018, 7:18pm

because of F.adaptive_avg_pool2d() this would work (I mean without giving error) for any image size, however, the accuracy of the pre-trained network would be low and you have to retrained the network.
Again, I think you would get better results if you downsample images at the beginning, rather than using a larger kernel for pooling at the end.

Rogerwilco · December 4, 2018, 7:32pm

Thank you for your comments, the problem is, my small object size is around 35-60 pixels and downscaling will probably kill useful information. Training from scratch should not be a big deal since dataset is large enough, I have started from zero with Detectnet many times, usually it converges in first 30-50 epochs. Really wonder about accuracy with inceptionv3. Typically I was able to get over 65 mAP, for a single class.
About other OD models, are there any other models in pytorch, supporting larger inputs? (512 and beyond)
Regards,

TheBrownGirl · May 14, 2020, 9:39am

I tried to run this code but it is giving this error.
File “/home/Drive2/shrey/shrey/code/Data/BACH/Data/Training/model_1.py”, line 22, in init
self.fc = nn.Linear(num_feats, num_classes)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 424, in setattr
“cannot assign module before Module.init() call”)
AttributeError: cannot assign module before Module.init() call
Can you please help me with this?