Imagenet example with inception v3

@achaiah Have you figured out how to use the inception-v3 model? How can I compute the aux loss and do backpropagation?

Not completely. Adjusting the image size is straightforward (just change the transform to reflect the new size) but waiting to see if @smth has any pointers on how to compute (and propagate) the loss.

size adjustment:

                transforms.Normalize(mean, std)
1 Like

OK, thanks anyway. I’ve already figured out how to use inception v3.

@wangg12 would you mind sharing your findings?

I just add the loss and aux loss (say total_loss), in the train phase, backprop the total_loss.

Ok, so you just do loss1 + loss2 like here How to extract features of an image from a trained model ? Then backprop them?

1 Like

If so, do you use the same criterion with same targets for both losses?

Yes. I use the same criterion with same targets for both losses. And it seems to work well for my task.

1 Like

Thanks, I appreciate the feedback. I was just about to try the same thing.

According to the useful guidelines of @achaiah & @wangg12, I can fine tune the inception v3 model. However, I can’t save this model correctly and then reuse it again. Would you please help me?
I have tested both of the methods described at Recommended approach for saving a model, but they don’t work correctly for inception v3 model.

In fact, when I used the, ‘./model.pth’)

approach, when I loaded the model and re use it (i.e., test some new images via the model and forward them to the loaded model), the following error has occurred:

result = i.index(self.index)
IndexError: index 1 is out of range for dimension 0 (of size 1)

And when I used the following approach for saving the model, ‘./model.pth’)

the loaded model give me wrong answers (i.e., can’t predict the label of the images correctly in most of the samples).

1 Like

There are two issues here to look out for.

  1. Make sure you replace the FC and Bias layers just like you did when fine-tuning the model, since the original model still has the default number of classes.
  2. Make sure you somehow preserve and restore the class_to_idx mapping. Remember that the model doesn’t know anything about names of actual classes so you have store it yourself. A simple mistake (that I’ve made) is to have fewer classes in the test set than in the train set. You’d wouldn’t catch the problem at first but if you’re using an ImageFolder then it will create a different class_to_idx mapping from your test set than from your train set if they have a different number of classes.
1 Like

I face the same issue…

IndexError                                Traceback (most recent call last)
<ipython-input-14-3fe681399c1a> in <module>()
  1 t1 = time.time()
  2 print("%Y-%m-%d %H:%M:%S"))
----> 3 model_conv = train_model(model_conv, criterion, optimizer_conv,exp_lr_scheduler, num_epochs=10)
  4 t2 = time.time()
  5 print("%Y-%m-%d %H:%M:%S"))

<ipython-input-8-c5340d5d5434> in train_model(model, criterion, optimizer, lr_scheduler, num_epochs)
 49                 loss = None
 50                 # for nets that have multiple outputs such as inception
---> 51                 outputs = model(inputs)
 52                 if isinstance(outputs, tuple):
 53                     loss = sum((criterion(o,labels) for o in outputs))

/home/anaconda3/lib/python3.6/site-packages/torch/nn/modules/ in __call__(self, *input, **kwargs)
205     def __call__(self, *input, **kwargs):
--> 206         result = self.forward(*input, **kwargs)
207         for hook in self._forward_hooks.values():
208             hook_result = hook(self, input, result)

/home/anaconda3/lib/python3.6/site-packages/torchvision/models/ in forward(self, x)
 72             x = x.clone()
 73             x[0] = x[0] * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
---> 74             x[1] = x[1] * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
 75             x[2] = x[2] * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
 76         # 299 x 299 x 3

/home/anaconda3/lib/python3.6/site-packages/torch/autograd/ in __getitem__(self, key)
 67                 type( == 'ByteTensor'):
 68             return MaskedSelect()(self, key)
---> 69         return Index(key)(self)
 71     def __setitem__(self, key, value):

/home/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/ in forward(self, i)
 14     def forward(self, i):
 15         self.input_size = i.size()
---> 16         result = i.index(self.index)
 17         self.mark_shared_storage((i, result))
 18         return result

IndexError: index 1 is out of range for dimension 0 (of size 1)

What can I do?

I face the same question while fine-tune the InceptionV3.
And I transform the image to 299*299.
Pytorch version=0.2.0+de24bb4

Traceback (most recent call last):
File “”, line 263, in
train(train_loader, model, criterion, optimizer, epoch)
File “”, line 126, in train
outputs = model(input_var)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/”, line 252, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/”, line 60, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/”, line 70, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/”, line 67, in parallel_apply
raise output
RuntimeError: size mismatch at /home/scw4750/AIwalker/pytorch/source/pytorch/torch/lib/THC/generic/

If I am not wrong, I think the inception v3 model was trained with different image preprocessing. After unroll the equation:

new_x = (original_x*0.299) / 0.5 + (0.485-0.5)/0.5  -->
new_x = (0.299*original_x + 0.485) / 0.5 - 1

the original_x is:

original_x = ((new_x + 1) * 0.5 - 0.485) / 0.299

which is mean subtraction and one unit deviation normalization by ( 0.485 and 0.299 resp.) this means that new_x was ranged between -1 and 1.

inception v3 has two outputs. so how to get the final predict or accuracy with the two outputs?

1 Like

The ‘aux’ layer is used only for training. On inference time, you have just the output of the final layer.

Please correct me if I’m wrong: there’s no need to do the mandatory normalization (“The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]” since it’s already included in the model (as long as transform_input is set to True:

For those who are still stuck on this issue (from here and then):

if isinstance(outputs, tuple):
    loss = sum((criterion(o,labels) for o in outputs))
    loss = criterion(outputs, labels)

If this is true than the master documentation needs to be changed. It states: “All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]”.

1 Like

Your answer saved my life.