Imagenet example with inception v3

Nadav_Bhonker · April 6, 2017, 9:56am

When I run the pytorch_examples/imagenet example with inception_v3 model I get:

Traceback (most recent call last):
  File "/home/nadavb/pytorch_examples/examples/imagenet/main.py", line 287, in <module>
    main()
  File "/home/nadavb/pytorch_examples/examples/imagenet/main.py", line 130, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "/home/nadavb/pytorch_examples/examples/imagenet/main.py", line 166, in train
    output = model(input_var)
  File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
RuntimeError: CHECK_ARG(input->nDimension == output->nDimension) failed at torch/csrc/cudnn/Conv.cpp:275

It looks like I’m missing something obvious which is related to the input size. I’m not sure what exactly…
Is there a different pre-processing done for inception models?

fmassa · April 6, 2017, 10:09am

Inception requires the input size to be 299x299, while all other networks requires it to be of size 224x224. Also, if you are using the standard preprocessing of torchvision (mean / std), then you should look into passing the transform_input argument

achaiah · May 4, 2017, 9:26pm

@fmassa There seems to be another problem with customizing Inception. Even if the image size is adjusted looks like the Inception output is a tuple so the code to compute loss fails.

Code for loss computation:

output = model(input_var)
loss = criterion(output, target_var)

RuntimeError: expected a Variable argument, but got tuple

Looking at the inception implementation it seems like the following lines are being executed:

if self.training and self.aux_logits:
            return x, aux

Do you know what the aux_logits parameter is refering to? Should it simply be ignored for loss computation?

thanks

smth · May 5, 2017, 2:10am

you need to compute loss on both x and aux (separately). aux is the auxillary classifier referred to in the inception paper.

achaiah · May 5, 2017, 3:51am

I see. Do you have an idea of what specifically needs to be adapted to make it work? I’m not familiar with the exact computation of loss in the inception network so I’m not sure how to change the pytorch code.

Thanks.

wangg12 · May 5, 2017, 10:57am

@achaiah Have you figured out how to use the inception-v3 model? How can I compute the aux loss and do backpropagation?

achaiah · May 5, 2017, 1:53pm

Not completely. Adjusting the image size is straightforward (just change the transform to reflect the new size) but waiting to see if @smth has any pointers on how to compute (and propagate) the loss.

size adjustment:

transforms.Compose([
                transforms.Scale(299),
                transforms.ToTensor(),
                transforms.Normalize(mean, std)
            ])

wangg12 · May 5, 2017, 3:01pm

OK, thanks anyway. I’ve already figured out how to use inception v3.

achaiah · May 5, 2017, 3:02pm

@wangg12 would you mind sharing your findings?

wangg12 · May 5, 2017, 3:03pm

I just add the loss and aux loss (say total_loss), in the train phase, backprop the total_loss.

achaiah · May 5, 2017, 3:04pm

Ok, so you just do loss1 + loss2 like here How to extract features of an image from a trained model ? Then backprop them?

achaiah · May 5, 2017, 3:05pm

If so, do you use the same criterion with same targets for both losses?

wangg12 · May 5, 2017, 3:06pm

Yes. I use the same criterion with same targets for both losses. And it seems to work well for my task.

achaiah · May 5, 2017, 3:07pm

Thanks, I appreciate the feedback. I was just about to try the same thing.

ahkarami · June 8, 2017, 10:59am

According to the useful guidelines of @achaiah & @wangg12, I can fine tune the inception v3 model. However, I can’t save this model correctly and then reuse it again. Would you please help me?
I have tested both of the methods described at Recommended approach for saving a model, but they don’t work correctly for inception v3 model.

In fact, when I used the

torch.save(myInceptionModel, ‘./model.pth’)

approach, when I loaded the model and re use it (i.e., test some new images via the model and forward them to the loaded model), the following error has occurred:

result = i.index(self.index)
IndexError: index 1 is out of range for dimension 0 (of size 1)

And when I used the following approach for saving the model

torch.save(myInceptionModel.state_dict(), ‘./model.pth’)

the loaded model give me wrong answers (i.e., can’t predict the label of the images correctly in most of the samples).

achaiah · June 8, 2017, 2:02pm

There are two issues here to look out for.

Make sure you replace the FC and Bias layers just like you did when fine-tuning the model, since the original model still has the default number of classes.
Make sure you somehow preserve and restore the class_to_idx mapping. Remember that the model doesn’t know anything about names of actual classes so you have store it yourself. A simple mistake (that I’ve made) is to have fewer classes in the test set than in the train set. You’d wouldn’t catch the problem at first but if you’re using an ImageFolder then it will create a different class_to_idx mapping from your test set than from your train set if they have a different number of classes.

Dondon_Jie · August 4, 2017, 8:18am

I face the same issue…

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-14-3fe681399c1a> in <module>()
  1 t1 = time.time()
  2 print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
----> 3 model_conv = train_model(model_conv, criterion, optimizer_conv,exp_lr_scheduler, num_epochs=10)
  4 t2 = time.time()
  5 print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

<ipython-input-8-c5340d5d5434> in train_model(model, criterion, optimizer, lr_scheduler, num_epochs)
 49                 loss = None
 50                 # for nets that have multiple outputs such as inception
---> 51                 outputs = model(inputs)
 52                 if isinstance(outputs, tuple):
 53                     loss = sum((criterion(o,labels) for o in outputs))

/home/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
204 
205     def __call__(self, *input, **kwargs):
--> 206         result = self.forward(*input, **kwargs)
207         for hook in self._forward_hooks.values():
208             hook_result = hook(self, input, result)

/home/anaconda3/lib/python3.6/site-packages/torchvision/models/inception.py in forward(self, x)
 72             x = x.clone()
 73             x[0] = x[0] * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
---> 74             x[1] = x[1] * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
 75             x[2] = x[2] * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
 76         # 299 x 299 x 3

/home/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py in __getitem__(self, key)
 67                 type(key.data).__name__ == 'ByteTensor'):
 68             return MaskedSelect()(self, key)
---> 69         return Index(key)(self)
 70 
 71     def __setitem__(self, key, value):

/home/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py in forward(self, i)
 14     def forward(self, i):
 15         self.input_size = i.size()
---> 16         result = i.index(self.index)
 17         self.mark_shared_storage((i, result))
 18         return result

IndexError: index 1 is out of range for dimension 0 (of size 1)

What can I do?

godisboy · August 27, 2017, 12:37pm

I face the same question while fine-tune the InceptionV3.
And I transform the image to 299*299.
Pytorch version=0.2.0+de24bb4

Traceback (most recent call last):
File “inceptionv3.py”, line 263, in
train(train_loader, model, criterion, optimizer, epoch)
File “inceptionv3.py”, line 126, in train
outputs = model(input_var)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 252, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py”, line 60, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py”, line 70, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py”, line 67, in parallel_apply
raise output
RuntimeError: size mismatch at /home/scw4750/AIwalker/pytorch/source/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

lolongcovas · August 28, 2017, 10:54am

Yes,
If I am not wrong, I think the inception v3 model was trained with different image preprocessing. After unroll the equation:

new_x = (original_x*0.299) / 0.5 + (0.485-0.5)/0.5  -->
new_x = (0.299*original_x + 0.485) / 0.5 - 1

the original_x is:

original_x = ((new_x + 1) * 0.5 - 0.485) / 0.299

which is mean subtraction and one unit deviation normalization by ( 0.485 and 0.299 resp.) this means that new_x was ranged between -1 and 1.

micklexqg · September 27, 2017, 12:42pm

inception v3 has two outputs. so how to get the final predict or accuracy with the two outputs?