When I run the pytorch_examples/imagenet example with inception_v3 model I get:
Traceback (most recent call last):
File "/home/nadavb/pytorch_examples/examples/imagenet/main.py", line 287, in <module>
main()
File "/home/nadavb/pytorch_examples/examples/imagenet/main.py", line 130, in main
train(train_loader, model, criterion, optimizer, epoch)
File "/home/nadavb/pytorch_examples/examples/imagenet/main.py", line 166, in train
output = model(input_var)
File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/home/nadavb/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
raise output
RuntimeError: CHECK_ARG(input->nDimension == output->nDimension) failed at torch/csrc/cudnn/Conv.cpp:275
It looks like I’m missing something obvious which is related to the input size. I’m not sure what exactly…
Is there a different pre-processing done for inception models?
Inception requires the input size to be 299x299, while all other networks requires it to be of size 224x224. Also, if you are using the standard preprocessing of torchvision (mean / std), then you should look into passing the transform_input argument
@fmassa There seems to be another problem with customizing Inception. Even if the image size is adjusted looks like the Inception output is a tuple so the code to compute loss fails.
Code for loss computation:
output = model(input_var)
loss = criterion(output, target_var)
RuntimeError: expected a Variable argument, but got tuple
Looking at the inception implementation it seems like the following lines are being executed:
if self.training and self.aux_logits:
return x, aux
Do you know what the aux_logits parameter is refering to? Should it simply be ignored for loss computation?
I see. Do you have an idea of what specifically needs to be adapted to make it work? I’m not familiar with the exact computation of loss in the inception network so I’m not sure how to change the pytorch code.
Not completely. Adjusting the image size is straightforward (just change the transform to reflect the new size) but waiting to see if @smth has any pointers on how to compute (and propagate) the loss.
According to the useful guidelines of @achaiah & @wangg12, I can fine tune the inception v3 model. However, I can’t save this model correctly and then reuse it again. Would you please help me?
I have tested both of the methods described at Recommended approach for saving a model, but they don’t work correctly for inception v3 model.
In fact, when I used the
torch.save(myInceptionModel, ‘./model.pth’)
approach, when I loaded the model and re use it (i.e., test some new images via the model and forward them to the loaded model), the following error has occurred:
result = i.index(self.index)
IndexError: index 1 is out of range for dimension 0 (of size 1)
And when I used the following approach for saving the model
Make sure you replace the FC and Bias layers just like you did when fine-tuning the model, since the original model still has the default number of classes.
Make sure you somehow preserve and restore the class_to_idx mapping. Remember that the model doesn’t know anything about names of actual classes so you have store it yourself. A simple mistake (that I’ve made) is to have fewer classes in the test set than in the train set. You’d wouldn’t catch the problem at first but if you’re using an ImageFolder then it will create a different class_to_idx mapping from your test set than from your train set if they have a different number of classes.
I face the same question while fine-tune the InceptionV3.
And I transform the image to 299*299.
Pytorch version=0.2.0+de24bb4
Traceback (most recent call last):
File “inceptionv3.py”, line 263, in
train(train_loader, model, criterion, optimizer, epoch)
File “inceptionv3.py”, line 126, in train
outputs = model(input_var)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 252, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py”, line 60, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py”, line 70, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File “/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py”, line 67, in parallel_apply
raise output
RuntimeError: size mismatch at /home/scw4750/AIwalker/pytorch/source/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243