RuntimeError when fine tuning using Inception V3

stoneyang · August 8, 2018, 3:43am

Hi,

When fine-tuning using the Inception V3 shipped with pytorch-examples, I encountered the RuntimeError like follows:

RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 2 (while checking arguments for cudnn_convolution)

I inserted code:

if args.arch == 'inception_v3':
        num_aux_in = model.module.AuxLogits.fc.in_features
        print(num_aux_in)
        model.module.AuxLogits.fc = nn.Linear(num_aux_in, NUM_CLASSES)
        num_final_in = model.module.fc.in_features
        print(num_final_in)
        model.module.fc = nn.Linear(num_final_in, NUM_CLASSES)

I’ve searched around and found some issues on github:

#1
#2

They pointed this problem to Probably the size of the tensor you want to process becomes smaller than the kernel size at some point. However, I’ve done similar task using vgg16, with the same routine and the same dataset…

Any suggestions?

Thanks in advance!

P.S: The following is the full Traceback:

Traceback (most recent call last):
  File "main_ft.py", line 387, in <module>
    main()
  File "main_ft.py", line 220, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "main_ft.py", line 258, in train
    output = model(input)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torchvision/models/inception.py", line 109, in forward
    aux = self.AuxLogits(x)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torchvision/models/inception.py", line 308, in forward
    x = self.conv1(x)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torchvision/models/inception.py", line 325, in forward
    x = self.conv(x)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 2 (while checking arguments for cudnn_convolution)

stoneyang · August 8, 2018, 7:39am

The input size should be 299 instead of 224, and there are two losses in the network: a final fc and an intermediate aux_logits. One should notice these difference and adjust the crop_size in data_loader and the computation of metric. Cf. here