Confusing `cudnn_convolution` error

yngtodd · March 5, 2018, 7:14pm

Hey Everyone,

I am running into a strange error message when training a VGG-like network:

RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 2 (while checking arguments for cudnn_convolution)

Has anyone come across this before? I am running hyperparameter optimization over my network’s convolution filter sizes, and I wonder if some combination of filer sizes is causing this error. I have tried manually configuring the kernel sizes but have been unable to reproduce this error outside of my optimization code.

Here is the full error:

Traceback (most recent call last):
  File "scikit_opt.py", line 113, in <module>
    main()
  File "scikit_opt.py", line 110, in main
    res_gp = gp_minimize(objective, hparams, n_calls=10, verbose=True)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/scikit_optimize-0.5.1-py3.6.egg/skopt/optimizer/gp.py", line 228, in gp_minimize
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/scikit_optimize-0.5.1-py3.6.egg/skopt/optimizer/base.py", line 253, in base_minimize
  File "scikit_opt.py", line 69, in objective
    outputs = net(inputs)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 68, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 78, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 68, in parallel_apply
    raise output
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 44, in _worker
    output = module(*input, **kwargs)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ygx/paper_hyperspace/vgg_cifar/adaptive_model.py", line 106, in forward
    x = self.block3(x)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 75, in forward
    input = module(input)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ygx/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 2 (while checking arguments for cudnn_convolution)

Here is the network when the above error occured:

VGG(
  (block1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(10, 10), stride=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU()
    (3): Conv2d(64, 128, kernel_size=(2, 2), stride=(1, 1))
    (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU()
    (6): AdaptiveMaxPool2d(output_size=16)
  )
  (block2): Sequential(
    (0): Conv2d(128, 128, kernel_size=(6, 6), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU()
    (3): Conv2d(128, 256, kernel_size=(5, 5), stride=(1, 1))
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU()
    (6): AdaptiveMaxPool2d(output_size=7)
    (7): Dropout(p=0.5)
  )
  (block3): Sequential(
    (0): Conv2d(256, 256, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU()
    (3): Conv2d(256, 256, kernel_size=(9, 9), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU()
    (6): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1))
    (7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (8): ReLU()
    (9): AdaptiveMaxPool2d(output_size=2)
    (10): Dropout(p=0.5)
  )
  (block4): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU()
    (3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU()
    (6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1), padding=(1, 1))
    (7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (8): ReLU()
    (9): AdaptiveMaxPool2d(output_size=1)
    (10): Dropout(p=0.5)
    (11): AdaptiveAvgPool2d(output_size=1)
  )
  (linear_layers): Sequential(
    (0): Linear(in_features=512, out_features=10, bias=True)
  )
)

Edit: I had come across this issue after a quick search, but it doesn’t give much insight into the error yet:

I would appreciate a fresh pair of eyes!

RayJ · May 16, 2018, 4:56am

Any updates on this problem? I’m getting this error when doing a very simple transfer learning task with inception_v3.

stes · May 18, 2018, 12:57am

@RayJ See my comment in the issue tracker on github: https://github.com/pytorch/pytorch/issues/4884

Probably at some point in your network, the feature maps get smaller than the kernel size you want to apply. Providing a larger input image or adapting the architecture will likely fix the issue.

yngtodd · May 22, 2018, 7:05pm

@RayJ,

My issue was exactly as @stes suggested; a kernel size in the latter layers of my network was too big for the intermediate feature map size.

I found where that was by printing the size of the tensor after each layer in the forward() function. As a quick sanity check, you might try setting all of your kernel sizes to 1. If the error goes away, then you could dig further into which kernel is too large for your feature representation.