Invalid size error while performing backward in vanilla AlexNet architecture with FCN

anil_batra · March 4, 2018, 5:28pm

Hi,
I am using AlexNet architecture for semantic segmentation with 6 classes. The architecture is:

AlexNetFCN(
  (conv1): Sequential(
    (0): Conv2d(3, 96, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
  )
  (pool1): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (conv2): Sequential(
    (0): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU(inplace)
  )
  (pool2): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (conv3): Sequential(
    (0): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
  )
  (conv4): Sequential(
    (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
  )
  (conv5): Sequential(
    (0): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
  )
  (pool3): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (fc6): Sequential(
    (0): Conv2d(256, 4096, kernel_size=(1, 1), stride=(1, 1))
    (1): BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU(inplace)
    (3): Dropout2d(p=0.5)
  )
  (fc7): Sequential(
    (0): Conv2d(4096, 4096, kernel_size=(1, 1), stride=(1, 1))
    (1): BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU(inplace)
    (3): Dropout2d(p=0.5)
  )
  (score_fc7): Conv2d(4096, 6, kernel_size=(1, 1), stride=(1, 1))
  (rescale): UpsamplingBilinear2d(size=(227, 227), mode=bilinear)
)

But I am getting an error while performing loss.backward() and I am not able to find/get it from error trace. here is log and error trace:

Training Epoch: 0
Conv1 size => torch.Size([16, 96, 56, 56])
Pool1 size => torch.Size([16, 96, 27, 27])
Conv2 size => torch.Size([16, 256, 27, 27])
Pool2 size => torch.Size([16, 256, 13, 13])
Conv3 size => torch.Size([16, 384, 13, 13])
Conv4 size => torch.Size([16, 384, 13, 13])
Conv5 size => torch.Size([16, 256, 13, 13])
Pool3 size => torch.Size([16, 256, 6, 6])
fc6 size => torch.Size([16, 4096, 6, 6])
fc7 size => torch.Size([16, 4096, 6, 6])
score_fc7 size => torch.Size([16, 6, 6, 6])

Traceback (most recent call last):
  File "semantic_alex_ce.py", line 381, in <module>
    train(epoch)
  File "semantic_alex_ce.py", line 269, in train
    loss.backward()
  File "/home/anil.k/miniconda2/envs/torch/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/anil.k/miniconda2/envs/torch/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
  File "/home/anil.k/miniconda2/envs/torch/lib/python2.7/site-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "/home/anil.k/miniconda2/envs/torch/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 481, in backward
    grad_tensor = grad_tensor.masked_scatter(mask, grad_output)
  File "/home/anil.k/miniconda2/envs/torch/lib/python2.7/site-packages/torch/autograd/variable.py", line 427, in masked_scatter
    return self.clone().masked_scatter_(mask, variable)
RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at /opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCTensor.c:326

Please help me to identify where is the size issue.
Thanks in advance!
Anil

anil_batra · March 4, 2018, 6:28pm

The following link help me to get the exact error and now I am able to fix.

gdb python
catch throw
run <script name>
backtrace

Thanks.