Bilinear layer gradient size mismatch

I get this error upon calling .step() of a optimizer on my model after backprop:

Traceback (most recent call last):
  File "gru_model_biliear.py", line 259, in <module>
    optimizer.step()
  File "/home/nilabhra/miniconda2/envs/pytorch/lib/python2.7/site-packages/torch/optim/adam.py", line 74, in step
    p.data.addcdiv_(-step_size, exp_avg, denom)
RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:566

The final layer of the model is a nn.Bilinear layer. The error goes away if I replace it with a nn.Linear layer.
Possible bug in the backward function?

Can you run under PDB and report what the runtime sizes of p.data and exp_avg are?

Might be a while before I can get the tensor sizes via PDB. I did check the grad sizes after back prop. The Bilinear layer weight.data was of the shape 5x5x5 while the weight.grad was of the shape 5x5. The shape of the bias and it’s gradient was consistent

Alright, learning PDB enough to get the tensor sizes was a piece of cake.
It seems they are indeed of different shape:

That certainly sounds like the issue. Here’s the implementation https://github.com/pytorch/pytorch/blob/5bb13485b8484a37f9afad67582512cf53ed13cb/torch/nn/_functions/linear.py#L35; I can see a few tricky lines but there isn’t (at least to me) an obvious bug. Maybe you can add some print statements to it locally and try to narrow things down? Or feel free to just open a github issue.

I think I should open a github issue. I have seen the implementation. Not sure if the loop body is correct, will try to understand it a bit more to see if I can come up with a fix myself.

To my understanding, the weight gradient should be:

this should shortly be fixed in master. (thanks @griffinliang for the proposed fix, it was correct).

1 Like