Backward propagation bug?

Forward pass didn’t yield any error but got error when loss.backward():

/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/nn/_functions/ UserWarning: other is not broadcastable to self, but they have the same number of elements.  Falling back to deprecated pointwise behavior.
Traceback (most recent call last):
  File "", line 400, in <module>
    graphqa(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
  File "", line 360, in graphqa
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/autograd/", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/autograd/", line 99, in backward
    variables, grad_variables, retain_graph)
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/autograd/", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/nn/_functions/", line 181, in backward
    grad_input1.masked_fill_(mask, 1)
RuntimeError: The expanded size of the tensor (1) must match the existing size (32) at non-singleton dimension 1. at /pytorch/torch/lib/TH/generic/THTensor.c:309

Got this error when replacing consine similarity to MLP:
score = self.W3(F.tanh(torch.matmul(out1, self.W1) + torch.matmul(out2, self.W2))

Originally out1 is (32, 100) and out2 is (32, 8, 100).

Score dimension should be (32) not (32, 1).