Backward propagation bug?

Forward pass didn’t yield any error but got error when loss.backward():

/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/nn/_functions/loss.py:161: UserWarning: other is not broadcastable to self, but they have the same number of elements.  Falling back to deprecated pointwise behavior.
  _output.mul_(-1).mul_(y)
Traceback (most recent call last):
  File "simple_batch3.py", line 400, in <module>
    graphqa(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
  File "simple_batch3.py", line 360, in graphqa
    loss.backward()
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "/home/heeyoung-gpu/anaconda2/envs/Python3/lib/python3.6/site-packages/torch/nn/_functions/loss.py", line 181, in backward
    grad_input1.masked_fill_(mask, 1)
RuntimeError: The expanded size of the tensor (1) must match the existing size (32) at non-singleton dimension 1. at /pytorch/torch/lib/TH/generic/THTensor.c:309

Got this error when replacing consine similarity to MLP:
score = self.W3(F.tanh(torch.matmul(out1, self.W1) + torch.matmul(out2, self.W2))

Originally out1 is (32, 100) and out2 is (32, 8, 100).

==================================================================
Solved.
Score dimension should be (32) not (32, 1).