Hi, I’m trying to calculate MseLoss between 2 tensors of size torch.Size([64]),torch.Size([64]) as follows:

```
criterion = nn.MSELoss(reduction = 'sum')
for epoch in range(5):
total_loss = 0
n_batches =0
for batch_idx, data in enumerate(train_dataloader):
s1, s2, labels, s1_length, s2_length = data
optimizer.zero_grad()
outputs, att1, att2 = model(s1,s2,s1_length,s2_length)
print(outputs.shape) #torch.Size([64])
print(labels.shape) #torch.Size([64])
loss = criterion(outputs,labels) + pen_s1 + pen_s2
print(loss)
loss.backward()
```

I keep getting this error : * grad can be implicitly created only for scalar outputs* , despite giving the

`reduction = sum`

argument inside nn.MSELoss. Could someone tell me where am I going wrong with this?