Thanks for the executable code, that was really helpful.
You are accidentally broadcasting the loss, since you have a mismatch in the output and target tensors.
While your output has the shape [batch_size, 1]
, the target has [batch_size]
.
This yields to a broadcasting as seen here:
# your code with the broadcasting
output = torch.randn(4, 1)
target = torch.randn(4)
criterion = nn.L1Loss(reduction="none")
loss = criterion(output, target)
print(loss) # you only want the diagonal
> tensor([[1.0231, 2.3743, 2.4857, 2.3248],
[1.5896, 0.2385, 0.1270, 0.2879],
[1.7572, 0.4061, 0.2946, 0.4555],
[1.2650, 0.0862, 0.1976, 0.0368]])
# fixed
target = target.unsqueeze(1)
loss = criterion(output, target)
print(loss)
> tensor([[1.0231],
[0.2385],
[0.2946],
[0.0368]])
You should also get a warning such as:
UserWarning: Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
Use this line of code to calculate the loss and it should work:
loss = criterion(outputs, labels.float().unsqueeze(1))
We were all in the same situation, so please don’t be disappointed.