MSELoss gives same results for different reduction values if my tensors requires_grad

jackstraw · December 7, 2018, 9:03pm

torch.manual_seed(5)
x = torch.rand(4, 3).requires_grad_()
y = torch.rand(4, 3).requires_grad_()

(
torch.nn.functional.mse_loss(x, y),
torch.nn.functional.mse_loss(x, y, reduction='elementwise_mean'),
torch.nn.functional.mse_loss(x, y, reduction='sum'),
torch.nn.functional.mse_loss(x, y, reduction='none'),
torch.nn.functional.mse_loss(x, y, size_average=True, reduce=True),
torch.nn.functional.mse_loss(x, y, size_average=True, reduce=False),
torch.nn.functional.mse_loss(x, y, size_average=False, reduce=True),
torch.nn.functional.mse_loss(x, y, size_average=False, reduce=False),
)

yields

(tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>),
 tensor(2.5974, grad_fn=<SumBackward0>))

whereas

torch.manual_seed(5)
x = torch.rand(4, 3)
y = torch.rand(4, 3)

(
torch.nn.functional.mse_loss(x, y),
torch.nn.functional.mse_loss(x, y, reduction='elementwise_mean'),
torch.nn.functional.mse_loss(x, y, reduction='sum'),
torch.nn.functional.mse_loss(x, y, reduction='none'),
torch.nn.functional.mse_loss(x, y, size_average=True, reduce=True),
torch.nn.functional.mse_loss(x, y, size_average=True, reduce=False),
torch.nn.functional.mse_loss(x, y, size_average=False, reduce=True),
torch.nn.functional.mse_loss(x, y, size_average=False, reduce=False),
)

yields

(tensor(0.2164),
 tensor(0.2164),
 tensor(2.5974),
 tensor([[0.5743, 0.2843, 0.0370],
         [0.0579, 0.0037, 0.2846],
         [0.0116, 0.0332, 0.4051],
         [0.5369, 0.3200, 0.0486]]),
 tensor(0.2164),
 tensor([[0.5743, 0.2843, 0.0370],
         [0.0579, 0.0037, 0.2846],
         [0.0116, 0.0332, 0.4051],
         [0.5369, 0.3200, 0.0486]]),
 tensor(2.5974),
 tensor([[0.5743, 0.2843, 0.0370],
         [0.0579, 0.0037, 0.2846],
         [0.0116, 0.0332, 0.4051],
         [0.5369, 0.3200, 0.0486]]))

Why do I get different results if my tensors requires_grad?
And why are the results for the different values of reduction all the same if my tensors requires_grad?

albanD · December 8, 2018, 10:12am

Hi,

Which version of pytorch are you using?
The implementation is fairly simple and looks correct to me, see here.
Maybe you need to upgrade pytorch?

jackstraw · December 8, 2018, 4:46pm

I’m using 0.4.1. Isn’t this unexpected behavior even for 0.4.1?

albanD · December 8, 2018, 5:18pm

It is unexpected behavior.
But we don’t backport bugfixes into old versions. So it most certainly has been fixed since 0.4.1 was realease few month ago.