Here I have very simple test code
target = Variable(torch.tensor([ [1, 0], [0, 1], ]).float(), requires_grad=False) output = Variable(torch.tensor([ [0, 1], [0, 1], ]).float(), requires_grad=True) # loss = nn.L1Loss()(output, target) loss = nn.MSELoss()(output, target) print('LOSS\n', loss) print(type(loss.grad_fn)) grad = loss.grad_fn(torch.tensor(1).float()) print('GRAD\n', grad)
where the output I got is
LOSS tensor(0.5000, grad_fn=<MseLossBackward>) <class 'MseLossBackward'> GRAD tensor([[-0.5000, 0.5000], [ 0.0000, 0.0000]], grad_fn=<MseLossBackwardBackward>)
Based on documentation (https://pytorch.org/docs/stable/nn.html#mseloss) and its gradients definitions (https://stats.stackexchange.com/a/312997),
I thought the right value for loss is 1 because sum of squared difference is 2 and N is 2. However it’s 0.5.
Similarly, I was expecting to see [[-2, 2] [0,0]] for gradients but it is different.
I suspect that due to the averaging operation,
grad_fn does some rescaling based on the gradient I provide (1 in this case). I want to understand that logic as well.
Does anyone know details about these Loss function implementation?