Question about MSELoss gradient

I was transferring something written in Caffe to Pytorch and noticed a different in gradient of the MSELoss, so I tested the MSELoss of pytorch using this script:

import torch
import numpy as np
import torch.nn as nn
from torch.autograd import Variable

grads = {}
def save_grad(name):
    def hook(grad):
        grads[name] = grad
    return hook

#pred = np.load('caffe-spn/data.npy')
#tar = np.load('caffe-spn/label.npy')
pred = np.random.rand(1,1,3,3)
tar = np.random.rand(1,1,3,3)

pred = Variable(torch.from_numpy(pred),requires_grad=True).cuda()
tar = Variable(torch.from_numpy(tar),requires_grad=False).cuda()
pred.register_hook(save_grad('pred'))

criterion = nn.MSELoss(size_average=False)
loss = criterion(pred,tar)
#loss = torch.sum((pred - tar)**2)
loss.backward()

print(loss)
print(torch.sum((pred - tar)**2))
print('prediction:')
print(pred[0,0,:,:])
print('target:')
print(tar[0,0,:,:])
print('gradient:')
print(grads['pred'][0,0,:,:])

Here’s what I got:

Variable containing:
 0.7841
[torch.cuda.DoubleTensor of size 1 (GPU 0)]

Variable containing:
 0.7841
[torch.cuda.DoubleTensor of size 1 (GPU 0)]

prediction:
Variable containing:
 0.7884  0.7303  0.9125
 0.3856  0.5309  0.5715
 0.4588  0.4232  0.2187
[torch.cuda.DoubleTensor of size 3x3 (GPU 0)]

target:
Variable containing:
 0.9345  0.1918  0.6964
 0.1005  0.2449  0.1909
 0.2013  0.5728  0.0471
[torch.cuda.DoubleTensor of size 3x3 (GPU 0)]

gradient:
Variable containing:
-0.2922  1.0770  0.4321
 0.5702  0.5721  0.7612
 0.5150 -0.2991  0.3432
[torch.cuda.DoubleTensor of size 3x3 (GPU 0)]

My question is that the gradient of (pred - tar)^2 w.r.t to pred should be (pred - tar), but it seems not the case in pytorch MSELoss, this could be a silly mistake, but I just wonder what I did wrong here.

It’s 2*(pred - tar), so it seems to work :wink: