I was transferring something written in Caffe to Pytorch and noticed a different in gradient of the MSELoss
, so I tested the MSELoss
of pytorch using this script:
import torch
import numpy as np
import torch.nn as nn
from torch.autograd import Variable
grads = {}
def save_grad(name):
def hook(grad):
grads[name] = grad
return hook
#pred = np.load('caffe-spn/data.npy')
#tar = np.load('caffe-spn/label.npy')
pred = np.random.rand(1,1,3,3)
tar = np.random.rand(1,1,3,3)
pred = Variable(torch.from_numpy(pred),requires_grad=True).cuda()
tar = Variable(torch.from_numpy(tar),requires_grad=False).cuda()
pred.register_hook(save_grad('pred'))
criterion = nn.MSELoss(size_average=False)
loss = criterion(pred,tar)
#loss = torch.sum((pred - tar)**2)
loss.backward()
print(loss)
print(torch.sum((pred - tar)**2))
print('prediction:')
print(pred[0,0,:,:])
print('target:')
print(tar[0,0,:,:])
print('gradient:')
print(grads['pred'][0,0,:,:])
Here’s what I got:
Variable containing:
0.7841
[torch.cuda.DoubleTensor of size 1 (GPU 0)]
Variable containing:
0.7841
[torch.cuda.DoubleTensor of size 1 (GPU 0)]
prediction:
Variable containing:
0.7884 0.7303 0.9125
0.3856 0.5309 0.5715
0.4588 0.4232 0.2187
[torch.cuda.DoubleTensor of size 3x3 (GPU 0)]
target:
Variable containing:
0.9345 0.1918 0.6964
0.1005 0.2449 0.1909
0.2013 0.5728 0.0471
[torch.cuda.DoubleTensor of size 3x3 (GPU 0)]
gradient:
Variable containing:
-0.2922 1.0770 0.4321
0.5702 0.5721 0.7612
0.5150 -0.2991 0.3432
[torch.cuda.DoubleTensor of size 3x3 (GPU 0)]
My question is that the gradient of (pred - tar)^2
w.r.t to pred
should be (pred - tar)
, but it seems not the case in pytorch MSELoss
, this could be a silly mistake, but I just wonder what I did wrong here.