Nn criterions don't compute the gradient w.r.t. targets

I am implementing a really basic autoencoder in pytorch.

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
from torch.autograd import Variable

BATCH_SIZE = 16

criterion_mse = nn.MSELoss().cuda()
x = Variable(torch.FloatTensor( BATCH_SIZE , 10  )  ).cuda()
l = nn.Linear( 10 , 10 ).cuda() 
y = l(x)
loss = criterion_mse( x , y )

But this code gives the following error.

AssertionError                            Traceback (most recent call last)
<ipython-input-2-386981b1292e> in <module>()
     14 l = nn.Linear( 10 , 10 ).cuda()
     15 y = l(x)
---> 16 loss = criterion_mse( x , y )

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/loss.pyc in _assert_no_grad(variable)
      9 def _assert_no_grad(variable):
     10     assert not variable.requires_grad, \
---> 11         "nn criterions don't compute the gradient w.r.t. targets - please " \
     12         "mark these variables as volatile or not requiring gradients"
     13 

AssertionError: nn criterions don't compute the gradient w.r.t. targets - please mark these variables as volatile or not requiring gradients

The equivalent code works fine on TensorFlow.

1 Like

by default, the criterions in the nn package indeed dont.

if you write MSE as:

def mse_loss(input, target):
    return torch.sum((input - target)^2) / input.data.nelement()

Then you can indeed compute the gradient wrt input and target

7 Likes

I face this problem too and I followed you solution too, but in my first attempt as a workaround I tried to clone the tensor into a new variable but it didn’t work.
Do you plan to support this in the near future?
Also, I think that you have a typo (^2 should be **2)

1 Like

Hi, any progress on this? Workaround seems tedious.

Hi, the target (or label) value shouldn’t be a variable with the feature of requires_grad = True. The target in your codes is the part of the computation graph which make cause the requires_grad = True. If the target value is a the part of the computation graph, you can use detach() or detach_() function to disable the relation between the target tensor and the computation graph. For example, your may need something like this:

y = l(x)
y.detach_()
loss = criterion_mse( x , y )

Good luck.

3 Likes