Nn criterions don't compute the gradient w.r.t. targets error

machine · April 14, 2018, 9:23am

pytorch’s Variable is too confusing to use well, and I got an error which spent my whole day to solve but I stil have no idea where is the problem?

here is part of my code,and the ‘train_batches’ is just an iterator of my trainset

        for batch in train_batches:
            loss=0
            Encoder_optimizer.zero_grad()
            Attention_optimizer.zero_grad()
            Score_optimizer.zero_grad()

            for idx in range(args.batch_size):
                question=Variable(torch.LongTensor(batch['question_token_ids'][idx]))
                answer_passage=batch['answer_passage'][idx]
                label=torch.zeros(args.max_paragraph_num)
                label[answer_passage]=1
                label=Variable(label)
                label.requires_grad=True
                scores = Variable(torch.zeros(args.max_paragraph_num))
                Encoder.init_hidden()
                _,question=Encoder(question)
                j=0
                for pidx in range(idx*args.max_paragraph_num,(idx+1)*args.max_paragraph_num):
                    passage=Variable(torch.LongTensor(batch['passage_token_ids'][pidx]))
                    Encoder.init_hidden()
                    passage,_=Encoder(passage)
                    passage=Attention(passage,question)
                    score=Score(passage,question)
                    scores[j]=score
                    j+=1

                scores=F.softmax(scores,0)
                loss+=loss_func(label,scores.view(1,5))

the error is:
Traceback (most recent call last):
File “/home/k/PycharmProjects/PassageRanking/run.py”, line 154, in
run()
File “/home/k/PycharmProjects/PassageRanking/run.py”, line 146, in run
train(args)
File “/home/k/PycharmProjects/PassageRanking/run.py”, line 134, in train
loss+=loss_func(label,scores.view(1,5))
File “/home/k/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/k/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py”, line 677, in forward
_assert_no_grad(target)
File “/home/k/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py”, line 11, in _assert_no_grad
"nn criterions don’t compute the gradient w.r.t. targets - please "
AssertionError: nn criterions don’t compute the gradient w.r.t. targets - please mark these variables as volatile or not requiring gradients

Naruto-Sasuke · April 14, 2018, 10:04am

Take the following as an example,

>>> loss = nn.L1Loss()
>>> input = autograd.Variable(torch.randn(3, 5), requires_grad=True)
>>> target = autograd.Variable(torch.randn(3, 5))
>>> output = loss(input, target)
>>> output.backward()

target is the second parameter, not the first. So you need to swap the paramters.

machine · April 14, 2018, 12:40pm

But it still occurs the same error after I exchange the position of input and target:sob:

Naruto-Sasuke · April 16, 2018, 2:17pm

Because you set label.requires_grad=True, delete this line. Remember we only need to compute the gradient w.r.t the input. In many situations, the gradient w.r.t the target is useless.

adrien · September 25, 2018, 2:23pm

But why throw an error ?

I came across a use case where I needed to minimize the mse between intermediate features of an auto-encoder: both input and target need to be differentiated here.

I had to trade nn.MSELoss(encoder_i, decoder_i) for torch.sum((encoder_i - decoder_i)**2) which also does the job. However I’m not 100% sure I didn’t lose somthing with this fix (efficiency ?). I don’t understand why such use of nn losses are not permited.

viniciusarruda · March 14, 2019, 8:06pm

@adrien Did you solve your problem ? I am having exactly the same problem.

yunpei · October 8, 2019, 1:49pm

So, how do you solve this problem when both the input and the target need to be differentiated?