"One of the variables need for gradient computation"

Erik_Bertolino · December 3, 2020, 2:27am

has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2, expected version 1.

I know that this error implies that the computational graph has been used previously, and one should not use it twice. However - what does “[torch.FloatTensor [1]]” specifiy? And what does “version 2” and “version 1” mean?

My code is quite long but in short:


ref_grad = []
        for i in range(2):
            layer_grad = utils.AverageMeter()
            k = 0
            for param in neuralnet.decoder.de_dense.parameters():
                if(k == 0 or k == 2):
                    wrtt = param
                    print(param.shape)
                k = k + 1
            layer_grad.avg = torch.zeros(wrtt.shape).to(device)
            
            ref_grad.append(layer_grad)
            

    for j in range(2):
                k = 0
                for param in neuralnet.encoder.en_dense.parameters():
                    if(k == 0 or k == 2):
                        wrt = param
                        print(param.shape)
                    k = k + 1                                                        
                target_grad = torch.autograd.grad(recon_loss, wrt, create_graph = True, retain_graph = True)[0]
                print(j)
                grad_loss += -1*func.cosine_similarity(target_grad.view(-1,1), ref_grad[j].avg.view(-1,1), dim = 0)
                grad_loss = grad_loss/l_2



            neuralnet.optimizer.zero_grad()
            l_tot.backward(retain_graph = True)

If I remove target_grad and grad_loss, the code is functional (bus useless). So obviously target_grad is the problem here. How do I fix this?

albanD · December 3, 2020, 2:58pm

Hi,

I know that this error implies that the computational graph has been used previously, and one should not use it twice.

No it is not what this error means. This error means that a Tensor that is needed to compute the backward has been modified inplace. And so the backward cannot be computed anymore.

However - what does “[torch.FloatTensor [1]]” specifiy? And what does “version 2” and “version 1” mean?

This means that the faulty Tensor (that was modified inplace) is a float Tensor of shape [1] and is at version 2 (second time it has been modified inplace) while the autograd expect version 1 (the result after it was modified inplace once).

Erik_Bertolino · December 3, 2020, 3:32pm

Hello. Thank you for answering.

The code is functional if I remove

 target_grad = torch.autograd.grad(recon_loss, wrt, create_graph = True, retain_graph = True)[0]

What I seem to understand is that this computed the gradient of recon_loss with respect to the tensor wrt. It retains also the graph. Why does this interfere later in the code when I create my loss function l_tot and apply backward() to it, l.backward()?

albanD · December 3, 2020, 3:35pm

I think the grad_loss might be the problematic Tensor as it is of size 1 and is modified inplace when you do +=.
Could you change that to be out of place with grad_loss = grad_loss + ... to see if it fixes the problem?

Erik_Bertolino · December 3, 2020, 3:56pm

Sadly, I received another error.

albanD · December 3, 2020, 4:06pm

Looks like an issue with a values saved by batchnorm? You might want to enable anomaly during the forward as proposed in the warning to know which one is faulty.

Erik_Bertolino · December 3, 2020, 4:13pm

albanD · December 3, 2020, 4:32pm

Yep that one batchnorm

Erik_Bertolino · December 3, 2020, 4:34pm

Is there any canonical solution to this? Or is this a very rare error?

albanD · December 3, 2020, 4:54pm

It is an error that happens quite often but there is no canonical solution as it depends on what your code does.
Basically your code modifies inplace a Tensor that this batchnorm needs to compute its backward. You need to remove that inplace operation. But where that inplace is depends a lot on your code and what you do