In-place operations and autograd

serhii · January 19, 2018, 5:39pm

Hello everyone!
I am trying to understand relations between in-place operations and autograd. In the first snippet of code, I have “RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation” and doesn’t have this exception in the second one.

First snippet:

import torch as T
from torch.autograd import Variable
from torch.nn import functional as F

x = Variable(T.rand(2, 2), requires_grad=True)
h = F.sigmoid(x)
h[:, 0] = 0
loss = h.sum()
loss.backward()

Second snippet:

import torch as T
from torch.autograd import Variable
from torch.nn import functional as F

x = Variable(T.rand(2, 2), requires_grad=True)
h = F.relu(x)
h[:, 0] = 0
loss = h.sum()
loss.backward()

Can someone explain why this is the case? Is it because F.sigmoid calls c function directly and F.relu performs some computations before calling c function which allows pytorch to handle in-place operation? Is there any function in Tensor that performs not in-place setitem ? something like this:

def setitem(self, key, value):
    self = self.clone()
    self[key] = value
    return self

Any help would be appreciated.

SimonW · January 19, 2018, 5:49pm

sigmoid uses the output to compute gradients:

github.com

pytorch/pytorch/blob/main/tools/autograd/derivatives.yaml#L530-L531


      
          
          - name: _ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank=0, bool zero_infinity=False) -> (Tensor, Tensor)

relu uses the input to compute gradients:

github.com

pytorch/pytorch/blob/main/tools/autograd/derivatives.yaml#L703-L704


      
          
          - name: fmod.Tensor(Tensor self, Tensor other) -> Tensor

Hence the different behaviors you see when modifying the output :). If you change

in second example to

x[:, 0] = 0

, you will see a similar error.