Encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation


(Yongyi Tang) #1

I am going to define my layer. How ever, I encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation while running backward().

I found that if i commented the second for loop ‘for j in range(self.number_person):’ or make ‘u_i[:,j,:] = (1 - self.lumbda)*u_i[:,j,:]’, the backward() was fine.

I wonder where is inplace operation and why it does not work? ‘p_rnn_feature’ and ‘u_sum’ has been compute before.

BTW, this code is run on pytorch 0.19.7ad948f

def myNet():
    def __init__():
          #do som init
    def forward():
           #compute  p_rnn_feature,u_sum
           p_rnn_feature = Variable(torch.ones(p_rnn_feature.size())).cuda()
            u_sum = Variable(torch.ones(u_sum.size())).cuda()
            for i in range(self.embedding_length):
                u_s = u_s.clone()
                u_i = u_i.clone()

                for j in range(self.number_person):
                    
                    alpha_i = Variable(torch.zeros(batch_size, self.number_person, 1)).cuda()
                    comp_mask = Variable(j*torch.ones(valid_person_num.size())).cuda()
                    comp_mask = torch.lt(comp_mask, valid_person_num) # (batch_size, 1)
                    comp_mask_ui = comp_mask.repeat(1, self.hyper_d)
                    tmp_x = torch.cat((p_rnn_feature[:,j,:], u_sum[:,j,:], u_s), 1) # size: (batch_size, 2*rnn_cell_size+hyper_d)


                    u_i[:,j,:] = (1 - self.lumbda)*u_i[:,j,:] + self.lumbda*F.relu(self.u_i_linear(tmp_x))
                    u_i[:,j,:] = u_i[:,j,:]*comp_mask_ui.float()

                    alpha_i[:,j,:] = F.tanh(self.alpha_i_linear(torch.cat((u_i[:,j,:], u_s),1)))
                    alpha_i[:,j,:] = alpha_i[:,j,:]*comp_mask.float()

                alpha_sum = torch.sum(alpha_i,1)
                alpha_sum = alpha_sum.repeat(1,self.number_person,1)

                gate = alpha_i / Variable(torch.max(alpha_sum.data, torch.ones(alpha_sum.size()).cuda())).cuda()
                gate = gate.repeat(1,1,self.hyper_d)

                gated_ui_sum = gate*u_i
                gated_ui_sum = torch.sum(gated_ui_sum,1)
                gated_ui_sum = torch.squeeze(gated_ui_sum, dim=1)
                
                tmp_s = torch.cat((u_s, p_feature_sum, gated_ui_sum), 1)   # size: (batch_size, hyper_d+rnn_cell_size+hyper_d)
                u_s = (1 - self.lumbda) * u_s + self.lumbda * F.relu(self.u_s_linear(tmp_s))

            pred_tmp = torch.cat((torch.squeeze(torch.sum(u_i, 1), dim=1), u_s), 1)
            pred = self.pred_dropout(self.pred_linear(pred_tmp))
            pred = self.pred_linear_second(pred)

In-place operation and ResNet
(Adam Paszke) #3

Assignments to Variables are in-place operations and you’re doing a lot of them (u_i[:,j,:]). You’re using that variable in lots of other contexts and some of the functions require it to not change. This might help (I added some calls to clone):

u_i[:,j,:] = (1 - self.lumbda)*u_i[:,j,:].clone() + self.lumbda*F.relu(self.u_i_linear(tmp_x))
u_i[:,j,:] = u_i[:,j,:].clone()*comp_mask_ui.float()

alpha_i[:,j,:] = F.tanh(self.alpha_i_linear(torch.cat((u_i[:,j,:], u_s),1)))
alpha_i[:,j,:] = alpha_i[:,j,:].clone()*comp_mask.float()


(Yongyi Tang) #4

Thanks, it’s working. But what do you mean by ‘Assignments to Variables are in-place operations’? So something like x=x+1 is in-place operation? Or just because I am using indexing in a matrix?


(Adam Paszke) #5

x = x + 1 is not in-place, because it takes the objects pointed to by x, creates a new Variable, adds 1 to x putting the result in the new Variable, and overwrites the object referenced by x to point to the new var. There are no in-place modifications, you only change Python references (you can check that id(x) is different before and after that line).

On the other hand, doing x += 1 or x[0] = 1 will modify the data of the Variable in-place, so that no copy is done. However some functions (in your case *) require the inputs to never change after they compute the output, or they wouldn’t be able to compute the gradient. That’s why an error is raised.


(Yongyi Tang) #6

Thanks! That’s a great explanation!


(Aaron Lai) #7

Nice explanation!!
I encountered the same problem and solved it by the explanation!
Thanks


(Wasi Ahmad) #8

@apaszke I am facing a similar problem in my code, is there any way I can find which Variable is causing the problem because of in-place operations in my code? I am kind of stuck at this point, any help would be appreciated.


(Yongyi Tang) #9

I think you can just try to clone the variable before you use it.


(Anand Bhattad) #10

I’m getting same error for a different scenario. I am running a LSTM code and my code works completely fine when I use criterion as MSELoss and it gives me this error when I change my criterion to CrossEntropyLoss (of course I am feeding in desired type of inputs to my criterion). I get this error when I call loss.backward(). Strangely, the code runs perfectly fine when I call loss.backward() at every time step in the time loop instead of calling after the entire sequence has been completed.

Is it possible to have some pointing to a variable at least which is causing this trouble or any other way to reason out the possible error?

Thank you in advance for your help.


(Thomas V) #11

Hi,

One thing you can do is check the line numbers in the traceback. If you look up the class the line belongs to in torch/autograd/_functions/*.py, you can tell which operation it bails out on to narrow it down.
I wonder whether it might be worth adding a “debug” modus that records the stack of the op in the forward pass and spits it out on error in the backward. That way, it would point to the right line of code directly.

Best regards

Thomas


(Asif Hossain) #12

x11[:,:,0:int(w_f/2),0:int(h_f/2)]= x11[:,:,0:int(w_f/2),0:int(h_f/2)]*xx1[0]
suppose x11 is a auto grad variable. Now my issue is, when i am running the code it say:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
but if i do x11=x11*xx1[0], then no error things go correctly
i tried:
x11[:,:,0:int(w_f/2),0:int(h_f/2)]= x11[:,:,0:int(w_f/2),0:int(h_f/2)].clone()*xx1[0]
still its not working @apaszke