Ne of the variables needed for gradient computation has been modified by an inplace operation

Angry_potato · April 28, 2020, 6:20am

Hi i am trying to implement a RNN to learn number pattern.Here is the code. When i run it i encounter the following error.Help!!!

for i in range(1000):
    inp=torch.randn(1,1)
    y=torch.randn(1,1)
    
    
    ht=torch.tanh(whh.mm(ht_1)+wxt.mm(inp.float()))
    y_pred=why.mm(ht)
    

    
    loss=(y_pred-y.float()).pow(2).sum()
    loss.backward(retain_graph=True)
    
    
    
    with torch.no_grad():
        whh-=lr*whh.grad
        wxt-=lr*wxt.grad
        why-=lr*why.grad
        whh.grad.zero_()
        wxt.grad.zero_()
        why.grad.zero_()
        ht_1=ht
    print(loss)

Angry_potato · April 29, 2020, 6:36am

@ptrblck kindly help !!

ptrblck · April 29, 2020, 6:54am

This error might be raised, since you are keeping the graph via retain_graph = True, which would then in the second iteration try to backpropagate through all parameters twice.
Is this your use case or why are you using retain_graph = True?

Angry_potato · April 29, 2020, 10:46am

If i remove retain_graph=True it throws an error

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

ptrblck · April 30, 2020, 2:25am

I’m not sure, what your exact use case is, but the last RuntimeError could be raised, if whh and ht_1 both require gradients, as they are reused in the second iteration.
You could detach ht_1, if it fits your use case:

ht=torch.tanh(whh.mm(ht_1.detach())+wxt.mm(inp.float()))

Angry_potato · April 30, 2020, 3:33pm

HI,
thanks for the reply, i tried to run this but now the gradient are none type objects as shown by this error

     17     with torch.no_grad():
---> 18         whh-=lr*whh.grad
     19         wxt-=lr*wxt.grad
     20         why-=lr*why.grad

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

here is the way i initialise whh,wxt,why

whh=torch.randn(10,10,requires_grad=True)
wxt=torch.randn(10,1,requires_grad=True)
why=torch.randn(1,10,requires_grad=True)

ptrblck · April 30, 2020, 11:27pm

How are you initializing ht_1?
Since you are not updating it, I assume it’s a standard tensor?

This code snippet seems to work:

whh = torch.randn(10,10,requires_grad=True)
wxt = torch.randn(10,1,requires_grad=True)
why = torch.randn(1,10,requires_grad=True)
ht_1 = torch.randn(10,1)
lr = 1e-3

for i in range(10):
    inp=torch.randn(1,1)
    y=torch.randn(1,1)
        
    ht=torch.tanh(whh.mm(ht_1)+wxt.mm(inp.float()))
    y_pred=why.mm(ht)

    loss=(y_pred-y.float()).pow(2).sum()
    loss.backward()
    
    with torch.no_grad():
        whh-=lr*whh.grad
        wxt-=lr*wxt.grad
        why-=lr*why.grad
        whh.grad.zero_()
        wxt.grad.zero_()
        why.grad.zero_()
        ht_1=ht.detach()
    print(loss)

Angry_potato · May 1, 2020, 1:57am

Thsnk you very much.It worked. Should read the docs to understand why this error occurs or is there any other resource

Angry_potato · May 1, 2020, 2:21am

HI,there is weird issue.Everything was fine till now,but as soon as i restarted the kernel in jupyter, the grad wrt variables is again nonetype.I think there is still some issue as this error was resolved once i pasted and ran your code in a seperate cell.Then i ran my loop which i have provided above,everything is fine.And btw i initialise ht_1 just like you did.

Angry_potato · May 1, 2020, 2:36am

Oh, nvm
I made a typing error. Instead of doing

y_pred=why.mm(ht)

i by mistake typed this and it never came to my attention

y_pred=why.mm(ht_1)

altough i am stil not sure why this gave me nonetype grad