Ne of the variables needed for gradient computation has been modified by an inplace operation

Hi i am trying to implement a RNN to learn number pattern.Here is the code. When i run it i encounter the following error.Help!!!

for i in range(1000):
    inp=torch.randn(1,1)
    y=torch.randn(1,1)
    
    
    ht=torch.tanh(whh.mm(ht_1)+wxt.mm(inp.float()))
    y_pred=why.mm(ht)
    

    
    loss=(y_pred-y.float()).pow(2).sum()
    loss.backward(retain_graph=True)
    
    
    
    with torch.no_grad():
        whh-=lr*whh.grad
        wxt-=lr*wxt.grad
        why-=lr*why.grad
        whh.grad.zero_()
        wxt.grad.zero_()
        why.grad.zero_()
        ht_1=ht
    print(loss)

@ptrblck kindly help !!

This error might be raised, since you are keeping the graph via retain_graph = True, which would then in the second iteration try to backpropagate through all parameters twice.
Is this your use case or why are you using retain_graph = True?

If i remove retain_graph=True it throws an error

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I’m not sure, what your exact use case is, but the last RuntimeError could be raised, if whh and ht_1 both require gradients, as they are reused in the second iteration.
You could detach ht_1, if it fits your use case:

ht=torch.tanh(whh.mm(ht_1.detach())+wxt.mm(inp.float()))

HI,
thanks for the reply, i tried to run this but now the gradient are none type objects as shown by this error

     17     with torch.no_grad():
---> 18         whh-=lr*whh.grad
     19         wxt-=lr*wxt.grad
     20         why-=lr*why.grad

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

here is the way i initialise whh,wxt,why

whh=torch.randn(10,10,requires_grad=True)
wxt=torch.randn(10,1,requires_grad=True)
why=torch.randn(1,10,requires_grad=True)

How are you initializing ht_1?
Since you are not updating it, I assume it’s a standard tensor?

This code snippet seems to work:

whh = torch.randn(10,10,requires_grad=True)
wxt = torch.randn(10,1,requires_grad=True)
why = torch.randn(1,10,requires_grad=True)
ht_1 = torch.randn(10,1)
lr = 1e-3

for i in range(10):
    inp=torch.randn(1,1)
    y=torch.randn(1,1)
        
    ht=torch.tanh(whh.mm(ht_1)+wxt.mm(inp.float()))
    y_pred=why.mm(ht)

    loss=(y_pred-y.float()).pow(2).sum()
    loss.backward()
    
    with torch.no_grad():
        whh-=lr*whh.grad
        wxt-=lr*wxt.grad
        why-=lr*why.grad
        whh.grad.zero_()
        wxt.grad.zero_()
        why.grad.zero_()
        ht_1=ht.detach()
    print(loss)

Thsnk you very much.It worked. Should read the docs to understand why this error occurs or is there any other resource

HI,there is weird issue.Everything was fine till now,but as soon as i restarted the kernel in jupyter, the grad wrt variables is again nonetype.I think there is still some issue as this error was resolved once i pasted and ran your code in a seperate cell.Then i ran my loop which i have provided above,everything is fine.And btw i initialise ht_1 just like you did.

Oh, nvm
I made a typing error. Instead of doing

y_pred=why.mm(ht)

i by mistake typed this and it never came to my attention

y_pred=why.mm(ht_1)

altough i am stil not sure why this gave me nonetype grad