Hi i am trying to implement a RNN to learn number pattern.Here is the code. When i run it i encounter the following error.Help!!!
for i in range(1000):
inp=torch.randn(1,1)
y=torch.randn(1,1)
ht=torch.tanh(whh.mm(ht_1)+wxt.mm(inp.float()))
y_pred=why.mm(ht)
loss=(y_pred-y.float()).pow(2).sum()
loss.backward(retain_graph=True)
with torch.no_grad():
whh-=lr*whh.grad
wxt-=lr*wxt.grad
why-=lr*why.grad
whh.grad.zero_()
wxt.grad.zero_()
why.grad.zero_()
ht_1=ht
print(loss)
This error might be raised, since you are keeping the graph via retain_graph = True
, which would then in the second iteration try to backpropagate through all parameters twice.
Is this your use case or why are you using retain_graph = True
?
If i remove retain_graph=True
it throws an error
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I’m not sure, what your exact use case is, but the last RuntimeError
could be raised, if whh
and ht_1
both require gradients, as they are reused in the second iteration.
You could detach
ht_1
, if it fits your use case:
ht=torch.tanh(whh.mm(ht_1.detach())+wxt.mm(inp.float()))
HI,
thanks for the reply, i tried to run this but now the gradient are none type objects as shown by this error
17 with torch.no_grad():
---> 18 whh-=lr*whh.grad
19 wxt-=lr*wxt.grad
20 why-=lr*why.grad
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'
here is the way i initialise whh,wxt,why
whh=torch.randn(10,10,requires_grad=True)
wxt=torch.randn(10,1,requires_grad=True)
why=torch.randn(1,10,requires_grad=True)
How are you initializing ht_1
?
Since you are not updating it, I assume it’s a standard tensor?
This code snippet seems to work:
whh = torch.randn(10,10,requires_grad=True)
wxt = torch.randn(10,1,requires_grad=True)
why = torch.randn(1,10,requires_grad=True)
ht_1 = torch.randn(10,1)
lr = 1e-3
for i in range(10):
inp=torch.randn(1,1)
y=torch.randn(1,1)
ht=torch.tanh(whh.mm(ht_1)+wxt.mm(inp.float()))
y_pred=why.mm(ht)
loss=(y_pred-y.float()).pow(2).sum()
loss.backward()
with torch.no_grad():
whh-=lr*whh.grad
wxt-=lr*wxt.grad
why-=lr*why.grad
whh.grad.zero_()
wxt.grad.zero_()
why.grad.zero_()
ht_1=ht.detach()
print(loss)
Thsnk you very much.It worked. Should read the docs to understand why this error occurs or is there any other resource
HI,there is weird issue.Everything was fine till now,but as soon as i restarted the kernel in jupyter, the grad wrt variables is again nonetype.I think there is still some issue as this error was resolved once i pasted and ran your code in a seperate cell.Then i ran my loop which i have provided above,everything is fine.And btw i initialise ht_1 just like you did.
Oh, nvm
I made a typing error. Instead of doing
y_pred=why.mm(ht)
i by mistake typed this and it never came to my attention
y_pred=why.mm(ht_1)
altough i am stil not sure why this gave me nonetype grad