Here is a snippet
import torch
import torch.nn as nn
from torch.autograd import Variable
dtype = torch.FloatTensor
x = Variable(torch.randn(1, 25).type(dtype), requires_grad = True)
t = Variable(torch.randn(1, 25).type(dtype), requires_grad = False)
criterion = nn.MSELoss()
loss = criterion(x, t)
optimizer = torch.optim.Adam([x])
for i in range(5):
optimizer.zero_grad()
loss.backward()
optimizer.step()
This seems easy that we can even calculate such by hand. However, such code doesn’t work, and would throw an error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-cace42d6c54e> in <module>()
13 for i in range(5):
14 optimizer.zero_grad()
---> 15 loss.backward()
16 optimizer.step()
RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.
Follow such suggestion, I modify the code above as
import torch
import torch.nn as nn
from torch.autograd import Variable
dtype = torch.FloatTensor
x = Variable(torch.randn(1, 25).type(dtype), requires_grad = True)
t = Variable(torch.randn(1, 25).type(dtype), requires_grad = False)
criterion = nn.MSELoss()
loss = criterion(x, t)
optimizer = torch.optim.Adam([x])
for i in range(5):
optimizer.zero_grad()
loss.backward(retain_variables=(i==0))
optimizer.step()
The same error is thrown out again.
However, if we write a numpy snippet
import numpy as np
N = 5
x = np.random.randn(N)
y = np.random.randn(N)
learning_rate = 1e-2
for t in range(500):
loss = np.square(x - y).sum()
print(t, loss)
# Back-propagate
grad_y = 2.0 * (y - x)
# Update
y -= learning_rate * grad_y
We can see that it would converge in a few iteration.
Is this a bug
? Or is there anything wrong in my code?
Thanks.