I am using a neural network to fit data to a polynomial. To do this I am just using 20 degrees of x (I.E constant … x^19) paired with coefficients. The coefficients are the output of my model. Since my model retrains for every new dataset the input of the function doesn’t matter the function will simply just slowly graduate towards the proper coefficients through back propagation of the difference between the actual data and the polynomial. Up until now I have just been inputting random numbers as my input but realized this won’t teach the neural network anything thus I decided to just input a constant 0. However now my network doesn’t learn at all. The Loss doesn’t change and when I print out the predicted coefficients they remain the same.
here is my code: import matplotlib.pyplot as pltimport numpy as npimport randomimport time - Pastebin.com
Based on your code it seems you are detaching the model output so the backward
call will not calculate any gradients for the used parameters w.r.t. the loss.
I noticed this and updated my code. here is the main training loop
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
batch = 0
acutal = torch.from_numpy(f[1])
acutal.requires_grad = True
print(acutal)
model.train()
while batch < 10000:
coeffs = model(torch.ones(1))
some = [i.item() for i in coeffs.clone()]
p = np.polynomial.Polynomial(some)
x,y = p.linspace(100,[-10,10])
y = np.clip(y,-1000,1000)
y = torch.from_numpy(y)
loss = loss_fn(y,acutal)
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 1000 == 0:
print(some)
print(loss)
x,y = p.linspace(100,[-10,10])
line2.set_ydata(y)
fig.canvas.draw()
fig.canvas.flush_events()
for name, param in model.named_parameters():
if param.grad is not None:
print(f'Parameter: {name}, Gradient norm: {param.grad.norm()}')
else:
print(f'Parameter: {name}, Gradient: None')
batch+=1
When I print out the Gradient Norm I get None. Am I properly using the data without detaching it?
No, since you are calling 3rd party libraries which are not tracked by Autograd. You would either need to use pure PyTorch functions or you could implement custom autograd.Function
s including the backward
method.
So the problem is coming from
coeffs = model(torch.ones(1))
some = [i.item() for i in coeffs.clone()]
p = np.polynomial.Polynomial(some)
even though I am cloning the output? Is there a way to access or iterate the model output during training without messing up the autograd?
Not in your current code snippets since you are transforming the tensors to plain scalar values via the item()
call. Plain Python (or 3rd party operations) won’t be tracked and you are thus detaching the tensor from the computation graph.