Hey
Here is the standard approach
optimizer = optim.Adam(NN.parameters(), lr=1e-3)
for i in range(train_steps):
optimizer.zero_grad()
loss = get_loss_fn()
loss.backward()
optimizer.step()
However; when I tried manually set up the gradients
optimizer = optim.Adam(NN.parameters(), lr=1e-3)
for i in range(train_steps):
loss = get_loss_fn()
grads = torch.autograd.grad(loss, NN.parameters())
for p, g in zip(NN.parameters(), grads):
p.grad.fill_(g) # here shows that p.grad is None
optimizer.step()
Can someone explain why there grad is None and what’s is the problem with the second approach?