I do not want to use torch’s default loss.backward function for gradient computation. Instead I am calculating the gradients manually from the loss function (via torch.autograd.grad). But my gradients become zero after a few steps. The same code works if I use the loss.backward function. Is there any hidden transformation that torch applies on the gradients under the hood ? ( such as clipping the gradients in range [-1,1], detaching gradient etc. )
I am taking the gradients like this :
first_gradient=torch.autograd.grad(HSNR,NN.parameters(),retain_graph=True)
Here is how I am updating the weights :
with torch.no_grad():
for param, newgrad in zip(NN.parameters(),final_gradient):
param.grad = newgrad
optimizer.step()