I am logging the norm of the gradient of my loss with respect to one of the inputs Z
(the latent random vector in a GAN-like architecture). To this end I create the tensor Z
using requires_grad = True
, and after the backward
pass I accumulate the norm, for each batch in each epoch. This works fine for about 70 epochs, but then for some batches I start getting a None
gradient in Z.grad
, and as the training continues, the number of such batches grows, for example when the problem starts it happens in 1 batch out of 50, and after 200 epochs, it is around 25 batches out of 50.
I read these related topics:
- After loss.backward(requires_grad=True), no gradient can be found on some variables
- Grad is None even when requires_grad=True
- None type return while trying to find gradient of model's parameters
but my problem seems different.
I am using Python 3.9.12, Pytorch 2.0.1 (for cuda 11.8), and Ubuntu.
Any ideas?