Grad is None when `requires_grad=True`, but only for some epochs

Itamar_Katz · March 8, 2024, 6:45pm

I am logging the norm of the gradient of my loss with respect to one of the inputs Z (the latent random vector in a GAN-like architecture). To this end I create the tensor Z using requires_grad = True, and after the backward pass I accumulate the norm, for each batch in each epoch. This works fine for about 70 epochs, but then for some batches I start getting a None gradient in Z.grad, and as the training continues, the number of such batches grows, for example when the problem starts it happens in 1 batch out of 50, and after 200 epochs, it is around 25 batches out of 50.

I read these related topics:

but my problem seems different.
I am using Python 3.9.12, Pytorch 2.0.1 (for cuda 11.8), and Ubuntu.
Any ideas?

Itamar_Katz · March 8, 2024, 7:13pm

Ok, I solved my own problem, writing it here in case it is relevant to someone:
The discriminator block has a condition on the loss, and it performs the backward step only if the loss is higher than some threshold. As the training continues and the discriminator is getting better, the number of cases where this loss is small enough, also grows. When the backward step is not happening, the gradient is not computed.
That is easily solved by accumulating the gradient before the step that optimizes the discriminator.