-
Generally speaking, is it normal to end up with
inf,nanproblems in a network even if the inputs do not contain them? Or does it always indicate there is a bug somewhere in our code? -
If it is normal, how are we expected to handle it? Are we expected to add clipping or clamping of values somewhere?
-
The hook you provided prints out the gradients but not the scales and unscaled values of each layer (the loss, in this case). Is there another hook I can add to do that?
-
The unscaled loss is around 100 which seems quite reasonable/low. How do I find out why it is being scaled up to such a large value in the first place?
-
I’m not sure if PyTorch Lightning uses AMP in the backwards pass. How do I find out?
Thanks.