Nan or Inf Error during Training

DSL · September 4, 2021, 3:43pm

Hello all,

I am working on a depth estimation model. I have extended this model with a discriminator as in https://arxiv.org/pdf/1611.07004.pdf. Now I get “Warning: NaN or Inf found in input tensor” all the time while training. I wonder if this has a negative effect on my training and what I can do about it, does anyone have any ideas or tips?
The generator uses a pre-trained model and the discriminator is randomly initialized. I will now still initialize the weights as in the paper with the normal distribution. If that doesn’t help, my idea was to pre-train the discriminator and then later train everything in the composite.
I would be very grateful for any suggestions or ideas.

Sylvain_Ard · September 4, 2021, 4:32pm

try to decrease the learning rate

AlphaBetaGamma96 · September 5, 2021, 2:56pm

This will probably have an effect on training as you want neither NaNs nor Infs in your code at all. What you should do is first isolate what process is causing the NaNs/Infs to be created in the input. Secondly, which inputs exactly? I’d assume this is the inputs to your discriminator?

You can locate where the NaNs are occurring by using torch.autograd.detect_anomaly the docs for this function is here → Automatic differentiation package - torch.autograd — PyTorch 1.9.0 documentation

DSL · September 6, 2021, 10:07am

This command is great. I have now solved this differently and queried whether the value is infinite. I have also fixed the error. I used crossentropy with sigmoid function on my discriminator and afterwards it worked. I assume that my values from the discriminator were unstable at the beginning.