Getting Nan after first iteration with custom loss

Currently your model uses the linear layers (fc1, fc2, fc3) without a non-linearity between them, so basically it’s just one single linear transformation. Is this on purpose or did you forget to add the relu or another activation function?
Might be unrelated to this issue, but might be worth a try as a first approach.


Thank you for your replay

You are right, I did that on purpose because I am trying to mimic a paper that explained the network in this way. However, I tried to add non-linearty between them ,but unfortunately didn’t fix the NaN error.

debugging the code, I notice the NaN appears in the weights of the model after I call the optimizer()

Could you check the gradients in the layers which have the NANs after the update?
You can print them with print(model.fc1.weight.grad).

Sure, I printed the gradient after the backward() and it shows this:
tensor([[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan]], device=‘cuda:0’)

OK, thanks. Then we would need to see the loss function to track down these nasty NANs.

You can see the loss function in here:
which seems pretty complicated (sorry about that).

I also have my own norm functions as follow:
def get_fib_norm(A):
B = torch.sqrt(torch.trace(,0,1),A)))
return B

def get_nuc_norm(A):
B = torch.trace(torch.sqrt(,A)))
return B

def get_infinity_norm(A):
B = torch.max(torch.sum(torch.abs(A),dim=1))
return B

def get_spec_norm(A):
l1, B, l2 = torch.svd(A,some=False)
C = torch.max(B)
return C

As your script is quite complicated, you could try to build PyTorch from source and try out the anomaly detection, which will try to get the method causing the NANs.
You’ll find the build instructions here.
Let me know, if you encounter any problems.

Alternatively, you could create an executable code snippet and I could try to run it on my machine.


Thank you

I will try the anomaly detection and let you know what I find. If I couldn’t find out what is causing the problem maybe I will give you the executable code

Thank you for your time and help again

I guess it is because you have zeros in your sqrt which causes a nan in back prop


Thanks for pointing out anomaly detection! This was very helpful in finding where nans were coming from in my custom loss function. Protip: adding a tiny epsilon where you’re dividing or taking square roots will probably do the trick.


Here is a way of debuging the nan problem.
First, print your model gradients because there are likely to be nan in the first place.
And then check the loss, and then check the input of your loss…Just follow the clue and you will find the bug resulting in nan problem.

There are some useful infomation about why nan problem could happen:
1.the learning rate


I was using torch.reciprocal_ (for element wise reciprocal) function at some point and I had to add a small epsilon to get rid of the ‘nan’ loss value. Thank you for the overall discussion.

Is there a way to have the anomaly detection on by default? I want to avoid inserting with autograd.detect_anomaly(): in different parts of the code.

1 Like

You can add torch.autograd.set_detect_anomaly(True) at the beginning of the script to enable it globally.


this solved one of my problem. thanks!

@t.ouyang Thanks. Your advice solve my NaN problem.
My problem caused due to sqrt(0).
There are some useful infomation about why nan problem could happen:
1.the learning rate


I face exactly the same issue. A similarity with OP’s post is that we both seem to compute norm. As @Pengbo_Ma mentioned, this seems to vanish when I add a small epsilon to linalg.matrix_norm(). Ideally, this should be supported as a parameter of the function and/or pytorch must handle it internally. @ptrblck can you please comment on what is the best practise ?

I think adding a small eps value to the input matrix is the right approach as it’s explicitly guarding against invalid gradients as seen e.g. here:

x = torch.zeros(4, 4, requires_grad=True)
out = torch.linalg.matrix_norm(x)
# > tensor([[nan, nan, nan, nan],
#          [nan, nan, nan, nan],
#          [nan, nan, nan, nan],
#          [nan, nan, nan, nan]])

You could add e.g. 1e-6 to the input to avoid the NaN values in the gradient.


Dear @ptrblck ,
Is there anyway to get the tensor’s value after set_detect_anomaly raises the error?
Specifically, it returns to me this trace

File “/home/s1910442/Project/Master3/models/”, line 111, in si_snr
snr = 10 * torch.log10(target_norm / (noise_norm + eps) + eps)
(function _print_stack)
Traceback (most recent call last):
File “”, line 128, in
File “”, line 57, in train
File “/opt/conda/lib/python3.8/site-packages/torch/”, line 256, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/opt/conda/lib/python3.8/site-packages/torch/autograd/”, line 147, in backward
RuntimeError: Function ‘Log10Backward’ returned nan values in its 0th output.

What I want to ask is how can I get the values of target_norm and noise_norm? eps here is just 1e-9

You can directly print these tensors in the forward pass to get their values for debugging.