Gradient of Standard Deviation is nan

I encountered a strange behaviour that I don’t reall understand. When I want to normalize a Variable I run into a zero division error during the backward pass when the standard deviation is zero.

For example:

a = Variable(torch.ones(1), requires_grad=True)
b = (a - a.mean())/(a.std() + 1e-4)
b.backward()
print(a.grad) # gives nan

I tracked down this problem and it even occurs if I just divide by the std with adding of some high epsilon

a = Variable(torch.ones(1), requires_grad=True)
b = 1/(a.std() + 1)
b.backward()
print(a.grad) # gives nan

But the epsilon should actually prevent the division by zero of the derivative right?
So what is happening here? And how can I normalize without running into this error?

Well, sqrt(x)'s derivative at x=0 is undefined.

Thank you, sure of course. I totally forgot about the sqrt(x)

sorry, could you share a link where I can read about that. Since, I thought we can find a derivative of square root.

the left limit doesn’t exist…

1 Like

The sqrt(x) is not defined for values smaller or equal zero (at least for real numbered values). Doing this operation with such values results in nan. Therefore the derivative does only exist for x > 0.

1 Like

Hi, how could you get rid of this issue? I need to use std, is there any solution to avoid nan in the backward pass?